Intelligent Watermarking Techniques
Series on Innovative Intelligence Editor: L. C. Jain (University of South Australia)
Published: VOI. 1
Virtual Environments for Teaching and Learning (eds. L. C. Jain, R. J. Howlett, N. S. Ichalkaranje & G. Tonfoni)
Vol. 2
Advances in Intelligent Systems for Defence (eds. L. C. Jain, N. S. Ichalkaranje & G. Tonfoni)
VOl. 3
Internet-Based Intelligent Information Processing Systems (eds. R. J. Howlett, N. S. Ichalkaranje, L. C. Jain & G. Tonfoni)
VOl. 4
Neural Networks for Intelligent Signal Processing (A. Zaknich)
Vol. 5
Complex Valued Neural Networks: Theories and Applications (ed. A. Hirose)
Vol. 6
Intelligent and Other Computational Techniques in Insurance (eds. A. E Shapiro & L. C. Jain)
Forthcoming Titles: Biology and Logic-Based Applied Machine Intelligence: Theory and Applications (A. Konar & L. C. Jain) Levels of Evolutionary Adaptation for Fuzzy Agents (G. Resconi & L. C. Jain)
Intelligent Watermarking Techniques
Editors
Jeng-Shyang Pan National Kaohsiung University of Applied Sciences, Taiwan
Hsiang-Cheh Huang National Chiao Tung University, Taiwan
Lakhmi C. Jain University of South Australia
N E W JERSEY
.
1;World Scientific L O N D O N * SINGAPORE
SHANGHAI
HONG KONG
-
TAIPEI
-
BANGALORE
Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224 USA office: Suite 202,1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-PublicationData A catalogue record for this book is available from the British Library.
INTELLIGENT WATERMARKING TECHNIQUE23 (WITH CD-ROM) Copyright 0 2004 by World Scientific Publishing Co. Re. Ltd. All rights reserved. This book, or parts thereof; may not be reproduced in any form or by any means, electronic or mechanicaf, includingphotocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 981-238-955-5
Printed in Singapore by World Scientific Printers ( S ) Pte Ltd
PREFACE
Watermarking techniques involve the concealment of information within a text or images and transmit this information to the receiver with minimum distortion. This is a very new area of research. These techniques will have a significant effect on defence, business, copyright protection and other fields where information needs to be preserved at all cost from attackers. This book presents the recent advances in the theory and implementation of watermarking techniques. It brings together for the first time the successful applications of intelligent paradigms, including comparisons with conventional methods, in many areas as listed in the table of contents. We believe that this book will be of great value to undergraduate and postgraduate students of all disciplines including engineering and computer science. It is targeted at researchers, scientists, and practising engineers who wish to improve their productivity by developing successful information systems. We are grateful to the authors for their valuable contributions. We express our appreciation to the reviewers for their time and expert advice. Our thanks are due to the editorial staff of the World Scientific Publishing Company for their assistance in the preparation of the manuscript.
Peter Jeng-Shyang Pan Hsiang-Cheh Huang Lakhmi Jain
V
This page intentionally left blank
CONTENTS v
Preface Part I. Chapter 1.
Fundamentals of Watermarking and Intelligent Techniques An Introduction to Watermarking Techniques Hsiang-Cheh Huang, Hsueh-Ming Hang, and Jeng-Shyang Pan
1 3
Chapter 2.
Neuro-Fuzzy Learning Theory Yan Shi, Masaharu Mizumoto, and Peng Shi
41
Chapter 3.
Evolutionary Algorithms Wei-Po Lee and Chao-Hsing Hsu
67
Chapter 4.
A Tutorial on Meta-Heuristics for Optimization Shu-Chuan Chu, Chin-Shiuh Shieh, and John E Roddick
97
Part 11.
Watermarking Techniques
133
Chapter 5.
Watermarking Based on Spatial Domain Hsiang-Cheh Huang, Jeng-Shyang Pan, and Hsueh-Ming Hang
135
Chapter 6.
Watermarking Based on Transform Domain Hsiang-Cheh Huang, Jeng-Shyang Pan, and Hsueh-Ming Hang
147
vii
viii
Contents
Chapter 7.
Watermarking Based on Vector Quantization Chin-Shiuh Shieh, Hsiang- Cheh Huang, Zhe-Ming Lu, and Jeng-Shyang Pan
165
Chapter 8.
Audio Watermarking Techniques Hyoung Joong Kim, Yong Hee Choi, Jongwon Seok, and Jinwoo Hong
185
Chapter 9.
Video Watermarking: Requirements, Problems and Solutions Christoph Busch and Xiamu Niu
Chapter 10.
Digital Video watermarking: Techniques, Technology and Trends Deepa Kundur, Karen Su, and Dimitrios Hatzinakos
219
265
Chapter 11.
Benchmarking of Watermarking Algorithms Nikolaos Nikolaidis and Ioannis Pitas
315
Part 111.
Advanced Watermarking Techniques
349
Chapter 12.
Genetic Watermarking on Transform Domain Hsiang-Cheh Huang, Jeng-Shyang Pan, and Feng-Hsing Wang
35 1
Chapter 13.
Genetic Watermarking on Spatial Domain Feng-Hsing Wang, Lakhmi C. Jain, and Jeng-Shyang Pan
377
Chapter 14.
Robust Image Watermarking Systems Using Neural Networks Chin-Cheng Chang and Iuon-Chang Lin
Chapter 15.
A Perceptually Tuned Watermarking Scheme for Digital Images Using Support Vector Machines Chin-Cheng Chang and Iuon-Chang Lin
395
429
Contents ix
Chapter 16.
Recent Development of Visual Cryptography Kuo-Feng Hwang and Chin-Cheng Chang
Chapter 17.
Watermark Embedding System Based on Visual Cryptography Feng-Hsing Wang, Lakhmi C. Jain, and Jeng-Shyang Pan
Chapter 18.
Part IV. Chapter 19.
Chapter 20.
Chapter 21.
Chapter 22.
Spread Spectrum Video Data Hiding, Interleaving and Synchronization Yun Q. Shi, Jiwu Huang, and Heung-Kyu Lee
Practical Issues in Watermarking and Copyright Protection Video Watermarking: Approaches, Applications, and Perspectives Alessandro Piva, Roberto Caldelli, and Mauro Bami Quantization Index Modulation Techniques: Theoretical Perspectives and A Recent Practical Application Brian Chen Digital Watermarking for Digital Rights Man agement Sai Ho Kwok Watermark for Industrial Application Zheng Liu and Akira Inoue
459
48 1
515
559
561
593
613 639
Appendix
A-1
Appendix A. VQ-Based Scheme I
A-3
Appendix B. VQ-Based Scheme I1
A-35
x
Contents
Appendix C. Spatial-Based Scheme
A-49
Appendix D. GATraining Program for Spatial-Based Scheme
A-59
Appendix E. Visual Cryptography
A-75
Appendix F. Modified Visual Cryptography
A-83
Appendix G. VC-Based Scheme
A-89
Appendix H. Gain/Shape VQ-Based Watermarlung System
A- 107
Authors’ Contact Information
B-1
Index
1-1
PART I
Fundamentals
of Watermarking and Intelligent Techniques
This page intentionally left blank
Chapter 1 An Introduction to Watermarking Techniques Hsiang-Cheh Huang, Hsueh-Ming Hang, and Jeng-Shyang Pan A typical application of digital watermarking is to identify the ownership of a multimedia object or content by embedding the owner mark, or the watermark, into it. Most multimedia applications require imperceptible and robust watermarks. The purpose of this chapter is to provide an overview on the current watermarking techniques together with some useful Internet resources. We will describe several representative concepts and examples in image watermarking. The ideas for video and audio watermarlung can be derived from image watermarlung, and there are several chapters in this book specializing in these applications. In addition, we will cover the attacks that tamper watermarks and the theoretical aspects of watermarlung with some benchmarks. We hope that the readers, after going through this chapter, will learn the fundamentals of watermarking and its current status, and will be ready to explore this subject with the aid of the contents in the rest of this book further.
1
Introduction
Owning to the popularity of Internet connection, the demand for embedding securely owner identity and other information into multimedia becomes very urgent. The protection and enforcement of intellectual property rights for digital multimedia has become an important issue. The modern digital watermarking technology has a rather short 3
4 H.-C. Huang, H.-M. Hang, and J.-S. Pan
history since 1993 (Tirkel et al. 1993, Tirkel and Hall 2001). It is reported that there were only 21 publications in the public domain before and up to 1995 (Petitcolas et al. 1999). However, it has been blooming since then. The number of publications in 1998 was 103 (Petitcolas et al. 1999), and at the time of editing this book in January 2003, the number is more than 1200. This trend points out that watermarking research is a growing field and we anticipate its continuing progress in both academic researches and industrial applications in the next few years. The purpose of this chapter is to provide an overview on the current watermarking techniques together with some useful Internet resources. There are thousands of related documents available in the technical journals, conference proceedings, and web pages. Several comprehensive survey papers on digital watermarking were published recently (Kutter and Jordan 2000, Petitcolas et al. 1999, Pitas 1998, Podilchuk and Delp 200 1,Provos and Honeyman 2003, Swanson et al. 1998, Wolfgang et al. 1999). There is no need to duplicate all their effort here again in this chapter. As an introduction to the readers new on this subject, we will describe several representative concepts and examples in image watermarking. And there are many chapters in this book specializing in watermarking with other multimedia formats, and the readers are suggest to refer to these chapters for more details. We will also describe some common attacks, together with several popular benchmarks in researches for robust watermarking. We will also cover some theoretical aspects of watermarking and finally conclude with some remarks and references. We hope that the readers, after going through this chapter, will learn the bases of watermarlung and its current status, and will be ready to explore this subject further.
A n Introduction to Watermarking Techniques 5
2
Some Terminology
Watermarlung has a more former name, steganography. This word has its origin in Greek. “Stegano” means “covered” and “graphos” means “to write” (Petitcolas et al. 1999, Swanson et al. 1998). Together, steganography means literally “covered writing.” In literature, this term, steganography, is not yet popular. Most people use (digital) watermarking, data embedding, and information hiding. Among them, watermarking is most recognized by the general public and is, by far, mostly used on the commercial products. Some articles make a distinction among various names. For instance, the authors in (De Vleeschouwer et al. 2002) propose the classification for watermarking as depicted in Figure 1 according to their specific usages. In this chapter, unless specifically mentioned, we will view them, steganography, watermarking, and datdinformation hiding, equivalent and use them interchangeably. steganography
information hiding
robust watermarking
fragilehemi-fragile watermarking
Figure 1. The classification for watermarlung as depicted in (De Vleeschouwer et al. 2002).
The basic concept of hiding a message in a document or a picture such that the message is not detected or recognized by a third person is old. One can go back to the history and find the ancient stories about the steganographic techniques used a thousand years ago (Braudaway et al. 1996, Petitcolas et al. 1999, Swanson et al. 1998).
6
H.-C. Huang, H.-M. Hanq, and J.-S. Pan
However, the modern digital watermarking techniques were developed quite recently. The techniques cited in this chapter were designed in the past seven years or so. More specifically, digital watermarking, also called watermark insertion or watermark embedding, represents the scheme that inserts the hidden information into multimedia data, also called the original media or the cover-media. The hidden information may be the serial number or the random number sequence, copyright messages, ownership identifiers, control signals, transaction dates, creators of the work, text, bi-level or grey-level image, or other digital formats, called the watermark. After inserting or embedding the watermark by specific algorithms, the original media will be slightly modified, and the modified media is called the watermarked media. There might be no or little perceptible differences between the original media content and the watermarked one. The main application for digital watermarlung is copyright protection. After embedding the watermark, the watermarked media are sent to the receiver via the Internet or other transmission channel, for instance, the mobile channel. Whenever the copyright of the digital media is in question, this embedded information is decoded to identify the copyright owner. The decoding process can be twofold: one is the inverse operation of the embedding process to extract the embedded watermark, called watermark extraction; the other is to decide the existence of the embedded watermark, called watermark detection. The high level diagram of a generic watermarking scheme is shown in Figure 2. Typically, in a watermark insertion process shown in Figure 2(a), we have the original media ( X ) ,an image for example, and the encoder inserts a watermark ( W )into it. The result is the marked media X ' , for example, a marked image. In this encoding (embedding, inserting) process, a key, for instance, a random number sequence, may be involved to produce a more secure watermark.
A n Introduction to Watermarking Techniques 7
hencoding process. The dashed line in Figure 2 indicates that it may be needed for a particular design. At the other end, the watermark is either extracted by a decoder, illustrated in Figure 2(b), or detected by a detector, illustrated in Figure 2(c). In the former process, in addition to the test media ( X ” ) ,the original media and/or a key may be needed. In the latter, the inserted watermark ( W )is often necessary to check the mark identity. Different terms have been used in literature. It is reported that in a panel session of the First Information Hiding Workshop (Pfitzmann 1996), the following terms were agreed. The original media is called cover-media; the watermark is called embedded message and the marked-media is the stego-media. However, these terms are not yet very popular and thus in this chapter, we still use the conventional terms, original and marked media in most places. We also use mathematical notions to express the aforementioned processes in Figure 2. We can view the encoding or embedding process as a function or mapping that maps the inputs X , W and/or K to the output X ’ ;that is, (1)
) the embedding process, and [ K ]indicates that K where E ( +denotes may not be included. Similarly, the decoding or extraction process, D ( . ) ,can be denoted by
W’
=
D(X’/,[X],[K])
(2)
and the detection process, d(.), is {Yes or No}
=
d ( X ” ,[XI,W ,[IT]).
Again, [-] means that the element in the bracket may be optional.
(3)
8
H.-C. Huang, H.-M. Hang, and J.-S. Pan
Watermark, W Original-media, X
4
Watermark Encoder
Key K .....................................................
I_ I x'
Marked media,
i
(4
Original-media, X Test-media,
'--
4 X"
Watermark Mark, W' Decoder
Key, K ..................................................... i (b>
Watermark, W
Watermark
Original-media, X Test-media, X "
...
4
Detector
Key, K .....................................................
I
Yes or No
j
(4 Figure 2. (a) Watermark insertion, (b) watermark extraction, (c) watermark detection.
An Introduction to Watermarking Techniques 9
3
Applications
In general, digital watermarhng has two types of applications: (1) convey ownership information, and (2) verify object contents. The purpose of the first type of applications is to identify the ownership of an object. Hence, one popular way is embedding perceptible watermarks into the media, for instance, embedding the company logo into one corner of the video clip. In order to prevent someone to remove the mark embedded in the object, the robust watermarks are often required for this type of applications. In contrast, the goal of the second type of applications is to ensure the integrity or originality of the marked material. Hence, a fragile watermark is usually used to protect the data. Ideally, a single bit alternation on the marked material can be detected. Combining the advantages of both the robust and fragile watermarks leads to the newly developed schemes called semi-fragile watermarks. A semi-fragile watermark, unlike the fragile watermark, can survive manipulations for removing the watermark to some determined level, measured by the distortions caused by the manipulations. Consequently, semi-fragile watermarks attain the characteristics in robust and fragile watermarks within specific distortion levels. Thus, algorithm designers need be sure that the watermark cannot survive beyond that level. The concepts for watermarking digital images can extend to the applications for audio and video watermarking (Hartung and Girod 1998, Hartung and Ramme 2000, Lemma et al. 2003, Ye0 and Kim 2003). There is another type of applications, called collaborative watermarking, that described in literature (Mintzer and Braudaway 1998, Petitjean et al. 2002). It conveys object-specific information to a community of recipients. The automatic royalty accounting system for broadcast audio is described in (Mintzer and Braudaway 1998) as an example of collaborative watermarking. The
10 H.-C. Huang, H.-M. Hang, and J . 3 . P a n
audio signal is marked so that a monitoring device is able to extract the identity of each passage and automatically account for the royalties owed. Although this example is similar to the ownership applications, its requirements are different. The broadcast stations would not intentionally remove the marks because the audio signal is broadcast in the air and can be recorded and checked manually. Hence, the key requirements for this example are that the mark should be inaudible and can survive the broadcasting distortion. As we stated in the previous paragraphs, digital watermarking started from researches in the design and effectiveness of the algorithms. And there is the trend to turn the well-designed algorithms into practical products. For real implementations, the authors in (Cheung and Chiu 2003) proposed a watermark-based protocol for the document management in large enterprises. The authors in (Garimella et ul. 2003) described the VLSI implementation for watermarking techniques, which is one of the pioneering implementations for watermarking applications. Also, the authors in (Mathai et ul. 2003) proposed the video watermarking algorithms through the hardware implementations of a well-known algorithm called Just Another Watermarking Algorithm (JAWS) (Kalker et al. 1999). Details about JAWS will be depicted in Section 3 of Chapter 10. In this category, the implementation cost, or the hardware complexity, need to be considered in addition to the effectiveness or robustness of the watermarking algorithms. Different applications pose different requirements on the watermark design. A universal watermark that can withstand all attacks and at the same time satisfy all the other desirable requirements does not seem to exist (Tirkel and Hall 2001). However, developing a watermark for a specific application should be feasible.
An Introduction to Watermarking Techniques 11
Requirements
4
There are requirements in designing effective watermarking algorithms. We point out the requirements with the aid of existing schemes. The invisiblehnaudible and robust watermarks may be the most difficult challenge among all types of watermarks. We represent some practical requirements from industrial needs in the following examples. In 1997, the International Federation of the Phonographic Industry (IFPI) (IFPI Website 2003) issued a Request for Proposal for embedding signals (watermarks) into audio. Later on, in 2000 and 2001, Japan Society for Rights of Authors, Composers and Publishers (JASRAC) had completed two projects, called STEP2000 and STEP2001, with the goals of “Technical evaluation for promoting practical utilization of digital watermark” (IBM Tokyo Labratory 2003). The requirements can be summarized as follows, which can also serve as the reference for the requirements of watermarlung algorithms. 1. Robustness The embedded information is supposed to be extractable even after the following processing: 0
D/A, A/D conversion;
0
Downmixing: Stereo (2 channels) + mono;
0
Downsampling: 44.1 kHz + 16 kHz;
0
Dynamic range compression: 16 bits + 8 bits; Pitch shifting: +lo% and -10%; ;
0
Random stretching: +lo% and -10%;;
0
Lossy audio compression:
0
12 H.-C. Huang, H.-M. Hang, and J . 3 . P a n
- MPEG-1 Layer3 (MP3) (128 kbps, 96 kbps, 64 kbps (mono));' - MPEG-2 AAC (128kbp~,96kbp~); - Adaptive Transform Acoustic Coding for MiniDisc (ATRAC): (Version 4.5) (Tsutsui et al. 1992); - ATRAC3 (132 kbps, 10.5 kbps); - RealAudio (128 kbps, 64 kbps); - Windows Media Audio (128 kbps, 64 kbps); 0
Broadcasting: - FM (FM multiple broadcast, terrestrial hertzian TV broadcast); - AM (AM broadcast); - PCM (Satellite TV broadcast : communication satellite, broadcasting satellite);
0
Additive noise White noise (Signal-to-Noise Ratio, S/N: -400 dB)
2. Transparency Four individuals each from recording engineers, mastering engineers, synthesizer manipulators, and audio critics are selected to test the transparency of the watermark. 3. Tamper resistance It should not be possible to remove or alter the mark without sufficient degradation of the sound quality to render it unusable.
4. Information capacity The watermark technologies are expected to embed 0
2 bits for Copy Control Information (CCI) in the timeframe of 15 seconds;
'MPEG is the abbreviation for Motion Picture Experts Group (MPEG Home Page 2003).
A n Introduction to Watermarking Techniques 13
0
72 bits for Copyright Management Information (CMI) in the timeframe of 30 seconds.
5. Complexity Implementations for these technologies should be at a reasonable cost. Some additional discussions on various types of requirements versus applications can be found in (Decker 2001), (IBM Tokyo Labratory 2003), (Koch and Zhao 1995), (Langelaar et al. 2000), and (Mintzer et al. 1997). The fundamental concepts about the requirements of digital watermarking from the viewpoints of different research groups are similar. The requirements for audio watermarking listed above can also be served as a reference for the requirements of watermarking with other multimedia formats. Comparing to the audio watermarlung requirements, there are mutual dependencies among the elements of the basic requirements for watermarking in other multimedia formats, which can be depicted in Figure 3 (Langelaar et al. 2000). Transparency I
I
fi
fi
WitWWithout originals
Figure 3. The mutual dependencies among the elements of the basic requirements in image watermarking (Langelaar et al. 2000).
14 H.-C. Huang, H.-M. Hang, and J . S . Pan
From algorithm design viewpoint, the most critical three requirements are (a) transparency or imperceptibility, (b) robustness to intentional or unintentional attacks, and (c) information capacity, or watermark payload (Wolfgang et al. 1999). Although these three requirements are all very desirable, as pointed out in literature (Barni et al. 2000, Kirovski and Malvar 2001, Lin and Chang 2001, Wolfgang et al. 1999), they conflict with each other. The three requirements compose a three-dimensional tradeoff relationship. Fixing one dimension, the remaining two have conflicts between each other, and some tradeoff must be compromised. For instance, by keeping the number of embedded bit constant, the watermarked image may be better if the watermark is embedded in the less important part of the image, such as the least significant bits (LSB) or the high frequency coefficients in the transform domain. By doing so, the watermark is vulnerable to common image processing such as low-pass filtering. In contrast, if the watermark bits are embedded in the more important components of the image, the watermarked image quality becomes worse, hence others may have suspicion to the existence of the watermark. Moreover, invisible signals are generally small in magnitude or short in codeword size, thus they are vulnerable to attacks. Spread spectrum techniques can be used to reliably hide low rate information inside a high rate original signal. However, a higher reliability generally demands a lower information rate; that is, a lower capacity. Given a fixed visual quality in watermarked images, there exists a tradeoff between the robustness and the number of embedded bits. Finally, there is one requirement that is not frequently discussed in literature, called watermark performance (Decker 200 1). Watermark performance is the speed with which the watermark is embedded and extracted. If the watermarking algorithms for academic researches would like to extend to industrial applications, watermark performance is an important issue for system designs. Therefore, from the arguments above, researchers and watermarking system designers
A n Introduction to Watermarking Techniques 15
need to carefully determine the requirements based on the specific applications and purposes of the algorithms.
5
Classifications
Watermarking schemes can be classified into several categories according to their applications and characteristics as follows.
5.1
Perceptible and Imperceptible Watermarks
The watermarks can be classified as perceptible and imperceptible. For images and video, perceptible watermarks are visual patterns like the logos inserted into one corner of the images. In contrast, imperceptible watermarks, or perceptually invisible watermarks, apply the techniques in the spatial or transform domains to imperceptibly embed the watermarks. The authors in (Braudaway et al. 1996) discussed the usability of perceptible watermarks. Another early example of this application is the IBM digital watermarking scheme for the Vatican Library project (IBM Digital Library 2002). And here is an example shown to differentiate perceptible and imperceptible watermarks in Figure 4. Figure 4(a) is the well-know test image Lena, with image size 512 x 512. Figure 4(b) denotes the watermark with size 128 x 128, which is the school emblem of National Chiao Tung University in Taiwan. Figure 4(c) illustrates the example for perceptible watermarking, and the readers can observe the existence of the watermark in the upper-left corder in Lena image. In contrast, Figure 4(d) presents the example for imperceptible watermarkmg. The school emblem is embedded in the transform domain coefficients with the algorithm to be presented in Chapter 13. The desired properties of visible watermarks in Figure 4(c) are 1. it is visible but not obstructive, hence it directly confirms the
16 H.-C. Huang, H.-M. Hang, and J.-S. Pan
owner of the watermark,
2. it is hard to remove, and 3. it is adjusted automatically to cope with different original image contents; for example, it varies the mark intensity to match the local textures. For practical reasons, perceptible watermarks are not the main stream in academic researches, although they are easy for practical implementations. Hence, we will not elaborate on this type of watermarks. Most of the algorithms depict in this book focus on imperceptible watermarking.
5.2
Robust, Fragile, and Semi-Fragile Watermarks
5.2.1 Robust watermarks Watermark designed to survive legitimate and everyday usage, or the intentional or unintentional attacks of content are referred to as robust watermarks. Most watermarking algorithms describe in this book emphasize on robust watermarlung, and they will be explained in detail in the upcoming chapters. 5.2.2 Fragile watermarks
Afragile watermark is simply a mark likely to become undetectable after an original is modified in any way. One fragile watermarlung scheme will be described below. Because fragile marks are not our focus, we do not intend to have a full coverage on this topic. The scheme to be discussed was proposed by Wong (Wong 1998). The basic idea is to create a picture-dependent mark that is embedded in the least significant bits (LSB) of an image
An Introduction to Watermarking Techniques 17
Figure 4. A sample of the perceptible and imperceptible watermarks. (a) The original image, or the well-known test image lena. (b) The watermark. (c) The watermarked image with perceptible watermarking. (d) The watermarked image with imperceptible watermarking.
18 H.-C. Huang, H.-M. Hang, and J . 3 . P a n
in the spatial domain. The author borrowed the public-key encryption technique to produce a mark so that it is hard to fabricate by an attacker. The encoding or embedding block diagram of this scheme is depicted in Figure 5. Let the original image X be 8 bits per pixel with image size M x N . X is first partitioned into blocks, for example, blocks with sizes 8 x 8. And then each block, X,,is brought into the system from the lower left comer in Figure 5. The LSB’s of X , are discarded. The seven most significant bits in X,after discarding the LSB’s and the entire image X with size M x N are combined and mapped into a long bitstream, P,, by a hash function. The Internet MD5 hash function (Rivest 1992) is adopted in (Wong 1998). The length of P, should be larger than the pixel number of an image block X , , L. In case the length of P, is longer than needed, only the first L bits are used, and the L-bit binary pattern is served as the watermark B,.Next, performing the exclusive-or (XOR) operation with both the watermark to be embedded, B,,and the hashed pattern, P,. The result, W,, is encrypted using the private key K‘ in a public key cryptography system, for instance, the RSA system (Rivest et al. 1978), and finally the output, C, (an L-bit string), replaces the LSB’s of the original image block. The decoding or extracting procedure is pretty much the inverse operation of the encoding procedure step by step. The decoder diagram is shown in Figure 6. Note that the public-key decryption block uses the public key K . And it is reported in (Wong 1998) that this fragile mark has been tested for cropping, image size scaling, and pixel alternation. In addition to the authentication purposes, a new application for fragile watermarking is for robust transmission of images and video. In (Hwang et al. 2002), besides the conventional schemes to protect the compressed bitstream by employing error control codes or by using unequal error protection (UEP) algorithms, the authors proposed
A n Introduction to Watermarking Techniques 19
Private Key Public
Watermark Bitmap B,
Insert Cr Marked into LSB of ___)
Encryption
x,
Image size
M x N
-H ( M ,N , XT') A
-
Original Xr
Set LSB's to zero
x,.
x,.
Figure 5. The encoder of fragile watermarking schemes (Wong 1998).
Image
M x Extracted Test block
t
W:' Public
2:. LSB's Public Key
Decryption
I
K Figure 6. The decoder fragile watermarking associated with Figure 5.
20
H.-C. Huang, H.-M. Hang, and J.-S. P a n
an error detection technique using fragile watermarlung in order to improve error detection. The fragile watermarking scheme can incorporate with other error resilient coding algorithms to effectively and efficiently provide robust transmission of multimedia. 5.2.3 Semi-fragile watermarks
A semi-fiagile watermark describes a watermark that is unaffected by legitimate distortions, but destroyed by illegitimate distortions (Fridrich and Goljan 1999, Yin and Yu 2002). Furthermore, the semi-fragile watermarking schemes are marginally robust and are less sensitive to intentional or unintentional attacks. Semi-fragile watermarks are mainly for authentication purposes (Sun et al. 2002), and this kind of watermarking schemes allow acceptable manipulations on watermarked images while verifying the authenticity.
5.3
Algorithm Design Issues
The requirements of imperceptibility, robustness, and capacity described in the previous sections conflict with each other. Therefore, one of the aims in this book is to introduce the soft computing techniques for finding the trade-offs among the contradicting requirements. For instance, we search for the optimized solution between the imperceptibility and the robustness by fixing the watermark capacity in Chapter 12 and Chapter 13. The readers are suggested to refer to these chapters for more details.
6
Watermark Embedding Schemes
6.1
Spatial Domain Watermarking
Embedding the watermark into the spatial domain component, or the least significant bits (LSB), of the original is the straightforward method for digital watermarking. It has the advantages of low complexity and easy implementation. However, the spatial domain wa-
A n Introduction to Watermarking Techniques 21
termarking algorithms are generally not robust to intentional or unintentional attacks. Details for spatial domain watermarking will be described in Chapter 5 .
6.2
Transform Domain Watermarking
The fundamental concepts for transform domain watermarking, including watermarking in the discrete cosine transform (DCT) (Hsu and Wu 1999), discrete Fourier transform (DFT) (Barni et al. 2003), and discrete wavelet transform (DWT) (Serdean et al. 2003) domains, will be depicted in Chapter 6. Watermarking based on vector quantization (VQ) (Huang et al. 2001) will be depicted in Chapter 7. The readers are suggested to refer to these chapters for more details.
6.3
QIM Watermarking
In addition to the conventional schemes to embed the watermark in the spatial or transform domains, Chen and Wornell propose the embedding scheme called “quantization index modulation (QIM)”, which is provably good against arbitrary bounded and fully informed attacks, and achieves provably better rate-distortion robustness tradeoffs than spread spectrum methods (Chen and Wornell 2001). Details about QIM can be found in Chapter 2 1. In addition, knowledge about spread spectrum methods are described in Chapter 19.
7
Watermark Extraction Categories
7.1
Extraction with/without The Original Multimedia Contents
At the beginning of watermarking researches (Cox et al. 1996, Podilchuk and Zeng 1998, Swanson et al. 1996), the proposed schemes require the original image for the watermark extraction. These are called the non-oblivious, or private, watermarking tech-
22
H.-C. Huang, H.-M. Hang, and J.-S. Pan
niques. In contrast, watermarking algorithms that do not require the original image during the extraction process are called oblivious, or public, watermarking techniques (Holliman and Memon 2000, Lin et al. 2001, Zeng and Liu 1999). Conceptually speaking, it requires lots of storages, high bandwidths and computing power to extract the watermark if we need the original image in the watermarking algorithm. There are billions of images on the Internet, and consequently, it would be difficult and timeconsuming to find out the proper original image before extracting the watermark from the possibly attacked image. Moreover, the owners of the original works are compelled to unsecurely share their works with anybody who wants to check the existence of the watermark. On the other hand, if the watermark can be extracted from a suspect image without requiring the original image, the constraints imposed on the non-oblivious techniques would not be problematic. From the arguments above, extracting the watermarks without the original images is more attractive for both researches and practical implementations.
7.2
The PublicBecret Keys
Watermarking, like cryptography, needs public and secret keys to identify legal owners. Both the digital watermarking techniques and the cryptographic mechanisms are considered to be the security issues for multimedia systems (Dittmann et al. 2001, Hernandez et al. 2000). And the keys in watermarking algorithms can apply the cryptographic mechanisms to provide more secure services to copyright protection. As shown in the generic watermarking structure in Figure 2, the public or secret keys can be incorporated into the watermark embedding and extraction structures. Consequently, lots of watermarking systems are designed to use keys in an analogy to their counterparts in cryptographic systems. The readers are suggested to refer to (Cox
An Introduction to Watermarking Techniques 23
al. 2002, Chap. 2, Appendix A.2) and (Stinson 2002) for details about the definitions of public and secret keys in watermarking and cryptography. et
8
Attacking Schemes
Watermarking attacks can be classified into two broad categories:
destruction attacks : including image compression, image cropping, spatial filtering, etc., and synchronization attacks : including image rotation, image shifting and pixelhine deletion. We list describe some of these conventional attacks in the following sections. The authors discuss other commonly employed attacks for watermarking systems in (Voloshynovskiy et al. 2001).
8.1
Image Compression
Compression is a popular scheme for attacking watermarked images or audio. Two common compression schemes are VQ compression and P E G compression for image processing.
8.1.1 VQ compression For the attackers to remove the hidden watermarks, they may compress the watermarked images with some other VQ codebooks, and decode the VQ indices to get the reconstruction. The VQ compression schemes are effective for attacking some of the existing algorithms. The readers are suggested to refer to Chapter 7 for more details about the concepts of VQ and VQ-related watermarking algorithms.
24 H.-C. Huang, H.-M. Hang, and J.-S. P a n
8.1.2 JPEG compression The attackers could modify the watermarked images by varying the factor (QF) in the JPEG compression system (Pennebaker and Mitchell 1993). Each QF corresponds to a different quantization table in the JPEG compression system. If we choose the a larger QF, we will get the better image quality after attacking. Consequently, under the same watermark extraction scheme, the extracted watermark after attacking with larger QF will generally more recognizable than those with smaller QF. In addition, with the different QF values, we can tell whether the embedded watermarks can survive the P E G compression or not.
quality
8.2
Image Cropping
One of the popular schemes for the attackers is to alter the watermarked images by cropping the boarder or some part of the watermarked images, in the hope of removing the watermarks. However, under some circumstances, the watermarked image after cropping may lose its value for practical use. Two common solutions for combating the image cropping scheme are: applying the concepts from spread spectrum in communications (Cox et al. 1997), and making use of the linear feedback shift registers to disperse the spatial domain relationships in the original images (Proakis 1995).
8.3
Spatial Filtering
The use of spatial masks for image processing is usually called spatial Jiltering, and the masks themselves are called spatial filters (Gonzalez and Woods 1992). The basic approach for spatial filtering is to sum products between the mask coefficients and the luminance of the pixels under the mask at a specific location in the image. The three spatial masks, low-pass filtering, high-pass filtering, and median filtering, are applied for attacking. The effects of spatial filtering on the watermarked images need to be checked, and the robustness of the proposed algorithms can be evaluated.
An Introduction to Watermarking Techniques 25
8.4
Image Rotation
The attackers may rotate instead of modifying the watermarked image content in hoping that the watermark might be vanished. And in practical implementations, after rotation, some parts of the attacked image might not have any value for representing their luminance. On the one hand, if the missing pixel lie inside the watermarked image, the luminance of the attacked image is calculated by interpolating the neighboring pixels of the watermarked image. On the other hand, if the missing pixel is outside the range of the watermarked image, we set the luminance of these regions to zero for simplicity.
8.5
Image Shifting and Line Deletion
The attackers might move the watermarked image around horizontally and vertically, or delete a whole line of pixels, to destroy the watermark information conveyed. For the watermarks embedding in the DCT or VQ domains, image shifting might cause the watermark extracting algorithm to lose the synchronization of the watermarked image. How to acquire an acceptable quality in the watermarked image and to preserve the capability for recovering the embedded watermark with the image shifting scheme is another topic for robust watermarking.
9
Watermarking Benchmarks
There are at least four publicly recognized benchmarks for digital watermarking. Some commonly employed schemes, including filtering or rotation, are offered in all the benchmarks. In addition, each benchmark has its special utilities to compare with others. Details about watermarking benchmarks will be described in Chapter 11.
26
H.-C. Huang, H.-M. Hang, and J . 3 . Pan
9.1
Stirmark
Stirmark is a benchmark to test the robustness of image watermarking algorithms. The first version was published in November 1997, and the latest version is Stirmark benchmark 4.0 published in January 2003 (Petitcolas 2000, Petitcolas 2003, Petitcolas et al. 1998). The first version of Stirmark for audio was released in January 2001 (Lang 2003). It applies different filters on audio signals to serve as attacks.
9.2
Checkmark
The checkmark benchmark was initiated to better evaluate watermarking technologies. The original version was published in June 2001, and the latest version is checkmark benchmark 1.2 published in December 2001 (Pun 2001).
9.3
Opitmark
Optimark is a benchmarking tool for still image watermarking algorithms (Argyriou 2002). And it is expected to extend to video and audio features.
9.4
CERTIMARK
Certimark is “An European task force to disseminate watermarking techniques.” (Rollin 2002) The project duration was from May 2000 to July 2002.
10 Activities with Watermarking Research 10.1 Internet Resources There are numerous watermarking resources on the Internet. The fol-
A n Introduction to Watermarking Techniques
27
lowing website can be served as the gateway for watermarking research and applications.
1. www.watermarkingworld.org This page serves as a pointer linking to other watermark-related resources, including conferences, books, researches, and companies.
10.2 Special Issues in International Journals In the past few years, there are several special issues in international journals relating to watermarking research. The interested readers are suggested to study the papers therein. Some of these issues are listed in chronological orders as follows. 1. IEEE Journal on Selected Areas in Communications, Volume 16, Issue 4,May 1998. 2. Signal Processing, Volume 66, Issue 3, May 1998. 3. Optics Express, Volume 3, Issue 12, December 1998.
4. IEEE Computer Graphics and Applications, Volume 19, Issue 1, Januarymebruary 1999.
5. Proceedings of the IEEE, Volume 87, Issue 7, July 1999. 6. IEEE Signal Processing Magazine, Volume 17, Issue 5 , September 2000. 7. Signal Processing, Volume 81, Issue 6, June 2001. 8. IEEE Communications Magazine, Volume 39, Issue 8, August 200 1.
28
H.-C. Huang, H.-M. Hang, and J . 3 . P a n
9. IEEE Multimedia, Volume 8, Issue 4, October-December 200 1. 10. EURASIP Journal on Applied Signal Processing, Volume 2002, Issue 2, February 2002. 11. Communications ofthe ACM, Volume 46, Issue 4,April 2003. 12. IEEE Transactions on Signal Processing, Volume 51, Issue 4, April 2003. 13. IEEE Transactions on Signal Processing (Supplement on Secure Media), Volume 52, Issue 10, October 2004 (planned to be published).
10.3 Books At the time of writing this chapter, there are thrteen books relating to watermarking research, and most of them are published after 2002. They are listed in chronological orders as follows. 1. Information hiding - Techniques for steganography and digital watermarking. Edited by S . Katzenbeisser and F. A. P. Petitcolas. Published by Artech House Publishers in 1999. 2. Image and video databases: Restoration, watermarking and retrieval. Written by A. Hanjalic, G. C. Langelaar, P. M. B. van Roosmalen, and R. L. Lagendijk. Published by Elsevier Science in 2000. 3. Information hiding :Steganography and watermarking -Attacks and countermeasures. Written by N. F. Johnson, Z. Duric, and S. Jojodia. Published by Kluwer Academic Publishers in 2001.
A n Introduction to Watermarking Techniques 29
4. Digital watermarking. Written by I. J. Cox, M. L. Miller, and J. A. Bloom. Published by Morgan Kaufinann Publishers in 2001. 5. Digital data-hiding and watermarking with applications. Written by R. Chandramouli. Published by CRC Press in 2002.
6 . Disappearing crytography -Information hiding, steganography & watermarking. Written by P. Wayner. Published by Morgan Kaufmann Publishers in 2002. 7. Informed watermarking. Written by J. Eggers and B. Girod. Published by Kluwer Academic Publishers in 2002.
8. Multimedia data hiding. Written by M. Wu and B. Liu. Published by Springer-Verlag in 2002. 9. Data privacy: Encryption and information hiding. Written by D. Salomon and W. J. Ewens. Published by Springer-Verlag in 2003.
10. Digital watermarking (LNCS vol. 26 13). Edited by F.A.P. Petitcolas and H. J. Kim. Published by Springer-Verlag in 2003. 11. Hiding in plain sight: Steganography and the art of covert communication. Written by E. Cole and R. D. Krutz. Published by John Wiley & Sons in 2003.
12. Investigator’s guide to steganography. Written by G. Kipper. Published by Auerbach Publications in 2003.
30
H.-C. Huang, H.-M. Hang, and J.-S. P a n
13. Techniques and applications of digital watermarking and content protection. Written by M. Arnold, M. Schmucker, and S. D. Wolthusen. Published by Artech House in 2003.
10.4 Related Sessions in International Conferences There are also lots of sessions in many international conferences related to digital watermarking. Also, some conferences aim specifically at watermarking researches. Because there are too many conferences held around the world every year, we choose some of the publicly acquainted ones in the list shown below. Readers who are interested in such conference events can refer to the following lists shown in alphabetical orders with their URLs. 1. International Conference on Acoustics, Speech, and Signal Processing. http://www.icassp2004.org/ 2. International Conference on Image Processing. http://www.icip2003.org/
3. International Conference on Knowledge-Based Intelligent Information & Engineering Systems. http://www.bton.ac.uk/kes/kes.html
4. International Workshop on Digital Watermarking. http://www.iwdw.org/ 5. International Workshop on Information Hiding. http://research.microsoft.com/ih2OO2/ 6. TS&T/SPIE Symposium on Electronic Image Science and Technology: Security and Watermarking of Multimedia Contents.
http://electronicimaging.org/call/O3/
An Introduction t o Watermarking Techniques 31
7. IS&T/SPIE Symposium on Electronic Image Science and Technology: Security, Steganography, and Watermarking of Multimedia Content. http://electronicimaging.org/call/O4/
8. Pacific Rim Workshop on Digital Steganography. http://www.know.comp.kyutech.ac.jp/STEG/
10.5 Companies and Products Relating to Watermarking The authors depict watermarking specifically for industrial applications in Chapter 22. The authors in Chapters 10 and 20 also mention the applications for digital watermarking. The readers are suggested to refer to these chapter for more details about watermarking in industries.
11 Organization of This Book This book is divided into four parts. Part one includes the fundamental concepts and background introductions in both the watermarking field in Chapter 1, and the soft computing field in Chapter 2 to Chapter 4. In the latter portion of Part one, the authors in each chapter describe the watermarking algorithms and their applications in images, video, and audio, from Chapter 5 to Chapter 10. Chapter 11 gives the benchmarking of the watermarking algorithms. The applications for the combinations of soft-computing schemes with the watermarking algorithms, associated with some advanced topics in watermarking, are presented in Part three from Chapter 12 to Chapter 18. Finally, in Part four, the authors offer some practical issues in watermarking and copyright protection from Chapter 19 to Chapter 22.
32
H.-C. Huang, H.-M. Hang, and J . 3 . P a n
12 Summary A general framework for watermarking embedding and extraction has been presented in this chapter, along with a review of some of the algorithms for different media types described in the literature. Some Internet resources are also offered in this chapter. In the subsequent chapters, we will discuss the fundamental concepts and applications in detail with image, audio, and video watermarking.
References Argyriou, V., Nikolaidis, N., Solachidis, V., Tefas, A., Nikolaidis, A., Tsekeridou, S., and Pitas, I. (2002), “Optimark benchmark,” http://poseidon.csd.auth.gr/optimark/ Barni, M., Bartolini, F., De Rosa, A., and Piva, A. (2000), “Capacity of full frame DCT image watermarks,” IEEE Trans. Image Processing, vol. 9, pp. 1450-1455. Barni, M., Bartolini, F., De Rosa, A., and Piva, A. (2003), “Optimum decoding and detection of multiplicative watermarks,” IEEE Trans. Signal Processing, vol. 5 1,pp. 1118-1 123. Bender, W., Gruhl, D., Morimoto, N., and Lu, A. (1996), “Techniques for data hiding,” IBM Systems JournuZ, vol. 35, pp. 313336. Braudaway, G., Magerlein, K., and Mintzer F. (1996), “Protecting publicly available images with a visible image watermark,” Proc. SPIE: Optical Security and Counterfeit Deterrence Techniques, V O ~ .2659, pp. 126-133, 1996. Chen, B., and Wornell, G.W. (2001), “Quantization index modulation: A class of provably good methods for digital watermarking and information embedding,” IEEE Trans. Information Theory, VO~ 47, . pp. 1423-1443.
An Introduction to Watermarking Techniques 33
Cheung, S.C. and Chiu, D.K.W. (2003), “A watermarlung infrastructure for enterprise document management,” 36th Annual Hawaii Int’l Con$ on System Sciences, pp. 105-114. Cox, I.J., Kilian, J., Leighton, T., and Shamoon, T. (1996) “Secure spread spectrum watermarking for images, audio and video,” IEEE Int ’I Con$ Image Processing, pp. 243-246. Cox, I.J., Kilian, J., Leighton, F. T., and Shamoon, T. (1997) “Secure spread spectrum watermarking for multimedia,” IEEE Trans. Image Processing, vol. 6, pp.1673-1687. Cox, I.J., Miller, M.L., and Bloom, J.A. (2002) Digital watermarking, Morgan Kauffman Publishers, San Francisco: CA. De Vleeschouwer, C., Delaigle, J.-F., and Macq, B. (2002), “Invisibility and application functionalities in perceptual watermarking -An overview,” Proceedings of the IEEE, vol. 90, pp. 64-77. Decker, S. (200 l), “Engineering considerations in commercial watermarking,” IEEE Communications Magazine, vol. 39, pp. 128133. Dittmann, J., Wohlmacher, P., and Nahrstedt, K. (2001), “Using cryptographc and watermarlung algorithms,” IEEE Multimedia, V O ~ .8, pp. 54-65. Fridrich, J. and Goljan, M. (1999), “Images with self-correcting capabilities,” IEEE Int ’I Con$ Image Processing, pp. 792-796. Garimella, A., Satyanarayana, M.V.V., Kumar, R.S., Murugesh, P.S., and Niranjan, U.C. (2003), “VLSI implementation of online digital watermarking technique with difference encoding for 8-bit gray scale images,” 16th Int ’I Con. VLSI Design, pp. 283-288. Gonzalez, R.C. and Woods, R.E. (1992), Digital image processing, Addison-Wesley, Reading: MA.
34 H.-C. Huang, H.-M. Hang, and J.-S. Pan
Hartung, F. and Girod, B. (1998), “Watermarking of uncompressed and compressed video,” Signal Processing, vol. 66, pp. 283-301. Hartung, F. and Ramme, F. (2000), “Digital rights management and watermarking of multimedia content for m-commerce applications,” IEEE Communications Magazine, vol. 38, pp. 8-84. Hernindez, J., Amado, M., and Pkrez-Gonzilez, F. (2000), “DCTDomain watermarking techniques for still images: Detector performance analysis and a new structure,” IEEE Trans. Image Processing, vol. 9, pp. 55-68. Holliman, M. and Memon, N., “Counterfeiting attacks on oblivious block-wise independent invisible watermarlung schemes,” IEEE Trans. Image Processing, vol. 9, pp. 432-441. Hsu, C.-T. and Wu, J.-L. (1999), “Hidden digital watermarks in images,” IEEE Trans. Image Processing, vol. 8, pp. 58-68. Huang, H.-C., Wang, F.H., and Pan, J.S. (2001), “Efficient and robust watermarking algorithm with vector quantisation,” IEE Electronics Letters, vol. 37, pp. 826-828. Hwang, Y., Jeon, B., and Chung, T.M. “Improved error detection method for real-time video communication using fragile watermarking,” IEEE Pacijic Rim Conference on Multimedia, pp. 5057. IBM Digital Library (2002), http : / /www . software . ibm
.com/is/dig-lib IBM Tokyo Research Laboratory (2003), ht tp : / /www. trl .
ibm.com/projects/RightsManagement/datahiding /index-e.htm International Federation of the Phonographic Industry (2003),
http://www.ifpi.org/
A n Introduction to Watermarking Techniques 35
Kalker, T., Depovere, G., Haitsma, J., and Maes, M. (1999), “A video watermarking system for broadcast monitoring,” IS& T/SPIE Electronic Imaging ’99, Security Watermarking Multimedia Contents, pp. 103-112. Kalker, T. and Haitsma, J. (2000), “Efficient detection of a spatial spread-spectrum watermark in MPEG video streams,” IEEE Int ’I Conf Image Processing, pp. 434-437. Kirovsh, D. and Malvar, H. (2001), “Spread-spectrum audio watermarlung: requirements, applications, and limitations,” IEEE Fourth Workshop on Multimedia Signal Processing, pp. 2 19-224. Koch, E. and Zhao, J. (1995), “Towards robust and hidden image copyright labeling,” IEEE Workshop on Nonlinear Signal and Image Processing, pp. 452-455. Kutter, M. and Jordan, F. (2000), “Digital watermarking technology,” http://www.alpvision.com/watermarking.html Kutter, M. and Petitcolas, F.A.P. (1999), “A fair benchmark for image watermarlung systems,” Electronic Imaging ’99, Security and Watermarking of Multimedia Contents, pp. 226-239. Lang, A. (2003), “Stirmarkbench - Evaluation of watermarking schemes,” http : / /ms- smb . darmstadt .gmd .de /stirmark/stirmarkbench.html Langelaar, G.C., Setyawan, I., and Lagendijk, R.L. (2000), “Watermarking digital image and video data: A state-of-the-art overview,” IEEE Signal Processing Magazine, vol. 17, pp. 20-46. Lemma, A.N., Aprea, J., Oomen, W., and van de Kerkhof, L. (2003), “A temporal domain audio watermarking technique,” IEEE Trans. Signal Processing, vol. 5 1,pp. 1088-1097.
36
H.-C. Huang, H.-M. Hang, and J.-S. Pan
Lin, C.Y. and Chang, S.F. (2001), “Watermarking capacity of digital images based on domain-specific masking effects,” Int ’I Con. Information Technology: Coding and Computing, pp. 90-94. Lin, C.Y., Wu, M., Bloom, J.A., Cox, I.J., Miller, M.L., and Lui, Y.M. (200 l), “Rotation, scale, and translation resilient watermarlung for images,” IEEE Trans. Image Processing, vol. 10, pp. 767-782. Mathai, N. J., Kundur, D., and Sheikholeslami, A. (2003), “Hardware implementation perspectives of digital video watermarking algorithms,” IEEE Trans. Signal Processing, vol. 51, pp. 925-938. Mintzer, F., Braudaway G., and Yeung, M. (1997), “Effective and ineffective digital watermarks,” IEEE Int ’I Conf Image Processing, pp. 9-12. Mintzer, F. and Braudaway, G.W. (1998), “Opportunities for watermarking standards,” Communications ofthe ACM, vol. 41, pp. 5764. MPEG Home Page (2003), http://www.chiariglione . org /mpeg/index.htm Petitjean, G., Dugelay, J.L., Gabriele, S., Rey, C., and Nicolai, J. (2002), “Towards real-time video watermarlung for system-onchip,” IEEE Int ’I Conf Multimedia and Expo, pp. 597-600. Pennebaker, W.B. and Mitchell, J.L. (1993), JPEG: still image data compression standard, Van Nostrand Reinhold, New York. Petitcolas, F.A.P. (2000), “Watermarlung schemes evaluation,” IEEE Trans. Signal Processing, vol. 17, pp. 58-64. Petitcolas, F.A.P. (2003), “Stinnark benchmark 4.0,” http://www.cl.cam.ac.uk/”fapp2/watermarking/ stirmark/
An Introduction to Watermarking Techniques 37
Petitcolas, F.A.P., Anderson, R.J., and Kuhn, M.G. (1998), “Attacks on copyright marlung systems,” 2nd Workshop on Information Hiding, pp. 219-239. Petitcolas, F.A.P., Anderson, R.J., and Kuhn, M.G. (1999), “Information hiding - A survey,” Proceedings ofIEEE, vol. 87, pp. 1062-1078. Pfitzmann, B. (1996), “Information hiding terminology,” 1st Workshop on Information Hiding, pp. 347-350. Pitas, I. (1998), “A method for watermark casting on digital images,” IEEE Circuits and System for Edeo Technology, pp. 775-780. Podilchuk, C.I. and Delp, E.J. (200 l), “Digital watermarlung: Algorithms and Applications,” IEEE Signal Processing Magazine, vol. 18, pp. 33-46. Podilchuk, C.I. and Zeng, W. (1998), “Image-adaptive watermarking using visual models,” IEEE Journal on Selected Areas in Communications, vol. 16, pp. 525-539. Proakis, J.G. (19954, Digital communications, 3rd ed., McGraw-Hill, New York:NY. Provos, N. and Honeyman, P. (2003), “Hide and seek: An introduction to steganography,” IEEE Security & Privacy Magazine, vol. 1, pp. 32-44. Pun, T. (2001), “Checkmark benchmark 1.2,” http : / / watermarking.unige.ch/Checkmark/index .html Rivest, R.L., Shamir, A., and Adleman, L. (1978), “A method for obtaining digital signatures and public-key cryptosystems,” Conzmunications of the ACM, vol. 2 1, pp. 120-126. Rivest, R.L. (1992), RFC 1321: The MD5 Message-Digest Algorithm, Internet Activities Board, 1992.
38 H.-C. Huang, H.-M. Hang, and J . 4 Pan
Rollin, C. (2002), “Certimark benchmark,” vision.unige.ch/certimark/
h t tp : / /
Serdean, C.V., Ambroze, M.A., Tomlinson, M., and Wade, J.G. (2003), “DWT-based high-capacity blind video watermarking, invariant to geometrical attacks,” IEE Proceedings Esion, Image and Signal Processing, vol. 150, pp. 51-58. Stinson, D.R. (2002), Cryptography : Theory and practice, CRC Press, Boca Raton:FL. Sun, Q., Chang, S.F., Maeno, K., and Suto, M. (2002), “A new semifragile image authentication framework combining ECC and PKI infrastructures,” IEEE Int ’I Symp. Circuits and Systems, pp. 440443. Swanson, M.D., Kobayashi, M., and Tewfik, A.H. (1998), “Multimedia data-embedding and watermarking technologies,” Proceedings ofIEEE, vol. 86, pp. 1064-1087. Swanson, M., Zhu, B., and Tewfik, A. (1996), “Transparent robust image watermarking,” Int’Z Con$ Image Processing, pp. 21 1-214. Tirkel, A.Z., Rankin, G.A., van Schyndel, R.M., Ho, W.J., Mee, N.R.A., and Osborne, C.F. (1993), “Electronic water mark,” Digital Image Computing Techniques and Applications ’93,pp. 666672. Tirkel, A.Z. and Hall, T.E. (2001), “A unique watermark for every image,” IEEE Multimedia, vol. 8, pp. 30-37. Tsutsui, K., Suzuki, H., Shimoyoshi, O., Sonohara, M., Akagiri, K., and Heddle, R.M. (1992), “ATRAC: Adaptive Transform Acoustic Coding for MiniDisc,” 93rd Audio Engineering Society Convention.
An Introduction to Watermarking Techniques 39
Voloshynovskiy, S., Pereira, S., Pun, T., Eggers, J.J., and Su, J.K. (200 l), “Attacks on digital watermarks: classification, estimation based attacks, and benchmarks,” IEEE Communications Magazine, vol. 39, pp. 1 18-126. Wong, P. (1998), “A public key watermark for image verification and authentication,” IEEE Int ’I Con$ Image Processing, pp. 455-459. Wolfgang, R.B., Podilchuk, C.I., and Delp, E.J. (1999), “Perceptual watermarks for digital images and video,” Proceedings of IEEE, V O ~ .87, pp. 40-5 1. Yeo, I.K. and Kim, H.J. (2003), “Modified patchwork algorithm: a novel audio watermarking scheme,” IEEE Trans. Speech and Audio Processing, vol. 11, pp. 381-386. Yin, P. and Yu, H.H. (2002), “Semi-fiagile watermarlung system for MPEG video authentication,” IEEE Int ’I Con$ Acoustics, Speech, and Signal Processing, pp. 3461-3464. Zeng, W. and Liu, B. (1999), “A statistical watermark detection technique without using original images for resolving rightful ownerships of digital images,” IEEE Trans. Image Processing, vol. 8, pp. 1534-1548.
This page intentionally left blank
Chapter 2 Neuro-Fuzzy Learning Theory Yan Shi, Masaharu Mizumoto, and Peng Shi
In this chapter, we introduce an improved neuro-fuzzy learning algorithm for tuning fizzy inference rules. In this learning approach and before learning fizzy rules we extract typical data from training data by using fuzzy c-means clustering algorithm. This done in order to remove redundant data, resolve conflicts in data, and to produce practical training data. By the use of this typical data the fuzzy rules can be tuned. The neuro-fizzy learning algorithm proposed by authors is used for this work. Here the tuning parameters in the fuzzy rules can be learned without changing the fuzzy rule table form used normal fizzy applications. The learning time can be reduced and reasonable, and suitable fuzzy rules can be generated by using this learning approach. We shall also show the efficiency of the neurofuzzy learning algorithm by identiflmg nonlinear functions.
1
Introduction
In recent fizzy applications, it is becoming more important to consider how to design optimal fuzzy rules from training data. This is in order to construct a reasonable and suitable fizzy system model for identifjmg the corresponding practical systems (Cho and Wang 1996, Hayashi et al. 1990, Horikawa et al. 1992, Ichihashi et al. 1990, 1991, 1993, Kosko 1992, Kishida et al. 1995, Lee and Kil 1991). It is natural and necessary to generate or tune fizzy rules using learning techniques. Based on the neural networks back41
42
Y. Shi, M.Mizumoto, and P. Shi
propagation algorithm (Rumelhart et al. 1986), these so-called neuro-fizzy learning algorithms, which are widely used in recent fizzy applications for generating or tuning optimal fizzy system models. They were proposed by Ichihashi et al. (1991, 1993), Nomura et al. (1991, 1992), Wang and Mendel (1992, 1994), Shi and Mizumoto (1996, 1999, 2000), independently. By using one of the neuro-fizzy learning algorithms, the fizzy rules can be generated or can be tuned to construct an optimal fizzy system model. This can be used to identi@ a practical system (Cho and Wang 1996, Kishida et al. 1995, Masuda and Yaku 1993, Tanaka et al. 1994, Yager and Filev 1994). There are still important remaining problems in the area of neurofizzy learning algorithms. When dealing with fizzy rule generation using one of the above-mentioned neuro-hzzy learning algorithms, the learning time, the convergence and the generated fizzy rules vary according to the different training data. That is, for given set of training data, the learning iteration process may be long or, the fizzy rules generated may not be suitable. T h s is due to the reasons as given in Figure 1. Firstly, there exist confhcts in the training data which may lead to a long learning time for convergence. This is due to the need to fit all the data. Secondly, there exist a few redundant items of data in the numerous items of training data. These can be regarded as an inconsistency in the trends of the identified system model. Here the fizzy rules generated by the learning process may not work well with the unknown data although they may fit well with the training data.
Neuro-Fuzzy Learning Theory 43
’
Redundantdata
0 Conflicts in data
90 0
00
p”
\ Redundantdata
x
0 ’
Figure 1. Fictitious training data for the neuro-fuzzy learning process.
In this chapter, we introduce an improved neuro-fuzzy learning algorithm for tuning the hzzy inference rules (Shi and Mizumoto 2001). Using this learning approach and before learning fbzzy rules we extract typical data from training data by using fuzzy c-means clustering algorithm (Bezdek et al. 1981, 1986). This is done in order to remove redundant data and resolve conflicts in the data, and to produce practical training data, whch will improve the above problems. That is done as part of the preprocessing for training data and before the learning hzzy rules are used. By the use of these typical data, the hzzy rules can be tuned by the neuro-fuzzy learning algorithm proposed by authors. The tuning parameters in the fuzzy rules can be learned without changing the fuzzy rule table form used in the normal fuzzy applications. The learning time can be reduced and the fuzzy rules generated by the learning approach are likely to be reasonable and suitable for the identified system model. We shall also show the efficiency of the neuro-fuzzy learning algorithm by identifjmg nonlinear functions.
44 Y. Shi, M. Mizumoto, and P. Shi
2
A Neuro-Fuzzy Learning Algorithm
Firstly, we describe the neuro-fuzzy learning algorithm at proposed by authors. This can tune fuzzy inference rules without changing the form of fuzzy rule table (Shi and Mizumoto 1996, 2000). Without loss of generality, we shall derive a new neuro-fuzzy learning algorithm in which the fuzzy system model has only two input linguistic variables xl, x2 and one output variable y . It is not difficult to extend the idea to the case of multiple-input linguistic variables by using the same method. Let a hzzy rule base be defined based on all of the combinations of Ali andAzj (i=l,....r ; j = l ,...,k) as follows:
Rule (i-l)k+j:Ali, A2j => y(i-l)k+j
(1) where Ali, A2j (i=l ,...,r;j=1,....k) are hzzy sets on& and X2 respectively, and y ( ; - l ) k + j is a real number on Y. Clearly, we can express the above fuzzy rule base given Equation (1) in the form of the fuzzy rule table shown in Table 1, and this form is often used in fuzzy applications. Table 1. Fuzzy rule table for A l i and A , Fuzzy partition for x2 A21
A22
Yl
Y2
yk+1
Yk+2
...
...
... ...
A2j
...
A2k
yj
...
Yk
... y k + j ......... ... Y f i - l ) k + j .........
Y2k
...
Neuro-Fuzzy Learning Theory 45
If an observation ( ~ 1 ~ is 2 ) given, then a fuzzy inference consequence y can be obtained by using the simplified fuzzy reasoning method (Maeda and Murakami 1987) as follows:
(2)
where h(, I&, = A i(xl)A2,(x2) (i= 1,2,...,r;j= 1,2,...,k) is the agreement of Ali and A2, at (x1,x2),in the antecedent part of the rules.
Linguistic variable xi
Figure 2. Gaussian-type membership functions for xi
Suppose that Gaussian-type membershp functions A li, A , (i=1,2,..., r;j=1,2 ,..., k), shown in Figure 2, for the input variables x1 and x2 are defined as (3) (4) The so-called Gaussian-type neuro-fizzy learning algorithm for updating the parameters a l i , l i , a2j, 0 - 2 , and y(i-l)k+j(i=1,2, ...,r; j=1,2, ...,k) is based on the gradient descent method (Rumelhart et al. 1986, Shi and Mizumoto 2000):
46
Y. Shi, M. Mimmoto, and P. Shi
(5)
+
j=l r k
= a,;( t )
+
0,; ( t 1) = 0]; ( t )-
p
aE
aa,;( t ) (6)
= ol; ( t )
+
j=1
25
Oli3
;=I
where aliand
gliare
the center and width ofAIi,respectively.
aE a2j(t+l)=a2j(t)-a--“2 j ( t )
O Z j( t
+ 1) = O Z (j t ) - p
= O2 i ( t )+
h(i-l)k+j
j=l
dE
302 j ( t ) i=l
Neuro-Fuzzy Learning Theory 47
where ~2~ and oZjare the center and width of A,, respectively.
i=l
where a , ,8 and 7 are the learning rates, and t is the learning iteration. E is an objective function used for evaluating any error between y* and y , and is defined as
Here y* is the desired output value, andy is the corresponding fuzzy inference result. In the case of three input variables, approach the neural network to the neuro-fuzzy learning algorithm is shown as Figure 3. In Figure 3, there are two membershp functions for x1 and x2 respectively, and three membership functions exist for x3. The neuro-fuzzy learning algorithm has the following main characteristics, whch are different from the conventional approaches (Shi and Mizumoto 2000): 1: The membership functions are not independent of each other 2: Thefuzzypartitions are independent of each other
3: Representation in the fuzzy rule table does not change. 4: Non-firing states or weak-firing states can be avoided. 5 : The setting for initial fuzzy rules is simple.
48
Y. Shi, M. Mizumoto, and P. Shi
Figure 3. Neural network of the neuro-fuzzy learning algorithm.
3
Extraction of Typical Data from Training Data Based on FCM
We now discuss preprocessing of the training data. The learning fuzzy rules based on the fbzzy c-means clustering algorithm (FCM) are now described briefly (Bezdek et al. 1981, 1986). Assume X I , x2 to be variables in the input space X = X I XX2, andy is is an n X c matrix and has a a variable in the output space Y. U E Rnwc fbzzy partition for training data of the form xk = (XI k ,x2 k ,y* k ) (k=l,2, ...,n). Where c is the number of clusters. Let y k i E U be a membership function value from k-th vector xk in the i-th cluster center vector v j (=(vli,v2',v3')ER3)(i=1,2, ...,c; 2
Neuro-Fuzzy Learning Theory 49
i=l
An objective function JSof FCM used when solving for Vi is defined
as follows:
where s (l<S<m) is a smoothing weight, and uct norm.
(1 - (1 is an inner prod-
For minimizing Js, the FCM algorithm is used in the following procedures: Step 1. Given the parameters c and s such that 2 5 c < n, 1 < s < 00 and an inner product norm 11 11. We can take a norm as
.
where A is a positive definite matrix. Step 2. Given an initial matrix U(0) of fuzzy partitions randomly, selected and using the conditions of (1 1) - (1 3). Step 3. For t = 0,1,2,..., calculate cluster centers vi (i=1,2,...,c) by using U(t)as follows:
50
Y. Shi, M. Mizumoto, and P. Shi
Step 4.Update U(t)in the following ways: (1) Calculate the sets z k and z*k (k=1,2,...,n) as
fk=
{ 1,2,...,C} - z k
(2a) When Ik = 0,
(2b)W h e n I k # 0,
Step 5. Define a suitable matrix norm, and stop FCM process if the following condition holds:
Neuro-Fuzzy Leamino Theory
51
Otherwise, set t = t+l and return to Step 3; where E is a positive and sufficiently small real number. This is a termination criterion. In t h s study, we use a matrix norm (Bezdek et al. 1981, 1986):
For the problems in Section 1, by using of the fuzzy c-means clustering algorithm, we can do the preprocessing for the training data. T h s can remove redundant data and resolve conficts within the data in the following way. Firstly, for given set n of training data x k = (XIk,x2k,y*k) (klY2,...,n), and using the FCM process we obtain c (
(24)
These can be regarded as typical data extracted fkom the original training data x k = (xlk,x;,y*k) ( k 1 , 2 , ...,n) shown in Figure 4. Let c cluster centers be used as training data for the neuro-fuzzy learning process to learn the fuzzy rules.
The typical data obtained from the original training data xk = k k + (XI,x2 ,y k ) (k1,2,...,n), and the cluster centers vi = ( v ] ~ , v ~ ~ , (i=l ,2,...,c) created by FCM process represent the basic characteristics of the system model. The typical data have at least three valuable characteristics when applied to the neuro-fuzzy learning process when used as training data. Firstly, we can resolve conflicts in the original training data as a result of the distribution of typical data. Secondly, we can remove redundant data from the original data, as there is no cluster center around redundant data. Thirdly, the computation problem is reduced by the reduction in size of the training data set.
52
Y . Shi, M. Mizumoto, and P. Shi
0--- Training data
rn --- Cluster centers
0
P 0 X
0
Figure 4. Imaginary training data and cluster centers.
4
Numerical Examples
The improved method is applied to the following nonlinear function of one input variable and one output variable.
Example 1 y = 0.3 + 0 . 9 /(1 ~ .2x3 + x + 0.3) + 77
(25)
where x E [0,1] is an input variable, and 77 E [O,O. 151 is noise.
In Step 1 we assume that n = 30, c = 15, s = 2 and A is a unit matrix. In Step 5 E = 0.00002. Where 30 values of random input-output data are given (Table 2). Using FCM process, after 29 iterations 15 cluster centers are created (Figure 5 and Table 3 ) .
Neuro-Fuzzy Learning Theory 53
Table 2. Original training data for Example 1 .
VO
.
1 2 3 4 5 6 7 8 9 10 -
No. X 0.7430 11 0.055 0.6908 12 0.209 0.7302 13 0.774 0.8448 14 0.458 0.7723 15 0.285 0.7411 16 0.788 0.7757 17 0.449 0.7722 18 0.641 0.6949 19 0.509 0.185 20 0.7988 -
V
X
0.646 0.770 0.336 0.569 0.586 0.435 0.750 0.322 0.243 0.600
Y 0.4334 0.7143 0.7078 0.7960 0.7714 0.6499 0.7907 0.8309 0.7451 0.6920
No. X 21 22 23 24 25 26 27 28 29 30
0.658 0.995 0.025 0.898 0.200 0.955 0.118 0.014 0.652 0.775
Y 0.7121 0.7185 0.3062 0.7030 0.6727 0.6977 0.5574 0.3731 0.8033 0.6625
Table 3. Cluster centers extracted from Table 2 by FCM.
No. -
v1
6 7 8 0.5090 0.7472 9 0.4524 0.0507 0.4270 10 0.7487 0.0221 0.3238 -
v2
0.9789 0.7102 0.6499 0.7324 0.7765 0.6748 0.8992 0.7028 0.2376 0.6978
0.7883 0.7745
'i dera
0
D CIustcr ccntcrs I
Figure rs for 5.
a4
Y. Shi, M.Mizmoto, and P. Shi
We then set the initial fuzzy rules as given in Table 4, and adjust them by using the original training data given in Table 2 and by use of the 15 typical training data information given in Table 3, respectively. These are based on the neuro-hzzy learning algorithm ( 5 ) , (7) and (9), where the learning rates are taken as a = 0.01, ,8 = 0.05, 7 = 0.65. The threshold value is 6, and is used to stop the neurofizzy learning process. The value for t h s example is 6 = 0.001. Table 4. Initial fuzzy rules for Example 1.
0.0000 0.1062 0.5000 0.2500 0.1062 0.5000 0.5000 0.1062 0.5000 0.7500 0.1062 0.5000 1.OOOO 0.1062 0.5000
in Tablegiven 5 No. aij bii Yi
1 2 3 4 5
-0.0310 0.2429 0.5019 0.7584 0.9962
0.0951 0.1855 0.1719 0.1463 0.1362
0.1329 0.6687 0.8420 0.6658 0.7118
Table 6. Fuzzy rules generated y the originaltraining datagivenin Table 2. No.
1 2 3 4 5
ali
-0.0467 0.3376 0.4695 0.6834 0.9169
bli 0.1309 0.0538 0.1697 0.4756 0.0629
Yr 0.3481 0.6203 1.0084 0.5658 0.8259
Neuro-Fuzzy Learning Theory 55
In the casc where the improved method is used the objective function E converges after 225 iterations, and in the case of the conventional method (Sh and Mizumoto 2000), E converges after 3422 iterations. T h s where the fuzzy rules are tuned without using the FCM process. Thus, two knds of fuzzy rules are generated as shown in Tables 5 and 6, respectively. We shall investigate the fuzzy mference results by using these two lunds of fuzzy rules for identifying Example 1. Table 7 shows the comparison between the conventional method ( A ) and the improved method ( B ) by calculating the error in the evaluation. The mean square error and the maximum absolute error are calculated. Here 30 sets of data are used for the evaluation are used in a random order. (A) T a b l emethod 7 . posed megthod (B) Example 1. Case
1
2 3 4 5 6 7 8 9 10
Error of evaluation (A) (B) 0.003291 0.002499 0.003008 0.002714 0.003297 0.002641 0.003119 0.001912 0.002178 0.002 166 0.002743 0.002204 0.004881 0.002094 0.003802 0.002219 0.002969 0.00 1908 0.002 137 0.002092
Max. absolute error (A) (B) 0,121532 0.098892 0.096834 0.102163 0.121413 0.110326 0.098054 0.092502 0.082210 0.084828 0.120275 0.108349 0.142082 0.089424 0.1457 19 0.092419 0.112820 0.079956 0.108612 0.073173
Figure 6(a) shows the fuzzy inference results obtained by using the fuzzy rules obtained by the conventional method. That is Case 7 in Table 7 and the desired output values. Figure 6(b) shows the fuzzy inference results obtained by using the fuzzy rules obtained by the improved method of Case 7 in Table 7 and the desired output values.
56
Y. Shi,
M. Mizumoto,
and P. Shi
From the above results we can see that, using the preprocessing training data based on the fbzzy c-means clustering algorithm the learning time is improved and better fuzzy rules are generated after the neuro-fuzzy learning process.
(a) Desired model and hzzy model using the conventional method. Y 1
Et
(b) Desired model and h z z y model using the improved method.
Figure 6 . Desired model and hzzy model Example 1.
We next compare the improved method with the conventional method using the following n o h e a r function, which has two input variables and one output variable:
Neuro-Fuzzy Learning Theory 57
Example 2
y
+
= [4sin( X X ~ ) ~ C O S ~ (
~ 2 ) ] /+10.45 2 + 7
where x 1 , x 2E [-1,1] are input variables, and noise.
No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 -
(26)
VE
[0,0.05] is the
Table 8. Original training data Example 2.
-
XI
-0.146 -0.042 0.740 0.890 -0.160 0.334 -0.380 -0.642 0.888 0.884 -0.846 0.000 0.738 -0.500 -0.064 0.942 -0.444 -0.630 0.972 -0.842 0.054 -0.906 0.104 -0.222 -0.254
x2
-0.398 -0.918 -0.224 -0.732 0.802 0.736 0.598 -0.346 -0.500 -0.426 0.838 -0.784 0.690 -0.392 -0.136 -0.474 -0.314 0.006 0.4 14 -0.216 0.424 0.774 0.478 0.268 -0.164
Y NO.
0.6531 0.2905 0.8496 0.4855 0.1950 0.6630 0.1316 0.2628 0.6039 0.6182 0.1754 0.3643 0.6317 0.1936 0.5396 0.5640 0.2363 0.3262 0.5683 0.4225 0.5852 0.2586 0.5850 0.3594 0.3739
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
~
~
XI
0.062 0.104 0.666 0.020 -0.668 0.406 -0.990 -0.856 -0.128 -0.996 -0.710 0.982 -0.752 -0.990 0.052 0.584 -0.890 -0.010 -0.802 0.680 0.614 -0.972 -0.918 -0.682 -0.904
~
~
x2
-0.580 0.534 0.208 -0.102 -0.938 0.706 -0.724 0.072 0.122 -0.210 -0.954 0.436 -0.794 0.836 0.506 -0.560 -0.642 -0.842 -0.212 0.992 0.032 -0.886 -0.154 0.514 -0.524
1)
0.4966 0.5797 0.8899 0.6301 0.0095 0.7179 0.3592 0.4852 0.5161 0.5925 0.0702 0.5506 0.0898 0.2995 0.5421 0.7861 0.2927 0.2960 0.4288 0.5713 0.9340 0.2871 0.5191 0.1914 0.3519
58
Y.Shi, M. Mizumoto, and P. Shi
As in Example 1, we first assume that the number of training data sets is n = 50, and the number of clusters c = 40. The smooth constant s = 2, and A is an unit matrix at Step 1. The stopping constant is E = 0.000002 at Step 5. 50 sets of training data are randomly selected as shown in Table 8. Using the FCM process, 40 cluster centers of typical data are created after 17 iterations and are shown in Table 9. Table 9. Cluster centers extracted from Table 8 by FCM.
No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 -
VI
-0.8563 -0.2219 0.0203 -0.9727 -0.2540 0.5842 -0.6681 0.6555 0,7382 -0,6423 -0.6821 -0.5001 0.1228 0.9769 -0.8490 0.9044 -0.6302 0.6799 -0.7525 -0.8238
v2
0.0710 0.2680 -0.1019 -0.8793 -0.1640 -0.5595 -0.9379 0.1725 -0.2202 -0.3461 0.5141 -0.3920 0.5143 0.4249 0.8352 -0.4678 0.0058 0.9918 -0.7938 -0.2132
v3
0.4854 0.3595 0.6303 0.2901 0.3739 0.7861 0.0097 0.8987 0.8507 0.2630 0.1914 0.1937 0.5933 0.5596 0.1794 0.5954 0.3264 0.5714 0.0904 0.4272
No. -
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 -
VI
-0.1600 -0.9584 0.1042 -0.0321 -0.1279 0.0561 -0.3880 0.3705 -0.0008 -0.9054 -0.9734 0.7378 -0.9038 0.1462 -0.7102 -0.4441 -0.0638 0.0546 0.8960 0.0620
v2
0.8019 0.8127 0.4901 -0.8945 0.1220 0.5080 0.5980 0.7206 -0.7882 -0.5272 -0.1940 0.6897 -0.6532 -0.3978 -0.9538 -0.3140 -0.1360 0.4246 -0.7314 -0.5800
v3
0.1950 0.2841 0.5840 0.2922 0.5162 0.545 1 0.1316 0.6909 0.3593 0.3521 0.5712 0.6319 0.3019 0.6562 0.0704 0.2363 0.5397 0.5853 0.4857 0.4966
Neuro-Fuzzy Learning Theory 59
T a b e l identi 1 0 . F u z z y
''
AIi\A, ( -1, 0.2123)
3. (-0.5, 0.2123)
g. 3 $
( 0, 0.2123) (0.5, 0.2123) ( 1,0.2123)
(-1,0.4247) (0,0.4247) (1,0.4247) 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
The initial hzzy rules for identifying Example 2 are defined in Table 10. We then tune the initial fuzzy rules of Table 10 using 50 sets of training data given in Table 8 and the 40 cluster centers in Table 9 respectively. These are based on the neuro-fuzzy learning algorithm (6) - (9). Here, the learning rates are taken as a = 0.01, ,8 = 0.05 and y = 0.65. The threshold is arranged as 8 = 0.0005. When using the improved method, the objective function E converges after 17 iterations, and in the case of the conventional method E converges after 35 iterations. Thus, two kinds of hzzy rule bases have been generated and are shown in Tables 11 and 12, respectively. Table 11. Fuzzy rules generated by original training data given in Table 8. Fuzzy partition with center and width for A ,
\ A2j
(-1.0026,0.4005)(0.0024,0.3598)(0.9983,0.4242) (-1.0074,0.1475) 0.2357 0.3087 0.7128 0.0050 0.3163 0.0156 (-0.4960,0.2612) 0.5940 0.2698 0.3 185 (-0.0032,O.1303) ( 0.4942,0.273 1) 0.6785 1.0046 0.6225 ( 1.0050,O.1968) 0.8070 0.5050 0.2208 Ali
60
Y. Shi, M. Mizumoto, and P. Shi
Table 12. Fuzzy rules generated by typical training data given in Table 9. Fuzzy partition with center and width for Azj
-2 I
I
(0.0024,0.1984)
0.0024 0.3040 0.6366 0.3782
0.3068
0.0060
0.6625 1.0857 0.5977
0.3458
0.6185 0.4039
Table 13 shows the comparisons between the conventional method ( A ) and the improved method (B) concerning the error of evaluation and the maximum absolute error. Here 100 sets of checking data are employed randomly. We can see from Table 13 that the approximate results obtained by use of the improved method are better than those obtained by the conventional method. Their maximum absolute errors are not markedly different. Table 13. Comparison between the conventional method ( A ) and the improved method (B) for identifylng Example 2.
7 9 10
Error of evaluation (A) (B) 0.00 1092 0.00 1026 0.001267 0.000764 0.001936 0.001311 0.001980 0.001108 0.001866 0.001579 0.001106 0.000954 0.001801 0.000851 0.001718 0.001345 0.00 1569 0.00 1327 0.002501 0.001532
Max. absolute error (A) (B) 0.099568 0.088656 0.142893 0.072053 0.188676 0.122666 0.178458 0.1 15130 0.171658 0.096452 0.084740 0.131111 0.164983 0.082657 0.225489 0.135972 0.127717 0.116813 0.196738 0.104869
Neuro-Fuzzy Learning Theory 61
5 Conclusions In this chapter, we have introduced a neuro-fuzzy learning algorithm. Using this the tuning parameters in the hzzy rules can be learned without changing the fuzzy rule table form used in conventional fbzzy applications. The neuro-hzzy learning algorithm based on the fizzy c-means is improved by the use of clustering algorithm. The greatest advantage in the use of the improved method is that the original training data can be preprocessed before application of the fuzzy rules. It removes the redundant data and resolves conficts in the original training data enabhg computation faster and the fuzzy rules generated are more suitable for the system model identified. Some numerical examples have also been given to illustrate the improved efficiency of this method. In the future, it is expected to be able to determine the optimum number of cluster centers corresponding to the training data. This will probably be based upon the characters or properties of the identified system model.
References Bezdek, J.C. (198 l), Pattern Recognition with Fuzzy Objective Function AZgorithms, Plenum Press, New York. Cannon, R.L., Daveand, J.V., and Bezdek, J.C. (1986), “Efficient implementation of the fuzzy c-means clustering algorithms,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, pp. 248-255. Cho, K.B. and Wang, B.H. (1996), “Radial basis function based adaptive fuzzy systems and their applications to system identifi-
62
Y. Shi, M. Mizumoto, and P. Shi
cation and prediction,” Fuzzy Sets and Systems, vol. 83, pp. 325339. Hayash, I., Nomura, H., and Wakami, N. (1990), “Acquisition of inference rules by neural network driven fuzzy reasoning,” Journal of Japan Society for Fuzzy Theory and Systems, vol. 2, pp. 585-597 (in Japanese). Horikawa, S . , Furuhash, T., and Uchikawa, Y. (1992), “On hzzy modeling using fuzzy neural networks with the back-propagation algorithm,” IEEE Transactions on Neural Networks, vol. 3, pp. 801-806. lchihash, H. (1991), “Iterative fuzzy modeling and a hierarchcal network,’’ Proceedings of the 4th IFSA World Congress, vol. Eng., pp. 49-52. lchhashi, H. and Tuksen, I.B. (1993), “A neuro-fuzzy approach to data analysis of pairwise comparisons,” International Journal of Approximate Reasoning, vol. 9, pp. 227-248. Icluhashi, H. and Watanabe, T. (1990), “Learning control by a simplified hzzy reasoning,” Journal of Japan Society for Fuzzy Theory and Systems, vol. 2, pp. 429-437 (in Japanese). Kishida, K., Miyajima, H., Fukumoto, S., and Murashima, S. (1999, “Construction methods of fuzzy modeling based on learning algorithms,” Journal of Japan Society for Fuzzy Theory and Systems, vol. 7, pp. 585-593 (in Japanese).
Kosko, B. (1992), Neural Networks and Fuzzy Systems, PrenticeHall, New Jersey.
Neuro-Fuzzy Learning Theory 63
Lee, S. and Kil, R.M. (1991), “A Gaussian potential fbnction network with hierarchically self-organization learning,” IEEE Transactions on Neural Networks, vol. 4, pp. 207-224. Maeda, M. and Murakami, S. (1987), “An automobile tracking control with a fuzzy logic,” Proceedings of the 3rd Fuzzy System Symposium, Osaka, pp. 61-66 (in Japanese). Masuda, T. and Yaku, S. (1993), “An Acquisition method of h z z y reasoning rules by neural network with dynamic creative function of hidden units,” Journal of Japan Society f o r Fuzzy Theory and Systems, vol. 5 , pp. 348-358 (in Japanese). Nomura, H., Hayashi, I., and Wakami, N. (1991), “A self-tuning method of fuzzy control by descent method,” Proceedings of the 4th IFSA World Congress, Brussels, Vol. Eng., pp. 155-158. Nomura, H., Hayashi, I., and Wakami, N. (1992), “A self-tuning method of fuzzy control by descent method”, Proceedings of the IEEE International Conference on Fuzzy Systems, San Diego, pp. 203-210. Rumelhart, D.E., McClelland, J.L., and the PDP Research Group (1986), Parallel Distributed Processing, MA: MIT Press, Cambridge. Shi, Y. and Mizumoto, M. (1999), “Self-tuning for fuzzy rule generation based upon h z z y singleton-type reasoning method,” Journal of Advanced Computational Intelligence, vol. 3, pp. 200-206.
Shi, Y. and Mizumoto, M. (2000), “Some considerations on conventional neuro-fizzy learning algorithms by gradient descent method,” Fuzzy Sets and Systems, vol. 112, 5 1-63.
64
Y . Shi, M. Mizumoto, and P. Shi
Sh,Y. and Mizumoto, M. (2000), “A new approach of neuro-fuzzy learning algorithm for tuning fuzzy rules,” Fuzzy Sets and Systems, vol. 112, pp. 99-1 16.
Shi, Y. and Mizumoto, M. (2001), “An improvement of neuro-hzzy learning algorithm for tuning fuzzy rules,” Fuzzy Sets and Systems, vol. 118, pp. 339-350.
Sh,Y., Mizumoto, M., Yubazaki, N., and Otani, M. (1996), “A method of fuzzy rules generation based on neuro-fuzzy learning algorithm,” Journal of Japan Society for Fuzzy Theory and Systems, vol. 8, pp. 695-705 (in Japanese). Shi, Y., Mizumoto, M., Yubazaki, N., and Otani, M. (1996), “A self-tuning method of fuzzy rules based on the gradient descent method,” Journal of Japan Society for Fuzzy Theory and Systems, vol. 8, pp. 757-765 (in Japanese).
Sh,Y., Mizumoto, M., Yubazalu, N., and Otani, M. (1996), “A learning algorithm for tuning hzzy rules based on the gradient descent method,” Proceedings of the 5th IEEE International Conference on Fuzzy Systems, New Orleans, vol. 1, pp. 55-61. Tanaka, K., Kashiwagi, N., and Nakajima, H. (1994), “Identification of a pulse rate prediction model for activity sensing pacer via a fuzzy modeling method,” Journal of Japan Society for Fuzzy Theory and Systems, vol. 6, pp. 756-764 (in Japanese). Wang, L.X. (1994), Adaptive Fuzzy Systems and Control, PrenticeHall, New Jersey. Wang, L.X. and Mendel, J.M. (1992), “Back-Propagation fuzzy system as norhear dynamic system identifiers,” Proceedings of the IEEE International Conference on Fuzzy Systems, San Diego, pp. 1409-1416.
Neuro-Fuzzy Learning Theory 65
Yager, R.R. and Filev, D.P. (1994), Essentials of Fuzzy Modeling and Control, John Wiley & Sons, Canada. Yager, R.R. and Filev, D.P. (1994), “Generation of fuzzy rules by mountain clustering,” Journal of Intelligent & Fuzzy Systems, V O ~ .2, pp. 209-219.
This page intentionally left blank
Chapter 3 Evolutionary Algorithms Wei-Po Lee and Chao-Hsing Hsu
Evolutionary algorithms are computer algorithms that simulate a natural evolution process to evolve solutions for problems. It has become increasingly popular to employ this kind of algorithms to solve problems in different domains. In this Chapter, we will introduce some evolutionary algorithms and the relevant techniques. The first section explains the general concept of evolutionary algorithms; the second and the third sections describe in detail various forms of t h s kind of mechanism and the specific genetic techques; and the final section characterizes different ways to parallelize the mechanisms of simulated evolution because they can dramatically reduce execution time and enhance performance.
1
Evolutionary Computation
The theory of natural selection presented by Darwin in h s most famous book The Origin of Species explains the cause of evolution: in any environment members of species struggle and compete for available resources, and it is those most adapted to their surroundings that survive (Darwin 1859). This theory profoundly mfluenced the early biologists and is currently playing an important role in the study of artificial intelligence. As more and more evidence shows, the key matter in AI research, intelligent computation, oRen requires adaptive search mechanisms which can dynamically adjust the searchmg direction according to the features of the problem itself 67
68
W.-P. Lee and C.-H. Hsu
Evolutionary computation (evolutionary algorithm) techtuques were thus developed to construct more dynamic models of computational intelligence. Evolutionary algorithms represent the kind of algorithms which simulate the process of natural evolution to search for the fittest through selection and re-creation over the generations. Based on different phlosophies, three main streams of research are currently used in evolutionary computation: evolutionary strategies, evolutionary programming, and genetic algorithms. Although sharing the same concept of simulated evolution, they are developed independently and emphasize the adaptive changes at different levels. Evolution strategies (ES) stress behavioral changes at the level of the individual; evolutionary programming (EP) emphasizes behavioral changes at the level of species; and genetic algorithms (GAS), including traditional genetic algorithms and recent genetic programming, emphasize the genetic operations at the chromosome level. We will first explain the shared concept among these approaches, and then describe their individual concerns. Evolutionary algorithms are population-based optimization processes; they are based on the collective learning process w i t h a population of individualdspecies each of which represents a search point in the space of potential solutions for a given problem. In general, the process of an evolutionary algorithm mainly includes initialization, reproduction, re-creation, and selection. The initiahzation is usually to randomly select a set of initial points in the search space. The number of points varies from a small (e.g., less than 10) to a large number (e.g., several thousands), depending on the difficulty of the specific problem. By manipulating a population of potential solutions, an evolutionary algorithm can search various regions of the solution space simultaneously. Reproduction is an obvious way to propagate individuals/species; it is accomplished through transferring the genetic features of the members in the cur-
Evolutionary Algorithms
69
rent generation to the next generation. Based on the fitness brought about by the environment, the members with better fitness are favored to reproduce more often than the worse ones. Re-creation is a method to introduce variety for the individuals/species; it adjusts the genetic features which are inherited fiom their parent population. Different evolutionary algorithms involve different types of recreation. For example, in evolutionary programming, mutation is strictly the only way to change the members' components; whde in genetic algorithms, re-creation usually includes point mutation, and crossover (this is specially referred as recombination in GAS). Selection is the process in which different individualshpecies compete for finite resources; in an evolutionary algorithm, t h s process can be simulated in an entirely random or stochastical way based on the fitness. Thus, the outline of an evolutionary algorithm can be described as follows. An initial population of candidate solutions is generated at random; and each member is evaluated by a problem-dependent criterion which measures how good t h s member is for the specific problem. The result is quantified by afitnessfunction and is used as this member's survival fitness. The fitness function here is similar to a cost function used in other search-based techniques; it normally gives a real value to indicate how fit a candidate solution is for the problem. Essentially, an evolutionary algorithm is trying to maximize/minimize the value given by the fitness function. Once all the candidates in a population are evaluated, an evolutionary algorithm employs a certain selection scheme to select a subset of the current population to act as parents to generate a new population. There are various selection schemes used to choose parents. In general, the members with best fitness are preferred, but the relatively worse ones are not excluded, in order to maintain the diversity of the population. Therefore, the fitness corresponding to different members are regarded as the probabilities of survival and certain probability-based methods are used to choose parents. After that, an
70
W.-P. Lee and C.-H. Hsu
evolutionary algorithm applies a set of operators on the chosen parents to alter their features to form a subsequent population. As described above, the operations can be recombination or random regeneration, depending on what kind of algorithm (ES, EP, GA) is employed. The above cycle is an iteration (or generation) of an evolutionary mechanism; it is repeated until a certain termination criterion is met (e.g., a solution with a fitness better than the expected value appears; or the algorithm has been executed for a pre-defined number of generations). Figure 1 illustrates the general flow of an evolutionary algorithm.
Evaluation (measure performance of individuals)
satisfy termination criterion?
Yes
No
Figure 1. The general flow of a simulated evolution mechanism.
Evolutionary Algorithms 71
The above passage has explained the common concept of an evolutionary algorithm. As mentioned earlier, different evolution-based algorithms emphasize different evolution philosoplues wluch lead to differences in their implementations. The individual characteristics of different evolutionary algorithms are briefly described below. Firstly GAS are introduced in Section 2, because they are the most popular evolutionary algorithms used and have been successhlly applied to many practical problems in different domains, such as communication (Ngo and Li 1998, Sandalidis et. al. 1998), circuit design (Drechsler 1998, Mange 2000), information filtering and retrieval (Grossman 1995, Horng and Yeh 2000, Lee et. al. 2002), robot design (Nolfi and Floreano 2000, Leger 2000), and molecular design (Clark 2000), etc. Other forms of evolutionary algorithms are then described afterward in Section 3.
2
Genetic Algorithms
Genetic algorithms as they are known today were first developed by John Holland (1975). They are currently the most popular form in modeling evolution processes for optimization problems (Deb 2001, Osyczka 2002). This model emphasizes adaptive changes at the gene level; it treats a member in the population as the chromosome of an individual. A chromosome is constituted by a string of genes; each of them is regarded as carrying a genetic feature of an individual. An individual with better fitness is said to have some genetic features capable of solving the problem, and these features are expected to be propagated to the subsequent population by means of duplicating or recombining the gene sequence of parent populations. In genetic algorithms, an individual is normally a solution of a specific problem to be solved; it is represented as a fixed-length string, ofken, but not necessarily, in the form of a bit string. For example, if the problem is to find the maximum value of the function (15x-x2)in
72
W.-P. Lee and C.-H. Hsu
which the parameter x is an integer between 0 and 15, we can then define a 4-bit string representation for the parameter x. In other words, in t h s case a potential solution x is encoded to a binary representation so that further genetic operations can be performed. A gene can also be a floating-point number, instead of a binary bit. This is especially useful for the case of evolving neural networks in which the weights to be evolved are usually floating-point numbers (Yao 1999, Yen and Lu 2002). As can be seen, the most important thing in designing a representation is to ensure that a potential solution in the search space for the problem to be solved can be expressed as a string of the developed representation. It often requires considerable knowledge and insight for the problem. For simplicity, in this Chapter we only take the binary form to explain relevant techmques in GAS, more details on operating floating-point number forms can be found in (Michalewicz 1996). After the genetic representation is defined, the initialization phase starts. As is mentioned in the previous section, the initialization involves randomly generating a set of individuals each is a possible solution. The set of individuals forms a population and the number of individuals included is called population size. Population size is normally pre-defined by the user and how to pick an appropriate number depends on the user’s experiences. For example, in the above example, we can generate an initial population with population size 4 that includes four 4-bit strings (0 0 0 1) (0 1 0 1) (I 0 1 1) (1 0 0 1).
As other evolutionary algorithms, GA is operated as an iteratively cyclic mechanism which includes a sequence of selecting parents and creating children. Selection involves probabilistically choosing individuals from the current population as parents to generate offspring to form a new population. Selection is generally based on the fitness value of each individual, and the fitter, the better. In the above example, the objective function (15x-x2) can be directly used
Evolutionary Algorithms 73
as the fitness function. To evaluate the four initial individuals listed, we can decode them from binary to decimal forms and then calculate their fitness values as 14, 50, 44 and 54, respectively. Therefore, the fourth individual (1 0 0 1) will have a higher chance to be selected. Different selection schemes have been proposed; they will be described in a later section. For the creation of a new population, in GAS crossover is the major operator to create chldren among different genetic operators. The operations of reproduction and mutation produce relatively small numbers of offspring; they are secondly operators. Different operating techniques for the re-creation of chddren individuals have been used in the study of GAS, and the most popular ones are described in Section 2.3.
2.1
Selection Methods
Evolutionary algorithms simulate natural evolution processes in which the fitter members of the population have higher probabilities of producing offspring genetically. Selection criteria are thus implemented in evolutionary algorithms to play the role of choosing the fitter members for the creation of offspring. There are many different selection methods based on the fitness; they mainly include fitness-proportional selection, rank-based selection, tournament selection, and local selection. Fitnessproportional selection (also known as roulette wheel selection (Goldberg 1989, Davis 1991)) is the original selection method proposed for genetic algorithms by John Holland (1975). In this method, the probability of an individual being selected as a parent is directly proportional to its fitness value. Figure 2 illustrates the roulette wheel of the example shown in the last section. As can be seen, each individual is given a slice of a circular roulette wheel, and the area of the slice an individual occupies is equal to its fitness ratio. For example, the individual (0 0 0 1) occupies a slice of 9% (= 14/(14+50+54+44)) of the overall area. In the same way, the prob-
74
W.-P. Lee and C.-H. Hsu
abilities of all individuals in the population can be calculated and used in the selection scheme. Though this selection method is easy to use, it causes the selection probabilities to strongly depend on the scaling of the fitness values in the population. For instance, if the worst and the best fitness values are 1 and 10 in a population respectively, the probability of the best individual being selected is then ten times than the worst one; however, if the worst fitness is 1000 and the best fitness is 1010, the probabilities of the best and the worst individuals being selected are almost identical. This undesirable property is due to the fact that fitness-proportional selection method is not translation invariant. Some scaling methods have been proposed to overcome this property (e.g., (Goldberg 1989, Michalewicz 1996)).
El id1 (0001) *%
0 id2(0101) Oid3(1011) mld4(1001)
27%
Figure 2. Roulette wheel selection.
Rank-based selection was first suggested by Baker as a way to elmmate the disadvantage of the fitness-proportional selection method (Baker 1985). In a rank-based selection scheme, the individuals in the population are sorted first, according to their fitness values; this requirement of global information results in implementations of thls selection method being slower than those without sorting. The probability of an individual being selected as parent is now
Evolutionary Algorithms
75
based on its relative rank in the population (rather than the numerical fitness value). Different types of rank-based selection methods have been developed; the most popular form is the one proposed in (Whltley 1989) in which a bias factor is involved to control the selection pressure. Blickle (1 996) and Miller (1996) provide some details about the comparison of the above selection schemes. In tournament selection, a group of individuals is chosen randomly fkom the whole population and the individual with best fitness value is selected as a parent. The number of individuals in a group is the so-called tournament size. The tournament selection method has become more and more popular because its selection is based on rank and it only uses local information-no extra sorting on the fitness values of the whole population is required as in the rank-based selection method. Local selection methods are typically used in massively parallel genetic algorithms (or cellular GAS, described in a later section) (e.g., (Collins and Jefferson 1991, Gorges-Schlenter 1992)). In general, an algorithm using this type of selection arranges the individuals on a toroidal, two dimensional grid, with one individual at a grid position. Selection occurs locally at each grid position: the competition is among a small number of neighboring individuals with this individual as center. A typical implementation is that the parents chosen to produce a new individual at a certain grid position are the ones with the best fitness during a random walk starting fiom that position.
2.2
Creating Offspring
The operation of crossover is typically to recombine two parent individuals that are selected independently, to create two children individuals. In GAS, there is a crossover probabilityp, to determine whether this operation should be performed, after the two parents
76
W.-P. Lee and C.-H. Hsu
are selected for mating. That is, if a randomly generated number for the two parents selected is less than pc, crossover is performed; otherwise the parents are simply reproduced (copied) to the new population without changed. The recombination is generally implemented as alternately copying gene sequences, separated by randomly chosen crossover points, from the selected parents. Different types of crossover have been invented and among which one-point, twopoint and uniform crossover are the methods often employed in GAS. Figure3(a) and (b) illustrate examples of one-point and twopoint crossover. As for the uniform crossover, the commonly used strategy is that for each pair of genes at a certain position, a random number between is firstly generated, if the value is less than 0.5 the two genes of the parents are exchanged, otherwise the genes remain the same.
parent]: 0 0 1 1 parentz: 1 0 0 0
.t
PI
child]: 0 0 0 0 chiZd2: 1 0 1 1 (4 parent]: 0 0 1 1 parent2: 1 0 0 0
. tP2t
PI
child,: 0 0 0 1 childz: 1 0 1 0 (b) Figure 3. (a) and (b), one-point and two-point crossover in which each p i is a crossover point.
Selection and crossover operations can effectively search and recombine the chromosomes, but they may generate a population of
Evolutionary Algorithms
77
very s d a r individuals and make the algorithm get trapped on a local optimum. To maintain the population diversity, mutation is used. As the operator mutation is equivalent to random search, it thus only plays a mirror role in creating offspring in GAS. This operation is usually implemented by introducing a mutation probability p,, whch is typically a small number in the range 0.001 and 0.01, and checking each gene of the children individuals. Then, a random number is generated for each gene and if the random number is less than pm, this gene is mutated by randomly creating a new value within the corresponding range to replace the original one.
3
Other Evolution Algorithms
In addition to Genetic Algorithms, there are other evolutionary computation methods widely used. In this section, three popular evolutionary algorithms, evolutionary strategies, evolutionary programming, and genetic programming are introduced in turn.
3.1
Evolution Strategies and Evolutionary Programming
In the study of simulating evolution, the first kinds of algorithms were independently adopted; they include evolutionary strategies by Schwefel (1981) and Rechenberg (1973) in Germany, and evolutionary programming by Lawrence Fogel (1999) in the United States. Unlike GAS, these kinds of evolutionary algorithm emphasize the behavioral link between parents and offspring. In the ES model, a member of the population is typically considered as an individual which is constituted by a set of parameters (components) to be optimized, and it is then a fixed-length real valued string. Unllke genetic algorithms, the ES model emphasizes the behavioral linkage between parents and offspring; each component of
78
W.-P. Lee and C.-H. Hsu
a trial solution is thus regarded as a behavioral trait rather than a gene. It is assumed that whatever genetic transformations happen, the resulting change in each behavioral trait will follow a Gaussian distribution with zero mean difference and some standard deviation. In ES, mutation is the primary operator for creating offspring. It is applied to all components simultaneously, and is generally implemented by adding normally distributed random numbers to all components of an individual. The key concept of the ES is that it maintains one global step size (ie., the standard deviation 1~ in the equations below) for each individual. In the ES model, the step size is self-adaptive-each offspring inherits its step size from its parent and the step size is modified by the logarithm of normal random numbers. With this characteristic, ES are allowed to self-adapt to different fitness landscapes. Therefore, except the population size, there are no system parameters to be tuned by the designer (Back 1996). It has been shown that with the way of recreation described earlier, ES are better alternatives to Genetic Algorithms for some problems in whch the epistatis exists among the parameters to be optimized (Saloman 1996). Epistatis describes the interaction (or dependency) of parameters with respect to the fitness of an individual. If it appears, all parameters involved have to be adapted simultaneously so that the overall fitness of an individual can be improved. Hence, with the genetic operators used, GAS are very time consuming (epistatis drastically slows down the convergence of GAS) for the above kind of problems. Another advantage of using a mutation-based ES is in that it can reduce the negative impact of the permutation problem, and the evolutionary process can thus become more efficient (Back 1996). The multi-member ES model ( p +R)-ES is the ES variant widely used (e.g., (Lee and Tsai 2003, Yao 1999)); it incorporates the idea of population and self-adaptation of strategy parameters. In this model, f i is the number of individuals in each population, R is the
Evolutionary Algorithms 79
number of offspring created from the parents, and the best ,u individuals selected from the parents and offspring form the next population. In the traditional ES model, an individual is represented as a vector I = (XI,x2, . . ., x, 01, 0 2 , .. ., on)consisting of n components xi ( I 5 i 5 n) and their corresponding n standard deviations oi( I I i 5 n) for individual Gaussian mutation. During the evolution, in order to create an offspring each individual I is mutated to I ' = (xif, XZ', . . ., x,', oifyo2', .. ., onn') by the following operations: oi'= oi x exp (z'N(0, 1) + t Ni(O, 1)) and
Here, N(0, 1) is a normally distributed one-dimension random number with mean zero and variance one; Ni(O, 1) is a random number for component xi ; z and t f are set to the common used values
(a)-' (a)-', and
respectively.
The second operator in ES for creating offspring involves different parents (two or more); it recombines the parameter sets of different parents to create new individuals in several ways. Ths operation is much like the crossover in standard GAS, but with more options because of its real value representation. For example, it can randomly choose the values from multiple parents to generate multiple children, or average individual components fi-om parents by a specific weighting strategy. Some implementations also explore whether the selected parents should participate in the competition for survival. More details can be found in (Back 1996). S d a r to evolutionary strategies, evolutionary programming emphasizes the behavioral linkage between parents and offspring. But EP models evolution at the level of species rather than at the level of individuals as ES. In EP, a population is regarded as composed of different species whch compete with each other (a member in an EP
80
W.-P. Lee and C.-H. Hsu
population is regarded as a kind of species). An offspring is generally similar to its parent in the behavior-level with only slight variation. Thus, different species (members) in the population are considered to have independent behavior features and then are not allowed to mix with each other. In other words, the creation of a new offspring involves only one parent, and the offspring is derived from the selected parent with different forms of mutations. EP oRen uses finite state machines as its representation because its original inventor, Lawrence Fogel, thmks that this kind of machme involves intelligent behavior which requires the ability to predict environment conditions and to generate suitable responses for the given goal (Fogel 1999). As mentioned earlier, no recombination is involved in EP; offspring machines are created by randomly mutating parent m a c h e s . T h s attempt is to preserve behavioral similarity with the parent, and the motivation behmd is adopted directly from biology in which an offspring is generally similar to its parent at the behavioral level with only slight variations (Fogel 1995). Given that a f h t e state machine has a number of states, an initial state, a collection of transitions between the states, and an output associated with each transition, five possible modes of random mutation are used: adding a state, deleting a state, changing the initial state, changing a state transition, and changing an output symbol on a transition. The selection of mutation operation is based on the principle that the distribution of child structures would approximate a normal distribution around the parent. Details can be found in (Back 1996, Fogel 1995).
3.2
Genetic Programming
A variant of genetic algorithms, named genetic programming, was recently invented by John Koza, and it is popularity is currently increasing in the community of evolutionary algorithm research (Koza
Evolutionary Algorithms
81
1992, Koza 1994, Koza 1999). Genetic programming is similar to traditional genetic algorithms in its concept that the change mostly happens at the gene level, but is different from it in the representation and the implementation of the corresponding genetic operators. Hence, GP can be regarded as an extension of GA: it applies techniques used in GA to evolve dynamic-length tree structures rather than fixed-length strings as standard GA does. In GP, an individual is represented as a tree; t h s is inspired by the fact that a program in any computer programming language can be expressed as a parse tree with respect to the syntax of the language. It aims to evolve dynamic and executable structures which are oRen interpreted as computer programs, to solve problems without explicit programming.
As in computer programming, a tree structure in GP is constituted by a set of non-terminals which are the internal nodes of the trees, and a set of terminals which are the external nodes (leaves) of the trees. The construction of a tree is based on the syntactical rules whch extend a root node to appropriate symbols (non-terminal and/or terminals) and each non-terminal is extended again by suitable rules accordingly, until all the branches in a tree end up with terminals. The search space in genetic programming is the space of all possible tree structures which are composed of non-termjnals and terminals. Therefore, one of the most important step in developing a GP system is to define the sets of non-terminals (functions) and terminals that will comprise the evolving tree. As in a rewriting system in which a non-terminal will be substituted by a set of symbols, in GP, each function included in the function set must take a specified number of arguments for the corresponding branches in a tree. Terminals take no argument by definition. In this way, functions and terminals occupy the internal and external nodes of the trees, respectively; and the overall structure of a tree is determined by the number of arguments to the functions. For example, if a function set is defined as {+, -,*> in which each of the finctions has two argu-
82
W.-P. Lee and C.-H. Hsu
ments and performs the usual arithmetic operation, and if a terminal set is defined as {X, Y, R } in which X, Y are numerical variables and R represents the set of real numbers, the typical tree individuals look like the ones in Figure 4. A tree is equivalent to a parse tree which most compilers construct internally to represent a given computer program. Thus, evolving individuals in this form is equal to manipulating computer programs genetically.
/
0.51
Y
5
t'
Figure 4. Examples of randomly generated tree individuals. The function set is {f,-, * } and the terminal set is {X, Y, R } . It also provides an example of crossover in which new trees are created by swapping sub-trees of parents.
Considering developing a tree representation, there are two desirable properties in d e h g hnctions and terminals to form the tree individuals in GP (Koza 1992). One is sufficiency which is to ensure that the defined hnctions and terminals are capable of expressing the solution to the problem. The other is the closure property which guarantees the consistency of the data value or type returned by a
Evolutionary Algorithms 83
function or a terminal. That is, each function in the function set should be well defined to accept any combination of components (function or terminal) it may encounter. Depending on the problem, d e h g a function set and terminal set whch satis@ the property of sufficiency may be obvious or may require considerable insight. In some domains, the requirements for sufficiency are well-known. For example, for the task of evolving combinational logics to achieve some specific goal, the function set including logic operators and, or, not, is known to be sufficient for realizing any boolean function; and the terminals can be defined to include the related input variables. However, sometimes it is not clear how to define these sets - some knowledge and understanding about the problem is necessary. The knowledge and understanding is not related to any specific theory at all but the problem itself As in traditional computer programming, each function and terminal
in GP has an associated type, such as an integer or boolean. When a function or a terminal is called by others, it returns a value to the calling function and the types of the values passed through the tree must be consistent to guarantee the tree to be executable. Koza thus defines the closure property such that each of the defined functions is able to accept its own arguments with any value or data type that may possibly be returned by any function or terminals. With t h s property, the new offspring can be created by using the operator of crossover to swap sub-trees at arbitrary points, and as a result the new trees will be still correct in syntax. In general, a straightforward way to acheve the property of closure is to use a single return type for all the functions and terminals, and to carefully handle certain special situations such as calculating the result of a numerical variable divided by zero or the square root of a negative number. Typically, GP users have to pre-define some special operations, such as returning a constant in the former case or
84
W.-P. Lee and C.-H. Hsu
giving the square root of the absolute value of a negative variable in the latter case, to deal with cases like these.
As GP uses dynamic tree structure as its representation, to create an initial population thus involves determining the length (depth) and the way of growing a tree. In practice it is necessary to restrict the size of a tree when one is using GP to solve problems. Without a h t a t i o n on size, the tree evaluation will soon saturate the computational resources. In general, the limitation on tree size can be done by specifjmg a maximal depth or a maximal number of nodes for a tree. As noted in (Koza 1992), any reasonable limitation on tree size will not be a factor affecting the development of trees since the number in the search space will still be extremely large. In fact, for most of the experiments in his book, Koza uses 6 and 17 as the default constraints on depths for trees in initialization and after crossover, respectively. And he claims that the above values are enough to evolve expected solutions for most of the problems described in his books. Two ways are normally used to generate trees; one is the full method and the other is the grow method. The former is to create a full tree in which the length of a path from root node to any terminal must be equal to the specified maximum depth value. And the latter is to allow the tree to grow arbitrarily but to meet the maximum depth constraint. In the latter case, the sizes and shapes of trees generated by the grow method can be quite different. After the above steps, which are the kernel events in the application of genetic programming, the evolutionary mechanism breeds individual tree-structures to solve problems by executing the steps described in Figure 1. As in conventional Genetic Algorithms, reproduction and crossover are two primary operators in GP to create offspring. When reproduction happens, it involves only one parent tree which is selected from the parent population according to a cer-
Evolutionary Algorithms 85
tain selection criterion, and t h s parent tree is simply copied to be one of the new population members without any alteration. Crossover is the major operator to create most of the offspring. It recombines two selected parent trees to generate two chddren. When t h s operation happens, two trees are chosen, based on their fitness, fiom the current population, then a randomly selected sub-tree is identified in each of them and the sub-trees are swapped. A typical crossover operation is illustrated in Figure 4. Here, mutation only plays a minor role in creating a new population. This operator happens on one tree; it picks a sub-tree fiom the selected parent, deletes it, and generates a new sub-tree at random to substitute for the removed one. This new evolutionary approach has demonstrated its potential by evolving programs in a wide range of application, such as circuit design (Koza 1999, Fernandez et. al. 2001), network design (Gruau 1995), computer animation and art (Sims 1994, Rooke 2002), and bioinformatics (Ando et. al. 2002, Ross 2001), etc. 4
Parallelizing Simulated Evolution
Although evolutionary algorithms have been proved to be promising search approaches and have been applied successfully to different problem domains, two of their inherent features must be improved in order to solve more difficult problems. The first is premature convergence. As is well known, one of the attractive features of an evolutionary algorithm is that it can quickly concentrate on searching promising areas of the solution space. But, t h s feature sometimes has the negative effect that the EA loses population diversity before the goal is met. In other words, the EA converges to local optima. Although the genetic operator, mutation, can offer the ability of maintaining diversity of population, it performs a destructive operation. With a high mutation rate, an EA can increase the diversity but the good solutions may also be lost. Thus, a method which
86
W.-P. Lee and C.-H. Hsu
can maintain the population diversity to explore new areas of the solution space without destroying the current results is desired. The second feature which has to be improved is the computation time. As we know, an EA is a population-based approach; it has to evaluate all the population members and then can select the fittest to survive. Basically, the population size must be reasonably large in order to allow an EA to search the space globally, and the population size typically increases with increasing difficulty of problem. As a result, an inordinate amount of time may be required to perform all the evaluations for a hard problem. Parallelizing EAs was proposed by different researchers and has been proven to be a promising method to overcome both of the above problems. The parallelism is to divide a big population in a sequential EA into multiple smaller sub-populations whch are distributed to separate processors and can then be evaluated simultaneously. According to the sub-population size, the parallel EAs are categorized into two types: coarse-grain (e.g., (Tanese 1989, Koza and Andre 1995, Lee and Hallam 1999)), andfine-grain (e.g., (Spiessens and Manderick 1991, Gordon and Whltley 1993, Baluja 1993)). The characteristics of the two different types parallel EAs are described individually below.
4.1
Coarse-Grain Models
A coarse-grain EA divides the whole population into a small number of sub-populations in which each sub-population is expected to be evaluated by an independent EA in a separate processor. The subpopulations are kept relatively isolated from each other, so this kind of distributed EA is also called an island model EA. The island model was firstly proposed by Tanese (1989). In t h s model, each sub-population is manipulated by a sequential EA, and the selection and genetic operations are limited to happen only in the local sub-
Evolutionary Algorithms 87
populations. A communication phase is introduced in island model EAs. Ths idea is that an EA periodically selects some promising individuals fiom each sub-population and sends them to different sub-populations, according to certain criteria. This operation is called migration and the selected individuals are called migrants. In t h s way, an EA has hgher possibility to maintain population diversity and protect good solutions found locally. The coarse-grain models allow us to run several populations at the same time in their own processors to speed up the process of evolutions; and some research results have reported that parallel versions of genetic systems found better solutions than comparable serial ones, because their search in multiple directions maintains the diversity of the population. For the coarse-grain model, the most popular way is to configure the sub-populations as a binary n-cube; Figure 5 shows some examples of different n cubes. In such configurations, migration normally happens only between immediate neighbors, along different dimensions of the hypercube, and the communication phase is to send a certain number of the best individuals of each subpopulation to substitute the same number of worst individuals of its immediate neighbors at a regular interval (e.g., every ten generations).
Figure 5. The n-cube models for a distributed EA. From left to right: n 1, 2, 3, respectively.
=
As can be seen, running a coarse-grain EA involves the determination of some parameters: the topology of the distributed system, the migration rate, and the migration interval. Topology defines the
88
W.-P. Lee and C.-H. Hsu
connections between different subpopulations; migration rate determines the number of individuals to be migrated itom one subpopulation to others; and the migration interval controls how often the communication phase should happen. Most of the work in the study of coarse-grain EAs determines these parameters through empirical study; theoretical analysis is still needed.
4.2
Fine-Grain Models
The other type of parallelism is the fine-grain model in whch an EA is implemented to be massively parallel: the originally large population is divided into a large number of sub-populations in which each of them includes only a small number of individuals. This model is designed to take advantage of machines with a large number of processors (1000 or more). In the ideal case, each individual is evaluated on a different processor. Figure 6 shows the grid topology of the fine grain model generally used. In the fine-grain model, the whole population is viewed as numerous small overlapping subpopulations in which each individual belongs to multiple subpopulations. The selection and mating are restricted to occur only between an individual and its localized neighborhoods (ie., those individuals w i t h a certain range).
Evolutionary Algorithms 89
t
t
t
....
1,
i,
i2
1,
Figure 6. The grid topology of the fine-grain model.
The network topology in a fine-grain type EA affects the performance of such a system profoundly. If the connectivity among the sub-populations is hgh, the local optimals will spread quickly to the entire population. This situation is more serious than coarse-grain models or sequential ones, because the population size in this model is quite small which makes it easy for the local optimals to dominate the sub-populations. How to constrain the interactions between subpopulations is yet to be investigated. Although there has been some research work trying to compare the performance of coarse-grain and fine-grain models of parallelism, the results came out to be inconclusive (Cantu-Paz 2000). For example, in (Baluja 1993), the author prefers the fine-grain model, while in (Gordon 1993), the results favor the coarse-grain algorithms. From their results, we can see that the choice of coarse-grain or fine-grain is application- dependent, and it also relies on the availability of parallel machines.
90
5
W.-P. Lee and C.-H. Hsu
Summary
In this Chapter, we have briefly introduced the concept of evolutionary computation, and its most popular forms including genetic algorithms, evolutionary strategies, evolutionary programming and genetic programming. The relevant operating techniques, such as selection and creating offspring have also been explained. Different methods to paralleke sequential evolutionary algorithms have then been characterized; they are essential for enhancing performance and reducing computation cost. The principles described in this Chapter are relatively general in developing an evolutionary mechanism and they have been widely used to solve problems in many domains. To tackle more complex tasks, researchers have been exploring more advanced techniques. For example, Tuson and Ross have proposed an approach to adjust the probabilities of the genetic operators during a GA run, according to a measure of operator performance (Tuson 1998). Hark and Lob0 have considered some aspects of the theory of GAS, including population sizing, the schema theorem, building block mixing and genetic drift, and developed a parameter-less GA (Hark 1999). In addition, some work has explored the effect of co-evolution in which multiple sub-populations evolve together to acheve the overall goal (Potter 2000, Rosin 1995). Unlike the distributed GAS in whch individuals compete each other, in the co-evolutionary paradigm, each sub-population represents only part of the overall solution and individuals from different sub-populations must cooperate to form a complete solution for the task. The co-evolutionary model is introduced to provide reasonable opportunities for solutions to evolve in the form of interacting co-adapted sub-components. It is expected to enhance the performance and effectiveness of the traditional evolutionary algorithms.
Evolutionary Algorithms 91
References Ando, S., Sakamoto, E., and Iba, H. (2002), “Evolutionary modeling and inference of gene network,” Information Sciences, vol. 145, pp. 237-259. Back, T. (1996), Evolutionary Algorithms in Theory and Practice, Oxford University Press, New York. Baker, J. E. (1985), “Adaptive selection methods for genetic algorithms,” Proceedings of the First International Conference on Genetic Algorithms, pp. 101- 111. Baluja, S. (1993), “Structure and performance of fine-grain parallelism in genetic search,” Proceedings of the Fifth International Conference on Genetic Algorithms, pp. 155-162. Blickel, T. and Thiele, L. (1996), “A comparison of selection schemes used in evolutionary algorithms,” Evolutionary Computation, vol. 4, pp. 361-394. Cantu-Paz, E. (2000), Efficient and accurate parallel genetic algorithms, Boston, MA, Kluwer Academic Publishers. Clark, D.E. (2000), editor, Evolutionary algorithms in molecular design, Wiley-Vch Verlag. Collins, J.R. and Jefferson, D.R. (1991), “Selection in massively parallel genetic algorithms,” Proceedings of the Fourth International Conference on Genetic Algorithms, pp. 249-256. Darwin, C. (1859), The origin of species. John-Murray. Davis, L. (1991), Handbook on genetic algorithms. Van Nostrand Reinhold, New York.
92
W.-P. Lee and C.-H. Hsu
Deb, K. (2001), Multi-objective optimization using evolutionary Algorithms. John Wiley and Sons. Drechsler, R. (1998), Evolutionary algorithms for VLSI CAD, Kluwer, Boston, MA. Fernandez, F., Sanchez, J. M. and Tomassini, M. (2001), “Placing and routing circuits on FPGAs by means of parallel and distributed genetic programming,” Proceedings of the Fourth International Conference on Evolvable Systems: From Biology to Hardware, pp. 204-2 15. Fogel, D.B. (1995), Evolutionary Computation: Towards a New Philosophy of Machine Intelligence, IEEE Press. Fogel, L.J. (1999), Intelligence through Simulated Evolution: Forty Years of Evolutionary Programming, John Willey and Sons. Gorges-Schlenter, M. (1992), “Comparison of local mating strategies in massively parallel genetic algorithms,” Proceedings of Parallel Problem Solving from Nature 11, pp. 553-561. Goldberg, D.E. (1992), Genetic Algorithms in Search, Optimization, and Machine Learning, Addision-Wesley. Gordan V. and Wlutley, D. (1993), “Serial and parallel genetic algorithms as function optimizers,” Proceedings of the Fifth International Conference on Genetic Algorithms, pp. 177-183. Grossman, D. (1995). Integrating structured data and text: A relational approach. PhD thesis, George Mason University. Gruau, F. (1995). “Automatic d e h t i o n of modular neural netw o r k ~ ,Adaptive ’~ Behavior, vol. 3, pp. 151- 183.
Evolutionary Algorithms 93
Hark, G.R. and Lobo, F.G. (1999), “A parameter-less genetic algorithm,” Proceedings of International Conference on Genetic and Evolutionary Computation Conference, pp. 258-265. Holland, J. (1975), Adaptation in natural and artzfzcial systems. The Michgan University Press. Horng, J.T. and Yeh, C.-C. (2000), “Applying genetic algorithms to query optimization in document retrieval,” Information Processing and Management, vol. 36, pp. 737-759. Koza, J.R. (1992), Genetic Programming: on the Programming of Computers by Means of Natural Selection. MIT Press. Koza, J.R. (1994). Genetic Programming 11: Automatic Discovery of Reusable Programs. MIT Press. Koza, J.R. and Andre, D. (1 995), “Parallel genetic programming on a network of transputers,” Technical Report CS-TR-95-1542, Stanford University. Koza, J. R., Bennett, F., Andre, D., Keane, M., and Brave, S. (1999), Genetic Programming 111: Darwinian Invention and Problem Solving, Morgan Kaufman Publishers. Lee, W.-P. and Hallam, J. (1999), “Evolving reliable and robust controllers for real robots by genetic programming,” Soft Computing, vol. 3 , pp. 63-75. Lee, W.-P., Liu, C.-H., and Lu, C.-C.(2002), “Intelligent agentbased systems for personalized recommendations in internet commerce,” Expert Systems with Applications, vol. 22, pp. 275-284.
94 W.-P. Lee and C.-H. Hsv
Lee, W.-P. and Tsai, T.-C. (2003), “An interactive agent-based system for concept-based web search,” Expert Systems with Applications, vol. 24, pp. 365-373. Leger, C. (2001), DARWIN2K: An evolutionary approach to automated design f o r robotics, Kluwer Academis Publishers. Mange, D. (2000), “Toward robust integrated circuits: The embryo n i c ~approach,” Procedings of the IEEE, vol. 88, pp. 516-543. Michalewicz, Z. (1996), Genetic algorithms + data structures evolution programs. Third edition, Springer.
=
Miller, B.L. and Goldberg, D.E. (1996), “Genetic algorithms, tournament selection, and the effects of noise,” Complex Systems, VOI. 9, pp. 193-212. Ngo, C.Y. and Li, V.O.K. (1998), “Fixed channel assignment in cellular radio networks using a modified genetic algorithm,” IEEE Transactions on Vehicular Technology, vol. 47, pp. 163-171. Nolfi, S. and Floreano, D. (2000), Evolutionary robotics: The biology, intelligence, and technology of self-organizing machines, MIT Press. Osyczka, A. (2002), Evolutionary algorithms for single and multicriteria design optimization, Physica-Verlag. Potter, M. A. and DeJong, K. A. (2000), “Cooperative coevolution: an archltecture for evolving coadapted subcomponents,” Evolutionary Computation, vol. 8, pp. 1-29. Rechenberg, I. (1973), Evolutinsstrategie: Optimierung technischer systemenach prinzipiender biologischen evolution, FrommannHolzboog Verlag.
Evolutionary Algorithms
95
Rooke, S. (2002), “Eons of genetically evolved algorithmic images,” In P. J. Bentley and D. W. Corne (eds.), Creative Evolutionary Systems, pp. 339-365, Morgan Kaufman Publishers. Rosin, C.D. and Belew, R.K. (1995), “Methods for competitive coevolution: finding opponents worth beating,” Proceedings of the Sixth International Conference on Genetic Algorithms, pp. 373-380. Ross, B.J. (2001), “The evaluation of a stochastic regular motif language for protein sequences,” Proceedings of the Genetic and Evolutionary Computation Conference, pp. 120-128.
Saloman, R. ( 1996), “Reevaluating genetic algorithm performance under coordinate rotation of benchmark hnctions: a survey of some theoretical and practical aspects of genetic algorithms,” BioSystems, vol. 39, pp. 263-278. Sandalidis, H.G., Stavroulakis, P.P., and Rodriguez-Tellez, J.R. (1998), “An efficient evolutionary algorithm for channel resource management in cellular mobile systems,” IEEE Transactions on Evolutionary Computation, vol. 2, pp. 125-137. Schwefel, H.-P.( 198l), Numerical Optimation of Computer Models. John Wiley and Sons. Sims, K. (1994), “Evolving virtual creatures,” Computer Graphics, Annual Conference Series (SIGGRAPH’94), pp. 15-22. Spiessens, P. and Manderick, B. (1991), “A massively parallel genetic algorithm: implementation and first analysis,” Proceedings of the Fourth International Conference on Genetic Algorithms, pp. 279-285.
96
W.-P. Lee and C.-H. Hsu
Tuson, A. and Ross, P. (1998), “Adapting operators settings in genetic algorithms,” Evolutionary Computation, vol. 6, pp. 16l 184. W t l e y , D. (1989), “The genitor algorithms and selection pressure: why rank-based allocation of reproduction trial is best,” Proceedings of the Third International Conference on Genetic Algorithms, pp. l 16-12l . Yao, X. (1999), “Evolving artificial neural networks,” Proceedings o ~ I E E E~01.87, , pp. 1423-1447. Yen, G. and Lu, H (2002), “Hierarchical genetic algorithms for near-optimal feedfonvard neural network design,” International Journal of Neural Systems, vol. 12, pp. 3 1-43.
Chapter 4 A Tutorial on Meta-Heuristics for Optimization Shu-Chuan Chu, Chin-Shiuh Shieh, and John F. Roddick Nature has inspired computing and engineering researchers in many different ways. Natural processes have been emulated through a variety of techniques including genetic algorithms, ant systems and particle swarm optimization, as computational models for optimization. In this chapter, we discuss these meta-heuristics from a practitioner’s point of view, emphasizing the fundamental ideas and their implementations. After presenting the underlying philosophy and algorithms, detailed implementations are given, followed by some discussion of alternatives.
1
Introduction
Optimization problems arise from almost every field ranging from academic research to industrial application. In addition to other (arguably more conventional) optimization techniques, meta-heuristics, such as genetic algorithms, particle swarm optimization and ant colony systems, have received increasing attention in recent years for their interesting characteristics and their success in solving problems in a number of realms. In this tutorial, we discuss these metaheuristics from a practicer’s point of view. After a brief explanation of the underlying philosophy and a discussion of the algorithms, detailed implementations in C are given, followed by some implementation notes and possible alternatives. 97
98
S.-C. Chu, C.-5'.Shieh, and J . F. Roddick
The discussion is not intended to be a comprehensive review of related topics, but a compact guide for implementation, with which readers can put these meta-heuristics to work and experience their power in timely fashion. The C programming language was adopted because of its portability and availability. The coding style is deliberately made as accessible as possible, so that interesting readers can easily transfer the code into any other programming language as preferred' . In general, an optimization problem can be modeled as follows:
where F ( z ) is the object function subject to optimization, and X is the domain of independent variables xl,2 2 , . . , x,. We are asked to find out certain configuration of Z = (x1 x2 . . . , z,) ) to maximize or minimize the object function F ( Z ) . The optimization task can be challenging for several reasons, such as high dimensionality of Z, constrains imposed on 2,nondifferentiability of F ( z ) ,and the existence of local optima.
2
Genetic Algorithms
Based on long-term observation, Darwin asserted his theory of natural evolution. In the natural world, creatures compete with each other for limited resources. Those individuals that survive in the competition have the opportunity to reproduce and generate descendants. In so doing, any exchange of genes may result in superior or inferior descendants with the process of natural selection eventually filtering out inferior individuals and retain those adapted best to their environment. 'The code segments can be downloaded from http://kdm.first.flinders.edu.au/IDM/
A Tutorial
on Meta-Heuristics for Optimization
99
Inspired by Darwin’s theory of evolution, Holland (Holland 1975, Goldberg 1989) introduced the genetic algorithm as a powerful computational model for optimization. Genetic algorithms work on a population of potential solutions, in the form of chromosomes, and try to locate a best solution through the process of artificial evolution, which consist of repeated artificial genetic operations, namely evaluation, selection, crossover and mutation.
A multi-modal object function PI(z,y) as shown in Figure 1 is used to illustrate this. The global optimum is located at approximately F1(1.9931,1.9896) = 4.2947. F1(J:,
4
-
(z -
+ 2)2 + (y - 2)2 + 1 (z
-2
y
3 (y
+ + +1 2)2
5
Figure 1. Object function Fj
The first design issue in applying genetic algorithms is to select an adequate coding scheme to represent potential solutions in the search
100 S.-C. Chu, C.-S. Shieh, and J. F. Roddick
space in the form of chromosomes. Among other alternatives, such as expression trees for genetic programming (Willis et al. 1997) and city index permutation for the travelling salesperson problem, binary string coding is widely used for numerical optimization. Figure 2 gives a typical binary string coding for the test function Fl. Each genotype has 16 bits to encode an independent variable. A decoding function will map the 65536 possible combinations of b15 . . bo onto the range [-5,5) linearly. A chromosome is then formed by cascading genotypes for each variable. With this coding scheme, any 32 bit binary string stands for a legal point in the problem domain.
16 bits for x
16 bits for y
Figure 2. A binary coding scheme for Fl
A second issue is to decide the population size. The choice of population size, N , is a tradeoff between solution quality and computation cost. A larger population size will maintain higher genetic diversity and therefore a higher possibility of locating global optimum, however, at a higher computational cost. The operation of genetic algorithms is outlined as follows:
Step 1 Initialization Each bit of all N chromosomes in the population is randomly set to 0 or 1. This operation in effect spreads chromosomes randomly into the problem domains. Whenever possible, it is suggested to incorporate any a priori knowledge of the search space into the initialization process to endow the genetic algorithm with a better starting point.
A Tutorial o n Meta-Heuristics f o r Optimization 101
Step 2 Evaluation Each chromosome is decoded and evaluated according to the given object function. The fitness value, fi,reflects the degree of success chromosome ci can achieve in its environment.
Step 3 Selection Chromosomes are stochastically picked to form the population for the next generation based on their fitness values. The selection is done by roulette wheel selection with replacement as the following:
Pr(ci be selected)
=
ffF EN
3=1
f?
(5)
3
The selection factor, SF, controls the discrimination between superior and inferior chromosomes by reshaping the landscape of the fitness function. As a result, better chromosomes will have more copies in the new population, mimicking the process of natural selection. In some applications, the best chromosome found is always retained in the next generation to ensure its genetic material remains in the gene pool. Step 4 Crossover Pairs of chromosomes in the newly generated population are subject to a crossover (or swap) operation with probability Pc, called Crossover Rate. The crossover operator generates new chromosomes by exchanging genetic material of pair of chromosomes across randomly selected sites, as depicted in Figure 3. Similar to the process of natural breeding, the newly generated chromosomes can be better or worse than their parents. They will be tested in the subsequent selection process, and only those which are an improvement will thrive.
102 S.-C. Chu, C.-S. Shieh, and J. F. Roddick
1-site crossover
2-site crossover
Figure 3. Crossover operation
Step 5 Mutation After the crossover operation, each bit of all chromosomes are subjected to mutation with probability P M ,called the Mutation Rate. Mutation flips bit values and introduces new genetic material into the gene pool. This operation is essential to avoid the entire population converging to a single instance of a chromosome, since crossover becomes ineffective in such situations. In most applications, the mutation rate should be kept low and acts as a background operator to prevent genetic algorithms from random walking. Step 6 Termination Checking Genetic algorithms repeat Step 2 to Step 5 until a given termination criterion is met, such as pre-defined number of generations or quality improvement has failed to have progressed for a given number of generations. Once terminated, the algorithm reports the best chromosome it found. Program 1 is an implementation of genetic algorithm. Note that, for the sake of program readability, variable of int type is used to store a single bit. More compact representation is possible with slightly tricky genetic operators. The result of applying Program 1 to the object function Fl is reported
A Tutorial o n Meta-Heuristics for Optimization 103
in Figure 4. With a population of size 10, after 20 generations, the genetic algorithm was capable of locating a near optimal solution at Fl(1.9853, 1.9810) = 4.2942. Readers should be aware that, due to the stochastic nature of genetic algorithms, the same program may produce a different results on different machines.
4 v c 3
2 3 u1
Lo
a, c c .-
LL
-(02 a,
m
1
01
I
Although the operation of genetic algorithms is quite simple, it does have some important t.t. They search from a population oof points rather a single point. The use the object function directly, not their derivative. They use probabilistic o guide the search toward promising region. In effect, genetic algorithms maintain a population of candidate solutions and conduct stochastic searches via information selection and
104 S.-C. Chu, C.-S. Shieh, and J . F. Roddick
exchange. It is well recognized that, with genetic algorithms, nearoptimal solutions can be obtained within justified computation cost. However, it is difficult for genetic algorithms to pin point the global optimum. In practice, a hybrid approach is recommended by incorporating gradient-based or local greedy optimization techniques. In such integration, genetic algorithms act as course-grain optimizers and gradient-based method as fine-grain ones. The power of genetic algorithms originates from the chromosome coding and associated genetic operators. It is worth paying attention to these issues so that genetic algorithms can explore the search space more efficiently. The selection factor controls the discrimination between superior and inferior chromosomes. In some applications, more sophisticated reshaping of the fitness landscape may be required. Other selection schemes (Whitley 1993), such as rankbased selection, or tournament selection are possible alternatives for the controlling of discrimination. Numerous variants with different application profiles have been developed following the standard genetic algorithm. Island-model genetic algorithms, or parallel genetic algorithms (Abramson and Abela 1991), attempt to maintain genetic diversity by splitting a population into several sub-populations, each of them evolves independently and occasionally exchanges information with each other. Multiple-objective genetic algorithms (Gao et al. 2000, Fonseca and Fleming 1993, Fonseca and Fleming 1998) attempt to locate all nearoptimal solutions by careful controlling the number of copies of superior chromosomes such that the population will not be dominated by the single best chromosome (Sareni and Krahenbuhl 1998). Co-evolutionary systems (Handa et al. 2002, Bull 2001) have two or more independently evolved populations. The object function for each population is not static, but a dynamic function depends on the current states of other populations. This architecture vividly models interaction systems, such as prey and predator, virus and immune system.
A Tutorial on Meta-Heuristics for Optimization 105
3
Ant Systems
Inspired by the food-seeking behavior of real ants, Ant Systems, attributable to Dorigo et al. (Dorigo et al. 1996), has demonstrated itself to be an efficient, effective tool for combinatorial optimization problems. In nature, a real ant wandering in its surrounding environment will leave a biological trace, called pheromone, on its path. The intensity of left pheromone will bias the path-talung decision of subsequent ants. A shorter path will possess higher pheromone concentration and therefore encourage subsequent ants to follow it. As a result, an initially irregular path from nest to food will eventually contract to a shorter path. With appropriate abstraction and modification, this observation has led to a number of successful computational models for combinatorial optimization. The operation of ant systems can be illustrated by the classical Travelling Salesman Problem (see Figure 5 for example). In the TSP problem, a travelling salesman problem is looking for a route which covers all cities with minimal total distance. Suppose there are n cities and m ants. The entire algorithm starts with initial pheromone intensity set to TO on all edges. In every subsequent ant system cycle, or episode, each ant begins its trip from a randomly selected starting city and is required to visit every city exactly once (a Hamiltonian Circuit). The experience gained in this phase is then used to update the pheromone intensity on all edges. The operation of ant systems is given below: Step 1 Initialization Initial pheromone intensities on all edges are set to T ~ . Step 2 Walking phase In this phase, each ant begins its trip from a randomly selected starting city and is required to visit every city exactly once. When an ant, the k-th ant for example, is located at city T and needs to
106 S.-C. Chu, C.-S. Shieh, and J . F. Roddick 1
0.8
(u
0.6
m
._ E 0
?
0.4
0.2
0 0
0.2
0.4
0.6
0.8
1
X-coordinate
Figure 5 . A traveling salesman problem with 12 cities
decide the next city s, the path-taking decision is made stochastically based on the following probability function:
otherwise. where ~ ( rs), is the pheromone intensity on the edge between is the reciprocal of the discities T and s;visibility ~ ( rs) ,= tance between cities r and s; J k ( r )is the set of unvisited cities for the k-th ant. According to Equation 6, an ant will favour a nearer city or a path with higher pheromone intensity. p is parameter used to control the relative weighting of these two factors. During the circuit, the route made by each ant is recorded for pheromone updating in step 3. The best route found so far is also tracked.
&
Step 3 Updating phase
A Tutorial on Meta-Heuristics for Optimization 107
The experience accumulated in step 2 is then used to modify the pheromone intensity by the following updating rule:
k= 1
0,
if ( T , s) E route made by ant Ic; otherwise.
where 0 < a < 1 is a parameter modelling the evaporation of pheromone; L k is the length of the route made by the Ic-th ant; a T k ( T , s) is the pheromone trace contributed by the Ic-th ant to edge (T, s). The updated pheromone intensities are then used to guide the path-taking decision in the next ant system cycle. It can be expected that, as the ant system cycle proceeds, the pheromone intensities on the edges will converge to values reflecting their potential for being components of the shortest route. The higher the intensity, the more chance of being a link in the shortest route, and vice visa. Step 4 Termination Checking Ant systems repeat Step 2 to Step 3 until certain termination criteria are met, such as a pre-defined number of episodes is performed or the algorithm has failed to make improvements for certain number of episodes. Once terminated, ant system reports the shortest route found.
Program 2 at the end of this chapter is an implementation of an ant system. The results of applying Program 2 to the test problem in Figure 5 are given in Figure 6 and 7. Figure 6 reports a found shortest route of length 3.308, which is the truly shortest route validated by exhaustive search. Figure 7 gives a snapshot of the pheromone intensities after 20 episodes. A higher intensity is represented by a
108 S.-C. Chu, C . 3 . Shieh, and J. F. Roddick
wider edge. Notice that intensity alone cannot be used as a criteria for judging whether a link is a constitute part of the shortest route or not, since the shortest route relies on the cooperation of other links. 1
0.8
0.4
0.2
0 0
0.2
0.4
0.6
0.8
1
X-coordinate
Figure 6. The shortest route found by the ant system
A close inspection on the ant system reveals that the heavy computation required may make it prohibitive in certain applications. Ant Colony Systems was introduced by Dorigo et al. (Dorigo and Gambardella 1997) to remedy this difficulty. Ant colony systems differ from the simpler ant system in the following ways:
0
There is explicit control on exploration and exploitation. When an ant is located at city r and needs to decide the next city s, there are two modes for the path-taking decision, namely exploitation and biased exploration. Which mode to be used is governed by a random variable 0 < q < 1,
A Tutorial on Meta-Heuristics for Optimization 109
Figure 7. The snapshot of pheromone intensities after 20 episodes
Exploitation Mode:
Biased Exploration Mode:
0
Local updating. A local updating rule is applied whenever a edge from city T to city s is taken: 7 ( T ,s )
-
(1 - p ) . T ( T , s )
+ p q r , s)
(10)
where AT(T,s) = T~ = ( n L,,)-’, L,, is a rough estimation of circuit length calculated using the nearest neighbor heuristic; 0 < p < 1is a parameter modeling the evaporation of pheromone.
110 S.-C. Chu, C.-S. Shieh, and J . F. Roddick
0
Count only the shortest route in global updating. As all ants complete their circuits, the shortest route found in the current episode is used in the global updating rule: T(T,
AT(?-,S) =
s)
-
(1 - a ) - T ( T , s )
+ a h ( r ,s)
(1 1)
if ( T , s) E global best route; otherwise.
where Lgb is the length of shortest route. In some respects, the ant system has implemented the idea of emergent computation - a global solution emerges as distributed agents performing local transactions, which is the working paradigm of real ants. The success of ant systems in combinatorial optimization makes it a promising tool for dealing with a large set of problems in the NP-complete class (Papadimitriou and Steiglitz 1982). In addition, the work of Wang and Wu (Wang and Wu 2001) has extended the applicability of ant systems further into continuous search space. Chu et al. (2003) have proposed a parallel ant colony system, in which groups of ant colonies explore the search space independently and exchange their experiences at certain time intervals.
4
Particle Swarm Optimization
Some social systems of natural species, such as flocks of birds and schools of fish, possess interesting collective behavior. In these systems, globally sophisticated behavior emerges from local, indirect communication amongst simple agents with only limited capabilities. In an attempt to simulate this flocking behavior by computers, Kennedy and Eberthart (1995) realized that an optimization problem can be formulated as that of a flock of birds flying across an
A Tutorial
on
Meta-Heuristics for Optimization
111
area seeking a location with abundant food. This observation, together with some abstraction and modification techniques, led to the development of a novel optimization technique - particle swarm optimization. Particle swarm optimization optimizes an object function by conducting a population-based search. The population consists of potential solutions, called particles, which are a metaphor of birds in flocks. These particles are randomly initialized and freely fly across the multi-dimensional search space. During flight, each particle updates its velocity and position based on the best experience of its own and the entire population. The updating policy will drive the particle swarm to move toward region with higher object value, and eventually all particles will gather around the point with highest object value. The detailed operation of particle swarm optimization is given below:
Step 1 Initialization The velocity and position of all particles are randomly set to within pre-specified or legal range. Step 2 Velocity Updating At each iteration, the velocity of all particles are updated according to the following rule:
<
where 6 and are the position and velocity of particle i, respectively; &,best and $best is the position with the 'best' object value found so far by particle i and the entire population, respectively; w is a parameter controlling the dynamics of flying; R1 and R2 are random variables from the range [0,1]; c1 and c2 are factors used to control the related weighting of corresponding terms. The inclusion of random variables endows the particle swarm optimization with the ability of stochastic searching. The weight-
112
S.-C. Chu, C.-S. Shieh, and J. F. Roddick
ing factors, c1 and c2, compromises the inevitable tradeoff between exploration and exploitation. After the updating, Gishould be checked and clamped to pre-specified range to avoid violent random walking.
Step 3 Position Updating Assuming a unit time interval between successive iterations, the positions of all particles are updated according to the following rule:
5
t-
@a
+ Ga
(13)
After updating, fi should be checked and coerced to the legal range to ensure legal solutions.
Step 4 Memory Updating Update 6 , b e s t and ijb& when condition is meet. +
Pi,best
Gbebest
t-
6
6
if if
f(6) > f(@i,bi,best), f(6) > f ( g b e s t )
(14)
where f(2)is the object function subject to maximization.
Step 5 Termination Checking The algorithm repeats Steps 2 to 4 until certain termination conditions are met, such as pre-defined number of iterations or a failure to make progress for certain number of iterations. Once terminated, the algorithm reports the &best and f(&St) as its solution. Program 3 at the end of this chapter is a straightforward implementation of the algorithm above. To experience the power of particle swarm optimization, Program 3 is applied to the following test function, as visualized in Figure 8. F 2 ( 2 ,y) =
--z sin
(m) (m), - y sin
-500 < z, y < 500 (15)
A Tutorial on Meta-Heuristics for Optimization 113
where global optimum is at F 2 ( -420.97, -420.97)
= 837.97.
Figure 8. Object function FZ
In the tests above, both learning factors, cl and c2,are set to a value of 2, and a variable inertia weight zu is used according to the suggestion from Shi and Eberhart (1 999). Figure 9 reports the progress of particle swarm optimization on the test function F 2 ( z ,g) for the first 300 iterations. At the end of 1000 iterations, F2(-420.97, -420.96) = 837.97 is located, which is close to the global optimum. It is worthwhile to look into the dynamics of particle swarm optimization. Figure 10 presents the distribution of particles at different iterations. There is a clear trend that particles start from their initial positions and fly toward the global optimum. Numerous variants had been introduced since the first particle swarm
114 S.-C. Chu, C . 3 . Shieh, and J. F. Roddick
850 800
p 750 0 2
LL 3 m
700
5 650 W 0
$600 c
m
550 500 450
0
50
100 150 200 Number of PSO cycles
250
300
Figure 9. Progress of PSO on object function Fz
optimization. A discrete binary version of the particle swarm optimization algorithm was proposed by Kennedy and Eberhart (1997). Shi and Eberhart (2001) applied fuzzy theory to particle swarm optimization algorithm, and successfully incorporated the concept of co-evolution in solving min-max problems (Shi and Krohling 2002). (Chu et al. 2003) have proposed a parallel architecture with communication mechanisms for information exchange among independent particle groups, in which solution quality can be significantly improved.
5
Discussions and Conclusions
The nonstop evolution process has successhlly driven natural species to develop effective solutions to a wide range of problems.
A Tutorial on Meta-Heuristics for Optimization 115
(b) 10-th iteration.
(a) 0-th iteration
"t (c) 100-th iteration.
(d) 300-th iteration.
Figure 10. The distribution of particles at different iterations
Genetic algorithms, ant systems, and particle swarm optimization, all inspired by nature, have also proved themselves to be effective solutions to optimization problems. However, readers should remember that, despite of the robustness these approaches claim to be of, there is no panacea. As discussed in previous sections, there are control parameters involved in these meta-heuristics and an adequate setting of these parameters is a key point for success. In general, some kind of trial-and-error tuning is necessary for each particular instance of optimization problem. In addition, these meta-heuristics should not be considered in isolation. Prospective users should speculate on the possibility of hybrid approaches and the integration of gradient-based methods, which are promising directions deserving further study.
116 S.-C. Chu, C.-S. Shieh, and J. F. Roddick
References Abramson, D. and Abela, J. (199 l), “A parallel genetic algorithm for solving the school timetabling problem,” Technical report, Division of Information Technology, CSIRO. Bull, L. (200 l), “On coevolutionary genetic algorithms,” Soft Computing, vol. 5 , no. 3, pp. 201-207. Chu, S. C. and Roddick, J. F. and Pan, J, S. (2003), “Parallel particle swarm optimization algorithm with communication strategies,” personal communication. Chu, S. C. and Roddick, J. F. and Pan, J. S. and Su, C. J. (2003), “Parallel ant colony systems,” 14th International Symposium on Methodologies for Intelligent Systems, LNCS, Springer-Verlag, (will appear in Japan). Dorigo, M. and Maniezzo, V. and Colorni, A. (1996), “The ant system: optimization by a colony of cooperating agents,” IEEE Trans. on Systems, Man, and Cybernetics-Part B y vol. 26, no. 2, pp. 2941. Dorigo, J. M. and Gambardella, L. M. (1 997), “Ant colony system: a cooperative learning approach to the traveling salesman problem,” IEEE Trans. on Evolutionary Computation, vol. 26, no. 1, pp. 5366. Fonseca, C. M. and Fleming, P. J. (1993), “Multiobjective genetic algorithms,” IEE Colloquium on Genetic Algorithms for Control Systems Engineering, number 1993/130, pp. 6/1-6/5. Fonseca, C. M. and Fleming, P. J. (1998), “Multiobjective optimization and multiple constraint handling with evolutionary algorithms I: A unified formulation,” IEEE Trans. on Systems, Man and Cybernetics-Part A , vol. 28, no. 1, pp. 26-37.
A Tutorial on Meta-Heuristics for Optimization 117
Gao, Y., Shi, L., and Yao, P. (2000), “Study on multi-objective genetic algorithm,” Proceedings of the Third World Congress on Intelligent Control and Automation, pp. 646-650. Goldberg, D. E. (1989), Genetic Algorithm in Search, Optimization and Machine Learning, Addison-Wesley, Reading, MA. Handa, H., Baba, M., Horiuchi, T., and Katai, 0. (2002), “A novel hybrid framework of coevolutionary GA and machme learning,” International Journal of Computational Intelligence and Applications, vol. 2, no. 1, pp. 33-52. Holland, J. (1979, Adaptation In Natural and Artificial Systems, University of Michigan Press. Kennedy, J. and Eberhart, R. (1 999, “Particle swarm optimization,” IEEE International Conference on Neural Networks., pp. 19421948. Papadimitriou C. H. and Steiglitz, K. (1982), Combinatorial Optimization - Algorithms and Complexity, Prentice Hall. Sareni, B. and Krahenbuhl, L. (1998), “Fitness sharing and niching methods revisited,” IEEE Trans, on Evolutionary Computation, vol. 2, no. 3, pp. 97-106. Shi, Y. and Eberhart, R. C. (2001), ‘‘Fuzzy adaptive particle swarm optimization,” Proceedings of 2001 Congress on Evolutionary Computation (CEC’2001), pp. 101-106. Shi, Y. and Krohling, R. A. (2002), “Co-evolutionary particle swarm optimization to solve min-max problems,” Proceedings of 2002 Congress on Evolutionary Computation (CEC’2002), vol. 2, pp. 1682-1687. Wang, L. and Wu, Q. (2001), “Ant system algorithm for optimization in continuous space,” IEEE International Conference on Control Applications (CCA ’2001), pp. 395-400.
118 S.-C. Chu, C.-S. Shieh, and J. F. Roddick
Whitley, D. (1993), A genetic algorithm tutorial, Technical Report CS-93- 103, Department of Computer Science, Colorado State University, Fort Collins, CO 8052. Willis, M.-J., Hiden, H. G., Marenbach, P., McKay, B., and Montague, G.A. (1997), “Genetic programming: an introduction and survey of applications,” Proceedings of the Second International Conference on Genetic Algorithms in Engineering Systems: Innovations and Applications (GALESIA’97), pp. 3 14-319.
A Tutorial on Meta-Heuristics f o r Optimization 119
Program 1. AR implementation of genetic algorithm in C language.
#include <stdio.h> #include <stdlib.h> #include <math.h>
/ * Maximal Number of Generations * / / * Population Size * / / * Number of bits in each chromosome * / #define SF 2.0 / * Selection Factor * / #define CR 0.5 / * Crossover Rate * / #define MR 0.05 / * Mutation Rate * / #define MG 50 #define N 10 #define CL 32
/ * Macro for random number between 0 and 1 * / #define RAND ( (float)rand()/ (float)(FlAND-MAXtl)) int c[N] [CL]; float f [N]; */ int best c[CLl; float best-f; -
/ * Population of Chromosomes * / / * Fitness Value of Chromosomes / * Best Chromosome * / / * Best Fitness Value * /
/ * Decode Chromosome * / void decode (int chromosome [CL],float *x,float * y ) {
int j; / * Decode the lower 16 bits for variable x * / (*x)=o. 0;
for ( j = O ; j
for(j=CL/2;j
120 S.-C. Chu, (2.3.Shieh, and J. F. Roddick
( * y )= ( * y )* 2 . Otchromosome [ j 3 ; ( * y )= ( * y ) /POW ( 2 . 0 , 1 6 . 0 ) *lo. 0 - 5 . 0 ;
1
/ * Object Function * / f l o a t o b j e c t ( f l o a t x, f l o a t y )
I
r e t u r n ( 4 . 0 / ( ( x - 2 . 0 ) * ( x - 2 . 0 ) t ( y-2.0) * ( y-2.0) +I. 0)t 3 . o / ( (x-2.0 1 * (x-2.0 1 + (y+2.0 1 * ( y + 2 . 0 )+ l .0 ) +2. o / ( ( x + 2 . 0 * ( x+2 0 + ( y-2 0 ) * ( y-2 0 ) +l. 0) ) ;
.
.
.
1 v o i d main ( v o i d )
I int int int float int float int int float
i , k; j;
gen; x,y; site; tmpf; tmpi ; tmpc[N] [ C L ] ; p "1 ;
/* /* /* /*
/*
I n d e x f o r Chromosome * / Index f o r Generation * / Index f o r Generation * / Independent Variables * / Mutation S i t e * /
/ * Temporary P o p u l a t i o n * / /* Selection P r o b a b i l i t y * /
/*
S e t random seed * / srand(4);
/* I n i t i a l i z e Population */ b e s t f=-l.Oe99; f o r (i=O;i < N ; i++)
I
/ * Randomly
s e t e a c h g e n e t o ' 0 ' o r '1' * / f o r ( j = O ; j
A Twtorial on Meta-Heuristics for Optimization
/ * Repeat Genetic Algorithm cycle for MG times * / for(gen=O;gen<MG;gen++)
I / * Evaluation * / for (i=O;i
I
decode (c[ i] ,&x,&y); f [i]=object (x,y); / * Update best solution * / if (f[il>best-f)
I best f=f [i]; for j
(3=0;
1
/ * Selection * / / * Evaluate Selection Probability * / tmpf=O.0; for (i=O;i
I p[i]=pow(f [i],SF); tmpf=tmpf+p [il ; 1 for (i=O;i
I tmp f=RAND ; for (k=O;tmpf>p [ k] ;k++) tmpf=tmpf-p [kl ;
121
122 S.-C. Chu, C . 3 . Shieh, and J. F. Roddick
/ * Chromosome k is selected * / for (j=O;j
1
/ * copy temporary population to population * / for (i=O;i
I site=RAND*CL; for (j=O;j<site;j++)
I tmpi=c [ i] [ j ]; c[i] [jl=c[i+l] [jl; c [ i+l] [ j]=tmpi;
I I / * Mutation * / for (i=O;i
I / * Report Solution * / decode (best c,&x, & y ); print f ( 'IF ( %?,8 f) =%f\n x ,y ,ob ject (x,y ) ) ; 'I,
1
A Tutorial o n Meta-Heuristics for Optimization 123
Program 2. An implementation of ant system in C language. #include <stdio.h> #include <s tdlib h > #include <math.h>
.
#define Map
"map.txt"
P */ #define #define #define or * / #define #define #define #define
/*
RAND:
#define
NumberOfCity NumberOfAnt alpha
/ * file name of city ma
12 / * number of cities * / 10 / * number of ants * / 0.2 / * pheromone decay fact
2.0 / * tradeoff factor between pheromone and distance * / tau0 0.01 / * initial intensity of pheromone * / EpisodeLimit 20 / * limit of episode * / Route "route.txt" / * file name for route map * / Macro for random number between 0 and 1 * / RAND ((float)rand()/(float)(RAND-MAX+l))
beta
typedef struct { float x; / * x coordinate * / float y; / * y coordinate * / } CityType; typedef struct { int route[NumberOfCity]; / * visiting sequence of cities * / float length; / * length of route * / } RouteType; CityType city [ NumberOfCity ]; / * city array * / float delta [NumberOfCityl[NumberOfCity]; / * distance matrix * / float eta [NumberOfCity][NumberOfCity];
124 S.-C. Chu, C.-S. Shieh, and J. F. Roddick
/ * weighted visibility matrix * / tau [NumberOfCity][NumberOfCity]; / * pheromone intensity matrix * / RouteType BestRoute; / * shortest route * / RouteType ant [NumberOfAnt]; / * ant array * / float
float p [NumberOfCity]; int visited[NumberOfCity]; status * / float
/ * path-taking probability array * / / * array for visiting
delta-tau[NumberOfCity] [NumberOfCity]; / * sum of change in tau * /
void main (void) t FILE* mapfpr; int r,S; int k; int episode ; int step; float tmp; FILE* routefpr;
/ * file pointer for city map * / / * indices for cities * / / * index for ant * / / * index for ant system cycle * / / * index for routing step * / / * temporary variable * / / * file pointer for route map * /
/ * Set random seed * / srandll); / * Read city map * / mapfpr=fopen (Map,Irrrt) ; for(r=O;r
/ * Evaluate distance matrix * / for(r=O;r
for(s=O;s
.
A Tutorial on Meta-Heuristics
for
Optimization 125
delta[r] [s]=sqrt( (city[r].x-city[sl .x)
* (city[rl .x-city[s].x)t(city[r].y-city[s].y)*(city[ rl .y-city[sl.y)1;
/ * Evaluate weighted visibility matrix * / for (r=O;r
/ * Initialize pheromone on edges * / for(r=O;r
tau [ r] [ s]=tauO;
/ * Initialize best route * / BestRoute.route[O]=O; BestRoute.length=O.O; for(r=l;r
I BestRoute.route [ r]=r; BestRoute.lengtht=delta[r-11 [rl;
I BestRoute.lengtht=delta[NumberOfCity-11 [O];
/ * Repeat ant system cycle for EpisodeLimit times * / for(episode=O;episode<EpisodeLimit;episodei-+)
I / * Initialize ants' starting city * / for (k=O;k
126 S.-C. Chu, C . 3 . Shieh, and J . F. Roddick
/ * Let all ants proceed for NumberOfCity-1 steps * / for(step=l;step
I for (k=O;k
I / * Evaluate path-taking probability array for ant k at current time step
*/
.
r=ant [ k] route [ step-11 ;
/ * Clear visited list of ant k*/ for(s=O;s
visited [ s]=O;
/ * Mark visited cities of ant k*/ for ( s = O ; s<step;s + + ) visited[ant[k] .route[sll=l; tmp=O. 0; for(s=O;s
I p [ s ] =tau [ r] [s]*eta [r] [sl;
tmp+=p [sl ;
I for(s=O;s
p [ s ];s++ ) tmp-=p [ s I ; ant [ k] route [step]=s ;
.
I I
A Tutorial
on
Meta-Heuristics for Optimization 127
/ * Update pheromone intensity * / / * Reset matrix f o r sum of change in tau * / for(r=O;r
I / * Evaluate route length * /
.
ant [ k] length=O.0; for (r=l;r
.
.
/ * Evaluate contributed delta-tau * / for(r=l;r
delta tau[ant[k] .route[r-111 [ant[kl. route [ r] ]+=l.O/ant [ k] length; delta-tau[ant[k].route[NumberOfCity - 1 1 1 [ant[k]. route[O]]+=l.O/ant[k].length;
.
/ * Update best route * /
.
if (ant[ k] length
I BestRoute.length=ant[k].length; for(r=O;r
.
1 1 / * Update pheromone matrix * / for (r=O;r
128 S.-C. Chu, C.-S. Shieh, and J . F. Roddick
tau [ rl [ s l = (1.0-alpha) *tau [r] [ s ] +delta tau[rl [s]; printf ("%d %f\n",episode, BestRoite.length) ; 1
/ * Write route map * / routefpr=fopen (Route,"w") ; for (r=O;r
Sample data, map.tq for Program 2. 0.997375 0.731415 0.078674 0.854553 0.199432 0.613770 0.287476 0.518707 0.465729 0.843231 0.091431 0.634186
0.666321 0.678162 0.856720 0.204895 0.599670 0.773651 0.430420 0.288818 0.555573 0.890839 0.479309 0.478363
A Tutorial o n Meta-Heuristics for Optimization 129 Program 3. An implementation of particle swarm optimi zation in C language.
#include <stdio.h> #include <stdlib.h> # i n c l u d e <math.h> # d e f i n e It e r a t i o n L i m i t teration */ #define Populationsize
1 0 00 40
/*
Maximal Number o f I
/* Population Size: Number of P a r t i c l e s * /
# d e f i n e Dimension Space * / # d e f i n e WU t i a Weight * / # d e f i n e WL t i a Weight * / #define cl
2
/ * Dimension o f S e a r c h
0.9
/ * Upper Bound o f I n e r
0.4
/ * Lower Bound
2.0
/* Acceleration Factor
2.0
/* Acceleration Factor
of I n e r
1 */ # d e f i n e c2 2 */ # d e f i n e Vmax # d e f i n e RAND
1000.0 / * Maximal V e l o c i t y ( ( f l o a t )r a n d ( ) / ( f l o a t ) ( R A N D
*/
wti) typedef s t r u c t { f l o a t x [Dimension] ; f l o a t dimension] ; float fitness; f l o a t best-x[Dimension]; f l o a t b e s t -f i t n e s s ; ] ParticleType;
/* Position */
/*
Velocity
*/
/* Fitness */ /* Individual B e s t Solution */ /* Individual B e s t Fitness
*/
130 S.-C. Chu, C.3. Shieh, and J . F. Roddick
ParticleType
p [ PopulationSize];
/ * Particle
float
gbest-x [Dimension];
/ * Global
float
gbest-fitnes s ;
Array * / Best Solution * / / * Global Best Fitness * /
/ * Schwefel Function * / float Schwefel (float dimension] )
I int i; float tmp; tmp=0 0; for (i=O;i
.
I void main (void)
I int int float int
i; d; w; step;
/* /* /* /*
Index for Particle * / Index for Dimension * / Inertia Weight * / Index for PSO cycle * /
/ * Set random seed * / srand (1);
/ * Initialize particles * / gbest fitness=-l.Oe99; for (izo;i
I
.
.
p [ i] x [d]=p [ i] best-x [d]=RAND*1000.0-500.0; p [i].v[d]=RAND*1000.0-500.0;
1
A Tutorial on Meta-Heuristics for Optimization 131 p[i] .fitness=p[i].best-fitness=Schwefel (p[i] .x); / * Update gbest * / if (p[i].best-fitness>gbest-fitness)
I for (d=O;d
.
I / * Repeat
cycle for InterationLimit times * /
PSO
for(step=O;step
I
w=wU- (wU-wL)* ( (float)step/ (float)IterationLimit); for(i=O;i
for(d=O;d
I p[il .v[dl=w*p[il .v[dl+cl*RAND*(p[il .best-x [dl-p [il x [dl +c2*RAND* (gbest-x [dl-p [il x [dl 1: if (p[ i] v [ d]>Vmax)p [ i] v [ d]=Vmax; if ( p[ i 3 v [ d]<-Vmax)p [ i ] v [ d]=-Vmax ; p[i] .x[d]=p[i] .x[d]+p[i] .v[d]; if (p[i].x[dl>500.0)p[il .x[dl
.
.
.
.
.
.
=500.0;
if (p[i].x[dl<-500.0)p[il .x[dl =-500.0; 1 p[il .fitness=Schwefel(p[i].XI; / * Update pbest * / if (p[il.fitness>p[i] .best-fitness) for (d=O;d
.
.
132
S.-C. Chu, C.-S. Shieh, and J . F. Roddick
/ * Update gbest * / if ( p[ i] .best -fi tness>gbes t-f itness ) I for(d=O;d
.
I I
I
printf ("%f\n" ,gbest -fitness) ;
1
}
PART I1
Watermarking Techniques
This page intentionally left blank
Chapter 5 Watermarking Based on Spatial Domain Hsiang-Cheh Huang, Jeng-Shyang Pan, and Hsueh-Ming Hang In this chapter, we introduce the basics for watermarking in the spatial domain. Watermarking in the spatial domain, also called additive watermarking, is one of the fundamental schemes in the beginning of digital watermarking researches since 1993. Although this lund of watermarking schemes are simple and easy for implementation, they tend not to be robust congenitally. Employing the error control codes may increase the robustness of spatial domain watermarking. We point out this topic, and hope this can serve as a starting point for inspiring the readers to explore more in the field of digital watermarking.
1
Introduction
The most straightforward and fundamental schemes for the fields of digital watermarking are watermarking in the spatial domain. From the historical perspective, watermarking in the spatial domain started long ago until today, for instance, there are watermarks in the paper bills that we use everyday. At the beginning of the digital watermarking research, while designing the embedding and extraction algorithms, researchers tended to propose schemes to add a pseudo-random noise pattern, or the watermark, to the original image by modifying the luminance values of the pixels in the spatial domain, like the methods in one of the earliest papers with this approach (Tirkel et al. 1993, van Schyndel et al. 1994). We demon135
136
H.-C. Huang, J.-S. Pan, and H.-M. Hang
strated one example for such an application in Section 5.2 of Chapter 1 (Wong 1998). Spatial watermarks are embedded easily and fast, but they are generally considered fragile (Langelaar et al. 2000, Voyatzis and Pitas 1998, Wolfgang et al. 1997). By incorporating the channel coding schemes with the spatial watermarking algorithms, the authors in (Hernindez et al. 1998) obtained the detector structures and the analytical bounds for the bit error rate (BER) and the receiver operating characteristic (ROC). They also pointed out that the use of channel coding, such as the BCH block codes, results in an improvement of BER for large watermark capacities, and a degradation for small watermark capacities. Therefore, the choice of the channel codes is important due to their minimum distances and the redundancies. With the experiences of watermarking in the spatial domain, researchers envisaged extending into other domains and applications for further investigation.
2
General Embedding Structures in The Spatial Domain
For spatial-domain watermarlung, the watermark is embedded by alternating the pixel values slightly in the spatial domain pixels. A typical example is proposed by Pitas (Pitas 1996, Pitas 1998). Assuming that the binary bit pattern watermark W is embedded into the original media X . If the watermark size is smaller than that of the original, it can be replicated until its size is larger than the original. The embedding procedure can be represented by
where i represents the positions to be embedded, and ai denotes the strength factor (Bruce et al. 1997, Tefas and Pitas 2001). Watermarking with the operations in Eq. (1) is also called additive watemarking.
Watermarking Based on Spatial Domain
137
Because the human eyes respond logarithmically to changes of luminance intensity, the watermark embedding algorithms need to incorporate with the human visual system (HVS) (De Vleeschouwer et al. 2002, Wandell 1995, Wolfgang et al. 1999). The logarithmic value of the variance among neighboring pixels within a square window, pi,controls the strength factor a,. Assuming that there is a small, positive, real number E E R+, E << 1. E limits pi within a practical range
pi
=
+E) ,
1og (var (window (Xi, n))
where n is the width of the square window, whch affects the distortion level of edges and small objects. Consequently, the strength factor aiin Eq. (1) is calculated with
where C represents a user-defined constant. Typically, the value of C is set to 10. By employing the general methods of spatial watermarking, we give a simple example here to embed a binary watermark with size (128 x 128) into the original image lena with size (512 x 512), as shown in the upper part of Figure l(b) and Figure l(a), respectively. On the one hand, the watermark is permuted according to a secret key, key,,, to disperse its spatial locations by employing a pseudo-random number traversing method (Proakis 1995). And the permuted watermark, shown in the lower part of Figure l(b), looks like random patterns. On the other hand, people can employ the spread-spectrum scheme for watermark embedding (Hernindez and PCrez-Gonzilez 1999, Kalker and Haitsma 2000). We use the permuted watermark to be embedded into the specifically selected pixels of the original image with the methods in Section 4.2. One commonly employed measure to evaluate the imperceptibility of the watermarked image is the peak signal-to-noise ratio (PSNR).
138 H.-C. Huang, J . 3 . Pan, and H.-M. Hang
Assuming that the original image X and the watermarked image X' both have image sizes M x N . The mean square error (MSE) between the original and the watermarked images can be represented by _.
M
N
Consequently, the PSNR can be calculated by
Intuitively, for imperceptible watermarkmg, the watermarked image should look as similar as the original one, thus the MSE between the two images in Eq. (4)should be as small as possible. Consequently, from Eq. ( 5 ) , the higher the PSNR value, the less imperceptible the watermarked image. The watermarked image, shown in Figure 1(c), has the PSNR value of 51.90 dB. From a subjective point of view, both Figure l(a) and Figure l(c) have no difference under human perception.
3
General Extraction Structures in The Spatial Domain
Watermark extraction in the spatial domain is a cross-correlation process. To detect a watermark in a possibly watermarked image X i , we calculate the correlation between the received image X l , which is possibly corrupted or attacked, and the pseudo-random noise pattern Wi by (Barni et al. 1998, Swanson et al. 1998)
P =
i=l
N
>
Watermarking Based o n Spatial Domain
139
(4 Figure 1. The depictions of spatial watermarking. (a) The original image lena with size 512 x 512. (b) The watermark rose with size 128 x 128, shown in the upper part, and the permuted version, shown in the lower part. (c) Embedding the watermark in the spatial domain of the original image. PSNR = 5 1.90 dB.
140 H.-C. Huang, J.-S. Pan, and H.-M. Hang
where p is the correlation coefficient. In extracting the watermark, we need to pre-determine a threshold value, and compare to the computed p. The extracted watermark is evaluated to determine the robustness of the watermarking algorithm. The authors in (Hernandez and Pkrez-Gonzilez 1999) gave some mathematical derivations for spatial domain watermarking fi-om a theoretical perspective.
4
Issues for Practical Implementations
4.1
Bit Plane Manipulations of The LSB’s
The methods describe here rely on the manipulation of the leastsignificant-bit (LSB) of images, in a manner which is undetectable and imperceptible to the human eyes. The 8-bit grey level original image is first compressed into 7-bit representations, or the LSB’s are discarded directly. Then, the LSB’s of the original image are replaced by the permuted watermark bits with Eq. (1). By modifying the least significant bits of the input multimedia to incorporate with the embedded watermark, we obtain the algorithm similar to those proposed in (van Schyndel et al. 1994). With this simple algorithm, the encoder is easy to implement, and the system is secure to some degree because of the use of the random number permutation or the spread spectrum scheme to the watermark.
4.2
Selecting The Embedding Positions
In Chapter 13 of this book, the authors proposed an effective means for selecting the embedding positions with genetic algorithms (GA) by considering the watermarked image quality, measured by PSNR in Eq. (9,and the robustness of the watermarking algorithm, measured by bit correct ratio (BCR). Assuming that the embedded watermark W and the extracted one W’have image sizes Mw x Nw.
Watermarking Based o n Spatial Domain
141
BCR can be calculated with
where @ means the exclusive-or (XOR) operation. By checking the tradeoffs of the above two elements, the embedding positions are determined. The embedding positions can be served as the secret key in the high level generic watermarking structure in Figure l(a) of Chapter 1. Readers are suggested to refer to Chapter 13 for more details.
5
Applications
Spatial-domain watermarking algorithms still have their applications due to the simplicity of the algorithms, especially for those that implementation complexities need to be considered. We point out some of the applications in the following sections.
5.1
Video Watermarking Applications
One of the important issues for watermarking is the implementation complexity. The authors in (Kalker and Haitsma 2000) proposed algorithms to upgrade the spatial “Just Another Watermarking System” (JAWS) (Kalker et al. 1999) at a minimum cost to detect the watermark in the MPEG domain. That is, the watermark is embedded in the spatial domain according to the schemes in JAWS, and the watermark is detected in the MPEG domain. It is shown that a large reduction in implementation complexity can be achieved with a small sacrifice in performance. The authors in (Lancini et al. 2002) propose to embed the watermark, which is protected by the error control code, into the original, raw video data. Simulation results indicate that by embedding the watermark in the spatial domain, it is still robust even under low bitrate
142
H.-C. Huang, J.-S. Pan, and H.-M. Hang
compression and the resizing, or transcoding attack. By employing the error control codes to add some redundancy into the watermark, although the watermark capacity is reduced, the bit error probability in the extracted watermark is also reduced.
5.2
Frame Reordering and Dropping
The authors in (Winne et al. 2002) extend spatial-domain watermarking schemes for MPEG-2 (MPEG-2 Website 2000). Instead of inserting the watermark into the raw video data, they embed the watermark at three distinct locations: before, in, and after the motion compensation loop. They propose algorithms to recover spatial watermarks from compressed video at different locations and different bitrates. From the simulation results, embedding the watermark before the motion compensation loop should be more feasible and more favorable.
5.3
Combining Spatial-Domain Watermarking with Watermarking in Different Domains
Generally speaking, except for video watermarking, spatial-domain watermarking is too simple to be employed for real applications. Hence, by combining the spatial-domain watermarking algorithms with the watermarking schemes in different domains seems an applicable way. In (Tsai et al. 2000), the authors proposed algorithms that incorporates spatial watermarking with wavelet domain watermarking, and both the image quality and the robustness of the algorithms are considered. Also, the simplicity and the application versatility can be achieved by combining watermarking in different domains.
6
Conclusions
In this chapter, we briefly introduced and discussed the general concepts and implementations for watermarking in the spatial domain.
Watermarking Based o n Spatial Domain
143
We also pointed out the applications with this approach. The human visual system plays an important role for spatial-domain watermarking. The embedding schemes can also incorporate with the error correcting coding schemes or the cryptographic methods to strengthen its usefulness. Although watermarking in the spatial domain seems to have a limited practicality, it serves as the beginning step for the subsequent watermarking researches.
References Barni, M., Bartolini, F., Cappellini, V., and Piva, A. (1998), “Copyright protection of digital images by embedded unperceivable marks,” Image and Vision Computing, vol. 16, pp. 897-906. Bruce, V, Green, P.R., and Georgeson, M.A. (1997), “Visual perception, physiology, psychology, and ecology,” Psychological Press, pp. 25-27. De Vleeschouwer, C., Delaigle, J.-F., and Macq, B. (2002), “Invisibility and application functionalities in perceptual watermarking -an overview,” Proceedings of the IEEE, vol. 90, pp. 64-77. Hernbdez, J.R. and Pkrez-Gonzilez, F. (1999), “Statistical analysis of watermarking schemes for copyright protection of images,” Proceedings of the IEEE, pp. 1142-1166. Hernandez, J.R., Pkrez-Gonzalez, F., and Rodriguez, J.M. (1998), “The impact of channel coding on the performance of spatial watermarlung for copyright protection,” Proc. IEEE Int ’I Con$ Acoustics, Speech, and Signal Processing, pp. 2973-2976. Kalker, T., Depovere, G., Haitsma, J., and Maes, M. (1999), “A video watermarking system for broadcast monitoring,” Proc. SPIE, Security WatermarkingMultimedia Contents, pp. 103-112.
144 H.-C. Huang, J.-S. Pan, and H.-M. Hang
Kalker, T. and Haitsma, J. (2000), “Efficient detection of a spatial spread-spectrum watermark in MPEG video streams,” Proc. IEEE Int ’I Con$ Image Processing, pp. 434-437. Lancini, R., Mapelli, F., and Tubaro, S. (2002), “A robust video watermarking technique for compression and transcoding processing,” Proc. IEEE Int’l Con$ Multimedia and Expo, pp. 549-552. Langelaar, G. C., Setyawan, I., and Lagendijk, R. L. (2000), “Watermarking digital image and video data: A state-of-the-art overview,” IEEE Signal Processing Magazine, vol. 17, pp. 20-46. MPEG-2 Website (2000), http ://www. chiariglione .org/ mpeg/standards/mpeg-2/mpeg-2.htm Pitas, I., (1996), “A method for signature casting on digital images,” IEEE Int’l Con$ Image Proc., pp. 215-218. Pitas, I., (1998), “A method for watermark casting on digital images,” IEEE Circuits and System for Udeo Tech., pp. 775-780. Proakis, J. G. (1995), Digital communications, 3rd ed. McGraw-Hill, New York, NY. Swanson, M.D., B. Zhu, and Tewfik, A.H. (1998), “Multiresolution scene-based video watermarking using perceptual models,” IEEE J. Selected Areas in Communications, vol. 16, pp. 540-550. Tefas, A. and Pitas, I. (2001), “Robust spatial image watermarking using progressive detection,” Proc. IEEE Innt ’1 Con$ Acoustics, Speech, and Signal Processing, pp. 1973-1976. Tirkel, A.Z., Rankin, G.A., van Schyndel, R.M., Ho, W.J., Mee, N.R.A., and Osborne, C.F. (1993), “Electronic water mark,” Digital Image Computing Techniques and Applications ’93, pp. 666672.
Watermarking Based o n Spatial Domain
145
Tsai, M.J., Yu, K.Y., and Chen, Y.Z. (2000), “Joint wavelet and spatial transformation for digital watermarking,” IEEE Trans. Consumer Electronics, vol. 46, pp. 241-245. van Schyndel, R.G., Tirkel, A.Z., and Osborne, C.F., (1994), “A digital watermark,” Proc. IEEE Int ’1 Conf Image Processing, pp. 8690. Voyatzis, G. and Pitas, I. (1998), “Chaotic watermarks for embedding in the spatial digital image domain,” Proc. IEEE Int’l Conf Image Processing, pp. 432-436. Wandell, B.A. (1995), Foundations of Vision, Sinuaer, Sunderland, MA. Winne, D.A., Knowles, H.D., Bull, D.R., and Canagarajah, C.N. (2002), “Spatial digital watermark for MPEG-2 video authenticatior, and tamper detection,” Proc. IEEE Int ’I Conf Acoustics, Speech, and Signal Processing, pp. 3457-3460. Wong, P., (1998), “A public key watermark for image verification and authentication,” Proc. IEEE Int ’I Con$ Image Processing, pp. 455-459. Wolfgang, R.B. and Delp, E.J. (1997), “Overview of image security techniques with applications in multimedia systems,” Proc. SPIE Conf Multimedia Networks: Securi@, Displays, Terminals, and Gateways, pp. 297-308. Wolfgang, R.B., Podilchuk, C.I., and Delp, E.J. (1999), “Perceptual watermarks for digital images and video,” Proceedings of IEEE, V O ~ .87, pp. 1108-1126.
This page intentionally left blank
Chapter 6 Watermarking Based on Transform Domain Hsiang-Cheh Huang, Jeng-Shyang Pan, and Hsueh-Ming Hang Watermarking based on transform domain is mostly encountered in literature. Transform-domain watermarking schemes, also called multiplicative watermarks, are generally considered to be robust against attacks. In this chapter, we introduced the fundamental methods for embedding and extraction of the watermarks in the transform domain. These schemes can apply directly to discrete cosine transform (DCT)-domain watermarking, and can also apply to other transformations, including discrete Fourier transform (DFT) and discrete wavelet transform (DWT), with some modifications to fit the characteristics of different transforms. The readers are encouraged to refer to the subsequent chapters in this book with the fundamental concepts of transform-domain watermarking to be described in this chapter.
1
Introduction
For image, audio, and video compression algorithms, transform coding is one of the most important building blocks for processing with the input multimedia. Therefore, watermarking based on transform domain is mostly encountered in literature. Among the practical schemes, discrete Fourier transform (DFT) (Oppenheim and Schafer 1999), discrete cosine transform (DCT) (Rao and Yip 1990), and discrete wavelet transform (DWT) (Akansu and Medley 1999, Rao and 147
148 H.-C. Huang, J.S. Pan, and H.-M. Hang
Bopardikar 1998) are the most popular transform coding schemes for academic researches and practical implementations. Watermarking related to transform coding schemes are referred to as multiplicative watermarking. Multiplicative watermarks are automatically image content dependent and are automatically embedded mainly into the perceptually most significant components of the image if the robustness requirement is the main concern for protecting the image content (Langelaar et ul. 2000). The perceptual models based on Weber’s law and the Just Noticeable Difference (JND) (Jameson and Hurvich 1972) can be easily exploited to alleviate the effects caused by the embedded watermark bits. In the following sections, we will describe the concepts for multiplicative watermarks with several examples for practical implementations.
2
General Structures for Transform Domain Watermarking
For transform domain image watermarking, three main steps must be specified: image transformation, watermark casting, and watermark recovery (Barni et al. 1998). For different applications and approaches, image transformation can be applied to the whole image (Cox et al. 1996), or to the block-byblock manner (Hsu and Wu 1999). After that, the transform domain coefficients can be obtained. Watermark casting refers to the selection of the transform-domain coefficients to embed the watermark bits. Algorithms for achieving transform domain watermarking would modify the selected coefficients in the transform domain. The schemes for embedding the watermark in the transform domain is also called the multiplicative embedding rule, which can be denoted
Watermarking Based on Transform Domain
149
by
xi
=
xi.( I f T W i ) ,
(1)
where X ’ and X stand for the watermarked media and the original counterpart, respectively, W denotes the watermark, i represents the positions to be embedded, and y is the gain factor. For the watermark with length L, i E [0,L - 11. For detection or verification, the receiver needs to verify if a specific watermarking pattern exists or not. A correlator is often used for full extraction of the watermark. The correlation (R,,,,,) w) between the possibly attacked image X ” , and the watermark W ,can be calculated by
-
L-1 i=O
Given a pre-determined threshold T , it can be compared with the correlation in Eq. (2) for deciding the presence of the watermark. Therefore, the decision rule for the presence of the watermark can be expressed by
2T
j
+
watermark is present; watermark is not present.
3
Categories for Transformations
3.1
Discrete Fourier Transform OFT)
(3)
For a length-M 1-D DFT, the relationship between the spatial/temporal domain signals, f [n], and their corresponding transform in the frequency domain, F [ k ] ,is M-1
n=O
150 H.-C. Huang, J.-S. Pan, and H.-M. Hang
where
WL
G
e-3 ' 2 x r f M
We can extend the 1-D DFT to its 2-D counterpart because the image can be regarded as a two-dimensional function X (2, j ). Hence, the 2D DFT can be defined as M-1 N - l
i=o j = o
The DFT of an image is generally complex valued. This leads to the magnitude and phase representations for the image
WW) =
(6.4 (6 .b)
lY(U,V)l>
@(u,v) = LY(u,v).
DFT-domain watermarking serves as the pioneering researches in transform-domain watermarking (0Ruanaidh et al. 1996). In (Barni et a2. 2003) and (Solachidis and Pitas 2001), the authors embed the watermark into the magnitude of the original image, M ( u , v ) , in Eq. (6.a) according to the multiplicative embedding rule in Eq. (1). Because DFT is less frequently encountered in image transformation, it leads to the results that only a few papers focus on DFT-domain watermarking.
Discrete Cosine Transform (DCT)
3.2
For DCT' with block size ( M x N ) , the connection between the spatial domain image pixels X ( i ,j ) and the transform domain coefu)is ficients Y(u, Y(U,V)
= M-1 N - 1
J M N .J=o.
(22
+ 1)ur
(2j
+ 1)vr
2=0
(7) 'Eq. (7) denotes the type-I1 DCT.
Watermarking Based o n Transform Domain
where u = 0, 1, . . .
,M
c(k)
-
=
1, v
= 0,
1,
1,
151
, N - 1,and
i f k = 0; otherwise.
One example for explaining the 8 x 8 block DCT and the zigzag scan of the 64 DCT coefficients is illustrated in Figure 2 of Chapter 12. Researches in DCT-domain watermarking started since 1996 (Hsu and Wu 1996). Watermarking in the DCT domain is popular, because DCT serves as the transform coding module in image and video coding standards, including JPEG, MPEG- 1, MPEG-2, H.26 1, and H.263. Conventional schemes for embedding the watermark in the DCT domain is to modify the selected coefficients in the blocks (Chu 2003, Hsu and Wu 1999, Lin and Chang 2001, Nikolaidis and Pitas 2003). Thus, there should be improvement and enhancement related to DCT domain watermarking in the near future.
3.3
Discrete Wavelet Transform (DWT)
Wavelet techniques provide excellent space and frequency energy compaction, in which energy tends to a cluster spatially in each subband. For DWT, the link between the spatial/temporal domain signals, f ( t ) , and the DWT of f ( t ) , d ( k , Z), is
where $(0 ) denotes the mother wavelet. One example of the wavelet function $ ( t )for the D4 wavelet, is presented in Figure 1. Researches for DWT domain watermarking started since 1997 (Swanson et al. 1997). Research activities are still blooming in this field (Barni et al. 2001, Nikolaidis and Pitas 2003, Serdeen et al. 2003, Wang et al. 2002, Wang et al. 1998).
152 H.-C. Huang, J.-S. Pan, and H.-M. Hang
4
Examples for Practical Implementations
4.1
Transform-Domain Image Watermarking
Many transform-domain techniques, as we described in Sec. 3, have been proposed in literature. A few survey papers (Langelaar et al. 2000, Swanson et a2. 1998, Wolfgang et al. 1999), in addition to the papers cited Sec. 3, have reviewed them quite extensively. We thus only describe three well-known schemes to illustrate transformdomain watermarlung implementations.
Watermarking Based o n Transform Domain
153
A well-known scheme was proposed by Koch and Zhao in 1995 (Koch and Zhao 1995). The basic idea is to embed binary marks on the frequency components. However, one mark bit is not used to change the magnitude of a single coefficient. Rather, a pair of coefficients is used to encode a mark bit. An image is first partitioned into blocks of size 8 x 8, for example. The discrete cosine transform (DCT) is applied to each individual 8 x 8 block, which is similar to that in P E G . Next, proper DCT coefficient pairs are selected from each transform block. Koch and Zhao suggest the use of mid-frequency components with the following inferences. For embedding into high-frequency components, the watermarked image is vulnerable to common image processing, such as low-pass filtering, even though the watermarked image quality is acceptable. For embedding into low-frequency components, it should be robust against common image processing attacks such as low-pass filtering. However, embedding in the lowfrequency components will cause the resulting watermarked image quality greatly degrades to compare with the original image. This comes from the fact that the energies of most natural images are concentrated in low-frequency components, and the human eyes are more sensitive to the noise caused by modifying the lower frequency components. Consequently, because the mid-frequency components often have moderate and comparable magnitudes, and their alternation is less visible, it can serve as good choices for embedding the watermark bits in the transform domain. For a selected coefficient pair, X ( m ,n ) and X ( i ,j ) , on the one hand, if the corresponding mark bit is “I”, then the X ( m ,n) is forced to be greater than X (i,j ) by at least p , where p is a properly chosen positive value. On the other hand, if the corresponding mark bit is “O”, then ( X ( r n ,n ) - X ( i ,j ) ) must be smaller than -p. If the original coefficient pair does not match what is required, then X ( m ,n) and X ( i ,j ) are properly adjusted so that their relationship follows the aforementioned rule. It was reported that this scheme could survive
154 H.-C. Huang, J.-S. Pan, and H.-M. Hang
P E G compression (Koch and Zhao 1995). A variation of the aforementioned scheme is that X ( m ,n) and X ( i , j ) are chosen from two neighboring blocks at the same frequency coefficient (Hsu and Wu 1999). Under this choice, the original X ( m ,n ) and X ( i ,j ) values are often close and thus their alternation has a smaller visual impact.
4.2
Spread Spectrum Transform-Domain Watermarking
Another well-known transform-domain scheme is the spread spectrum method proposed by Cox, Kilian, Leighton, and Shamoon (Cox et al. 1996, Cox et al. 1997). The term “spread spectrum” is used to name this approach because, according to the authors, the watermark is spread throughout the spectrum of an image. However, the frequency hopping technique in digital communication is not explicitly used. Another feature of this watermark is that the white Gaussian random variable, or the independent, continuous amplitude variable, is used to generate the mark. That is, the mark is not binary. The authors claim that the Gaussian mark (comparing to the binary mark) has a much better detection performance when multiple mark are embedded into the same image. Another interesting feature of this scheme is that the marks are added to the significant DCT components not to the lower magnitude components, which are less visible. The authors argue that because the insignificant components can be attacked with little loss in image quality, they are insecure. Figure 2 shows the basic steps of the embedding process. We first take the transform (Fourier transform @FT or FFT2),DCT or DWT) ’FFT: Fast Fourier Transform. The fast Fourier transform (FFT)is a discrete Fourier transform (DFT) algorithm which reduces the number of computations needed for N points from 2 N 2 to 2N log, N.
Watermarking Based o n Transform Domain
155
of the entire image. Then, select the significant transform components. In their paper, they picked up the largest 1000 components (excluding the DC component). The zero mean, unit variance Gaussian mark, W ( i ,j ) , is added to these significant components, X ( i ,j ) , by the following formula according to the multiplicative embedding rule,
xy2,j )
=
X(2,j ) (1 + c ' W(2,j ) ) ,
(9)
where c is a properly chosen value. In their example, the value of c is 0.1. It is clear that a large c would distort the marked image quality.
Determine
input image
X
/DCT ____t
Y
perceptually sign1ficant
regions
inverse
watermarked
y' -image
t
FFT/DCT
insert watermark W
I
Figure 2. The encoding process of the Spread Spectrum algorithm (Cox ef al. 1997).
The watermark detracting process is shown in Figure 3. The original image DCT components are needed to retrieve the mark. A correlator can be used to detect the existence of the embedded mark. The standard detection theory can be used to analyze its performance, including error rate, etc. It is reported that this scheme works well under image size scaling, cropping, dithering, and P E G compression (Cox et al. 1996, Cox et al. 1997).
156 H.-C. Huang, J.-S. Pan, and H.-M. Hang recovered image
input image
- + FFTDCT
Y" I
FFTDCT
- Y
extracted
f original
X
watermark
W Figure 3. The decoding process of the Spread Spectrum algorithm (Cox et al. 1997).
4.3
Image Adaptive Transform-Domain Watermarking
All the previous watermarking schemes do not employ explicitly the human visual model in the markmg process although they may be extended to include a perceptual model. The image adaptive scheme proposed by Podilchuk and Zeng (Podilchuk and Zeng 1997, Podilchuk and Zeng 1998) has a specific feature: it uses the human perceptual model and thus is picture-dependent. Its basic operation is similar to that of the Spread Spectrum scheme in Section 4.2. However, instead of performing the entire picture DCT, Podilchuk and Zeng take the block DCT transform and evaluate the Just Noticeable Difference (JND) for each coefficient, J ( i ,j ) , using the Safranek-Johnston model (Safranek et al. 1990), which was originally developed for compression purpose. In fact, a quite significant portion of their paper is describing the computation of J ( i ,j ) . It is obvious that J ( i , j ) is block dependent. Then, the zero mean, unit variance white Gaussian mark is added as follows:
X I ( 2 , j )=
X ( i ,j )
+ J ( i , j ) W ( ij ,) ,
if X ( i ,j ) > J ( i ,j ) , (10) otherwise.
Watermarking Based o n Transform Domain 157
Because the mark is added to the components that are greater than J ( i ,j ) , the number of marked components is picture-dependent. It ranges from 17,000 to 70,000 for pictures with sizes 512 x 512. A process similar to the decoding process of the Spread Spectrum scheme is used to extracvdetect the mark. It is reported that this scheme can survive strong cropping (1/16 of the original size), scaling and JPEG compression (Podilchuk and Zeng 1998).
4.4
Compressed-Domain Video Watermarking
Up to now, comparing to watermarking other multimedia formats, only a portion of researches are dedicated to watermarlung video signal. Interestingly, if we consider the real applications, there is a trend to turn from image to video watermarking for researches and implementations. We can apply a single-frame image watermark to each individual image to constitute a mark on a video sequence. However, even though each marked image has nearly no visible distortion when examined individually, the motion sequence may show flickering artifact if the mark changes the pixel values inconsistently in contiguous frames. What we like to describe in this sub-section is a scheme that inserts marks into the compressed bitstream. This scheme was proposed by Hartung and Girod (Hartung and Girod 1997, Hartung and Girod 1998). They adopt the time-domain spread spectrum technique in digital communication. The mark is viewed as a low rate binary message. It multiplies with a high rate binary pseudo random sequence, called chip, to produce a coded sequence. The coded sequence is a high rate random-like sequence embedded in a low rate message. If the correct chip sequence is provided, one can retrieve the message even when the coded sequence is contaminated by noise. This technique has been successfully used in digital communication systems. The wireless CDMA system adopts the same principle to create multiple channels.
158 H.-C. Huang, J . 3 . Pan, and H.-M. Hang
The coded sequence can be viewed as a watermark and it can be added to the original object to produce a marked object. The original object can be the image pixels (Hartung and Girod 1996, Hartung and Girod 1997) or the audio samples (Bender et al. 1996). The compressed data such as the MPEG bitstream can also be used as a “channel” that carries the coded sequence. This is the approach taken by Hartung and Girod (Hartung and Girod 1997, Hartung and Girod 1998). In the algorithm presented by Hartung and Girod (Hartung and Girod 1997, Hartung and Girod 1998), a bit pattern, b ( i ) , which is to be embedded into the media, is first multiplied by a high rate random sequence, the chip p ( i ) . The bit rate ratio between chip and bit pattern can be around 100,000 : 1, which produces a watermark rate of 12.5 bytes/second for typical MPEG sequences. The resultant binary sequence w(i) is DCT transformed into a frequency domain sequence, W ( j ) .This transformed sequence W ( j )is added to the original video coefficient, X ( j ) , to produce the marked coefficient
X‘m There are two features in this algorithm. If the bits of X ‘ ( j ) after variable wordlength coding is greater than that of X ( j ) ,this coefficient is not marked. In other words, the original MPEG DCT coefficient is modified only when this modification does not increase the compressed data bit rate. Because of this constraint, typically only 10% to 20% of the coefficients are marked. The second feature is to solve the drifting problem in motion compensation. Because the previous frame at the MPEG decoder is the marked version, which is different from the one at the MPEG encoder used for motion compensation. The differences propagate due to motion compensation and would degrade the image quality. We briefly describe the transform domain applications to video watermarking. The readers are suggested to refer to more details for
Watermarking Based o n Transform Domain
159
specific schemes in video watermarking in Chapters 9, 10, 18, and 19.
5
Summary
In this chapter, we introduced the concepts of multiplicative watermarks that are widely employed in watermarking in the transform domain. In addition, we presented several examples for transform domain watermarking, and the fundamental concepts for watermark embedding follow the basic principle of the multiplicative embedding rule, which is represented in Eq.(1).
The fundamental schemes described in this chapter applied directed to image watermarking. And the readers are suggested to refer to the subsequent chapters for other issues relating to transform domain watermarking.
References Akansu, A.N. and Medley, M.J. (1999), Wavelet, subband, and block transforms in communications and multimedia, Kluwer Academic Publishers, Boston: MA. Barni, M., Bartolini, F., Cappellini, V., and Piva, A. (1998), “A DCTdomain system for robust image watermarking,” Signal Processing, V O ~ 66, . pp. 357-372. Barni, M., Bartolini, F., De Rosa, A., and Piva, A. (2003), “Optimum decoding and detection of multiplicative watermarks,” ZEEE Trans. Signal Processing, vol. 51, pp. 1118-1123. Barni, M., Bartolini, F., and Piva, A. (2001), “Improved waveletbased watermarking through pixel-wise maslung,” ZEEE Trans. Image Processing, vol. 10, pp. 783-791.
160 H.-C. Huang, J.-S. Pan, and H.-M. Hang
Bender, W., Gruhl, D., Morimoto, N., and Lu, A. (1996), “Techniques for data hiding,” IBM Systems Journal, vol. 35, pp. 3 13336. Cheng, Q. and Huang, T.S. (2003), “Robust optimum detection of transform domain multiplicative watermarks,” IEEE Trans. Signal Processing, vol. 5 1, pp. 906-924. Chu, W.C. (2003), “DCT-based image watermarking using subsampling,” IEEE Trans. Multimedia, vol. 5 , pp. 34-38. Cox, I.J., Kilian, J., Leighton, T., and Shamoon, T. (1996) “Secure spread spectrum watermarking for images, audio and video,” IEEE Int ’I Con$ Image Processing, pp. 243-246. Cox, I.J., Kilian, J., Leighton, F. T., and Shamoon, T. (1997) “Secure spread spectrum watermarking for multimedia,” IEEE Trans. Image Processing, vol. 6, pp.1673-1687. Hartung, F. and Girod, B. (1996), “Digital watermarking of raw and compressed video,” SPIE Digital Compression Technologies and Systemsfor video Communications, pp. 205-213. Hartung, F. and Girod, B. (1997), “Digital watermarkmg of MPEG-2 coded video in the bit-stream domain,” IEEE Int’l Conf Acoustics, Speech, and Signal Processing, pp. 2621-2624. Hartung, F. and Girod, B. (1998), “Watermarking of uncompressed and compressed video,” Signal Processing, vol. 66, pp. 283-301. Hernandez, J., Amado, M., and Perez-Gonzalez, F. (2000), “DCTDomain watermarking techniques for still images: Detector performance analysis and a new structure,” IEEE Trans. Image Processing, vol. 9, pp. 55-68.
Hsu, C.-T. and Wu, J.-L. (1996), “Hidden signatures in images,” IEEE Int ’I ConJ:Image Processing, pp. 743-746.
Watermarking Based o n Transform Domain
161
Hsu, C.-T. and Wu, J.-L. (1999), “Hidden digital watermarks in images,” IEEE Trans. Image Processing, vol. 8, pp. 58-68. Jameson, D. and Hurvich, L. (1972), Handbook of sensory physiology, Springer Verlag. Koch, E. and Zhao, J. (1995), “Towards robust and hidden image copyright labeling,” IEEE Workshop on Nonlinear Signal and Image Processing, pp. 452-455. Langelaar, G.C., Setyawan, I., and Lagendijk, R.L. (2000), “Watermarking digital image and video data: A state-of-the-art overview,” IEEE Signal Processing Magazine, vol. 17, pp. 20-46. Lin, C.Y. and Chang, S.F. (2001), “A robust image authentication method distinguishing JPEG compression from malicious manipulation,” IEEE Trans. Circuits and Systems for video Technology, V O ~ .11, pp. 153-168. Nikolaidis, A. and Pitas, I. (2003) “Asymptotically optimal detection for additive watermarking in the DCT and DWT domains,” IEEE Trans. Image Processing, vol. 12, pp. 563-571.
0 Ruanaidh, J.J.K., Dowling, W.J., and Boland, F.M. (1996), “Phase watermarlung of digital images,” Proc. IEEE Znt ’1 Conf Image Processing, vol. 3, pp. 239-242. Oppenheim, A.V. and Schafer, R.W. (1999), Discrete-time signal processing, 2nd Ed. ,Prentice-Hall, Upper Saddle River: NJ. Pitas, I., (1996), “A method for signature casting on digital images,” IEEE Int’l Con$ Image Proc., pp. 215-218. Pitas, I., (1 998), “A method for watermark casting on digital images,” IEEE Circuits and System for Edeo Tech., pp. 775-780.
162 H.-C. Huang, J.S. Pan, and H.-M. Hang
Podilchuk, C.I. and Zeng, W. (1997), “Digital image watermarking using visual models,” SPIE Human Vision & Electronic Imaging ’97,pp. 100-111. Podilchuk, C.I. and Zeng, W. (1998), “Image-adaptive watermarking using visual models,” IEEE Journal on Selected Areas in Communications, vol. 16, pp. 525-539. Rao, R.M. and Bopardikar, A.S. (1998), Wavelet transforms: introduction to theory and applications, Addison-Wesley, Reading: MA. Rao, K.R. and Yip, P. (1990), Discrete cosine transform: algorithms, advantages, applications, Academic Press, San Diego: CA. Safranek, R.J., Johnston, J.D., Jayant, N.S. and Podilchuk, C. (1990), “Perceptual coding of image signals,” Proc. Twenty-fourth Asilomar Conference on Signals, Systems and Computers, pp. 346-350. Serdean, C.V.;, Ambroze, M.A., Tomlinson, M., and Wade, J.G. (2003), “DWT-based high-capacity blind video watermarking, invariant to geometrical attacks,” IEE Proceedings - Vision, Image and Signal Processing, vol. 150, pp. 5 1-58. Solachidis, V. and Pitas, I. (200 l), “Circularly symmetric watermark embedding in 2-D DFT domain,” IEEE Trans. Image Processing, VOI. 10, pp. 1741-1753. Suthaharan, S., Kim, S.W., Lee, H.K., and Sathananthan, S. (2000), “Perceptually tuned robust watermarking scheme for digital images,,, Pattern Recognition Letters, vol. 21, pp. 145-149. Swanson, M.D., Kobayashi, M., and Tewfik, A.H. (1998), “Multimedia data-embedding and watermarking technologies,” Proceedings ofIEEE, vol. 86, pp. 1064-1087.
Watermarking Based on Transform Domain
163
Swanson, M.D., Zhu B., Chau, B., and Tewfik, A.H. (1997), “Multiresolution video watermarking using perceptual models and scene segmentation,” Proc. IEEE Int ’1 Conf Image Processing, pp. 558-561 Wang, Y., Doherty, J.F., and Van Dyck, R.E. (2002), “A waveletbased watermarking algorithm for ownership verification of digital images,” IEEE Trans. Image Processing, vol. 11, pp. 77-88. Wang, H.J., Su, P.C., and Kuo, C.C. J. (1998), “Wavelet-based digital image watermarking,” Journal of Optics Express, vol. 3, pp.491496. Wolfgang, R.B., Podilchuk, C.I., and Delp, E.J. (1999), “Perceptual watermarks for digital images and video,” Proceedings of IEEE, V O ~ .87, pp. 40-5 1.
This page intentionally left blank
Chapter 7 Watermarking Based on Vector Quantization Chin-Shiuh Shieh, Hsiang-Cheh Huang, Zhe-Ming Lu, and Jeng-Shyang Pan Vector quantization had been distinguished for its high compression rate in lossy data compression applications. To be of practical significance, a digital watermarking technique should take into account the effect of vector quantization compression. In this chapter, we will review two important streams of watermarking t e c h q u e s based on vector quantization. In one of them, embedded information is implicitly carried in the codeword indices. In the other, signature binding together the watermark and the original image was generated for certification purpose. Some discussions on possible extensions to existing approaches are given at the end of this chapter.
1
Introduction
To reduce the space requirement for storage and the bandwidth requirement for communication, wide variety of compression techniques had been developed (Sayood 2000). For multi-media applications, less significant information can be sacrificed for hgher compression rate, since human sensory system is less sensitive to detail information. In t h s kind of applications, vector quantization (Gersho et al. 1992) had received considerable attention for its high compression rate and its essential role in various compression applications. As an extension to scalar quantization, vector quantization 165
166 C.-S. Shieh et al.
Codewords co,...,CALI
co,...,CN-I
Figure 1. A block diagram for vector quantization.
works on vectors of raw data. A vector can be fixed number of con. secutive samples of audio data or a small block of imagehide0 data for example, the gray-level values of a 4 x 4 pixel image block forms a 16-dimentional vector. Figure 1 gives an illustration of the operation of vector quantization compression. In the sender (compression) end, the codeword search process looks for a “nearest” codeword, Ci, from the codebook for the given input vector X,.Euclidean distance was used in the search process to measure the distance between two vectors as indicated in Equation (1).
i = argmin, D(X,,C,),j= O,...,N-1 where D(V,,V,) =
ll
(1)
(Vlk- V;)’ is the Euclidean distance be-
tween two K-dimensional vectors V, and V2,and ponent of vector V.
vk is the k-th com-
The index of selected codeword is then transmitted to the receiver end. With the same codebook, the decompression process can easily reconstruct vector X;by simple table look-up. Of course, there will be distortion introduced by the compression-decompression process, since X;is only an approximated version of the original X,. If we
Watermarking Based o n Vector Quantization 167
work on 8-bit gray-level image, using a block size of 4 x 4 pixel and a codebook of 256 codewords, then the compression ratio is up to 4x4~8 =16. log, 256 The codebook plays an essential role in vector quantization. The codebook size, i.e. the number of codewords in a codebook, is a tradeoff between compression quality and compression rate. The codewords in the codebook decide the resultant compression distortion. A dedicated procedure is required for the generation of appropriate codebook. One may regards the problem of codebook generation as a problem of finding N most representative vectors, C,,i= 0,..., N-1 , from A4 given training vectors, X j ,j = O ,...,A4 - 1 . The located Ci's serve as codewords used to partition the A4 given data into N mutual exclusive clusters, Si,i= 0, ..., N - 1 . A given vector X j is considered to belong to cluster Si if Ciis the nearest codeword to X j . Among other alternatives, LBG algorithm (Linde et al. 1980) is widely used in various applications. The following pseudo code illustrates the operation of LBG algorithm: ; Pseudo Code for LBG Algorithm Randomly pick up Nvectors from M training vectors as initial codewords DO Conduct clustering using the current codewords
Calculate new centers
ci for
each cluster UNTIL Improvement falls below threshold &
s,
Figure 2 is an example for the operation of vector quantization. Figure 2 (a) is the 8-bit gray-level original image of sue 256x256 pixel. Using a block size of 4 x 4 pixel, there are
168 C.-S. Shzeh et al.
256x 256 = 4096 training vectors. These training vectors are then 4x4 applied to LBG algorithm to generate a codebook of 256 codewords. The result of compression is given in Figure 2 (b). An objective measure, Peak Signal to Noise Ratio (PSNR), as defined in Equation (2), is used to judge the difference between two images.
where MSE is the Mean-Square-Error of gray-level values of corresponding pixels.
(4 (b> Figure 2 Sample images (a) before and (b) after vector quantization. PSNR: 29.03 dB
In response to the increasing demand of distributing multi-media clips among the Internet, watermarking technology has received
Watermarking Based o n Vector Quantization
169
considerable attention in recent years. Aimed at copyright protection, arbitration, and authentication, watermarking is the process of embedding extra information into a media clip. There have been a vast number of established methods (Barnett 1999, Katzenbeisser et al. 2000). However, it is still far from trivial to make the embedded watermark robust. Various criteria are addressed in judging a watermarking techque, such as perceptibility, security, embedding rate, whether original clip is required for extraction, robustness to common signal processing or intentional attacks, and so on. According to the hiding domain, digital watermarlung can be roughly classified into spatial-domain based methods (Van Schyndel et al. 1994) and transform-domain based methods. Transformdomain approaches have been intensively studied, such as discrete cosine transform (DCT) (Cox et al. 1997), discrete Fourier transform (DFT) (O’Ruanaidh et af. 1996), discrete wavelet transform (DWT) (Pan et al. 2003), and the C W - Z transform (Pereira et af. 2000). Recently, several researchers pay much attention to explore the hiding scheme based on vector quantization. The watermarking methods based on vector quantization include hiding the secrete information into the codeword indices (Lu et al. 2000, Lu et al. 2000), and the secrete keys (Huang et al. 2001, Huang et al. 2002). We will review these methods in subsequent sections. Before we proceed, an abstraction of the paradigms to be discussed is given in the block diagram in Figure 3. In the upper path, carrier image is actually modified according to the watermark information to be embedded. In the lower path, a secret key (signature) binding the watermark and the essential characteristics of the carrier image is generated for certification. Both approaches contribute to the copyright protection of the original image from different perspectives.
170
C.3. Shieh e t al.
Figure 3. An abstraction of two watermarking paradigms.
2
Watermarking Scheme Based on VQ Indices
To be robust against vector quantization compression, a watermarking scheme should conduct the embedding process in vector quantization domain. Lu et al. (2000) had pioneered the idea of carrylng watermark information in codeword indices. The basic idea can be illustrated using a simplified scenario as shown in Figure 4. The original codebook is partitioned into clusters, such that similar codewords are grouped into the same cluster. For a given input block Xiwhose nearest codeword is in cluster S, , the actual index to be sent is not the index of the nearest codeword, but dependents on the watermark information to be embedded. For example, index of C,,, 1 is sent if bits “01” is to be embedded, index of Cj+3is sent if bits “1 1” is to be embedded. In so doing, watermark information is implicitly carried in the codeword indices. With the same knowledge on the codebook and the partitioning, the receiver end is capa-
Watermarking Based on Vector Quantization
c,, c,+,,c,+2,
171
c,+3
Figure 4. Codebook partitioning for carrying watermark in indices.
ble of extracting the embedded watermark. There is extra distortion induced in the embedding process, however, the error can be well controlled if adequate partitioning is used. We now proceed with a more detail description of the index-based approach. Let D(C,,C j ) be the Euclidean distortion between C,and C j . Let S = {Sl,S2, ..., S M }denote a partition of the codebook C for a given threshold Dlh>O and the partition S satisfies the following four conditions: 1. llSiII= 2""), where llSiII denotes the number of codewords in
Siand n(i) is a natural number.
2. ' d i , j , l < i , j < M , i f i f j , then S i n s j = @ . 3
for
and
M
4.
s =us;. i=l
Assume the original image X is divided into T blocks, i.e., X = {X,, X,,.. X , } , where X, is a K-dimensional vector for t =1,2, ..., T and the watermark information is a bit sequence B=(b,,b, ,...,b,) , where b, E {0,1}, l l i I W . A partition S of
172
C.3.Shieh et al.
codebook C for a certain threshold Dfhis obtained before encoding and the partition S is used as the secret key not only in the embedding process but also in the watermark extraction. The embedding process can be performed block by block. For each input block X,, 1I tI T , the embedding process can be expressed as follows:
Step 1: Find the nearest codeword Ci for X,. Step 2: Find the corresponding index j o f Ciin S , to which Ci belongs. Step 3: Translate the n(p)-bit binary watermark information into an integer g . For example, if n(p)=4, i.e., the cluster S , includes 16 codewords, and the four watermark bits are ‘lOOl’,then g=9. Step 4: Obtain the watermarked index 1 whose corresponding P
codeword is also in S,. If j+gScIISqII-l, set Z = j + g ; otherwise, set 1 = j
+ g - IS, 1 .
q=l
Step 5: Find the corresponding codeword C, whose index is 1 in S . Step 6: C, is used to build the watermarked image.
Obviously the PSNR between the original image and the watermarked image depends on both the techtuque of codebook design and the quality of the partition S . The extraction process for each watermarked image block can be stated as follows:
Step 1: Find the nearest codeword Cifor the original image block and the nearest codeword C, for the corresponding watermarked image block. If Ciand C, are not in the same cluster, the watermarked image has been altered.
Watermarking Based o n Vector Quantization
173
Step 2: Find the index j for Ci and the index 1 for C,, in S . If I - j 2 0 , set g = l - j . Otherwise, set g=l-j+llS,II.
Step 3: Translate the integer g into the n(p)-bit binary watermark information. The above three steps are performed for all the watermarked image blocks, and then all extracted bits are pieced together into the extracted watermark.
Lu et al. (2000) reported that, for a 512x512 8-bit gray-level test image, 33199 bits can be embedded and then perfectly extracted, with only slightly degradation on image quality, from PSNR=3 1.22dB to PSNR=30.96dB. They used LBG algorithm to generate a codebook of size 512, and used Tabu search (Glover et al. 1997) together with local heuristics to obtain M=98 codeword clusters at a distortion threshold Dth =135. In their work, clusters with different size are allowed in the partitioning, a cluster with size s can be used to embedded log, s bits. Original image is required for watermark extraction in above approach. Lu, Pan, and Sun (2000) had proposed a new scheme, called codebook expansion, to relief the requirement. The basic idea is to expand the codebook from size N to 2N.The simplest way is to slightly perturb or even directly copy N codewords to get 2N codewords. Codewords Ciand Ci+,,, are grouped into the same cluster for i from 0 to N-1. Obviously the PSNR value between the watermarked image and the original image for codebook size 2N will be closed or exactly the same. With the proposed scheme, the extraction of embedded watermark can be done without the need of the original image. Moreover, the distortion threshold Dlh can be easily controlled, and the number of embedded watermark bits in each index is fixed. The codebook ex-
174
C.-S. Shieh et al.
pansion is not limited to the doubling of codebook size. With the same philosophy, we may expand a codebook of size N to size 2”N if there are s bits to be embedded for each index. The philosophy of carrying watermark implicitly in codeword indices has also been successfully applied to other VQ-descendant compression scheme, such as VQ-BTC (Lu, Liu, and Sun, 2002). Another interesting watermarking scheme, called semi-fragile watermarking, based on the concept of index constrained VQ, was introduced by Lu, Liu, Xu, and Sun (2003). The core idea of index constrained VQ is as follows: suppose that each index has n bits, then we can select an embedding position from n candidate positions. Assume that we select position m to embed the watermark bit, where 05 m i n-1. Unlike the normal vector quantization encoder, the embedding process for each watermark bit can be performed by searching the best match codeword C, for each input vector Xi under the constraints that the m-th bit of indexp is equal to the watermark bit to be embedded. As a result, the proposed watermarking scheme is robust to vector quantization compression and P E G compression, but fragile to others.
3
Watermarking Scheme Based on Secret Keys
Digital signature is intensively used in modern security systems (Stallings 1999) to ensure data integrity. The document to be protected is applied to some secure one-way hashing hnctions to generate a small digest closely related to the given document. The document together with the associated digest was then sent to intended receiver. The receiver conducts the same process to check for data integrity. In practice, a good h a s h g function, such as SHA-1 (Stallings 1999), will make it impossible that a document is modified by the opponent but not detected.
Watermarking Based on Vector Quantization 175
Huang et al. (2001, 2002) had explored a similar architecture but with a different appeal in mind. The digital signature in security syst e m is intended to be fi-agile, so that any modification can be detected. Huang et al. (2001, 2002)’s works try to capture the essential characteristics of the image to be protected, and generate a secret key based on the features captured and the watermark to be embedded. If the features we used are good enough, these features will survive even under severe attacks and so does the embedded watermark. Figure 5 is a block diagram for the approach proposed by Huang et al. (2001). Assume that the binary-valued watermark to be embedded is W , having size M , x N , . In order to survive the picture-cropping attacking, we permute the watermark to disperse its spatial relationships. With a pre-determined key, key,, we have
Figure %Watermarking scheme based on secret keys.
W, =permute(W,key,)
(3)
176
C.3. Shieh et al.
Assume that block (m,n) of the original image X is x ( m , n ) . After performing VQ, the indices Y and y(m,n) can be represented by
To capture the essential characteristics of the original image, we calculate the variance of y ( m , n ) and the indices of its surrounding blocks using
The polarities based on the variances can be decided with a predetermined threshold value by
m=O
n=O
where
P(m,n) =
1, if o2(m,n) 2 threshold; 0, otherwise.
(7)
A threshold value of half of the codebook size is used. Then, we are able to bind W, with P by the exclusive-or operation key, = W, OP.
(8)
After the inverse-VQ operation, the reconstructed image, X’ , along with the secret key, key,, work together to protect the ownership of the original image. The image quality of X’ is good, and it
Watermarking Based o n Vector Quantization 177
would not be influenced by the information conveyed in the watermarks because of hding the information into the secret key. In extracting the watermarks, we calculate the estimated polarities P’ fi-om X’ first, and then have the exclusive-or operation with key, to get an estimate of the permuted watermark
W(, = P’Okey,.
(9)
Finally, we can perform the inverse permutation to get the extracted watermark
W’ = inverse-permute (WL, key,).
(10)
Optimistic result was obtained in their experiments, as shown in Figure 6. The proposed approach is robust to P E G compression, cropping attack, low-pass filter, median filter (Gonzales et al. 1992), and even VQ compression with codebooks trained by some other images. An objective measure, Normalized Cross-correlation (NC), is de-
fined to judge the difference between the embedded watermark and extracted watermark.
178
C.-S. Shieh et al.
(a) No attack, NC=1 .O
:=1
.o
(c) VQ, Codebook 2, NC=0.9707
A *
(e) JPEG, QF=60%, NC=0.9888
(f) Image cropping, _ . NC=0.8604
(h) Median filter, NC=0.9848 (g) Low-pass filter, NC=0.9745 Figure 6. The extracted watermarks and the NC values of the proposed algorithm under various attacking methods.
4
Discussions and Conclusions
All kind of watermarking schemes has its strength and weakness. To increase the robustness against intentional attacks or common image processing, one should consider the integration of different watermarking schemes and the embedding of multiple watermarks from different domains.
Watermarking Based on Vector Quantization
179
The works of Sheh et al. (2003) and Huang et al. (2002) are representatives alone t h s direction. The system block diagram for the work of Shieh et al. (2003) is given in Figure 7. Those components in dashed block embedded watermark W , into carrier image X using an approach similar to that discussed in Section 2. The difference is that they used genetic algorithm (Holland 1975, Goldberg 1987) for the codebook partitioning. The rest components, following the philosophy discussed in Section 3, worked cooperatively to generate a secret key, key3, whch binding together the watermark W, and essential attributes of the carrier image in DCT domain. With the proposed approach, either watermark can be extracted independently. Experiment results revealed that both or one of the watermarks can survive tested attacks. The robustness of the entire system is therefore increased. These results imply that the integration of different watermarking schemes and the embedding of multiple watermarks from different domains is a promising direction for hgher robustness. The codebook partitioning is critical for the scheme discussed in Section 1. The PSNR between the original image and the watermarked image depends on the quality of partitioning. For the case that there are two codewords in each cluster, the partitioning prob...........................................................................
n
........................................................................
Figure 7. System block diagram for embedding multiple watermarks.
180
C.-S. Shieh et al.
lem can be formulated as the following minimization problem:
m=O
where Cm and Ckare the codewords belonged to the same cluster for watermarking. Higher PSNR is possible if we take into account the training vectors in the process of codebook partitioning. Assume that there are T training vectors, the object function subject to minimization becomes:
D , = z D ( X , , C i ) , t = l ,...,T ,
(13)
X,EPm
where P, is the m-th partitioned set for the training data X , ; C , is the codeword for the m-th partitioned set; C, and C i are the codewords belonged to the same cluster for watermarking. Conventionally, codebook generation and codebook partitioning is done separately, looking for best configuration in individual stage. However, individual optima cannot guarantee a best overall performance for the purpose of digital watermarking. The problem of codebook design and codebook partitioning can be jointly solved by minimizing the following object function:
With Equation (14), two sub-problems will be solved simultaneously and jointly, watermarking scheme with higher PSNR value can then be expected.
Watermarking Based o n Vector Quantization
181
References Barnett, R. (1999), “Digital watermarking applications, techques, and challenges,” IEE Electronics & Communication Engineering Journal, vol. 11, pp. 173-183.
Cox, I., Kilian, J., Leighton, F.T., and Shamoon, T. (1997), “Secure spread spectrum watermarking for multimedia,” IEEE Transactions on Image Processing, vol. 6, pp. 1673-1687. Gersho, A. and Gray, R.M. (1992), Vector Quantization and Signal Compression, Kluwer Academic Publishers. Glover, F. and Laguna, M.O. (1997), Tabu Search, Kluwer Academic Publishers. Goldberg, D. (1987), Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley. Gonzales. R.C. and Woods, R.E. (1992), Digital Image Processing, Addison-Wesley. Holland, J. (1975), Adaptation in Natural and Artificial Systems, University of Michigan Press. Huang, H.C., Wang, F.H., and Pan, J.S. (2001), “An efficient and robust watermarking algorithm with VQ,” Electronics Letters, V O ~ .37, pp. 826-828. Huang, H.C., Wang, F.H., and Pan, J.S. (2002), “An adaptive and robust watermarking method based on VQ,” Chinese Journal of Electronics, vol. 11, pp. 158-162. Huang, H.C., Wang, F.H., and Pan, J.S. (2002), “A VQ-based robust multi-watermarking algorithm,” IEICE Transactions on
182
C . 3 Shieh
et al.
Fundamentals of Electronics, Communication and Computer Sciences, vol. 85-A, pp. 1719-1726. Katzenbeisser, S. and Petitcolas, F. (2000), Information Hiding Techniques for Stenography and Digital Watermarking, Artech House. Linde, Y., Buzo, A., and Gray, R.M. (1980), “An algorithm for vector quantizer design,” IEEE Transactions on Communication, V O ~ .28, pp. 84-95. Lu, Z.M., Liu, C.H., and Sun, S.H. (2002), “Digital image watermarking technique based on block truncation coding with vector quantization,” Chinese Journal of Electronics, vol. 11, pp. 152157. Lu, Z.M., Liu, C.H., Xu, D.G., and Sun, S.H. (2003), “Semi-fiagile image watermarking method based on index constrained vector quantisation,” Electronics Letters, vol. 39, pp. 35-36. Lu, Z.M., Pan, J.S., and Sun, S.H. (2000), “VQ-based digital image watermarking method,” Electronics Letters, vol. 36, pp. 12011202. Lu, Z.M. and Sun, S.H. (2000), “Digital image watermarkmg technique based on vector quantisation,” Electronics Letters, vol. 36, pp. 303-305. O’Ruanaidh, J.J.K, Dowling, W.J., and Boland, F.M. (1996), “Phase watermarking of digital images,” Proceedings of the IEEE International Conference on Image Processing, vol. 3, pp. 239-242. Pan, J.S, Wu, S.Y, Shieh, C.S., and Shi, Y. (2003), “Genetic zerotree selection for robust watermarking system,” Proceedings of
Watermarking Based o n Vector Quantization 183
the 7th World Multiconference on Systemics, Cybernetics and Informatics. Pereira, S. and Pun, T. (2000), “An iterative template matchmg algorithm using the C W - Z transform for digital image watermarking,” Pattern Recognition, vol. 33, pp. 173-175. Sayood K. (2000), Introduction to Data Compression, 2nd Ed., Morgan Kauhann. Shieh, C.S, Huang, H.C., Wang,F.H., and Pan, J.S. (2003), “An embedding algorithm for multiple watermarks,” Journal of Information Science and Engineering, vol. 19, pp. 381-395. Stallings, W. (1999), Network Security Essentials and Standards, Prentice Hall.
-
Applications
Van Schyndel, R.G., Tirkel, A.Z., Mee, N., and Osborne, C.F. (1994), “A digital watermark,” Proceedings of the IEEE International Conference on Image Processing, vol. 2,pp. 86-90.
This page intentionally left blank
Chapter 8 Audio Watermarking Techniques Hyoung Joong Kim, Yong Hee Choi, Jongwon Seok, and Jinwoo Hong Audio watermarking schemes play very important role in digital copyright protection, authentication, indexing, synchronization of audio and video, and so on (Cox et al. 2002). Available studies on audio watermarking is far less than that of image watermarking or video watermarking since audio watermarking is more difficult to develop than image and video. Watermarking is a process of adding noise in any manner to the host signal. Unfortunately, human ear is far more sensitive than other sensory motors in terms of noise detection capability. One of the difficulties of audio watermarking comes from the sensitivity to embedding noise. However, during the last decade audio watermarking studies have also increased considerably. Those studies have contributed much to the progress of audio watermarking technologies. This chapter surveys those studies and classify them into four categories: spread-spectrum scheme, two-set scheme, replica scheme, and self-marking. First one embeds pseudorandom sequence and detects by calculating auto-correlation. The spread-spectrum scheme belongs to this class. Second one exploits the differences between two or more sets. The patchwork scheme belongs to this category. Third one uses the replica of the original audio clips both in embedding and detection phases. The replica modulation is an example of this class. Last one is self-marking scheme which can be used especially for synchronization or for robust watermarking, for example, against time-scale modification attack. Such four seminal works have improved watermarking schemes remark185
186 H. J. Kim et al.
ably. However, more sophisticated technologies are required, and expected to be achieved in the next decade. In all the schemes synchronization is a necessary preprocessing step before detection. Various synchronization mechanisms are surveyed. Conventional categorization of audio watermarking includes spread-spectrum, echo-hiding, patchwork, and so on (Bender et al. 1996). However, now tens of audio watermarking schemes are available. Some of them are new and not fit into the taxonomy. Thus, in this chapter classify them into four categories.
1
Introduction
Audio watermarks are special signals embedded into digital audio. These signals are extracted by detection mechanisms and decoded. Audio watermarking schemes rely on the imperfection of the human auditory system. However, human ear is much more sensitive than other sensory motors. Thus, good audio watermarking schemes are difficult to design. Even though the current watermarking techniques are far from perfect, during the last decade audio watermarking schemes have been applied widely. These schemes are sophisticated very much in terms of robustness and imperceptibility (Bender et al. 1996, Cox et al. 2002, Cox and Miller 2002). Such progress was in part due to the technical evaluations under the flag of SDMI (Secure Digital Music Initiative) in 2000, or by the name of STEP2000 in 2000 and STEP2001 in 2001 by JASRAC (Japanese Society for Rghts of Authors, Composers and Publishers). Many commercial watermarking techniques had been submitted and competed under the relatively fair technical evaluation criteria during 2000-200 1. The most valuable contribution of those competitions was to make watermarking developers and clients such as RIAA (Recording Industry Association of America) understand the state-of-the-art of audio watermarking techniques. Among many attacks, the time-scale and frequency-
Audio Watermarking Techniques 187
scale modulations were identified to be most difficult to defend. In addition, those watermarking schemes were evaluated each round by a single attack, but not by multiple attacks simultaneously. Thus, it is quite questionable that the finally selected schemes can successfully detect watermarks under multiple attacks. Early results on the attacks against various watermarking schemes (Petitcolas et al. 1998) have warned that the watermarlung technologies have a long way to go. Unfortunately, this warning became a reality. Open challenge to the selected algorithms under the flag of the “hackSDMI” showed that they were far from perfect. The SDMI posted one pair of unwatermarked and watermarked audio as a reference, and no more information was available about what scheme or what key was used. Even though the pair of audio was insufficient as a clue for attack, some reports (Boeuf and Stern 2001, Craver et al. 2001, Craver et al. 2002, Wu et al. 2001) have claimed that the watermarking schemes were broken. This is a kind of knownplaintext cryptanalysis where the opponent possesses a unwatermarked audio s ( n ) and the corresponding watermarked audio z ( n ) . Many lessons were learned from the event. It has been shown through the challenge that systematic analysis would be a serious threat to watermarking schemes. Non-blind watermarlung schemes are theoretically interesting, but not so useful in practical use, since it requires double storage capacity and double communication bandwidth for watermark detection. Of course, non-blind schemes may be useful as copyright verification mechanism in a copyright dispute (and even necessary, see (Craver et al. 1998) or inversion attacks). On the other hand, blind watermarking scheme can detect and extract watermarks without use of the unwatermarked audio. Thus, it requires only a half storage capacity and half bandwidth compared with the non-blind watermarking scheme. Hence, only blind audio watermarking schemes are considered in this chapter. Needless to say, the blind watermarking methods need self-detection mechanisms for detecting watermarks without unwa-
188 H. J. Kim et al.
termarked audio. This chapter presents basically four audio watermarking schemes. First one is the spread-spectrum method based on the similarity between watermarked audio and pseudo-random sequence. Second one is the two-set method based on differences between two or more sets, which includes the patchwork scheme. Third one is the replica method using the close copy of the original audio, which includes the replica modulation scheme. Last one is the self-marking scheme. Of course, much more schemes and their variants are available. For example, time-base modulation (Foote and Adcock 2003) is theoretically interesting. However, since this mechanism is a non-blind watermarking scheme, we exclude it from this chapter. Audio watermarking scheme that encodes compressed audio data (Nahrstedt and Qiao 1998) is also excluded, since this scheme does not embed real watermarking signal into raw audio. Furthermore, no psychoacoustic model is available in the compressed domain to enable the adjustment of the watermark to ensure inaudibility.
2
Spread-Spectrum Method
Spread-spectrum watermarlung scheme is an example of the correlation method which embeds pseudo-random sequence and detects watermark by calculating correlation between pseudo-random noise sequence and watermarked audio signal. Spread-spectrum scheme is the most popular scheme and has been studied well in literature (Boney et al. 1996, Cox et al. 1996, Cvejic et al. 2001, Kirovski and Malvar 200 1, Kim 2000, Lee and Ho 2000, Seok et al. 2002, Swanson et al. 1998). This method is easy to implement, but has some serious disadvantages: it requires time-consuming psycho-acoustic shaping to reduce audible noise, and susceptible to time-scale modification attack. (Of course, usage of psychoacoustic models is not limited to spread-spectrum techniques.) Basic idea of this scheme and implementation techniques are described below.
Audio Watermarking Techniques 189
Figure 1. A typical embedder of the spread-spectrum watermarking scheme.
2.1
Basic Idea
This scheme spreads pseudo-random sequence across the audio signal (see Figure 1). The wideband noise can be spread into either timedomain signal or transform-domain signal no matter what transform is used. Frequently used transforms include DCT (Discrete Cosine Transform), DFT (Discrete Fourier Transform), and DWT (Discrete Wavelet Transform). The binary watermark message w = {0,1} or its equivalent bipolar variable b = {-1, +1} is modulated by a pseudo-random sequence r ( n ) generated by means of a secret key. Then the modulated watermark w(n) = br(n) is scaled according to the required energy of the audio signal s ( n ). The scaling factor a controls the trade-off between robustness and inaudibility of the watermark. The modulated watermark w(n) is equal to either r(.) or -r(n) depending on whether w = 1 or w = 0. The modulated signal is then added to the original audio to produce the watermarked audio
190 H. J. Kim et al.
z ( n )such as
+
z(n)= s ( n ) a w ( n ) . The detection scheme uses linear correlation. Because the pseudorandom sequence r ( n )is known and can be regenerated generated by means of a secret key, watermarks are detected by using correlation between z ( n )and r ( n )such as
c
c = .L z(Z)r(Z), N z=1 .
(1)
where N denotes the length of signal. Equation (1) yields the correlation sum of two components as follows: c=-
c N l N Z=l
l N +cabr2(i). N.
s(i)r(i)
(2)
2=1
Assume that the first term in Equation (2) is almost certain to have small magnitudes. If those two signals s ( n ) and r ( n ) are independent, the first term should vanish. However, it is not the case. Thus, the watermarked audio is preprocessed as is shown in Figure 2 in order to make such assumption valid. One possible solution is filtering out s (n)from z(n). Preprocessing methods include high-pass filtering (Hartung and Girod 1998, Haitsma et al. 2000), linear predictive coding (Seok et al. 2002), and filtering by whitening filter (Kim 2000). Such preprocessing allows the second term in Equation (2) to have a much larger magnitude and the first term almost to be vanished. If the first term has similar or larger magnitude than the second term, detection result will be erroneous. Based on the hypothesis test using the correlation value c and the predefined threshold T,the detector outputs 1 ifc>r 0 ifc
Audio Watermarking Techniques 191
Watermarked Audio
r(n)
Pseudo-Random Sequence
Figure 2. A typical preprocessing block for detector of the spread-spectrum watermarking scheme.
Typical value of T is 0. The detection threshold has a direct effect both on the false positive and false negative probabilities. False positive means a type of error in which a detector incorrectly determines that a watermark is present in a unwatermarked audio. On the other hand, false negative is a type of error in which a detector fails to detect a watermark in a watermarked audio.
2.2
Pseudo-Random Sequence
Pseudo-random sequence has statistical properties similar to those of a truly random signal, but it can be exactly regenerated with knowledge of privileged information (see Section 2.1). Good pseudorandom sequence has a good correlation property such that any two different sequences are almost mutually orthogonal. Thus, crosscorrelation value between them is very low, while auto-correlation value is moderately large. Most popular pseudo-random sequence is the maximum length sequence (also known as M-sequence). This sequence is a binary sequence r ( n ) = {0,1} having the length N = 2" - 1 where m is the size of the linear feedback shift register. This sequence has very nice auto-correlation and cross-correlation properties. If we map the binary sequence r ( n ) = ( 0 , l ) into bipolar sequence r ( n ) = { -1, tl},auto-correlation of the M-sequence is given as
192
H. J . Kim et al.
follows: 1 -
c
N-1
N a=O .
T(i)T(i -
k) =
/N
-1
ifk=O otherwise
(3)
The M-sequences have two disadvantages. First, length of the Msequences, which is called chip rate, is strictly limited to as given by 2" - 1. Thus, it is impossible to get, for example, nine-chip sequences. Length of the typical pseudo-random sequences is 1,023 (Cvejic et al. 2001) or 2,047. There is always a possibility to make the trade-off between the length of the pseudo-random sequence and robustness. However, very short sequences such as length 7 are also used (Liu et al. 2002). Second, the number of different Msequences is also limited once the size rn is determined. It is shown that M-sequence is not secure in terms of cryptography. Thus, not all pseudo-random sequences are M-sequences. Sometimes, non-binary and consequently real-valued pseudo-random sequence r ( n ) E R with Gaussian distribution (Cox et al. 1996) is used. Non-binary chaotic sequence (Bassia et al. 2001) is also used. As long as they are non-binary, its correlation characteristic is very nice. However, since we have to use integer sequences (processed such as [ a r ( n ) ] ) due to finite precision, correlation properties become less promising.
2.3
Watermark Shaping
Carelessly added pseudo-random sequence or noise to audio signal can cause unpleasant audible sound whatever watermarking schemes are used. Thus, just reducing the strength a of pseudo-random sequence cannot be the final solution. Because human ears are very sensitive especially when the sound energy is very low, even a very little noise with small value of a can be heard. Moreover, small a makes the spread-spectrum scheme not robust. One solution to ensure inaudibility is watermark shaping based on the psychoacoustic model (Arnold and Schilz 2002, Bassia et al. 2001, Boney
Audio Watermarking Techniques 193
et al. 1996, Cvejic et al. 2001, Cvejic and Seppanen 2002). Inter-
estingly enough, the watermark shaping can also enhance robustness since we can increase the strength a sufficiently as long as the noise is below the margin.
-10
I 20
5b
IbO
2b0
I 500
I 1,000
I 2,000
I 5,000
I 10,000 20,( I0
Frequency (Hz)
Figure 3. A typical curve for masking. Noise sound below the solid line or bold line is inaudible. Bold line is moved upward by taking masking effects into consideration.
Psycho-acoustic models for audio compression exploit frequency and temporal masking effects to ensure inaudibility by shaping the quantized noise according to the masking threshold. Psycho-acoustic model depicts the human auditory system as a frequency analyzer with a set of 25 bandpass filters (also known as critical bands). The required intensity of a single sound expressed in unit of decibel [dB] to be heard in the absence of another sound is known as
194 H. J . Kim et al.
Audible Watermark Signal
Inaudible Watermark Signal
0-10
0
I
0.5
I
1
Frequency (Hz)
I
1.5
I
2
'I
2.5 x 104
Figure 4. An example of noise shaping. Audible noise (dotted line) is transformed into inaudible noise (broken line).
quiet curve (Cvejic et al. 2001) or threshold of audibility (Rossing et al. 2002). Figure 3 shows the quiet curve. In this case, the threshold in quiet is equal to the so-called minimum masking threshold. However, masking effect can increase the minimum masking threshold. A sound lying in the frequency or temporal neighborhood of another sound affects the characteristics of the neighboring sound, which phenomenon is known as masking. The sound that does the masking is called masker and the sound that is masked is called the maskee. The psycho-acoustic model analyzes the input signal s ( n )in order to calculate the minimum masking threshold T . Figure 4 shows inaudible and audible watermark signals. The audible watermark signal can be transformed into inaudible signal by applying watermark
Audio Watermarking Techniques 195
shaping based on the psycho-acoustic model. The frequency masking procedure is given as follows:
1. Calculate the power spectrum. 2. Locate the tonal (sinusoid-like) and non-tonal (noise-like) components. 3. Decimate the maskers to eliminate all irrelevant maskers.
4. Compute the individual masking thresholds. 5. Determine the minimum masking threshold in each subband. This minimum masking threshold defines the frequency response of the shaping filter, which shapes the watermark. The filtered watermark signal is scaled in order to embed the watermark noise below the masking threshold. The shaped signal below the masking threshold is hardly audible. In addition, the noise energy of the pseudorandom sequence can be increased as much as possible in order to maximize robustness. The noise is inaudible as far as the noise power is below the masking threshold T . Temporal masking effects are also utilized for watermark shaping. Watermark shaping is a time-consuming task especially when we try to exploit the masking effects frame by frame in real-time because watermark shaping filter coefficients are computed based on the psycho-acoustic model. In this case, we have to use Fourier transform and inverse Fourier transform, and follow the five steps described above. Needless to say, then detection rate increases since robustness of the watermark increases. However, since it is too timeconsuming, watermark shaping filter computed based on the quiet curve can be used. Since this filter exploits the minimum noise level, it is not optimal in terms of the watermark strength a. This results in a strong reduction of the robustness.
196 H . J.
Kim et al.
Of course, instead of maximizing the masking threshold, we can increase the length of the pseudo-random sequence for the robustness. However, this method reduces the embedding message capacity.
2.4
Sinusoidal Modulation
Another solution is the sinusoidal modulation based on the orthogonality between sinusoidal signals (Liu et al. 2002). Sinusoidal modulation utilizes the orthogonality between sinusoidal signals with different frequencies 1 N-1 27rim 27rin 1 ifm=n
N a=O .
0 otherwise
Based on this properties, the sinusoidally modulated watermark can be generated by adding sinusoids with different frequencies by pseudo-random sequences (Liu et al. 2002). Note that watermark signal modulated by the elements of pseudo-random sequence keeps the same correlation characteristics as that of pseudo-random sequence in Equation (3). This sinusoidal modulation method has following advantages. First, watermark embedding and detection can be simply done in the time-domain. Thus, its embedding complexity is relatively low. Second, length of the pseudo-random sequence is very short. Third, the embedded sinusoids always start from zero and end on zero, which minimizes the chance of block noise. Of course, this scheme also need psychoacoustic modulation for inaudibility.
3
Two-Set Method
Blind watermarking scheme can be devised by making two sets different. For example, if two sets are different, then we can conclude that watermark is present. Such decisions are made by hypothesis tests typically based on the difference of means between two sets. Malung two sets of audio blocks have different energies can also be a good solution for blind watermarking. Patchwork
Audio Watermarking Techniques 197
(Arnold 2000, Bender et al. 1996, Ye0 and Kim 2003) also belongs to this category. Of course, depending on the applications we can exploit the differences between two sets or more.
3.1
Patchwork Scheme
Original patchwork scheme embeds a special statistic into an original signal (Bender et al. 1996). The two major steps in the scheme are: (i) choose two patches pseudo-randomly and (ii) add the small constant value d to the samples of one patch A and subtract the same value d from the samples of another patch B. Mathematically speaking,
where ai and bi are samples of the patchwork sets A and B, respectively. Thus, the original sample values have to be slightly modified. The detection process starts with the subtraction of the sample values between two patches. Then, E[a*- b*], the expected value of the differences of the sample means is used to decide whether the samples contain watermark information or not, where zi* and b* are sample means of the individual sample a: and b:, respectively. Since two patches are used rather than one, it can detect the embedded watermarks without the original signal, which makes it a blind watermarking scheme. Patchwork has some inherent drawbacks. Note that
+
E[6*- b*] = E [ ( 6 d ) - (b - d ) ] = E[zi- b]
+ 2d,
where zi and 6 are sample means of the individual sample aiand bi, respectively. The patchwork scheme assumes that E[G*- b*] = 2d due to the prior assumption that random sample ensures that expected values are all the same such that E[zi - b] = 0. However, the actual difference of sample means, zi - 6,is not always zero in practice. Although the distribution of the random variable E[a*- b*] is shifted to the right as shown in Figure 5 , the probability of a
198
H. J . Kim et al.
wrong detection still remains (see the area smaller than 0 in the watermarked distribution). The performance of the patchwork scheme depends on the distance between two sample means and d which affects inaudibility. Furthermore, the patchwork scheme has originally been designed for images.
Unwatermarked Distriburion
Watermarked Distriburion
0
2d
Figure 5. A comparison of the unwatermarked and watermarked distributions of the mean difference.
The original patchwork scheme has been applied to the spatialdomain image (Bender et al. 1996) (or, equivalently, time-domain in audio) data. However, time-domain embedding is vulnerable even to weak attacks and modifications. Thus, patchwork scheme can be implemented in the transform-domain (Arnold 2000, Bassia et al. 200 1,Ye0 and Kim 2003). Their implementations have enhanced original patchwork algorithms. First, mean and variance of the Sample values are computed in order to detect the watermarks. Second, new algorithms assume that the distribution of the sample values is
Audio Watermarking Techniques 199
normal. Third, they try to decide the value d adaptively. Modified Patchwork Scheme (MPA) (Ye0 and Kim 2003) is described below: 1. Generate two sets A = {ai} and B = { bi} randomly. Calculate ai and 6 = N-l bi, respecthe sample means = N-l tively, and the pooled sample standard error
xzl
CZN,l(Ui- a)2
S=\l
+ CE1(bi-
6)2
N ( N - 1)
2. The embedding function presented below introduces an adaptive value change, a; = ai + sign(ii - 6 ) @ ~ / 2 bf = bi - sign(a - h ) e S / 2
(4)
where C is a constant and “sign” is the sign function. This function makes the large value set larger and the small value set smaller so that the distance between two sample means is always bigger than d = f i S as shown in Figure 6.
3. Finally, replace the selected elements ai and bi by a: and bi. Since the proposed embedding function (4)introduces relative distance changes of two sets, a natural test statistic which is used to decide whether or not the watermark is embedded should concern the distance between the means of A and B. In this section, we present the detecting scheme and investigate the statistical properties. The decoding process is as follows: 1. Calculate the test statistics
200
H. J. Kim et al. Unwatermarked Distribution
Watermarked Distribution
-d
0
d
Figure 6 . A comparison of the unwatermarked and watermarked distributions of the mean difference by the modified patchwork algorithm.
2. Compare T 2 with the threshold r and decide that watermark is embedded if T 2 > r and no watermark is embedded otherwise.
3.2
Amplitude Modification
This method embeds watermark by changing energies of two or three blocks. Energy of each block of length N is defined and calculated as N
E=
Is(i)l. i= 1
The energy is high when the amplitude of signal is large. Assume that two consecutive blocks be used to embed watermark. We can make the two blocks A and B have the same energies or different energies by modifying the amplitude of each block. Let EA and EB denote the
Audio Watermarking Techniques 201
+
energies of blocks A and B , respectively. If EA 2 EB r, then, for example, we conclude that watermark message m = 0 is embedded. If EA 5 EB - r, then we conclude that watermark message m = 1 is embedded. Otherwise, no watermark is embedded. However, this method has a serious problem. Assume that block A has much more energy than block B and the watermark message to be embedded is 0, then there is no problem at all. Otherwise, we have to make EA larger than EB. As long as the energy difference gap is wide, the resulting artifact becomes obvious and so unnatural to be noticed. This scheme can turn “forte” part into “piano” part, unfortunately, or vice versa. Such problem can be moderated by using three blocks (Lie and Chang 2001) or more. By using multiple blocks, such artifacts can be reduced slightly by distributing the burdens across other blocks.
4
Replica Method
Original signal can be used as an audio watermark. Echo hiding is a good example. Replica modulation also embeds part of the original signal in frequency domain as a watermark. Thus, replica modulation embeds replica, i.e., a properly modulated original signal, as a watermark. Detector can also generate the replica from the watermarked audio and calculate the correlation. The most significant advantage of this method is its high immunity to synchronization attack.
4.1
Echo Hiding
Echo hiding embeds data into an original audio signal by introducing an echo in the time domain such that
+
z ( n )= s ( n ) as(n - d ) .
(5)
For simplicity, a single echo is added above. However, multiple echoes can be added (Bender et al. 1996). Binary messages are em-
202
H. J. Kim et al.
bedded by echoing the original signal with one of two delays, either a do sample delay or a dl sample delay. Extraction of the embedded message involves the detection of delay d. Autocepstrum or cepstrum detects the delay d. Cepstrum analysis duplicates the cepstrum impulses every d samples. The magnitude of the impulses representing the echoes are small relative to the original audio. The solution to this problem is to take auto-correlation of the cepstrum (Gruhl et al. 1996). Double echo (Oh et al. 200 1) such as
+
~ ( n=)~ ( n )Q
S ( ~-
d ) - as(n - d - A).
can reduce the perceptual signal distortion and enhance robustness. Typical value of A is less than three or four samples. Echo hiding is usually imperceptible and sometimes makes the sound rich. Synchronization methods frequently adopt this method for coarse synchronization. Disadvantage of echo hiding is its high complexity due to cepstrum or autocepstrum computation during detection. On the other hand, anybody can detect echo without any prior knowledge. In other words, it provides the clue for the malicious attack. This is another disadvantage of echo hiding. Blind echo removing is partially successful (Petitcolas et al. 1998). Time-spread echo (KO et al. 2002) can reduce such a possibility of attacks. Another way of evading blind attack is auto-correlation modulation (Petrovic et al. 1999) which obtains watermark signal w ( n ) from the echoed signal ~ ( nin)Equation (5). This method is more sophisticated and elaborated in the replica modulation.
4.2
Replica Modulation
Replica modulation (Petrovic 200 1) is a novel watermarlung scheme that embeds a replica, i.e., a modified version of original signal. Three replica modulation methods include frequency-shift, phaseshift, and amplitude-shift schemes. The frequency-shift method transforms s ( n ) into frequency domain, copies a fraction of lowfrequency components in certain ranges (for example, from 1 kHz
Audio Watermarking Techniques 203
to 4 kHz), modulates them (by moving 20 Hz, for example, with a proper scaling factor), inserts them back to the original components (to cover ranges from 1020 Hz to 4020 Hz) and transforms inversely to time domain to generate watermark signal w(n). Since the fiequency components are shifted and added in the frequency domain, we call it “frequency-domain echo” to contrast it with “time-domain echo” - the case where replica is obtained by a time-shift of original (or its portion). Such a modulated signal w(n) is a replica. This replica can be used as a carrier in much the same manner as PN sequence in spread-spectrum techniques. Thus, the watermarked signal has the following form:
z ( n )= s ( n ) + aw(n). As long as the components are invariant against modifications, the replica in the frequency domain can be generated from the watermarked signal. The watermark signal G ( n ) can be generated from the watermarked signal z ( n )by processing it according to the embedding process. Then, correlation between z ( n )and G ( n ) is computed as follows I 1
N
I
-
c = - C S ( i ) G (2)
N z=.
1
N
+N c aw(i)G(i). . z=
(6)
1
to detect watermark. As long as we use frequency band with lower cut-off much larger than frequency shift, and the correlation is done over integer number of frequency shift period, we have very small correlation between s ( n ) and G ( n ) in Equation (6). On the other hand, the spectra of the product w(n)G(n) has a strong dc component, and, thus, c contains a term of mean value of w(n)G(n), i.e., it contains the scaled auxiliary signal in the last term of Equation (6). Note that the frequency-shift is just one way to generate replica. Combination of frequency-shift, phase-shift, and amplitude-shift makes the replica modulation more difficult for malicious attacker to derive a clue, and makes the correlation value between s ( n ) and
204
H. J . Kim et al.
W(n)even smaller. The main advantage in comparison to PN sequence is that chip synchronization is not needed during detection, which makes replica modulation immune to synchronization attack. When an attacker makes a jitter attack (e.g., cuts out a small portion of audio, and splices the signal) against PN sequence techniques, synchronization is a must. On the contrary, the replica modulation is free from synchronization since replica and original give the same correlation before and after cutting and splicing. Of course, the timescaling attacks can affect bit and packet synchronization, but this is much smaller problem than chip synchronization. Pitch-scaling (Shin et al. 2002) is a variant of the replica modulation, which makes it possible that the length of audio remains unchanged, but the harmonics is either expanded or contracted accordingly.
5
Self-Marking Method
Self-marlung method embeds watermark by leaving self-evident marks into the signal. This method embeds special signal into the audio, or change signal shapes in time domain or frequency domain. Time-scale modification method (Mansour and Tewfik 2001) and many schemes based on the salient features (Wu et al. 2000) belong to this category. Clumsy self-marking method, for example, embedding a peak into frequency domain, is prone to attack since it is easily noticeable.
5.1
Time-Scale Modification
Time-scale modification is a challenging attack and can be used for watermarking (Mansour and Tewfik 200 1). Time-scale modification refers to the process of either compressing or expanding the time-scale of audio. Basic idea of the time-scale modification watermarking is to change the time-scale between two extrema (successive maximum and minimum pair) of the audio signal (see Figure 7). The intervals between two extrema are partitioned to N seg-
Audio Watermarking Techniques 205
V
V
(a) Original Signal
Bit “1” is embedded. (Gentle slope)
Bit “0”is embedded. (Steep slope) (b) Time-Scale Modified Signal
Figure 7. The concept of time-scale modification watermarking scheme. Messages, either bit “0” and “l”, can be embedded by changing slopes between two successive extrema.
ments of equal amplitude. We can change the slope of the signal in certain amplitude interval(s) according to the bits we want to embed, which changes the time-scale. For example, the steep slope and gentle slope stand bits “0” and “I” or vice versa, respectively. Advanced time-scale modification watermarking scheme (Mansour and Tewfik 200 1) can survive time-scale modification attack.
5.2
Salient Features
Salient features are special and noticeable signal to the embedders, but common signal to the attackers. They may be either natural or artificial. However, in either case they must be robust against attacks. So far those features are extracted or made empirically. The salient
206
H. J. Kim et al.
features can be used especially for synchronization (see Section 6.3) or for robust watermarking, for example, against time-scale modification attack.
6
Synchronization
Watermark detection starts by alignment of watermarked block with detector. Losing synchronization causes false detection. Time-scale or frequency-scale modification makes the detector lose synchronization. Thus, most serious and malicious attack is probably the desynchronization. All the watermarking algorithms assume that any detector be synchronized before detection. Brute-force search is computationally infeasible. Thus, we need fast and exact synchronization algorithms. Some watermarking schemes such as replica modulation or echo hiding are rather robust against certain type of desynchronization attacks. Such schemes can be used as a baseline method for coarse synchronization. Synchronization code can be used to synchronize the onset of the watermarked block. However, refined synchronization scheme design is not simple. Clever attackers also try to devise sophisticated methods for desynchronization. Thus, synchronization scheme should also be robust against attacks and fast. There are two synchronization problems. First one is to align the starting point of a watermarked block. This approach is applied to the attacks such as cropping out or inserting redundancy. For example, a garbage clip can be added to the beginning of audio intentionally or unintentionally. Some MP3 encoders unintentionally add around 1,000 samples, which makes innocent decoder fail to detect exact watermarks. Second one is time-scale and frequency-scale modifications, intentionally done by malicious attackers or unintentionally done by the audio systems (Petrovic et al. 1999), anyway which are very difficult to cope with. Time-scale modification is a time-domain attack that adds fake samples periodically into target audio or delete samples periodically (Petitcolas et
Audio Watermarking Techniques 207
al. 1998) or uses sophisticated time-scaling schemes (Arfib 2002) (Dutilleux 2002) to keep pitches. Thus, audio length may be increased or decreased. On the other hand, frequency-scale modification (or pitch-scaling) adjusts frequencies and then applies timescale modification to keep the size unchanged. This attack can be implemented by sophisticated audio signal processing techniques (Arfib 2002) (Dutilleux 2002). Aperiodic modification is more difficult to manage. There are many audio features such as brightness, zero-crossing rate, pitch, beat, frequency centroid, and so on. Some of them can be used for synchronization as long as such features are invariant under attacks. Feature analysis in speech processing has been studied well in literature while very few studies are available in audio processing.
6.1
Coarse Alignment
Fine alignment is the final goal of synchronization. However, such alignment is not simple. Thus, coarse synchronization is needed to locate possible position fast and effectively. Once such positions are identified, fine synchronization mechanisms are used for exact synchronization. Thus, coarse alignment scheme should be simple and fast. Combination of energy and zero-crossing is a good example for coarse alignment scheme. In this method, energy and number of zero-crossings of each block are calculated. A sliding window is used to confine a block. If the two measures meet the predefined criteria, then we can conclude that the block is close to the target block for synchronization. Such conclusion is drawn from the assumption that energy and number of zero-crossing are invariant. For example, a block with low energy and large number of zero-crossings may be a good clue. Number of zero-crossings are closely related with frequencies. Large number of zero-crossings implies that the audio contains high frequency components. Energy computation is simple to
208
H. J. Kim et al.
implement. Just taking absolute values of each sample and summing up all gives the energy of the sample. Counting the number of sign changes from positive to negative and vice versa gives the number of zero-crossings. Echo-hiding can also be used for coarse synchronization. For example, if an evidence of echo existence is identified, it shows that the block is near from synchronization. Unfortunately, echo detection is considerably costly in terms of computing complexity. Replica modulation is rather robust against desynchronization attacks.
6.2
Synchronization Code
The synchronization code in time domain based on Bark code (Huang et al. 2002) is a notable idea. The Bark code (with bit length 12, for example, given as “1 11110011010”) can be used as a synchronization since this code has a special auto-correlation function. To embed the Bark code successively, this method sets the lowest 13 bits to be “1 100000000000’’ when embedding message is “l”, and set to be ‘‘0100000000000” otherwise, regardless of the sample values. For example, a 16-bit sample value “100000001 1111111” is changed forcibly into “100 1100000000000” to embed message “ 1” in time domain. This method is claimed to achieve the best performance to resist additive noise and keep sufficient inaudibility.
6.3
Salient Point Extraction
Salient point extraction without changing the original signal (Wu et al. 2000) is also a good scheme. Basic idea of this scheme is to extract salient points as locations where the audio signal energy is climbing fast to a peak value. This approach works well for simple audio clips played by few instruments. However, this scheme has two disadvantages with more complex audio clips. First, overall energy variation becomes ambiguous for complex audio where many music instruments are played altogether. Then, the stability of the salient
Audio Watermarking Techniques 209
points decreases. Second, there exists the difficulty to define appropriate thresholds for all piece music. High threshold value is suitable for audio with sharp energy variation. However, the same value to complex audio would yield very few salient points. Thus, audio content analysis (Wu et al. 2000) parses complex audio into several simpler ones so that stability of salient points could be improved and the same threshold could be applied to all audio clips. In order to avoid such complex operations, special shaping of audio signal is also useful for coarse synchronization. This approach intentionally modifies signal shape to keep salient points, which is sufficiently invariant under malicious modifications. For example, choosing the fast climbing signal portion and marking on it with special sawtooth shape is an example. Such artificial marking may generate audible high frequency noise. Careful shaping can reduce the noise to a hardly audible level.
(a) Exact match (15 matches)
(b) One-chip off (3 matches)
(c) One-chip off with etxtended chip rates by 3 (I5 matches)
Figure 8. The concept of redundant-chip coding. Right figure is an extended version of the center figure by chip-rate of 3. Correlation is calculated at the areas with dotted lines only.
6.4
Redundant-Chip Coding
Pseudo-random sequence is a good tool for watermarking. As is mentioned, correlation is effective to detect watermark as lcng as perfect synchronization is achieved. When the pseudo-random se-
210
H. J. Kim et al.
quence is exactly aligned, its correlation approaches to Equation (3). Figure %(a) depicts a perfect synchronization between a 15chip pseudo-random sequence (if we use M-sequence, but not in this example). Its normalized auto-correlation is 1. However, if the sequences are misaligned by one chip off as is shown in (b), its auto-correlation falls down to -3/15. This problem can be solved by redundant-chip coding (Kirovski and Malvar 2001). Figure 8-(c) shows an expanded chip rate 3. Now, misalignment by one chip off doesn’t matter. During the detection phase, only the central sample of each expanded chip is used for computing correlation. The central chips are marked by broken lines in Figure 8-(c). By using such a redundant-chip encoding with expansion by R chips, correct detection is possible up to [R/2Jchips off misalignment. Of course, this method enhances robustness at the cost of embedding capacity.
6.5
Beat-Scaling Transform
The beat, salient periodicity of music signal, is one of the fundamental characteristics of audio. Serious beat change can spoil the music. Thus, beat must be almost invariant under attacks. In this context, beat can be a very important marker for synchronization. The beat-scaling transform (Kirovski and Attias 2002) can be used for enabling synchronicity between the watermark detector and the location of the watermark in an audio clip. Beat-scaling transform method calculates the average beat period in the clip and identifies the location of each beat as accurately as possible. Next, the audio clip is scaled (i.e., stretched or shortened) such that the length of each beat period is constant and equal to the average beat period rounded to the nearest multiple of a certain block of samples. The scaled clip is watermarked and scaled back to its original tempo. As long as beat remains unchanged, watermarks can be detected from the scaled beat periods. Beat detection algorithms are presented in (Goto and Muraoka 1999, Scheirer 1998). Of course, in this case the synchronization relies on the accuracy of the beat
Audio Watermarking Techniques 21 1
detection algorithms. Watermarking General
;:;im(Coiaol,lPPb);(Swaaonnai.lPPB)jl(HanunlBCvad,lPPBj1I:(Liunnl,lW?i Spread-Spectrum Basics
TwoSet Replica
SelfMarking
Psycho-Acoustic Shaping
Blind Spread-Spectrum
Modified Patchwork (Ye0 and Kim, 2003)
Amplitude Modification (Lie and Chang, 2001)
Sinusoidal Modulation
mrlri 1 ~
Echo Hiding Basics (Gruhl ef al., 1996) Time-Scale Modification (Mansour ef al., 2001)
1 1
Advanced Echo Hiding (Oh ef al., 2001)
1
Echo Modulation (Petrovic ef al., 1999)
1
Replica Modulation (Petrovic, 2001)
1
Attacks (Petitcolas ef al., 1998) (Wu ef al., 2001) (Boeuf & Stem, 2001) (Craver ef al., 2001) (Craver ef al., 2002)
Figure 9. Taxonomy for audio watermarking research.
7
Conclusions
Available studies on audio watermarlung is far less than that of image watermarking or video watermarking. However, during the last decade audio watermarking studies have also increased considerably. Those studies have contributed much to the progress of audio watermarking technologies. This chapter surveyed those papers and classified them into four categories: spread-spectrum scheme, two-set scheme, replica scheme, and self-marking. Spread-spectrum scheme has serious drawbacks. This method requires psycho-acoustic adaptation for inaudible noise embedding. T h s adaptation is rather timeconsuming. Of course, most of the audio watermarking schemes need psychoacoustic modelling for inaudibility. Another disadvan-
212
H. J. Kim et al.
tage of spread-spectrum scheme is its difficulty of synchronization. On the other hand, replica method is effective for synchronization. However, echo hiding is vulnerable to attack. Replica modulation (Petrovic 2001) is rather secure than echo hiding. Among two-set schemes, the modified patchwork algorithm (Ye0 and Kim 2003) is also very much elaborated. Self-marking method can be used especially for synchronization or for robust watermarking, for example, against time-scale modification attack. Such four seminal works have improved watermarking schemes remarkably. However, more sophisticated technologies are required, and expected to be achieved in the next decade. For the beginners we provide the taxonomy for watermarking research in Figure 9. Due to the limitation of number of pages of this chapter, we cannot cover all the papers cited here in the taxonomy. The contributions of those papers will be evaluated in other papers. Fragile watermarking, which is not included in this chapter, should be studied further since it is important for authentication.
Acknowledgments This work was in part supported by the AITRC (Advanced Information Technology Research Center), KAIST, and Brain Korea 21 Project, Kangwon National University. The authors appreciate Prof. V. Mani of Indian Institute of Science for their comments. The authors also appreciate Dr. Rade Petrovic of Verance Inc., Mr. Michael Arnold of Fraunhofer Gesellscaft, Dr. Fabien A. P. Petitcolas of Microsoft, for their kind personal communications and review. The authors also appreciate Taehoon Kim, Kangwon National University, for implementing various schemes and providing useful information.
Audio Watermarking Techniques 213
References Arfib, D., Keiler, F., and Zoler, U. (2002), “Time-frequency Processing,” in DAFX: Digital Audio Efects, edited by U. Zoler, John Wiley and Sons, pp. 237-297. Arnold, M. (2000), “Audio watermarking: features, applications and algorithms,” IEEE International Conference on Multimedia and EXPO,V O ~ .2, pp. 1013-1016. Arnold, M. (2001), “Audio Watermarking: Burying information in the data,” DK Dobb’s Journal, vol. 11, pp. 21-28. Arnold, M. and Schilz, K. (2002), “Quality evaluation of watermarked audio tracks,” SPIE Electronic Imaging, vol. 4675, pp. 91-101. Bassia, P., Pitas, I., and Nikolaidis, N. (2001), “Robust audio watermarking in the time domain,” IEEE Transactions on Multimedia, V O ~ 3, . pp. 232-241. Bender, W., Gruhl, D., Morimoto, N., and Lu, A. (1996), “Techniques for data hiding,” IBM Systems Journal, vol. 35, pp. 313336. Boeuf, J. and Stern, J.P. (2001), “An analysis of one of the SDMI audio watermarks,” Proceedings: Information Hiding, pp. 407-423. Boney, L., Tewfik, A. H., and Hamdy, K. N. (1996), “Digital watermarks for audio signal,” International Conference on Multimedia Computing and Systems, pp. 473-480. Cox, I.J., Kilian, J., Leigton, F.T., and Shamoon, T. (1996), “Secure spread spectrum watermarking for multimedia,” IEEE Transactions on Image Processing, vol. 6, pp. 1673-1687. Cox, I.J., Miller, M.I., and Bloom, J.A. (2002), Digital Watermarking, Morgan Kaufman Publishers, San Francisco: CA.
214
H. J. Kim et al.
Cox, I.J. and Miller, M.I. (2002), “The first 50 years of electronic watermarking,” Journal of Applied Signal Processing, vol. 2, pp. 126-132. Craver, S.A., Memon, N., Yeo, B.-L., and Yeung, M.M. (1998), “Resolving rightful ownerships with invisible watermarking techniques: Limitations, attacks, and implication,” IEEE Journal on Selected Areas in Communications, vol. 16, pp. 573-586. Craver, S.A., Wu, M., Liu, B., Stubblefield, A., Swartzlander, B., Wallach, D. S., Dean, D., and Felten, E. W. (2001), “Reading between the lines: Lessons from the SDMI challenge,” UXENIXSecurity Symposium. Craver, S.A., Liu, B, and Wolf, W. (2002), “Detectors for echo hiding systems,” Information Hiding, Lecture Notes in Computer Science, vol. 2578, pp. 247-257. Cvejic, N., Keskinarkaus, A., and Seppanen, T. (2001), “Audio watermarlung using m-sequences and temporal maslung,” IEEE Workshops on Applications of Signal Processing to Audio and Acoustics, pp. 227-230. Cvejic, N. and Seppanen, T. (2002), “Improving audio watermarking scheme using psychoacoustic watermark filtering,” IEEE International Conference on Signal Processing and Information TechnolOD, pp. 169-172. Dutilleux, P., de Poli, C., and Zoler, U. (2002), “Time-frequency processing,” in DAFX: Digital Audio Efects, edited by U. Zoler, John Wiley and Sons, pp. 201-236. Foote, J. and Adcock, J. (2003), “Time base modulation: A new approach to watermarking audio and images,” IEEE International Conference on Multimedia and Expo, pp. 221-224.
Audio Watermarking Techniques 215
Goto, M. and Muraoka, Y. (1999), “Real-time beat tracking for drumless audio signals,” Speech Communication, vol. 27, pp. 33 1335. Gruhl, D., Lu, A., and Bender, W. (1996), “Echo Hiding,” PreProceedins: Information Hiding, Cambridge, UK, pp. 295-3 16. Haitsma, J., van der Veen, M., Kalker, T., and Bruekers, F. (2000), “Audio watermarking for monitoring and copy protection,” ACM Multimedia Workshop, pp. 119-122. Hartung, F. and Girod, B. (1998), “Watermarking of uncompressed and compressed video,” Signal Processing, vol. 66, pp. 283-301. Hsieh, C.-T. and Tsou, P.-Y. (2002), “Blind cepstrum domain audio watermarking based on time energy features,” IEEE International Conference on Digital Signal Processing, vol. 2, pp. 705-708. Huang, J., Wang, Y., and Shi, Y. Q. (2002), “A blind audio watermarking algorithm with self-synchronization,” IEEE In ternational Conference on Circuits and Systems, vol. 3, pp. 627-630. Kim, H. (2000), “Stochastic model based audio watermark and whitening filter for improved detection,” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. 1971-1974. Kirovski, D. and Malvar, H. (2001), “Robust spread-spectrum audio watermarking,” IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1345-1348. Kirovski, D. and Attias, H. (2002), “Audio watermark robustness to desynchronization via beat detection,” Information Hiding, Lecture Notes in Computer Science, vol. 2578, pp. 160-175. KO, B.-S., Nishimura, R, and Suzuki, Y. (2002), “Time-spread echo method for digital audio watermarking using pn sequences,” IEEE
216
H. J. Kim et al.
International Conference on Acoustic, Speech, and Signal Processing, vol. 2, pp. 2001-2004. Lee, S.K. and Ho, Y.S. (2000), “Digital audio watermarking in the cepstrum domain,” IEEE Transactions on Consumer Electronics, V O ~ .46, pp. 744-750. Lie, W.-N. and Chang, L.-C. (2001), “Robust and high-quality timedomain audio watermarking subject to psychoacoustic masking,” IEEE International Symposium on Circuits and Systems, vol. 2, pp. 45-48. Liu, Z., Kobayashi, Y., Sawato, S., and Inoue, A. (2002), “A robust audio watermarking method using sine function patterns based on pseudo-random sequences,” Proceedings of Pacijic Rim Workshop on Digital Steganography 2002, pp. 167-173. Mansour, M.F. and Tewfik, A.H. (2001), “Audio watermarking by time-scale modification,” International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. 1353-1356. Nahrstedt, K. and Qiao, L. (1998), “Non-invertible watermarking methods for MPEG video and audio,” ACM Multimedia and Security Workshop, pp. 93-98. Oh, H.O., Seok, J.W., Hong, J.W., and Youn, D.H. (2001), “New echo embedding technique for robust and imperceptible audio watermarking,” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. 1341-1344. Petitcolas, F.A.P., Anderson, R.J., and Kuhn, M.G. (1998), “Attacks on copyright marking system,” Information Hiding, Lecture Notes in Computer Science, vol. 1525, pp. 218-238. Petrovic, R., Winograd, J.M., Jemili, K., and Metois, E. (1999), “Data hiding within audio signals,” International Conference on Telecommunications in Modern Satellite, Cable, and Broadcasting Service, vol. 1, pp. 88-95.
Audio Watermarking Techniques 217
Petrovic, R. (2001) “Audio signal watermarking based on replica modulation,” International Conference on Telecommunications in Modern Satellite, Cable, and Broadcasting Service, vol. 1, pp. 227-234. Rossing, T.D., Moore, F.R., and Wheeler, P.A. (2002), The Science of Sound, 3rd ed., Addison-Wesley, San Francisco: CA. Seok, J., Hong, J., and Kim, J. (2002), “A novel audio watermarking algorithm for copyright protection of digital audio,” ETRI Journal, vol. 24, pp. 181- 189. Scheirer, E (1998), “Tempo and beat analysis of acoustic musical signals,” Journal of the Acoustic Society ofAmerica, vol. 103, pp. 588-601. Shin, S., Kim, O., Kim, J., and Choi, J. (2002), “A robust audio watermarking algorithm using pitch scaling,” IEEE International Conference on Digital Signal processing, pp. 70 1-704. Swanson, M., Zhu, B., Tewflk, A,, and Boney, L. (1998), “Robust audio watermarking using perceptual masking,” Signal Processing, V O ~ .66, pp. 337-355. Wu, C.-P., Su, P.-C., and Kuo, C.-C.J. (2000), “Robust and efficient digital audio watermarking using audio content analysis,’’ Security and Watermarking of Multimedia Contents, SPIE, vol. 3971, pp. 382-392. Wu, M., Craver, S.A., Felten, E.W., and Liu, B. (2001), “Analysis of attacks on SDMI audio watermarks,” IEEE International Conference on Acoustic, Speech, and Signal Processing, pp. 1369-1372. Yeo, I.-K. and Kim, H.J. (2003), “Modofied patchwork algorithm: A novel audio watermarking scheme,” IEEE Transactions on Speech andAudio Processing, vol. 11, pp. 381-386.
This page intentionally left blank
Chapter 9 Video Watermarking: Requirements, Problems and Solutions Christoph Busch and Xiamu Niu
Watermarking has been proposed to protect the copyright of multimedia data by hiding secret mformation robustly into digital products for many years. Given the high value of video content and considering the large amount of mherently redundant data between frames, the unbalance between the motion and motionless regions, and the real-time requirement in the video broadcasting case etc., watermarked video sequences are highly susceptible to pirate attacks such as frame averaging, frame swapping, statistical analysis, digital-analog (DNAD) conversion, and lossy compression (MPEG). One of the most technical difficulties of it is the issue of how to resist geometrical distortions and fiame dropping randomly, since it is difficult to extract the watermark content due to the loss of spatial and temporal synchronization. Hence, video watermarking poses some unique requirements not applicable to still image watermarlung, and it is a technical challenge to present a robust video watermarking method. This chapter will discuss the requirements and the problems of video watermarking in the video broadcasting case by following the European Broadcasting Union (EBU) Techmcal Requirements. We will review one of the first European Video Watermarking System, TALISMAN (Tracing Authors’ right by Labelmg Image Service and Monitoring Access Network), which was capable of embedding and 219
220
C. Busch and
X. Niu
extracting invisible watermarks in real-time. While on the one hand the TALISMAN system proved to have satisfactory visual quality and robustness regarding compression on the other hand improvements were needed to meet the requirements of the EBU. We show results from our recent research and o u t h e how our extended solution can cope with geometric distortions. 1
Introduction
1.1
Generic Models of Video Watermarking
Video sequences consist of a series of consecutive and equally time spaced still image. Whde object-oriented encoding of video signals will emerge with the advent of MPEG4 we can still find in today’s production and distribution scenario the following two existing formats that represent the video signal: raw video sequences at studio quality level and compressed video stream at the broadcasting level. In the following we will review various generic models of video watermarking (Chung et al. 1998, Hartung et al. 1996).
1.1.1 Raw Video Sequences Watermarking Raw video sequence watermarking approaches embed information directly in the raw data prior to the MPEG video encoding for streamed broadcasting and communication. Every single frame of the video will be watermarked in this approach, and can be subjected to various algorithms for embedding. For example, spatialdomain watermarking algorithm or frequency-domain watermarking algorithms may be applied. Figure 1 shows the framework of the raw video watermarking model.
Video Watermarking: Requirements, Problems and Solutions 221
videop 4 Raw
MPEG encoder
I
7
Decoded video
f
MPEG stream
Watermark embedding
Watermark extraction
Figure 1. Framework of raw video watermarking model
Since watermarking processing is based on raw video sequences, it has to take into consideration both the following lossy compression (MPEG) and the quality of raw video sequences. Figure 2 gives an example of the raw video watermarking of JAWS (Just Another Watermarking System) from Philips Research (Kalker 1999). The payload K in a matrix W of normally distributed pseudorandom values. X denotes each frame. S and h(X)are the global and the local scaling factor respectively. The watermark is embedded into raw frames directly. User information
K
Tiling W(K)
”
I
3
Noise
Basic patterns
W(X> I
I
Figure 2. An overview of JAWS embedding algorithm
xv Watermarked Frame
222
C. Busch and X . Niu
1.1.2 MPEG Compression with Watermarking During the MPEG compression for video broadcasting and communication, the watermarking algorithms can be integrated with the compression procedure. This is an efficient way for video watermarking since MPEG and watermarking are combined with one algorithm, which benefits the real-time requirements. Figure 3 shows the model of MPEG compression with watermarking concurrently. But the framework given with the MPEG-algorithm limits the watermarking algorithm. For examples in the case of MPEG-1, and -2, the watermarking algorithm may only be constructed in the DCTdomain. Chung, T.Y. et al. have developed their MPEG compression integrated watermarking model as shown in Figure 4 (Chung et al. 1998). The watermark embedding process is performed at all 8x8 DCT-blocks of I (Intra) picture of MPEG-2 coding scheme and the output is a watermarked compressed bit-stream, where Q is the short form of Quantization and VLC is a Variable Length Codec.
video Raw
~~~~~~~
encoder
Broadcasting
decoder
Decoded video
stream Watermark embedding
Watermark extraction
Figure 3. Framework of the MPEG compression with watermarking model
Video Watermarking: Requirements, Problems and Solutions
223
Original Video
Watermarked compressed bitstream
Figure 4. An example of MPEG compression with watermarking model
1.1.3 Watermarking MPEG Compression Stream This method directly embeds the watermark into MPEG compression video stream without being subjected to the encoding and decoding processes, causing less degraded quality of video. The framework is shown in Figure 5. But the bit-rate constraint for compressed video limits the amount of the data to be embedded by watermarking since there is not enough redundancy data for watermarking.
As an example of the model based on watermarking MPEG compression stream we may consider the watermarking system of the University of Erlangen-Nuremberg (Germany) as it is shown in Fig-
224
C. Busch and X . Nau
ure 6 (Hartung, F. et al. 1997). The MPEG bit stream is split into header and side information, motion vectors, and DCT encoded signal blocks. After decoded the signal blocks, the watermark is embedded into some coefficients of the DCT block. Then the modified DCT block is recoded with Hufhan Coding. Only an I-frame is watermarked. The bit-rate of watermarked stream is the same as the original one. Raw video-
MPEG encoder'
' * MPEG decoder
Broadcasting I
Watermark embedding
stream
Decoded video
T
Watermark extraction
Figure 5. Framework of the watermarking MPEG compression stream model
MPEG bit-stream
b Split
Watermark information
U
Huffman + DCT . encoded -b decoding signal blocks
Header and side information
Re-combined MPEG bitstream
Figure 6. An example of the watermarking MPEG compression stream model
Video Watermarking: Requirements, Problems and Solutions 225
1.2
Requirements for Video Watermarking
As shown in the previous section there are different approaches to video watermarlung. In the following section we would like to discuss the requirements that watermarking algorithms met in real life application scenarios. This discussion considers the characteristics of video production and a concrete application scenario, the videobroadcasting environment. This discussion has been influenced by the European Broadcasting Union (EBU) that established a taskforce on video watermarking for the definition of requirements based on the usage scenario of the EBU (Cheveau et al. 2001).
1.2.1 Broadcasting Environments for Video Watermarking Figure 7 sketches the broadcasting environment whch watermarking algorithms need to serve: The original video data stems from a digital camera or digital video archive and are distributed by the copyright owner as compressed video streams. The relevant information that identifies the copyright holder is embedded during the production procedure. The pirates may receive and record the video streams, and re-broadcast them without any permission or modifL the original sequences and then rebroadcast. The intention of the copyright owner is, to monitor a large number of broadcasted video streams automatically and in real time and eventually detecting copyright infringements. The EBU taskforce on watermarking (N/WTM - which was recently renamed to N/DRM-T) has performed video and audio watermarking system tests. In the system test specification (EBU Document 2000) the taskforce specified three different levels of the production-distribution environment that are shown in Figure 8. Those are the production-level with very high quality of video sequences, the contribution-level for EBU Eurovision Network with medium quality of video sequences, and the broadcasting-level with poor quality of video sequences.
226
C.Busch and X . Niu
SA$
..........7
....-...-....
Receiver
......7
..''
Broadcasting"...........,
Receiver Receiver
Figure 7. The broadcasting environment for the watermarking algorithm
PRODUCTION Production Level Creation
Post Production
Contribution Level
Broadcasting Level
Primary :Contribution network)
Secondary (Broadcasting)
Very High Quality
Figure 8. The broadcasting environment according to EBU Group N/WTM
Video Watermarking: Requirements, Problems and Solutions 227
1.2.2 Application Scenario In order to establish a direct llnk between the video sequence and the copyright mformation, there are two watermarks (Wl and W2) in the generic video watermarking model of the EBU, which is in line with the generic watermarking model developed in the OCTALIS project (OCTALIS 2000). In the model W1 contains ownershp dormation for identification of the source of the protected subject matter, and W2 is invasive fingerprint information identifjmg the ID and transmission reference for contribution network (Cheveau et al. 2001). Figure 9 shows the generic model, and Figure 10 gives an example of the model.
Original video
Producer
-
Primary Distributor
End Secondary Distributor -b user
L
DISTRIBUTION
PRODUCTION Production Level
Contribution Level
Broadcasting Level
W1 “ownership”:Identification of the source of the protected subject matter
I W2 “Fingerprint” information for the Contribution Network ~
~~
~~
Figure 9. The generic video watermarking model
I
228
C. Busch and X . Niu
W 1x+W2a2
W2a2 W 1z+W2b 1
t Figure 10. An example of the EBU Network Topology
1.2.3 Requirements and Problems The application scenario outlined above imposes some specific requirements for video watermarking algorithms that differ from that of image watermarkmg. According to EBU Techntcal Review, EBU video watermarking the user requirements for Production watermark (Wl) and Contribution watermark (W2) are as follows (Cheveau et al. 2001).
Visibility of the watermarks: Watermarks must be of “transparent quality”, that is, they must be invisible in the hgh-quality original digital material used in TV production.
Video Watermarking: Requirements, Problems and Solutions 229
Payload of the watermark: Considering the video sequence’s processing, an important aspect regarding watermark payload for video is the watermark granularity. The watermark granularity describes how much data is needed, to embed one unit of the watermark information. The EBU defined this unit as Watermark Minimum Segment (WMS) (Cheveau et al. 2001). In other word, one unit (one WMS) that is required to embed the complete watermark information once defines a group of successive frames. A large size of the WMS is not desirable as the watermark should be extractable from a small fraction of the video sequence. When someone cuts a number of frames out of the watermarked WMS, the watermark information is no longer retrievable. On the contrary, if a WMS is too small, there may be not enough space for a complete watermark information embedding. According to the EBU, the length of one WMS for W1 is one second and the length of one WMS for W2 is five seconds respectively. The data capacity for both W1 and W2 is 64 bits per WMS. Detection probability per WMS must be at least 95%, and false positive probability per WMS must be smaller than Security-secret watermarking key: The security assessment of watermarking techmques can be started in the same way as the security consideration of encryption techniques. The algorithm is not reliable, if the security relies on the secrecy of the watermarking algorithm it should rather rely on the secrecy of the watermarking key (Langelaar et al. 2000, Swanson et al. 1998). The watermarlung secret key must be difficult to predict and cryptographically strong by means of a large key space (using large number of available watermarking keys) and an efficient watermarlung-key management. Robustness problems: For watermarlung techmques, watermark robustness is typically one of the main issues, since robustness against data distortions introduced through standard data processing and attacks is a major requirement. Also, video watermarking has own
230
C. Busch and X . Niu
specific requirements and problems that are different from other signals ’. Robustness of the watermark against lossy data compression - such as MJPEG (2OMbit/s), ISO/MPEG- 1 (4Mbit/s), ISO/MPEG-2 (2 to 6 Mbit/s), and MPEG-2 4:2:2 (5OMbitls for recorder) etc. - is the basic requirement for video watermarking algorithms. Some of the early watermarking solutions were already focusing on this requirement and therefore designed to have strong robustness against lossy data compression. Those 8x8 DCT-block based watermarking techniques try to share some features with the P E G or MPEG lossy compression algorithms, and combine the P E G or MPEG algorithms with watermarking algorithms (Koch et al. 1995, Busch et al. 1999). Robustness of watermarks against signal conversions is also one general requirement that applies to video watermarking, which is one of difficult issues to solve. For watermarked video material such “attacks” in the ordinary production chain include re-sampling or DNDA conversion, sampling rate conversion, frame aspect-ratio conversion (4:3 w 16:9), frame-rate conversion (24Hz w 25Hz @ 3OHz), he-scan conversion (Progressive w interlace), motioncompensating noise reduction, added white noise (at -30dB), colorspace conversion, and slow motion (3:l). Some of the video conversion attacks above may happen along the temporal axis of watermarked video sequences and are therefore characterized as the temporal synchronization problems of video watermarking. The resistance of watermark algorithms to geometric attacks is one of the most technical challenges. For many algorithms it is difficult to extract the watermark content once the spatial synchronization for watermarked video sequences is lost due to geometric transforms. They transforms to be considered include frame cropping, scaling, rotation, translation, bending, shearing and others.
Video Watermarking: Requirements, Problems and Solutions 231
One solution to the spatial and temporal synchronization problems of video watermarking systems will be discussed in detail in Section 3 of this chapter.
Real-time embedding and detection: Watermark embedding and retrieval must be performed in real time to avoid slowing down neither the TV production (recording, viewing, archiving, and so on) nor the monitoring process. The embedding processing delay must be less than 80ms according to EBU.
2 The TALISMAN Project 2.1 Overview Digital video streams in MFEG-2 format (Rao et al. 1996) form the basis of emerging television standards like the European DVB project (Reimers 1994), the DVD video standard, and video stock archives for instant random access to digital video streams. Providers of digital contents and services formulated in-the mid-nineties the need copyright protection mechanisms including digital watermarking to track the dissemination of their digital products and detect large-scale commercial piracy and illegal copying of their data. TALISMAN (Tracing Authors’ rights by Labeling Image Service and Monitoring Access Network) project (TALISMAN 1998, Busch et al. 1999) was partially funded by the European “Advanced Communications Technologies & Services” program. TALISMAN commenced in September 1995 and was completed by August 1998. The system embeds watermarks into video streams, monitors for labels in MFEG-2 bit stream all in real time. This chapter restricts the discussion of TALISMAN to the core-watermarking algorithm.
232
C.Busch and X . Niu
2.2 Algorithm Description The algorithm took a frame-based approach to video watermarlung. During watermark embedding and retrieval, it separately processes each frame of the uncompressed video stream. The original video data stream may stem from a digital camera or a digital video player. The standard digital equipment in professional TV studios implements the International Telecommunications Union (ITU) 601 sampling standard. Digital equipment is connected via the ITU-R 656 serial digital interface. The data is stored in digital betacam format, which provides lossless compression of the video streams, resulting in a compression ratio of 1.77:1. The interface between the video system and the watermarking algorithm receives ITU-R 601 digital component signal as input and redirects the luminance component of each original frame to the watermarking system. The watermark is embedded and the modified data returned to the interface, which inserts the modified luminance component in the ITU-R 601 output stream. When running in monitoring mode, the system receives the decoded MPEG-2 stream from the interface and performs the watermark retrieval process for the luminance component of each frame of the decoded stream. The interface provides the algorithm with consecutive frames of the digital video stream and directs the watermarked data to the output device. Interfacing and watermark processing must not take longer than 40 ms per frame to achieve the PAL frame rate of 25 frames per second. The core algorithm in the system is the modified version of early approach of Koch and Zhao (Koch et al. 1995). The most important modifications introduced here to the Koch-Zhao‘s are new discrete cosine transform (DCT) and inverse DCT routines, which are optimized for real-time implementation with hardware support and additional checks of edges and textures prior to watermark embedding
Video Watermarking: Requirements, Problems and Solutions 233
and retrieval to avoid artifacts in the watermarked video sequences. Here we restrict our discussion to the visibility problem. The algorithm is block based and shares some features with the JPEG standard for still-image compression (Rao et al. 1996). The image’s luminance component is divided into 8 x 8 pixel blocks. The algorithm selects a sequence of blocks and applies the DCT to each of the selected blocks. The transformed blocks are quantized with the luminance quantization table proposed in the JPEG standard. The quantization step divides a DCT coefficient’s value by an integer number that depends on the coefficient’s position within the block matrix. Generally speaking, high-frequency coefficients are divided by higher quantization values than low-fiequency coefficients. The integer values forming the quantization table can be multiplied or divided by a constant value to allow scaling of the quanitzation’s impact on the coefficients. The algorithms consist of two basic components: A position generator and the watermark embedder: 1. The position for the watermark embedding in the frame (one 8 x 8 pixel block) must be generated. Therefore the algorithm uses a key to initialize a pseudo-random-number generator that determines the order of block processing and the coefficients to be modified within the block. The key may be public or secret, leading to a public or secret watermark. 2. The watermark embedder itself needs to evaluate the block’s characteristic prior to the embedding of the information. Therefore a method must be chosen that analysis and then modifies the coefficients in the block, which was selected by the positions generator. Even though the number of digital-watermark bits (payload) is not restricted by the algorithm, the TALISMAN system was configured to a fixed length of 64 information bits, as it is required by the EBU. Assuming that the algorithm would
234
C. Busch and X . Niu
find a higher number of suitable blocks in a frame the payload is embedding multiple times in each frame. Additional robustness is given through redundant embedding of the watermark along the time axis, which can be exploited along with time-integration of the extracted information in the watermark retrieval component, in order to survive the MPEG-2 compression scheme. The algorithm can embed up to four different, noninterfering watermarks in each frame. This is accomplished by dividing the frequency range for watermarking into four sub-bands, as Figure 11 shows. The sub-bands denoted as levels 1 through 3 are used for secret watermarkmg; the fourth band is used for a public watermark. One of those sub-bands is chosen prior to embedding or extraction of the watermark and used throughout the video stream processing.
Level 1
Level 2
Level 3
Public
Figure 11 Sub-bands used for watermarking
For each selected block, one bit is encoded as follows (tiis denoted as the value of quantized DCT coefficients’ absolute value; the subscript i identifies the coefficient position within the sub-band):
Video Watermarking: Requirements, Problems and Solutions
235
1. The signal (pixel values) within the block area is transformed using DCT. 2. A mechanism for detecting edges is applied that takes advantage of the block's DCT representation by analysis of the low frequency coefficients. Blocks that contain edges are dropped. 3. A pair of DCT coefficients is selected from the appropriate subband of the transformed block. Each sub-band consists of three coefficients, leading to six possible coefficient pairs. 4. The selected coefficient set is quantized. 5. The set's tivalues are used to determine if the block is well textured and suited for watermark embedding. 6. Depending on the bit value to be embedded, t k and tl must hold a predefined relationshp. The condition for encoding a ,,l" bit is tk > = t l + d; the relationship for encoding ,,O" is t k + d < = t l . If the required relationshp does not already exist naturally, the coefficients are changed accordingly. This is equivalent to overlaying a 2D cosine pattern on the original block data. The impact of the pattern on the block's visual quality in the pixel domain can be scaled by adjusting the noise level d (the difference). 7. The changed coefficients are multiplied by the quantization value at the corresponding position of the quantization matrix and embedded into the DCT transformed block. The block data is inverse DCT transformed to the pixel domain, and the altered block is put back to its original position inside the image matrix. To increase the watermark's robustness against the lossy MPEG-2 compression, the watermark is embedded with maximum redundancy. All blocks available in the video frame are subjected to the watermarlung procedure. The retrieval process is symmetrical to the embedding procedure. The secret key is mandatory to find the correct sequence of blocks and the coefficients w i t h them. These coefficients are evaluated, and if a pattern from the predefined set is found, the corresponding bit value is recorded. If no valid pattern is found ( t k = t1) the bit is
236
C. Busch and X.Niu
marked as not readable. Sixty-four ,,slots" are set up (one for each code bit), and each retrieved bit is put into the appropriate slot. After processing all blocks, the corresponding bits are used to reconstruct the original bits of the code word. The decision in each slot is based on its majority entries.
2.3 Transparent Quality The transition from watermarking for still images to video sequences revealed properties of the original Koch-Zhao algorithm that escaped notice before. The most prominent feature of video sequences is the increased sensitivity to changes introduced by the watermarking process. Even for parameter settings that minimize the watermark's strength, watermarking artifacts are visible in highquality digital betacam video sequences. Figure 12 illustrates the visibility problem. The block position is no longer restricted to a single image (x andy axes) but extends to the time axis t. The modification of blocks that are close to each other in x and y as well as in t can result in flickering effects. To avoid such degradation, video streams must be processed more carefully than still images. Homogenous areas w i t h fiames are particularly sensitive to this type of degradation as are regions containing sharp edges. Two criteria for checking blocks before actually embedding the watermark information have been introduced: edge detection and plain area detection mechanisms. Figure 13 shows DCT artifacts in edge and homogenous areas. The effects are exaggerated to make the problem visible using a still image. Figure 13(a) shows the original and Figure 13(b) the clipped image; Figure 13(c) shows the clip marked without the check algorithms. Here, artifacts in both homogenous and edge regions are clearly visible. Figure 13(d) shows the same part marked and checked for edge and homogenous blocks.
Video Watermarking: Requirements, Problems and Solutions
Time
237
f
Figure 12 The flickering problem in consecutively watermarked frames
(4
(d)
Figure 13 Visibility of edge blocks. The original image (a), the unmarked clipped image (b), the image marked without the check algorithms (c). Artifacts are visible in both edge and homogenous regions. Once the check algorithm is applied, the artifacts are no longer visible (d).
238
C. Busch and X . Niu
2.3.1 Edge Detection Numerous edge-detection algorithms are available, ranging from simple ones like the Sobel operator to more sophisticated ones, such as wavelet-based schemes (Mallat et al. 1992). As the entire watermarking system has to provide real-time capabilities edge detection must work fast. Even simple schemes like the Sobel operator have a computational complexity that is too high to implemented in real time. To minimize the overhead for edge detection, results already computed within the basic algorithm cycle are exploited. The lowest frequency DCT coefficients of a transformed block can be analyzed to decide whether the block contains an edge. We can illustrate this by looking at the DCT transform of a step function, where the lowest frequency terms have very high amplitudes compared to the transform of smoother functions. To separate edged blocks from textured blocks, the absolute value of one of three lowest AC coefficients is compare to a defined threshold and consequently a decision is taken, whether this block should be classified as an “edge block”.
2.3.2 Smooth Blocks The detection of plain areas is equivalent to a smooth block’s texture analysis. In order to minimize the computational costs for this again the previously results namely the block’s quantized DCT coefficients are anlayzed. A plain area characteristic of the block can be assumed if the coefficients in the subband that serve as information carrier are close to zero. This leads to the criteria, if one of the subbands’s quantized coefficients equals zero, the block is classified as a “plain block”.
Video Watermarking: Requirements, Problems and Solutions 239
2.3.3 Block Assessment and Processing The detection of edges and smooth areas as described in the previous section are important steps in the assessment of a block and the decision, whether or not a block should be further processed and eventually be modified. Once the block is classified as an edge block, it will be skipped resulting in one lost code bit per skipped block. In order to minimize the number of slupped blocks the treatment of smooth blocks is slightly different: Smooth blocks are watermarked but with reduced strength. The strength of the embedded information is adjusted, as the modifications to the coefficients are limited to low values. This concept for processing of critical blocks has proven to minimize the impact on the block’s visual quality. It is straightforward that the watermark retrieval process has to apply the same mechanism, which is to ignore an edge block and to use a higher reading sensitivity, where plain areas are detected.
2.3.4 Resistance against Geometric Distortions Like other block-based watermarking methods, the TALISMAN system is not resistant to geometrical transformations unless these transformations are inverted before watermark detection. Shifts can occur in some video installations, so it has to apply a mechanism for synchronization before monitoring the watermark. While the TALISMAN system implementation that was tested in the EBU benchmarks was compensating only limited distortions through the detection of the frame’s origin, advanced mechanisms for synchronization will be presented in Section 3 of this chapter.
2.3.5 Robustness against Compression and Practical Evaluation Despite the transparency demands in watermarked video streams, the watermark must be robust against digital TV’s MPEG-2 encoding. The watermark information must survive MPEG-2 encoding,
240
C. Busch and X . Niu
whch is applied immediately before transmission via EBU the contribution network.
Figure 14 Practical Evaluations: Watermarking Testbed at RTBF. The testbed as shown in Figure 14 assembles all the components that impact the production and distribution chain in the EBU application scenario: The original program was retrieved via a CCIR-656 Regenerator and displayed in Digital Betacam Quality on the control monitor (right hand side of Figure 14). The second control monitor displayed the watermarked but uncompressed stream and provided the possibility for direct comparison of the signal quality by studio engineers. Subsequently the stream was compressed with an MPEG-2 Encoder (MP@ML), decoded with an MPEG-2 Decoder and finally automatically analyzed by the monitoring station (left hand side of Figure 14). The field trial has shown with large amount of video streams that the algorithm for watermarking and monitoring video streams in a TV-broadcasting environment survives MPEG-2 compression of high-quality, real-world video sequences without degrading their quality.
As the MPEG-2 standard is designed to remove spatial and temporal redundancy from the video stream it poses a significant challenge to any watermarking algorithms, as remaining space in the stream, which is not effected by the encoder is extremely limited.
Video Watermarking: Requirements, Problems and Solutions 241
The algorithm was tested with benchmarking material (also used in the EBU tests) as well as in a field trial. The benchmark was performed with respect to detection performance and quality assessment of the watermarked video streams with 12 digitized sequences of digital betacam quality (ITU-R 601 format). Details on the results are given in (Busch et al. 1999). Furthermore a field trial was performed during the 1998 soccer world championship in France. At that time the life signal was continuously feed in the watermarking and monitoring test bed, which was installed at the premises of the Belgium TV Broadcaster RTBF.
3 The Spatial-temporal Synchronization Issues in Video Watermarking 3.1 Motivations Synchronization is the process of identifying the correspondence between the spatial-temporal coordinates of the watermarked video sequence and that of the embedded watermark. If the coordinates of the embedded watermark are changed (such as some video signal conversions and frame’s geometrical distortions as mentioned in section 1.2.3), the detector must identify the coordinates of the watermark prior to detection. These desynchronizing attacks occur frequently during the video signal processing (Cheveau et al. 2001). Hence, one of the most technical challenges for video watermarking algorithms is the issue of how to resist spatial-temporal desynchronizing attacks. Lin and Delp have proposed a temporal synchronization algorithm in video watermarking (Lin et al. 2002). But the algorithm is based on the assumption of no spatial desynchronizing attacks. In other words, any geometrical distortion to a frame could render the method inefficient.
242
C. Busch and X . Niu
A possible solution to the spatial desynchronizing attack suggested a watermarking processing in a transformation invariant domain based on the Fourier-Mellin transformation (0 Ruanaidh et al. 1997). This approach is only then effective if the image/frame is not cropped and only uniformly scaled due to the log-polar coordinate mapping in the algorithm. Furthermore the algorithm is extremely time consuming. The drawback of this method is that the log-polar mapping could cause a loss of image quality. An alternative approach (0Ruanaidh et al. 1997, Csurka et al. 1999, Deguillaume et al. 1999) is to embed a template into the frame before watermarking. The template is a sparse set of points embedded symmetrically to preserve the symmetry property of the Fourier transform. Since an exhaustive search in the DFT domain would be very costly as the transformation is unknown, a log-type (log-log or log-polar) mapping method is used to transform the rotation or scaling into a simple shift operation. Unfortunately it is hard to meet both requirements: the aspect-ratio changes and rotation of frames. By embedding the same watermark multiple times in the spatial domain, the watermark can be recovered after rotation and scaling by applying autocorrelation detection on the watermarked frame )Kutter 1998). But the detection algorithm is very time consuming. Niu, Schmucker and Busch have proposed a novel blind video watermarking approach that is resistant to spatial-temporal desynchronizing attacks (Niu et al. 2002). Since approach exploits the temporal axis of video sequences, it complies and retrieves frame rotations, aspect ratio conversions, scaling, translation, shearing, bending, and also randomly dropped frames. The fundamentals of the approach are outline in the following sections.
3.2 The Time-Axis Template 3.2.1 The Affine Transformation An affine transform can describe general geometrical changes (rotation, scaling and translation) of an image (or video frame).
Video Watermarking: Requirements, Problems and Solutions 243
x’= ax + by + e y’ = cx + dy + f
(1)
where (x,y) is an original frame pixel position, and ( x ’ , y ’ ) is a new frame pixel position after geometrical changes. The geometrical transform is described by six parameters a, b, c, d, e, f. Hence, at least three positions of corresponding points before and after the transformation are needed to determine the six parameters.
3.2.2 The Assumption We assume that the geometrical transformation for every frame along the time axis in a video sequence is the same at a very short time interval. For example, in each group of picture (GOP), there are about 12-15 frames at the time interval of 0.5 second, while one Watermarking Minimum Segment (WMS) consists of 1 second and about 24-30 frames (Cheveau et al. 2001). Under the assumption above, the position of each pixel in a GOP/WMS is changed in the same way along the time axis. Before watermarking, a time-axis template, which carries the information of at least three original pixel positions, is embedded into the video sequence along the timeaxis within each WMS. The transformed positions can be found by template detection after geometrical distortions.
3.2.3 The Time-axis Template The random sequence vi(k) with length N is defined as a time-axis template and has the following property:
244
C. Busch and X. Niu
where < . , . > is the inner product operator, 8, is the Cronecker Delta function, 1/ * 112 represents the energy of the sequence, and k=O, 1,2,...N1. The length N is based on the length of a WMS ( N J along the time axis. In other words a recommend configuration is
N=N,. The embedding of the time-axis template (as shown in Figure 15) is defined with
where Fk ( X J ) is the original pixel value of the frame, F’&x,y) is the modified pixel value of the fkame, (x, y ) is the pixel position of the time-axis template embedded, 3, is a global scaling factor (Kutter 1998), (k) is the standard deviation of the pixel values along the time axis in each WMS with length of N,(=N), and k=O,1,2,...N1.
Embedding time-axis templates within each WMS
Template detected by computing correlation between frame pixels and v I ,v2 and v3 along time-axis
Figure 15. Definition of the time-axis template
In the extraction process, the time-axis template can easily be detected through
where h is a prediction filter (Kutter 1998, Deovere et al. 1998), * is the convolution operator, 1=0,1,2, ...N-1,and N is the length of
Video watermarking: Requirements, Problems and Solutions
245
time-axis template along the time axis. The variable 1 is used to find the start point of the time-axis template embedded in the video frames. If d >threshold (Kalker et al. 1999, Deovere et al. 1998), the time-axis template is detected, and the new pixel positions (x’, y ’) can be obtained.
3.2.4 Video Watermarking based on Time-Axis Templates The watermarking algorithm is shown in Figure 15 and 16. An original video sequence is first segmented into its WMS for the sake of time-axis template embedding. After template pixel positions are defined pseudo randomly by a secret key and spread symmetrically over the frame in each WMS, the time-axis template is embedded in the spatial-domain of each frame as described in Equation (3). Secret Key 1
Template positions
Secret Key 2
Template generation
Template generation
v A WMS of original video--)
Template embedding
Watermark
Watermark
v -)’
Watermarked Template video -b searching
Geometrical
Watermark embedding -
__
~~~
~
~
Distortion
Watermark
Watermark detection ~
~~~~
~
~~~~
__
-
Figure 16. Watermarking algorithm based on time-axis template
The number of templates depends on the quality of recovered video frames that were subjected to geometrical distortions. Generally, three time-axis templates are enough to recover the frame’s geometrical distortion according to Equation (1). In practice, many more
246
C. Busch and X. Niu
points are measured and a least squares method is used to find the best fitting parameters for the transform. The spatial watermark embedding area should avoid the defined position of time-axis templates. In the procedure of watermark retrieval, the time-axis template is first detected by means of Equation (4) to recover the potential geometrical distortion, and then the watermark information is extracted according to a certain algorithm.
3.3 Watermarking along the Time-Axis The method of embedding the time-axis template needs the information of the original template position during procedure of recovering geometrical distortions. If the original positions of the template are the same all the time, it could be easily attacked. If the positions are changed in every WMS, the detection algorithm is inconvenient. The watermark embedding areas have to avoid predefined position of the time-axis template. This makes the embedding and detection algorithm more complex. As the number of time-axis templates increases, in order to achieve good quality of the recovered fiame, the algorithm gets more complex and consequently more computationally expensive. Furthermore the aflfine transform can only approximate certain transformations and therefore cannot recover geometrical distortions such as fi-ame bending or warping. Considering the assumption above, the embedded time-axis template is detected under any kind of geometrical distortion by the correlation algorithm. We propose an alternative watermarking method in which the watermark information is embedded in the same way as the time-axis template embedding. Let a random sequence vi(k) with length of N denote one bit of watermark information. The watermark embedding algorithm is very similar to Equation (3), and the detection algorithm is s d a r to Equation (4). Therefore the watermark will be inherently robust against all geometrical distortions.
Vadeo Watermarking: Requirements, Problems and Solutions 247
3.3.1 Watermark Generation
The length B of message bits bi E { 0,l }according to EBU is 64 bits for the first watermark (Wl) in WMS-time with fiame numbers of N,, about 1 second and the second watermark (W2) in WMS-time with fiame numbers of Nw2 about 5 second respectively, in which W1 has ownership mformation for identification of the source of the protected subject matter, and W2 has fingerprint mfonnation for the contribution network as mentioned in section 1.2.2 (Cheveau et al. 2001). After mapping 1+ -1 and O++l, the binary form bit bi is transformed into polarity form bit bi E {-l,l} correspondingly for the sake of watermarlung. The message-bits can be coded by a (15,ll) Hamming Error Correction Coding algorithm for improved robustness. If message bits are expanded to 66 bits in our algorithm, the length B, of Hamming coded message bits b will be 90 bits for W1 and W2 respectively. li
If we define a set of random sequences vi with length N in whch each element corresponds to a polarity form bit b 'i based on spread spectrum watermarking, then w i ~{ b 'iXvi} is the generated watermark where wi is a N-dimensional vector and i=0,1,2, ...B,-1. In other words, we have to generate 90 random sequences for representing the 90 bits. Considering the quality of digital products, the fewer pixels of each frame are modified, the better the quality is of the watermarked video. The technique of Direct Sequence Code Division Multiple Access (DS-CDMA) is introduced to generate watermark information (Lee et al. 1998, Mobasseri 1999). Let v be a pattern of set of random sequences vi with length N, bitlength of i-th message bit bYi(t)E {-l,+l} be T b , and chip length of random sequences v be T,, then M (=Tb /T,) is a bandwidth expansion factor or processing gain, where Tb>>T,. If b 'i is projected onto
248
C. Busch and X . Niu
the random sequence of pulses with amplitude of v”’(k) E {-l,+l}, where k=i.M, .... (i+l).M, i=0,1,2 .... [N/M+l, and [.] is an integer operator. The principle of the watermark generation is shown in Figure 17. ..................................................................................................................................................................................
;+ . . . . . . . . . i=o .............
+ !’
..........
i=l
................
.+j
.......;....................................................................................
n
Modulated ’code wi(k)
Watermarking points Figure 17. Principle of watermark generation
Also, the watermark generation can be denoted by a mathematic formula (i+l).M
vi(t)=
1
v(”(k)-g(t-k-T,)
k=i.M
where
The DS-CDMA watermark signal is wi(t)=b’ i ( t ) .Vi(t)
(5)
Video Watermarking: Requirements, Problems and Solutions 249
Hence, there are M chips in one bit, and only a fewer pixels of each frame can be modified. As an example of our algorithm, we use one PN sequence of length N=31 to represent 3 message bits where M=lO and [N/w=3, then 30 PN sequences can be representative of 90 message bits of W1 in WMS-time with frame numbers of Nw1(=N=31)about 1 second. In other words, only 30 pixels along the time axis have to be modified in each WMS. As for W2, we select the length of PN sequence to be 127, whch approximates WMS-time with frame numbers of Nw2(=127) about 5 second, one PN sequence to represent 10 bits by the technique of DS-CDMA. Hence, only 9 pixels have to be modified in each WMS of W2. The watermark generation is shown in Figurel8. 30 pixels modified per frame 90bits
9 pixels modified per frame
Figure 18. Watermark generation
3.3.2 Invisibility of Watermarks Since the proposed algorithm embeds the PN sequence along the time axis, there are some characteristics to meet the general requirement of invisibility of watermarks. On the one hand, if embedding the information in some texture areas of a single frame like the image watermarking algorithms, the embedded points might be visible when the texture areas are changed in every frame. Those pixels that change fast along the time axis have to be selected for embedding. If the video sequences contain a lot of motionless regions (especially in plain areas of the frame) or even are stable in all frames, we might have, on the other hand, no selected pixels that change fast along the time axis. Hence, we have to consider different situations of the video sequences and select suitable points to meet the
250
C. Busch and X.Niu
requirement of invisibility of watermark. Figure 19 demonstrates the algorithm of how to define the suitable pixels for embedding. Figure 19 (a) shows the selected pixels that change fast along the time axis, and Figure 19 (b) shows the selected pixels in border areas or texture areas of each frame if the video sequence contains a lot of motionless region. According to Figure 19, the method for selecting suitable watermarking points follows two principles, the rate-ofchange-of-pixel principle along the time axis and the borderhexture area principle on each frame. To select pixels in border area of motionless region
TO select pixels changing fast along the time axis (a> Figure 19. Definition of the suitable pixels for embedding in video
Further constraints of suitable watermarking points, which would influence the visibility, need to be respected in addition to Figure 19. If one WMS includes more than one scene and the watermarking points are located in plain areas in some scenes, there might be the visibility problem as shown in Figure 20 (a). Though it follows the fast rate-of-change principle, the selected watermarking points are located in the plain area on the frame in the last scene. If one WMS includes only one scene, fast moving object in the way may cause visibility problems if the watermarlung points are located in plain areas on some frames in the WMS as shown in Figure 20 (b). More severely, if there is no significant edge and texture characteristics on
Video Watermarking: Requirements, Problems and Solutions
251
each frame and, at the same time, the size of moving object is very small, the watermarking points have to be very dense along the n and y axis of frames. Figure 20 (c) shows this situation with a very
Watermarking points (a) The watermarking points
Possible watermarking area
Figure 20. The visibility problem
252
C. Busch and X . Niu
small flying airplane against the large blue sky, in which it is hard to select the suitable areas for watermarking. To solve these problems, we apply the scene segmentation mechanism and the plain area detection mechanism (Nam et al. 1997, Busch et al. 1999). The first step in our solution is to divide the video sequence into scenes if there is more than one scene in a WMS, and the embedding factor 1 is changed with the different scene for adapting itself to the content of video sequence. Then the areas in WMS, which belong to plain area, are treated as invalid and not suitable for watermarlung. The common areas located in texture areas in every frame are selected in which the suitable watermarking points are fast changing along the time axis as well. With the intention to save more processing time, we chose four fi-ames distributed in a WMS separately as samples. To take an example of WMS for W1, frame0, fi-ame10, frame20 and kame30 are as samples for selecting these four frames’ most texture areas located in common area of the whole WMS. Also, we spread the watermarking points in all selected common texture areas of each kame for decreasing the points’ density. The distance of each selected point is measured to keep every point in a certain distance. The definrtion of watermarking points in a WMS is shown in Figure 21. The algorithm would be similar to reference (Busch et al. 1999). Firstly, the four frames are segmented into 8x8 blocks respectively, and then transformed by DCT. The blocks are classified as plain areas and texture areas. To select the common texture area blocks among the four frames at least 30 blocks. If there are not enough blocks to be selected, adjust the threshold for texture area block, but never define in the plain block (the block’s quantized DCT coefficients equals zero). In the same time, the selected watermarking points have to spread in all common texture areas of each frame to avoid the denseness of the embedded points. The distance between the watermarking points in our algorithm is 8. If there are not enough points to be selected, the distance could be 4 but never smaller than
Video Watermarking: Requirements, Problems and Solutions
253
4. If there are a lot of motion regions in video sequence, the select-
ing watermarking points in the common areas are based on the fast rate-of-change-of-pixel principle along the time axis. Otherwise, if there are a lot of motionless regions, the selecting watermarking points are defined in most texture areas and border areas of the common areas in the WMS. Watermarking points are spread in cornmon texture areas of a WMS, and defined , by the fast rate-of-change principle as well
Figure 2 1. Definition of watermarlang points
3.3.3 Watermark Embedding and Detection 3.3.3.1 Adaptive Embedding Process
The embedding algorithm of the generated watermark (as shown in Figure 15 similarly) is
F" k (%y)= Fk (%y)+3, (s)x(ox,y(k)+ o(&y)+l)'Wi(k)
(8)
where Fk (XJ) is the original pixel value of a frame, F"k(x,y) iS the modified pixel value of a frame, (x,y ) is the pixel position of the watermark embedded, 3, (s) is a global scaling factor that is also the hnction of scene s, C T ~ (k) , ~ is the standard deviation of the pixel Values along the time axis in each WMS with length N,(= N),@,y) is the standard deviation of the pixel values along the x-y axis in the common texture areas of the sample frames, and k O , 1,2,. . .N- 1.
254
C. Busch and X . Niu
The embedding factor 3, (s) in Equation (8) depends on the scene changing s in a WMS. If there is more than one scene in a WMS, we first have to divide the video into different scenes. The factor 1 has to differ from scene to scene. The embedding energy in Equation (8) also depends on the values both of the rate-of-change-of-pixel a,,Jk) along the time axis and the rate-of-change-of-pixel a(x,y) along the x-y axis in the common texture areas of sample flames. The two standard deviations have to be computed during the process of selecting watermarking points. If ( ~ , , ~ is ( kincreased, ) the embedding energy should be increased, otherwise decreased, and if a(x,y) is large, the embedding energy should be also large, otherwise small, because both of the two rates of change have a direct relationship to the invisibility of the embedded watermark. 3.3.3.2 Adaptive Detection Process The detection of the watermark mformation is slightly more complex than Equation (4) since the wi(k) is coded by CDMA before embedding. By making use of part-correlation characteristics of PN sequence, the detection algorithm is as follows
and
where h is a prediction filter, * is a convolution operator, [.I is integer operator, Z=O, 1,2,. , .N- 1, N is the length of the embedded PN sequence or the length N,,, (= N> of WMS, and A4 (=TbIT,) is a bandwidth expansion factor. The variable Z is used to find the start
Video Watermarking: Requirements, Problems and Solutions 255
point of the embedded sequence in the video fiames. If dxy>threshold, the watermark is detected, and bIi =
{
+I
if d , (i) > 0
-1
if d,(i)
(11)
Since the embedding factor 3, is changed with the content of video sequence, it is hard to define a certain detection threshold. We just use the differential of dxyvalues to work as detection threshold. If d e h g the maximum value of dxyis dxyo,2ndmaximum value of dxyis dVl, 3rdmaximum value of dxyis dxy2,.. ., etc., the differential of these values are Ol=(dVo- dxy3)/4 . 3 and 02=(dxy4-dxy7)/ dxy7.If 0 1 / 0 2 > a certain threshold, the watermark is detected.
3.3.4 Security of Watermark Considering the security of watermark, the length N of PN sequence for WUW2 should be long enough. By making use of the partial correlation, each PN sequence is divided into several segments and the length M of each segment is the same as the length N, of WMS. Thus, each PN sequence has to be embedded in each WMS several times. For example, if we use one PN sequence of length N=127 to represent 5 message bits where M=25 and [N/W=5, then 18 PN sequences can be representative of 90 message bits of W1 in a M =25) about 1 second. But WMS with length of 25 fiames (NW1= each PN sequence has to be embedded in a WMS for W1 five times. In other words, 90 pixels along the time axis have to be modified in one WMS for W1 (as shown in Figure 22). This method can be also applied to the generation of watermark W2 for the sake of increasing the security and reliability of the partial correlation detection.
256
C.Busch and X. Nzu 190 pixels modified1
I
NW1=M=25
1
Figure 22. Security of watermark generation
3.4 The Temporal Synchronization If some of the frames in each WMS are dropped randomly, the proposed algorithm in Section 3.2 and 3.3 will lose the synchronization along the time axis during the detection process. To solve this problem, the special reference orthogonal sequences ri(k) with the same length of WMS are embedded multiple times at different points along the time axis before watermarlung. During the detection, the autocorrelation is used to determine the positions of dropped fi-ames. Figure 23 shows two instances of orthogonal reference sequences respectively. r,(k) is for determining if one fiame is dropped randomly, and r2(k) is for monitoring if two successive frames are dropped randomly. ri(k) (i=1,2) starts to be embedded as the same point as wi(k).The selection of embedding positions follow the principle outlined in Section 3.3, and the embedding algorithm is similar to Equation (8) in addition to embedding the sequences twice or more than twice. The embedding algorithm is as follows
where k=O,1,2,...Nw-1, Nw is the length of WMS, q denotes the times of embedding reference sequences (q=O,l in our algorithm), and i=1,2.
Video Watermarking: Requirements, Problems and Solutions 257
..............................................................
- - -
.............................................................................................
- _ -
b rI(4
-
....................................................
The detection of the reference sequences based on autocorrelation is
where h is a prediction filter, * is a convolution operator.
If R,i,i(u,v)>threshold, then (xi,yi)is one of the position embedded by reference sequence of ri (k) where i=1,2. To determine if any frame dropped, we just compare Fk(xi,yi)*hand ri(k) where i=1,2. The values of F’k(xi,yi)*his first mapped into Mxi,yi(k)E { - 1,+1). As for i=l, the calculations of Mxl,yl(k)0 Mxl,yl(k+l),where 0 represents the exclusive-or operator, and k O , 1 , ...NW-2,will be the sequence of { 1,1,....l } with the length of N,-2 if there is not any frame dropped. Also, as for i=2, the calculations of Mx2,y2 (k) OMx2,y2(k+l) will be the sequence of {O,l,O,l...} with the length of N,-2 if there is not any two successive frames dropped. If there is one fiame dropped w i t h a WMS, the calculated sequence for Mxl,yl (k) 0 Mx.,yl (k+l) will have a “0” element in the position, where the frame dropping occurred. If there are two successive frames dropped within a WMS, the calculated sequence for Mx2,y2(k) 0 Mx2,y2(k+l) will have successive “O,O,O” or “l,l,l”elements in
258
C. Busch and
X.Niu
the position, where two successive frames dropping occurred. By t h s method we can determine the positions of one frame, two successive frames and even three successive random frame drops. Considering the calculations of autocorrelation above, which are very costly, the pre-processing has to be done before the calculation. Those pixels values that remain unchanged along the time axis should be elmmated when the video contains a lot of stable areas, and a certain set of fast changing pixels along the time axis are selected since the fast changing pixels might contain the embedded reference sequences. After the pre-processing, the complexity of the calculation is reduced sharply. For detection of the embedded watermark sequence, which may more been subjected to frame dropping attack along the time axis, we simply have to find the starting point that is suitable for cross correlation calculations. To take an example of Equation (4), the detection is
where F’&,y) is a watermarked pixel value of a frame subjected to frames dropping attack, h is the prediction filter, * is convolution operator, vJi(k)is modified version of original PN sequence vi(k) according to determined lost positions mentioned above, 1=0,1,2,.. . Nw-l,and N, is the length of WMS. The variable I is used to find the start point that is suitable for cross correlation calculations in the video frames.
4 Conclusions This chapter has presented scenarios, concepts and solutions for video watermarking. W e the generic models have outlined inter-
Video Watermarking: Requirements, Problems and Solutions 259
actions points in the processing chain to add watermark information to the carrier signal two instances of system implementations (JAWS and TALISMAN) were introduced. The demand for t h s technology is obvious. Monitoring of potential copyright lnfringements is crucial for the business case of content providers. The demand for video watermarking has been underhed when the EBU established the taskforce on watermarking and performed the system test. As the test results have shown the requirements the requirements have not yet been met by the available systems. As for the problem of geometric transformation we have presented a solution in this chapter. Whde there remain numerous research topics to work on and technical challenges to find solutions for there are in fact application scenarios that can be served with the technology that is available today -just to name the generation of play lists as an example.
References Benham, D., Memon, N., Yeo, B.-L., and Yeung, M.M. (1997), “Fast watermarkmg of DCT-based compressed images,” Proceedings of the International Conference on Imaging Science, Systems, and Applications, pp. 243-252. Busch, C., Funk, W., and Wolthusen, S. (1999), “Digital watermarking: fi-om concepts to real-time video applications,” IEEE Computer Graphics and Applications, vol. 19, pp. 25-35. Cheveau, L., Goray, E., and Salmon, R. (2001), “Watermarking Summary results of EBU tests,” EBU TECHNICAL REVIEW, pp. 8-9. Chung, T.Y., Hong, M.S., Oh, Y.N., Shin, D.H., and Park, S. H. (1998), “Digital watermarking for copyright protection of
260
C. Busch and X . Niu
MPEG-2 compressed video,” IEEE Transaction on Consumer Electrics, vol. 44, pp. 895-901.
Csurka, G., Deguillaume, F., 0 Ruanaidh, J.J.K. and Pun, T. (1999), “A Bayesian approach to affie transformation resistant image and video watermarking,” Proceedings of the 3rd International Information Hiding Workshop, pp. 3 15-330. Deguillaume, F., Csurka, G., 6 Ruanaidh, J.J.K. and Pun, T. (1999), “Robust 3D DFT video watermarlung,” IS&T/SPIE Electronic Imaging‘99, Session: Security and Watermarking of Multimedia Contents, pp. 113-124. Depovere, G., Kalker, T. and Linnartz, J.P. (1998), “Improved watermark detection reliability using filtering before correlation,” Proceedings of the 1998 IEEE Conference on Image Processing, pp. 430-434. EBU document (2000), “Watermarking, call for systems,” EBU Technique Document, No. N/WTM 044, NMC 188.
Hartung, F. and Girod, B. (1996), “Digital watermarlung of raw and compressed video,” Proceedings of the European EOS/SPIE Symposium on Advanced Imaging and Network Technologies, Digital Compression Technologies and Systems for Video Communication, pp. 205-213. Hartung, F. and Girod, B. (1997), “Digital watermarking of MPEG2 coded video in the bitstream domain,” Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP 97), V O ~ 4, . pp. 2621-2624. Kalker, T. (1999), “System issues in digital image and video watermarking for copy protection,” Proceedings of IEEE International Conference on Multimedia Computering and System, vol. 1, pp.562-567
Video Watermarking: Requirements, Problems and Solutions 261
Kalker, T., Depovere, G., Haitsma, J., and Maes, M. (1999), “A video watermarlung system for Broadcast Monitoring,” Proceedings of IS&T/SPIE/EI25, Security and watermarking of Multimedia Content, vol. 3657, pp.103-112. Koch, E. and Zhao, J. (1995), “Towards robust and hidden image copyright labeling,” Proceedings of the IEEE Workshop on Nonlinear Signal and Image Processing, pp. 452-455. Kutter, M. (1 998), “Watermarking resisting to translation, rotation and scaling,” Proceedings of SPIE, Multimedia Systems and Applications, vol. 3528, pp. 523-53 1. Langelaar, G.C., Setyawan, I., and Lagendijk, R.L. (2000), “Watermarking digital image and video data - A state-of-the-art overview,” IEEE Signal Processing Magazine, vol. 17, pp. 2046. Lee, J.S. and Miller, L.E. (1998), CDMA system engineering handbook, J. s. Lee Associates, Inc., Press of Artech House, Inc. Lin, E.T. and Delp, E.J. (2002), “Temporal synchronization in video watermarking,” Proceedings of the SPIE International Conference on Security and Watermarking of Multimedia Contents IV, V O ~ .4675, pp. 493-504. Mallat, S. and Hwang, W. (1992), “Singularity detection and processing with wavelets,” IEEE Transactions on Information The09,V O ~ .38, pp. 617-643. Mobasseri, B.G. (1999), “Exploring CDMA for watermarking of digital video,” IS&T/SPIE Electronic Imaging‘99, Session: Security and Watermarking of Multimedia Contents, pp. 96- 102. Nam, J. and Tewfik, A.H. (1997), “Combined audio and visual stream analysis for video sequence segmentation,” Proceedings
262
C. Busch and X . Nau
of the 1997 International Conference on Acoustics, Speech and Signal Processing, pp. 2665-2668.
Niu, X., Schmucker, M., and Busch, C. (2002), “Video watermarking resisting to rotation, scaling, and translation,” Proceedings of the SPIE International Conference on Security and Watermarking of Multimedia Contents IV, vol. 4675, pp. 5 12-519.
0 Ruanaidh, J. J. K., and Pun, T.(1997), “Rotation, scale and translation invariant digital image watermarking,” Proceedings of the IEEE International Conference on Image Processing (ICIP 97), Santa Barbara, CAYvol. 1, pp. 536-539 OCTALIS project (2000), http://www.igd.fhg.de/igd-a8/projects/octal is Rao, K. and Hwang, J.J. (1996), “Techniques and standards for image, video and audio coding,” Chapter 11, Prentice Hall, Upper Saddle River, N.J. Reimers, U. (1994), “Concept of an European system for the transmission of digitized television signal via satellite,” SMPTE Journal, vo1.103, pp. 741-747. Pereira, S., 0 Ruanaidh, J.J.K., Degudlaume, F., Csurka, G. and Pun. T. (1999), “Template based recovery of Fourier-based watermarks using log-polar and log-log maps,” IEEE International Conference on Multimedia Computing and Systems, Special Session on Multimedia Data Security and Watermarking, pp. 870874. Swanson, M.D. Kobayash, M., and Matsui, K. (1998), “Multimedia data-embedding and watermarkmg technologies,” Proceedings of IEEE, vo1.86, pp.1064-1087
Video Watermarking: Requirements, Problems and Solutions
TALISMAN project (1998), http://www.tele.ucl.ac.be/TALISMAN
263
This page intentionally left blank
Chapter 10 Digital Video Watermarking: Techniques, Technology and Trends Deepa Kundur, Karen Su and Dimitrios Hatzinakos In this chapter, we present the problem of digital video watermarking. Video watermarking is first distinguished from traditional still image watermarking by introducing distinct challenges for video watermarkers and attackers. Common paradigms such as communication theory, steganography, perceptual coding and estimation theory used for algorithm design are introduced to provide a multidisciplinary flavor to the area. State-of-the-art algorithmic research and emerging applications are presented focusing on influential developments in the area. This work is balanced with a discussion of hardware implementation issues and perspectives for digital video watermarking. Trends in video watermarlung technology and future directions of research conclude the chapter.
1
Introduction
A general digital video watermarking system is illustrated in Figure 1. There are two main processes: watermark embedding and watermark detection. In watermark embedding, a watermark signal usually containing a “payload” data bit stream U is embedded into an original host digital video sequence X by imposing invisible changes to produce a watermarked video signal X, that it is perceptually identical to X. The embedding may be performed with the use of a secret key K which is a sequence of bits embodying any secret 265
266
D. Kundur, K. Su, and D. Hatzinakos
parameters and components of the watermarking system. Detection involves producing an estimate of the watermark payload U from a possibly tampered version of the watermarked video signal X,. We distinguish the signal used for watermark estimation Xw from X , because of the presence of unavoidable incidental distortions or intentional attacks on the marked video in the distribution channel between the embedder and the detector. Depending on the application, the watermark is embedded in a robust, semi-fragile or fragile manner. The majority of video watermarking research involves robust embedding which is the primary focus of this chapter.
Object Covert Data
U
Cryptographic
Watermark Embedding Procedure A
>
xw
Watermarked
=- Multimedia Object
K
Key Watermarked 2, Multimedia Object
1.1
Watermark
* Detection/Extraction Procedure
Data
* Estimate
Video vs. Still Image Watermarking
Much of the initial work in the area of digital watermarking was applied to still image data. These approaches can be readily extended to raw video sequences by viewing each video frame as an isolated
Digital Video Watermarking: Techniques, Technology and Trends 267
image. However, there are additional considerations that must be accounted for in the design of a video watermarking system: 1. Video information has significantly higher volume than other common types of media such still images which has implications for the watermarking system design and implementation.
2. There is a higher degree of perceptual irrelevancy and redundancy in video that can be exploited by both the watermarker and attacker. 3. Most video applications that employ watermarking include other processing stages such as video compression or encryption that must be compatible and must work effectively together. These distinctions have several implications for the video watermark designer. Complexity is a significant implementation issue for watermarking especially for real-time system design. Video watermarking schemes are much more likely to be built in hardware than other media counter-parts, so effects of finite precision are of greater overall concern. Furthermore, integration of watermarking with other processes such as compression and encryption may allow for the re-use of modules to reduce overall system expense. These implementation restrictions are balanced by greater flexibility in algorithm design. For instance, more sophisticated perceptual models based on temporal masking are available for use. The low watermark payload to host signal volume ratio for many video applications provides greater flexibility and an increased size of the watermark design solution space. That is, since there is much more volume of the host signal in which to embed the watermark, the possible ways in which redundancy can be exploited for watermark robustness increases. The task of the attacker is, in part, made more interesting because of the presence of temporal redundancy that is not fully exploited by the
268
D. Kundur, K. Su, and D. Hatzinakos
watermark designer. Correlations in video frames can be exploited to reduce or estimate the presence of the watermark in the sequence through an attack known as collusion (Su et al. 2002). Moreover, frame dropping or re-ordering can be applied to de-synchronize the watermark for unreliable detection. The importance of protecting against each of these attacks is an application-dependent factor. As we will see, in many emerging applications intentional watermark removal attacks is not a significant threat.
1.2
Applications
Video watermarking applications can be classified into two main groups: security-related, and value-added. Security related applications include watermarking for copy control, fingerprinting, ownership identification, authentication and tamper assessment. In copy control, one of the first popular applications for video watermarking, (Bloom et al. 1999), a fixed length payload is embedded into the host video sequence so that a security-compliant digital video player that houses a watermark detector can extract the watermark payload. Based on the payload data, security-related actions such as refusal to play or duplicate the video sequences are activated by the player. Fingerprinting is the process by which a distinct watermark is embedded into each digital video sequence to trace the source of unwanted processing of the content at a later time. This problem has application in video-on-demand scenarios to distinguish and trace the different distributed copies of the video. Ownership identification entails a watermark that reflects information of the originator; this requires the watermark to be embedded robustly which may or may not be possible under all conditions such as blind watermark detection in which the host video sequence is not available at the detector (Cox and Miller 2002). Authentication and tamper-assessment are envisioned to be successful applications for watermarking. Here, the payload is often a digital signature used for authentication that cannot be easily separated from
Digital Video Watermarking: Techniques, Technology and 7kends 269
the content. The watermark is extracted at the receiver and a decision is made on whether or not the content is authentic. Further processing can also be applied on the extracted watermark to characterize the type of tampering that was applied (Kundur and Hatzinakos 1999). We term applications that use watermarking as a means to provide additional non-security services as value-added. As watermarking was assessed and characterized within the research community, it became clear that the technology is fundamentally “weak” for high security applications such as ownership identification. However, it had the following attractive characteristics. Watermarking: 0
0
0
leaves the perceptual quality of the content and its format relatively unchanged, so that the presence of the payload does not affect the operation of video processing modules related to transmission and distribution such as encryption, and transcoding significantly. provides “free” additional bandwidth by malung use of redundancy and irrelevancy not normally exploited by traditional systems. ties payload information directly to the content, so that it is not easily separated. The watermark undergoes the same manipulations as the digital media.
As a result, many commercial applications such as legacy system enhancement and database linking (Cox and Miller 2002) make use of watermarking. For instance, companies such as Chinook Communications (Chinook Communications 2003) are embedding digital signals withm standard NTSC analog cable channels. This allows cable channels to provide digital services to consumers without significantly affecting their traditional analog channels. The technology is capable of providing a digital bandwidth of up to 6 Mbps. Other
270
D. Kundur, K . Su, and D. Hatzinakos
applications for video watermarking include video tagging to allow for hypewideo in which URLs are embedded within video to allow for added-services. A system by Philips Electronics called Watercast is employed for automatic monitoring of digital video broadcasts (Kalker et al. 1999). Digimarc has a product called MediaBridge that is able to link traditional print media to associated web sites (Digimarc 2003). In addition, current research is focused on digital video watermarking to enhance the error-correction capabilities (Robie and Merserea 2002) and color compression efficiency (Campisi et al. 2002).
2
Models and Measures
2.1
Figures of Merit
In addition to the standard figures of merit for most watermarking systems that are discussed in other chapters of this book that include robustness, reliability, imperceptibility, and practicality (i.e., non-blind detection capability), there are additional considerations that set video watermarking apart. For instance, designers must address:
Localized detection which is the ability of the receiver to detect the watermark within a small number of frames, ideally even from a single frame in isolation. The maximum number of frames allowed for detection is often an application-specific measure (Kalker 1999). Real-time algorithm complexity which refers to the capacity of the detection andor the embedding algorithms to execute within specified real-time deadlines. The reader should note that for most video watermarking commercial applications such as copy protection, the watermark embedding algorithm, which can occur
Digital Video Watermarking: Techniques, Technology and R e n d s
271
off-line, is not as critical as that of the detector. This can be achieved through the design of asymmetric schemes that improve detection cost at the expense of embedding complexity.
Synchronization recovery which is the facility of the detection algorithm to compensate for distortions on the watermarked signal that “misalign” the embedded watermark. This attack is especially prevalent in video watermarking in which there exists an additional temporal dimension for which desynchronization can be applied. Some techniques to compensate for this attack work in shift and scale invariant domains (0 Ruanaidh and Pun 1997). Others embed references or templates (Lin and Delp 2002). Area and time complexity which refer to the hardware implementation measures of physical size of the chip and processing speed, respectively. This measure is a function of both the algorithmic details and the aggressiveness of the hardware realization (Mathai et al. 2003). Effects of floating-point representation which involves the performance deviation of watermark embedding and detection due to the necessary imprecise representation of input, output and intermediate signal values. Such an analysis for the JAWS can be found in (Mathai et al. 2003). Power dissipation which is the amount of power consumed for watermark processing and is a significant factor in portable devices which work on batteries. This factor is dependent on both algorithmic and hardware implementation design strategies which must work together for ambitious gains.
2.2
Paradigms
The area of watermark design borrows tool-sets and principles from a number of well-established areas. We highlight a number of important archetypes in this section.
272
D. Kundur, K. Su, and D. Hatzinakos
Steganography is the art and science of hiding data (Wayner 2002). This general area involves the problem of masking the existence of information within some host message. It generally spans a broader class of applications than watermarking and may include information embedding in text and non-digital media. The notion of information hiding is common to both steganography and digital watermarking, but the problems differ in that the latter is also concerned with issues of robustness or fragility of the embedded data. Perceptual Coding is analogous to watermarking because it involves the imperceptible modification of signal content. In the case of perceptual coding, the irrelevant and redundant components of the video signal are “shaped” to reduce overall storage requirements; for watermarking they are modified to embed supplementary information. Thus, given this interesting relationship, many algorithms have attempted to design video watermarlung schemes in the same flavor as successful perceptual coders making use of related transforms and perceptual models as we will see in Section 3. Communication theory is a popular analogy for watermarking that takes into account issue of robustness not completely addressed in the steganographic or perceptual coding paradigms. The watermark embedding procedure is likened to channel coding, and watermark detection is equivalent to the process of communication recovery and decoding. The effective watermark channel is characterized by the distortions and attacks applied to the watermarked video in the distribution channel. Some very popular approaches based on spread spectrum (SS) communications have been proposed early on the watermarking literature and is applicable for video embedding (Cox et al. 1997). Signal estimation theory is used more recently to devise effective watermark channel estimation and detector structures (Kundur
Digital Video Watermarking: Techniques, Technology and Trends 273
and Hatzinakos 2001) as well as successful attacks against digital video watermarlung algorithms (Su et al. 2002). The tool-sets in this area allow for more sophisticated and well-defined models that shed light on novel strategies for improved digital watermarking not readily evident from other perspectives.
Information theory and coding has received recent popularity within the watermarking community. Results from these areas have been theoretically applied to the digital watermarking problem to develop fundamental bounds for watermarking performance. An entire class of algorithms classified as informed embedding have been developed (Eggers and Girod 2002). These ideas have not yet been applied to video watermarking as the bridge between theory and practice has not yet been fully established at this time. However, we believe that at this time it is fruitful to integrate theoretical work in informed embedding with practical video watermarking frameworks.
3
Video Watermarking Algorithms
Robust invisible video watermarking techniques differ in terms of the domain in which the watermark is embedded or detected, their capacity, real-time performance, the degree to which all three axes are incorporated, and their resistance to particular types of attacks. In this section we present an organizational framework to classify some existing algorithms. Our goals are to identify trends and survey the area through the exposition of popular methods. We will describe each class of algorithms, present the important ideas introduced by various representative schemes, and discuss their strengths and limitations.
274 D. Kundur, K. Su, and D. Hatzinakos
3.1
Classification of Video Watermarking Techniques
One possible taxonomy for existing video watermarking techniques is presented in Figure 2. The methods can be divided into two main groups based on the domain in which the watermark is embedded. The transform domain techniques can then be further sub-divided depending on the nature and dimensionality of the transform domain. For video watermarking, the domain used for embedding has an effect on the complexity, portability, and robustness of the watermarking algorithm (Fei et ul. 2003). Thus, it is a commonly used characteristic for classification. Invisible Robust Video Watermarking Techniques
Pixel Domain Methods
Transform Domain Methods
MPEG-2Based
FrameBased
Group-ofFrame-Based
Figure 2. Classification map of existing digital video watermarking techniques.
3.2
Pixel Domain Methods
We begin our exploration by discussing video watermarking techniques in the pixel domain. The watermark pattern is typically generated by applying spread spectrum modulation to the covert data sequence. Insertion of the watermark within the host signal is based on simple operations, such as addition or bit replacement of selected pixels. Detection proceeds by correlating the expected watermark pattern with the received signal.
Digital Video Watermarking: Techniques, Technology and Trends 275
The main strengths of pixel domain methods are that they are conceptually simple and have low computational complexities. As a result they have proven to be attractive for video watermarlung applications where real-time performance is a primary concern. These advantages come at a compromise: watermark optimization for robustness and imperceptibility is often difficult when limited to only spatial analysis techniques. Major research challenges for this class of methods includes determining methods robust to or that can recover from desynchronization attacks such as geometric distortions and frame swapping, and consideration of watermark evolution along the temporal axis for robustness to multiple frame collusion. The four methods that fall into this class can be distinguished by the dimensionality of the watermark pattern. Techniques based on 1D and 2D spread spectrum modulation, and 3D CDMA modulation have been proposed. 3.2.1
1D spread spectrum modulation
Hartung et al. propose an early spread spectrum watermarking approach (Hartung and Girod 1998). Spread spectrum techniques are attractive tools for watermarking applications since they facilitate the robust covert transmission of low energy narrow-band signals through a wide-band channel. In particular, spread spectrum signaling supports the transmission of multiple hidden signals over the same channel, each with high resistance to narrow-band interference and eavesdropping. Video media is especially suitable for such watermarking schemes because of the large bandwidth supplied by the channel. In the basic algorithm, each user data bit is spread over a large number ( C T ) of chips and modulated by a PN sequence (p) to form the watermark'. The video sequence and watermark are represented as 'The key from our general digital watermarking system model may be a direct representation of this sequence or a seed used to generate it.
276
D. Kundur, K. Su, and D. Hatzinakos
I D vectors and combined directly by addition. Thus the technique is analogous to direct sequence spread spectrum. Considering only a single user data bit for simplicity gives the following expression for the watermarked sequence:
Xw,i = X i + p i . ai . U,i
= 0 , .., (CT -
1)
(1)
where i is the spatial index for the signal of interest, aiis a local scaling factor used to match the amplitude of the watermark to the region ( X i ) into which it is embedded. When a watermarked object is received, the synchronization module has the task of recovering and properly aligning these regions so that detection of each bit can proceed by demodulation with the same PN sequence. The correlation sum S, expected value of S denoted by E ( S )and the estimated watermark U are given by cr-1
cr-1
i=O
i=O
nl . pa
U , assuming that p and X are uncorrelated.
E(S)
=
CT
0
=
sign(E(S))
(2)
where X i is the component of the watermarked signal containing the host and any processing or attack “noise.”
To improve the performance of the detection algorithm, a high-pass whitening filter is applied to the signal before demodulation, thus further reducing the correlation of p and X . Also observe from Equation 2 that the general robustness of the watermark can be improved by increasing CT, 0: , or pa. However increasing CT reduces the data rate of the scheme, since larger regions of the multimedia object are required to convey each information bit. The drawback of increasing CJ; or pa is reduced imperceptibility, since the amplitude of the modifications made to the original pixel intensities will be increased. Thus there is a performance compromise for schemes based on direct sequence spread spectrum concepts, which is characterized in terms of a tradeoff between robustness, data rate, and imperceptibility.
Digital Video Watermarking: Techniques, Technology and Trends 277
3.2.2 Just another watermarking system (2D spread spectrum)
JAWS was proposed by Kalker et al. to enable monitoring sites to verify and track video data transmitted over broadcast links (Kalker et al. 1999). It was specifically designed to be robust to video coding, D/A and A/D conversion, limited editing (e.g. subtitle or logo insertion), format conversions, changes of aspect ratio, and transmission errors. Since real-time watermark detection is critical, the pixel domain was chosen for its simplicity. The scheme is a 2D spread spectrum method since the watermark pattern is a two-dimensional spread signal; its most distinctive features are its shift invariance and enhanced payload capabilities. Figure 3 illustrates how the watermark is designed to achieve these goals. The basic pattern is an M x M block of independent identically distributed coefficients drawn from a standard Gaussian distribution. Therefore, its power spectrum is white and its autocorrelation is a 2D impulse. The lag for which the maximum correlation sum is achieved is indicated by a black square. If the watermark pattern is circularly shifted, the cross-correlation of the shifted and basic patterns will attain its maximum at a correspondingly shifted lag.
A shift-invariant watermark symbol is created by combining offset copies of the basic pattern that are opposite in sign. Since the resolution of the correlator is not fine enough to detect single pixel offsets, a coarse grid of f x f is used. In Figure 3 the 8 x 8 grid is shown with blocks shaded in black and gray to indicate the relative positions of the positive (i.e. maximum cross-correlation) and negative (i.e. minimum cross-correlation) patterns, respectively. Their 2D offset encodes one of ( f ) 2 - 1characters in the covert data alphabet. To mark an arbitrarily sized video frame, the M x M watermark symbol is extended by tiling, possibly with truncation at the edges. Note that the watermark symbol can be detected independently in each of the tiles; the tiling operation is therefore like applying a repetition code.
278
D. Kundur, K. Su, and D. Hatdnakos
Basic watermark pattern ( M x M )
M x M watermark symbol block
I Covert data
Tiling
Watermark Embedding
Differentiallyencoded watermark symbol
... Tiled watermark frame
Data estimate Received watermark frame (translated)
Watermark symbol
Watermark DetectionlExtraction
Figure 3. Watermark encoding in JAWS.
Detection begins by folding the received frame into an M x M structure. This is achieved by segmentation into M x M blocks and averaging. The picture components are expected to cancel each other out, while the watermark symbol is emphasized. Any shifts arising from transmission or attacks will appear in the folded block as cyclic shifts. By computing the cross-correlation of the basic watermark pattern and folded block, the lags at which the maximum and minimum values are attained can be determined. Since the covert data is encoded by the ofset between these indices, it can still be recovered correctly, thus the scheme is shift-invariant. Unlike the computationally complex sliding correlators that have been proposed in other pixel domain techniques, shift-invariance is achieved in JAWS by generating a 2D periodic watermark and using a fixed size 2D correlator, which can be implemented using 2D FFTs. The block size M controls the balance between robustness and data
Digital Video Watermarking: Techniques, Technology and Trends
279
rate; a smaller value results in more redundancy and hence robustness, but also decreases the size of the symbol alphabet. It can be shown that JAWS, in addition to other video watermarking schemes, are susceptible to multiple frame or block collusion. An attacker may fold an image or even average a sequence of dis-similar images to obtain an approximation of the basic watermark pattern block.
3.2.3 Spatially localized image dependent watermarking Given the collusion susceptibility of many 1D and 2D image watermarking schemes (Su et al. 2002), a novel algorithm that attempts to provide 3D watermarking protection while processing the video signal in a frame-by-frame manner was presented by Su et al. entitled Spatially Localized Image DEpendent (SLIDE) Watermarking. This work attempts to employ watermark design strategies that guarantee resistance to collusion (Su et al. 2002) in a practical frame by frame video watermarking algorithm. A basic s x s watermark pattern is generated or established and is repeatedly embedded such that it is centered around a fixed number of selected points known as anchors in every video frame. The watermark frame can be considered as the convolution of the basic s x s watermark pattern with the selected anchor points in each frame. Thus, only part of the video frame, where the basic watermark pattern lies, called thefootprint, is used to embed the watermark (Su et al. 2002). Feature extraction to produce the anchor points makes use of an algorithm based on interpolation distortions presented in (Su 2001). It can be shown that as the content of the frames vary, so do the selected feature points. Thus, the watermark footprint evolves with the host video. Once the watermark frame is formed for embedding in a given host frame, spatial masking is applied to the watermark frame to modulate its strength locally according to the properties of the video frame itself. The spatial masking is established with the use of local image-dependent scaling factors, derived from the noise visibility function (NVF) proposed by Voloshynovskiy et al.
280
D. Kundur, K. Su, and D. Hatzinakos
(Voloshynovskiy et al. 2000) that optimizes robustness while maintaining imperceptibility. Finally, the scaled watermark is embedded by addition to the host. The main steps of the proposed embedding algorithm are illustrated in Figure 4.The reader should note that the algorithm exhibits diversity, which is exploited at the detector, since the same watermark pattern is available around every anchor point.
Host Video Frames
Private Key
K
-
-
U Footprint A, Generation Spatial
Watermarked
Basic Pattern
Figure 4. Block diagram of proposed watermark embeddor.
The first step in the detection process is to estimate the anchor points in order to identify the footprint which reveals the location of the watermark signal. These features are computed the same way they are at the embeddor. Then, the NVF is estimated in order to attempt to “unscale” the watermark pattern to facilitate detection. From a communications perspective, the local scaling factors act as a multiplicative noise and the unscaling operation corresponds to a deconvolution in “frequency.” After generating the basic watermark pattern, the detection process is applied. The main steps in the proposed detector algorithm are illustrated in Figure 5. To reduce the power of the host image component, a 3 x 3 Laplacian filter is applied before any subsequent processing. Then given the estimated local scaling factors from the NVF a maximum ratio combing (MRC) detector is implemented to take full advantage of the spatial diversity inherent in the watermark (Su 2001). Basically,
Digital Video Watermarking: Techniques, Technology and Tkends 281
3
Transformed)
=-
Private Key
Basic Pattern Generation
K
‘
ws
-
Certainty
Maximal Ratio Combining
~
Figure 5. Block diagram of proposed watermark detector.
the SNR of each watermark pattern repetition embedded around each anchor point is estimated (details can be found in (Su et al. 2002)). Then, for each watermark pattern repetition, the SNR is used to weigh the “accuracy” of the watermark information at that location and is optimally combined using standard MRC detector theory. To improve performance, the watermark repetitions with very low SNRs can be rejected. The authors provide theoretical results to demonstrate the improved performance of the scheme for collusion and also report practical improvements in simulations and testing (Su et al. 2002). Some limitations of the algorithm include increased complexity in comparison to schemes such as JAWS, sensitivity of the algorithm to the implementation of the feature extraction phase, and some reduction of the robustness of the algorithm to standard non-collusive image processing operations (such as global filtering) due to the fact that the watermark is embedded only in the footprint rather than the entire frame. 3.2.4
CDMA modulation
In (Mobasseri 1999), Mobasseri proposes a fundamentally different scheme based on replacement rather than addition. Each video frame is decomposed into bitplanes, e.g. for an 8-bit gray-scale video there are 8 bitplanes per frame. As illustrated in Figure 6, the video is
282
D. Kundur, K. Su, and D. Hatzinakos
marked by replacing one of the four least significant bitplanes of each frame with a watermark plane. The bitplane to be replaced is selected according to a random periodic quaternary sequence. The watermark planes are generated as in the 1D spread spectrum method: the data bits are spread and modulated by an m-sequence. The CDMA modulation technique presents a truly three-dimensional approach and illustrates some of the advantages and difficulties of incorporating the time domain into video watermarking.
Video Frames Eligible Bitplanes
Video Bitplanes
Watermarked Video BitpIanes
Random Quaternary Sequence
0
3
1
Figure 6. Bitplane selection and replacement in CDMA modulation method. The gray bitplanes represent those for possible replacement in watermarking. The dark gray ones are those selected at random for actual watermark insertion.
Aside from the specific algorithmic details of the scheme, there are a few general issues that must be considered when incorporating bitplane replacement for watermarking:
Digital Video Watermarking: Techniques, Technology and Trends
0
0
0
283
Given the video sequence to be marked, the number and positions (i.e. significance) of the bitplanes eligible for replacement must be determined experimentally to ensure imperceptibility. There is no general result that holds for arbitrary video sequences.
As the significance of the replaced bitplanes increases, it can no longer be assumed that they are uncorrelated from frame to frame. For instance, the bitplane sequence in position 3 (see Figure 6) is expected to vary slowly in time, therefore it is possible that a replaced bitplane can be detected by considering the temporal characteristics of that bitplane sequence. For the more insignificant bitplanes, standard LSB watermark attacks such as lossy compression can potentially defeat the watermark.
Both spatial and temporal synchronization are critical issues for detection of the CDMA watermark. The authors propose a two-level hierarchical correlation: Given a sequence of test frames, first they are aligned with the quaternary sequence, then the indicated bitplanes are extracted and their correlation with the m-sequence is computed. The second level involves shifting the test sequence, relative to the quaternary sequence, and repeating the inner correlation until the maximum value is attained. The complexity of the outer temporal sliding correlator is bounded by the period of the quaternary sequence. This strategy is similar to the spatial sliding correlator introduced in JAWS, whose complexity was bounded by the period of the tiled blocks. However, in the case of JAWS the correlation sums are more efficiently computed in a convolution. The authors report that the robustness of the technique is increased by the two-level correlation. When the test sequence coincides both temporally and spatially with the quaternary and m-sequences, the decision value is more distinct than that obtained with only a single correlation. Because of the temporal component of the algorithm,
284
D. Kundur, K. Su, and D. Hatzinakos
detection reliability improves as the length of the test sequence increases. On the other hand in the case of a single frame, there is no added benefit to the two-level structure; however the computational complexity is still greater than that of lower dimensional pixel-based methods. Finally, it is the video processing attacks that present the greatest challenge for the CDMA modulation technique. Random frame swapping and dropping de-synchronizes the quaternary sequence and complicates watermark detection.
3.3
Transform Domain Methods
These techniques transform the host signal into an alternate domain and embed the watermark in the corresponding coefficients. Commonly used transforms are the DCT and DWT. Their energy compaction properties and frequency localization allow for more accurate and sophisticated modeling of the masking properties of the HVS and watermark attacks. This permits a greater ability to optimize the watermark for a given performance goal. The simplest watermarks are also additive random sequences, but are combined with the host signal in the transform domain. Other schemes embed the watermark by modifying invariant attributes of the host signal. Detection typically proceeds by transforming the received signal into the appropriate domain and searching for the watermark pattern or attributes. The main strength offered by these techniques is that they take advantage of special transform domain properties to address the challenges of pixel-based methods and to support additional features. For instance, designing a watermarking scheme in the 8 x 8 DCT domain leads to better implementation compatibility with popular video coding algorithms such as MPEG-2, and using shift- and rotationinvariant Fourier domains facilitates the design of watermarks that inherit these attractive properties. Finally, analysis of the host signal in a frequency domain is a pre-requisite for applying more advanced masking properties of the HVS to enhance watermark robustness and
Digital Video Watermarking: Techniques, Technology and Trends 285
imperceptibility. Generally the main drawback of transform domain methods is their higher computational requirement. The transform domain methods examined here are grouped into three categories: those based on MPEG-2 coding structures, single video frames, and groups of frames (GOFs).
3.3.1 MPEG-2-based techniques Video watermarking techniques that use MPEG-2 coding structures as primitive components are primarily motivated by the goal of integrating watermarking and compression to reduce overall real-time video processing complexity. Compression in block-based schemes like MPEG-2 is achieved by using forward and bi-directional motion prediction to remove temporal redundancy, and statistical methods to remove spatial redundancy. The reader is referred to (Le Gall 1991) for an exposition on MPEG-2 and related video coding standards. One of the major challenges of schemes based on MPEG coding structures is that they can be highly susceptible to re-compression with different parameters, as well as conversion to formats other than MPEG. There are a number of MPEG-2-based techniques that have been proposed, including approaches based on GOP modification (Linnartz and Talstra 1998), high frequency DCT coefficient manipulation (Langelaar et al. 1998), (Kiya et al. 1999), DCT block classification (Chung et al. 1998), (Holliman et al. 1997), and three more robust and general algorithms that will be discussed in detail in this section. The two MPEG-2 watermarking methods considered here embed hidden data by swapping level-adjacent variable-length code (VLC) codewords and manipulating mean luminances over regions of pixels. The last approach that will be discussed is more of a general framework than a specific method. Hartung et al. explains how any picture-independent additive technique, specifically the 1D spread spectrum modulation method by the same authors, can be applied
286
D. Kundur, K. Su, and D. Hatzinakos
directly to compressed video streams without full decoding to the pixel domain. 3.3.I . 1 VLC swapping In (Langelaar et al. 1998), Langelaar et al. propose a method based on the observation that in the MPEG-2 VLC tables there are pairs of codewords ( T , 1) H co and ( T , I 1) H c1 such that length(c0) = Zength(cl),Zsb(co) = 0, and lsb(cl) = 1. The set of VLCs that are elements of such level-adjacent pairs are called label-carrying VLC (lc-VLC) codewords. A covert data sequence is embedded into a frame by extracting eligible Ic-VLCs, ci E { cg } U{ c1}, and swapping a codeword with its pair, if necessary, to ensure that the sequence of codeword LSBs corresponds to the data sequence, i.e. Zsb(ci) = Ui. One of the main strengths of this technique is its data rate; the authors report very high rates of up to 29 Kbps imperceptibly hidden in MPEG-2 coded video at 8Mbps. The algorithm is summarized in Table 1.
+
The covert data sequence is embedded directly by making modifications to the compressed domain representation of the video stream. Taking a closer look at the algorithm, we can see that it modifies the quantized values of mid-high range frequency coefficients in each 8 x 8 DCT block. We can therefore model the underlying watermark frame as a mid-high frequency signal whose properties change across 8 x 8 block boundaries. Since the watermark is not necessarily matched to the perceptually significant components of the frame, it will be relatively easy to remove using signal processing operations (Cox et al. 1997). The method is also particularly susceptible to re-compression at coarser quantizations or lower bit-rates. In addition, because there is no random key-based component, a clever attacker could easily destroy the message by making random modifications to the
Digital Video Watermarking: Techniques, Technology and Trends
287
Table 1. Summary of the VLC swapping algorithm.
I) Define the set of lc-VLCs as { c o } U { c l } , where co and c1 are MPEG-2 VLCs such that ( r ,1 ) ++ co, ( r ,Z f 1) H c1, Zength(c0) = Zength(cl), and Zsb(c0) # Isb(cl). In other words a codeword is label-carrying if there is another codeword of the same length, whose run level is adjacent and whose LSB is different. II) For each 8 x 8 block, the T, highest frequency lc-VLCs are extracted. Experimental trials indicate that a value of T, = 2 provides a good tradeoff between the number of eligible lc-VLCs (i.e. data rate) and imperceptibility (i.e. minimal visual artifacts). III) Each eligible lc-VLC ci encodes a bit by swapping it with its level-adjacent pair, if necessary, so that the sequence of codeword LSBs corresponds to the data sequence, i.e. Zsb(ci) = Ui. IV) The estimated data message is extracted by concatenating the LSBs of eligible lc-VLCs
readily identifiable lc-VLCs; the approach is more like data hiding than watermarking. Its main features are its very low computational complexity and ready-applicability to compressed MPEG-2 streams. Because of its high data rate, a potential direction for future enhancement is to improve robustness by applying an error correcting code to the transmitted data sequence before embedding. However, the data rate is constrained by the number of lc-VLCs, which varies nondeterministically from frame to frame, therefore some care must be taken in selecting the code.
3.3.1.2 Region-based energy manipulation Darmstaedter et al. propose a method that embeds hidden data by manipulating average energies, i.e. luminance intensities, in subregions of each frame (Darmstaedter et al. 1998). As in the previous
288
D. Kundur, K. Su, and D. Hatzinakos
technique, the data sequence U is embedded directly, without explicitly generating a watermark pattern. The technique also achieves a high capacity by embedding one bit into each 8 x 8 block, and error control coding for added robustness is possible. The most important concept introduced by the method is block classification. By categorizing blocks, the scheme can take advantage of local spatial characteristics and adjust its embedding strategy accordingly, thereby improving imperceptibility and robustness. The embedding procedure is comprised of three operations: Block classification and separation of the pixels into zones; further subdivision of each zone into categories defined by a secret key-based grid; and embedding data bits by manipulating the mean energy of the pixels in each category according to some difference thresholds. The algorithm is summarized in Table 2. To minimize visible distortions, all of the pixels in a category are adjusted uniformly, and the overall mean of each zone is conserved. The detection algorithm requires knowledge of the key to regenerate the secret grid pattern, but not the original. It categorizes the pixels as above, and by computing the mean energy of each category the most likely transmitted bit is determined. The main shortcoming of the method is its sensitivity to the embedding thresholds that must be experimentally optimized for each video sequence. It was also found that embedding the watermark into blocks of high and progressive contrast types resulted in perceptible degradations, therefore the most recently reported tests use only noise contrast blocks. To improve watermark robustness, basic HVS masking properties can be applied by using the variance of the noise contrast blocks to modulate the watermark strength ( I , see Table 2), i.e. noise blocks with higher variances can tolerate larger energy manipulations. However these procedures may add to the complexity of the algorithm and can potentially reduce the achievable data rate.
Digital Video Watermarking: Techniques, Technology and Trends 289
Table 2. Summary of the 8 x 8 block energy manipulation algorithm. ~
I) In each 8 x 8 block, one bit is embedded as follows: A) Rearrange the pixels in order of increasing luminance magnitude to form a monotonically non-decreasing function F . The properties of F characterize the block as having a noise, hard, or progressive contrast and separate the pixels into two zones (1,2). dx m a x
<
I
~
zone:
Zone 1
Zone 1
da: m a x
’ -
T1
X
Noise Contrast
Hard Contrast
Progressive Contrast
B) Subdivide the pixels into categories by overlaying a secret 8 x 8 key-based grid. The zone and grid define four categories: lA, lB, 2A, and 2B. C) Data bits Uiare embedded by adjusting the average intensities of the pixels in each category: 0
0
to embed a 0, set mean(1B)- mean(1A)2 1 and mean(2B)mean(2A)2 1 to embed a 1, set mean(1A)- mean(1B)>_ 1 and mean(2A)mean(2B) 2 I
where I is an embedding threshold level that can be increased to improve robustness or decreased to improve imperceptibility. II) Extraction proceeds in the same manner, i.e. block classification, division into zones, and sub-division into categories. Then the means of the pixels in each category are computed to produce an estimate Oi.The magnitude of the differences between the means indicates the degree of certainty that the watermark is present.
290
D. Kundur, K. Su, and D. Hatzinakos
Although like the pixel domain methods the detector requires good spatial synchronization to properly extract the watermark, robustness to geometric distortions can be improved to some extent by decreasing the resolution of the key-based grid. The authors identify further robustness-imperceptibility optimization as an ongoing research goal. Finally, because of its consideration of local picture composition, reasonable resistance to multiple frame collusion is expected. Assuming that the same grid pattern is used to transmit the same message in each frame, visually similar blocks would be watermarked in the same manner, and dis-similar blocks would be watermarked differently, since the zone definitions would vary. 3.3.1.3 Spread spectrum modulation (Compressed domain) In (Hartung and Girod 1998), Hartung et al. propose an extension to their ID spread spectrum modulation method that supports computational compatibility with MPEG-2 coded video streams. Three key concepts are introduced: 1. Because the DCT is a linear transform, the watermark is pictureindependent, and embedding is done by addition, the watermark can be embedded either in the pixel or in the 8 x 8 DCT domains. By arranging the 1D watermark vector into a frame-sized structure and transforming this frame to the 8 x 8 DCT domain, the watermark can be added directly to a partially decoded MPEG-2 video stream.
2. Since it is desirable that the watermarked video be no larger in storage size and no slower in transfer rate than the original, DCT coefficients in the watermark and video frames are combined only if the resulting VLC codeword is not longer than the original. In addition, zero coefficients are not affected, which means that embedding can in fact take place in the VLC domain, by looking exclusively at run levels and codeword lengths.
Digital Video Watermarking: Techniques, Technology and Trends 291
3. Drift compensation is required to cancel out the watermark components that will be added into P-frames and B-frames by the MPEG-2 decoder, due to motion compensated predictions or interpolations from other frames. The difference between the predictions made at the MPEG-2 encoder and decoder is exactly the frame that is needed for drift compensation; its contents are therefore transformed and combined with the watermark frame prior to embedding. Except for the rate control condition imposed by Concept 2 above, the watermark embedded into the 8 x 8 DCT domain, is identical to the 1D spread spectrum watermark. It thus exhibits the same strengths and weaknesses. However, since the pixel domain representation of the frame is not explicitly determined, the local scaling factors aiin the original algorithm cannot be computing using the spatial characteristics of the picture.
3.3.2 Frame-based techniques Since MPEG-2 is a frame-based video coding standard, many of the frame-based techniques could be readily applied in an MPEG2 environment. We distinguish the frame-based techniques by their focus on design issues other than MPEG-2 constraints and their more general approach to the watermarking problem. One of the main features inherently supported by such methods is that the watermark can always be detected from each frame in isolation. Because of the dimensionality of the transform, their complexities also tends to be lower than those of methods based on groups of frames. Finally, although it would seem that any image watermarking technique could be applied to video in a straightforward frame-by-frame manner, this is not generally an effective approach because of multiple frame collusion and temporal imperceptibility considerations. A fmer distinction can be made between frame-based techniques that use experimentally determined parameter values and those that
292 D. Kundur, K. Su, and D. Hatzinakos
truly optimize performance by applying perceptual models of the HVS. Generally, the perceptual techniques exhibit higher robustness as a result of their optimality. However they also tend to have higher computational complexities. One drawback of the experimental techniques is that they may require re-tuning for specific video sequences. There is another emerging class of techniques that adaptively adjusts parameter values based on local picture characteristics2. Such techniques do not explicitly use perceptual models, however they are an attractive alternative, offering a good balance between robustness and complexity; a discussion on building watermark optimization masks is presented in (Bartolini et ul. 1998). We examine the details of two frame-based methods here: the DCTbased spread spectrum and perceptual DCT-based approaches. 3.3.2.1 DCT-based spread spectrum
One of the first transform domain methods, upon which many variations have been based (Qiao and Nahrstedt 1998), (Briabane et al. 1999), (Zhu et al. 1999), is presented by Cox et al. (Cox et al. 1997). It is considered a spread spectrum technique, even though it does not use a spreading code, because the watermark energy is spread over a large bandwidth, thus protecting it from narrow-band interference. The authors discovered and stress in their method the importance of embedding the watermark into perceptually significant components to increase robustness to signal processing and lossy compression attacks. The watermark itself is a sequence of length n, where each value is independently drawn from a standard normal distribution. Drawing watermark coefficients from continuous distributions is thought to offer superior robustness properties (e.g. compared to discrete distributions like binary PN sequences) because exact cancelation is not statistically possible. 'The idea is that the variance of a small block of pixels can be used as a measure of the entropy or activity within that block. A watermark can be embedded more strongly in regions of higher entropy, therefore the variance can be used to adapt the embedding strength.
Digital Video Watermarking: Techniques, Technology and Trends 293
The algorithm is based on the N x M DCT taken over each N x M frame. For video frames the perceptually significant components, i.e. those whose magnitudes are the greatest, typically correspond to the lower frequency coefficients. The watermark is therefore embedded into these coefficients. Contrast masking properties of the HVS (discussed in more detail in the next section) also dictate that large coefficients are less sensitive to changes of a given magnitude than small ones. This property is used to increase the strength of the watermark, however because of the global nature of the transform, local features cannot be taken into consideration. The major steps in the embedding and detection procedures are summarized in Table 3. The technique does not support blind detection; both the original frame and the watermark are required. The original frame is used to characterize and reverse any distortions that the frame may have been subjected to; for instance if Xw has been cropped, parts of the original X can be used to patch it up. Detection proceeds by transforming both the original and the test frames into the DCT domain, and correlating the difference vector with the expected watermark pattern. Thus the method is still dependent on absolute synchronization, except that instead of pixels in the spatial domain, it is coefficients in the transform domain that must be properly synchronized. The conjecture is that it is more challenging for an attacker to disrupt transform coefficients without damaging the video. The method is reported to be particularly resistant to multiple document collusion attacks, compression, spatial scaling, dithering, and cropping. An attractive design feature is that multiple watermarks embedded sequentially using the same method are found to be independently reliably detectable. The technique offers some protection against geometric distortions; they must first be inverted to enable successful detection. Although the watermark is embedded into perceptually significant components, one can take greater advantage of
294 D. Kundur, K. Su, and D. Hatzinakos
Table 3. Summary of the DCT-based spread spectrum algorithm.
I) Generate a watermark sequence Wiof length n drawn from the standard normal distribution. The key K may be used as a seed for the random number generator. 11) Apply the N x M DCT to each video frame X : X ~ 2 i g Z a g R e ~ d ( D C 7 ( XExtract ) ) ~ . the first n AC coefficients zli = Xr2+ 1 i = 1,.., n, where the DCT coefficients are zig-zag ordered from low to high frequencies starting at DC. 111) Scale the watermark by a global scaling - factor Q and the magni= Wi . Q . zli, i = 1,.., n. tude of the corresponding coefficients: Wi IV) Add the watermark and video frame in the DCT domain to embed: X, = ~C';T-l(ZigZagWrite(X:, zli Gi, X i ) ) ,i = 1,.., n , j = ( n + 2 ),.., N M . V) Detection begins by re-generating the expected watermark pattern using K. VI) X, is compared to the original X to characterize and reverse any obvious distortions. Then both frames are transformed and their difference vector di = ZiyZagRead(DCI(X, - X ) ) i + l ,i = 1, ..n is projected onto the watermark to determine its degree of similarity: -~
+
+
IP-oj&)
G.ci I = Jw.w
VII) If the similarity measure is above a pre-defined threshold, then the watermark is deemed to be present in X,.
HVS masking properties to make it more adaptable to local image characteristics as discussed in the next section. 3.3.2.2 Perceptual DCT-based All invisible video watermarlung techniques can be said to be perceptual, in the sense that imperceptibility is an essential feature and parameters such as the embedding strength can be adapted to achieve
Digital Video Watermarking: Techniques, Technology and Trends 295
this goal. However, we distinguish perceptual watermarking methods as those that explicitly model masking properties of the HVS, and then apply these models to analyze the frame or video sequence to embed the watermark in an optimal way. There are five main properties of the HVS that can be exploited by video watermarking techniques:
Frequency sensitivity refers to the fact that the human eye’s ability to detect changes in light intensity at different frequencies is nonuniform. Assuming a fixed minimum viewing distance, there is a JND threshold for each frequency band such that modifications to the associated frequency-domain transform coefficients of magnitudes less than the JND cannot be detected by the human eye (Wolfgang et al. 1999). Luminance sensitivity refers to the human eye’s ability to detect a low amplitude noise signal superimposed on a uniform background. Assuming that the overall effect must fall below some threshold of detectability, the maximum tolerable noise luminance will be a non-linear increasing function of the average background luminance (Wolfgang et al. 1999). In other words, for any given region, the higher the average luminance, the brighter the background, and the more strongly a watermark can be embedded into this region. Contrast masking refers to the ability of the human eye to detect one signal in the presence of another, i.e. if the watermark signal is well masked, then it should not be detectable in the presence of the host signal; note that this is a picture-dependent property. The HVS is less sensitive to the addition of noise components that are of the same spatial frequency, orientation, and location as the components of the original picture (Wolfgang et al. 1999). Edge masking is related to the fact that the sensitivity of the HVS is reduced in regions near features of high luminance intensity variation (Reid et al. 1997), e.g. the edges or contours of objects.
296
D. Kundur, K. Su, and D. Hatzinakos
Therefore a watermark can be embedded more strongly into pixels near edges or in regions of high variance.
Temporal masking models have not yet been applied in any of the published video watermarking schemes. They are based on the fact that the human eye is less sensitive to distortions in regions that are temporally near to features of high luminance intensity (Reid et al. 1997). Therefore the contour of a moving object enables stronger embedding into pixels that are temporally near to the contour, i.e. in the same spatial location of adjacent frames. In (Wolfgang et al. 1999), Wolfgang et al. propose a method for embedding watermarks into compressed video streams by marking all I-frames and applying linear interpolation to mark the P-frames and B-frames between successive I-frames. The technique employs a perceptual model composed of a picture-dependent component based on luminance sensitivity and contrast masking and a pictureindependent component based on frequency sensitivity. Like the DCT-based spread spectrum approach, the watermark is a sequence of random numbers drawn from a standard normal distribution. However it is embedded in the 8 x 8 DCT domain to support compatibility with MPEG video coding algorithms and local watermark strength adaptation. Each 8 x 8 block of DCT coefficients is analyzed according to the perceptual model and a block of JND coefficients is produced. The entries in this block indicate the maximum amount by which the corresponding DCT coefficient can be modified without significantly affecting visual quality. The idea is that all blocks of the form DCI-’(X’ f a . J ) are perceptually equivalent to X , where X’= DC7(X ) , a E [ - 1,1],and J is the block of JND coefficients. However if the absolute magnitude of a DCT coefficient is smaller than its JND, visible distortions will be introduced. Therefore only
Digital Video Watermarking: Techniques, Technology and Trends 297
those coefficients that are larger in absolute magnitude than their corresponding JNDs are marked. The procedure is summarized in Table 4. Table 4. Summary of the Perceptual DCT-based algorithm.
I) For each 8 x 8 block of each video frame: A) Generate an 8 x 8 watermark block Wi,jof random numbers drawn from the standard normal distribution. B) Apply perceptual models to obtain an 8 x 8 block of JND coefficients Ji,j. C) Embed the watermark into the transformed video block X ' = D C 7 ( X ) as follows: If lXl,jl > Ji,j, set XL,i,j = X& Ji,j . Wi,j; otherwise set Xh,i,j = Xl,j.Finally invert the transform to get Xw = 2 7 C I - l ( X L ). II) Detection begins by re-generating the expected watermark pattern using K , and the JND coefficient matrix using X . 111) Xw is compared to the original X to characterize and reverse any obvious distortions. Then both frames are transformed and their difference d = D C 7 ( X w - X ) is computed. IV) The difference frame is then normalized with respect to J , and d . the resulting normalized frame dn,i,j = 3 is projected onto the waJ.,3 W.d termark to determine its degree of similarity: l p r o j (d,) ~ 1=& . V) If the similarity measure is above a pre-defined threshold, then the watermark is deemed to be present in Xw.
+
Observe that not only will perceptually significant frequency components with large coefficients be marked, but also some not so significant components, as long as the DCT coefficients are greater in absolute magnitude than the corresponding JNDs. Thus the watermark remains imperceptible and yet can be embedded more strongly compared to methods that select an arbitrary subset of coefficients
298
D. Kundur, K. Su, and D. Hatzinakos
and modify them by an experimentally confirmed value. A characteristic of the technique is that the number of watermarkable coefficients per block depends on the results of the perceptual analysis. The watermark data rate is not predictable and varies from frame to frame. Unfortunately the method does not support blind detection; the original video frame is required to assist in reversing any malicious distortions and it is also needed to re-generate the JND coefficient matrix. Since this matrix controls both the DCT coefficients into which the watermark was embedded, as well as the local scaling factor which was applied, it is essential for detection. The authors note that a reasonable estimate of the JND coefficients may be determined from the marked frame and therefore the original may not be absolutely necessary for reliable detection, however results from blind detection were not presented at the time. The main advantage offered by the perceptual DCT-based technique is that the watermark is embedded at an optimal strength given the underlying perceptual model, thus maximizing robustness in some sense. Improvements can be made, at the hrther expense of computational complexity, by considering other masking properties of the HVS. In a multiple sequential watermarking application, there is a subtle complication that may arise due to the maximum strength of the watermark: After a few such watermarks are embedded, the video may become visibly distorted, since at each iteration the JNDs are computed relative to an already modified copy. Finally, because of the locally image-adaptive capabilities of the watermark, the approach is particularly effective for frames containing highly nonuniform content.
3.3.3 Group-of-frame-based techniques Techniques based on GOFs offer a few important benefits over those that apply image-based ideas to video sequences. First of all they
Digital Video Watermarking: Techniques, Technology and Trends 299
can take advantage of the temporal properties of the video. This is an important consideration from the perspective of maintaining temporal imperceptibility, i.e. smooth watermark transitions from frame to frame. The time dimension also introduces another degree of freedom that can be used to increase robustness. Secondly, since GOF-based methods consider a number of frames in sequence, they provide a natural framework for exploiting temporal masking properties of the HVS. The main design challenges for GOF-based techniques involve computational complexity, compatibility with video codecs, and the ability to recover the watermark from a single frame, although it is embedded into an entire group.
3.3.3.1 3 0 DFT In (Deguillaume et al. 1999), Deguillaume et al. propose a video watermarking t e c h q u e based on the 3D DFT. The uncompressed video sequence is segmented into fixed-length blocks of I frames and each block is transformed to the 3D DFT domain. A multilevel spread spectrum watermark signal is embedded into each block by modifying selected mid-range spatio-temporal frequency coefficients. Two important new concepts are introduced by the 3D DFT method: First of all, to assist in automatically reversing any distortions seen at the detector, a secondary watermark or template is embedded along with the primary watermark. Secondly, to search effectively for this template after spatial or temporal scaling attacks, the log-log-log transformation is used to map scaling operations to simple shifts and an efficient search algorithm is developed.
A summary of the embedding and extraction procedures is outlined in Table 5. The approach does not consider perceptual properties of the HVS in embedding or optimizing the watermark. The midrange frequency components are chosen as a general tradeoff between modifications at higher frequencies, which tend to be more susceptible to signal processing attacks, and those at lower frequencies, whch tend to be more perceptible. Note that due to the
300 D. Kundur, K. Su, and D. Hattinakos
shift-invariance and wrap-around properties of the DFT, if the same watermark signal is embedded into each block, absolute temporal synchronization is not essential for detection. Two other strengths of the scheme are its support for blind detection and ability to recover from scaling attacks in an unsupervised manner. However, detection of the watermark without a sufficiently long sequence of frames may not be possible and the effects of frame dropping have yet to be investigated.
The proposed method is reported to resist spatial shifts, frame cropping, padding, re-scaling, frame rate changes, and MPEG compression at standard quality. A side effect of the location of the watermark, i.e. in the mid-range temporal frequency components, is that the 3D DFT watermark may have some resistance to multiple frame collusion since it contains both static and dynamic components. The main limitation of the technique is its computational complexity, which is high for both the embedding and detection procedures. The method is also not compatible, either in implementation form or in real-time performance, with compressed video sequences. The authors note that it would be most suitable for applications where robustness is more important than speed.
3.3.3.2 Perceptual scene-based In (Swanson et al. 1998), Swanson et al. propose a multi-resolution video watermarking approach that uses a perceptual HVS model to embed a highly robust watermark. The algorithm is scene-based and partitions the video into logical instead of arbitrary temporal segments. It achieves robustness to multiple frame collusion by constructing a watermark that is similar in visually similar parts of the video, and dis-similar in visually dis-similar parts. The method also proposes a dual-key solution to the protocol or deadlock attack using a host-dependent watermark. It takes advantage of two transform
Digital Video Watermarking: Techniques, Technology and Trends 301
Table 5. Summary of the 3D DFT algorithm.
I) Segment the video sequence into fixed-length blocks of 1 frames (typically I = 16 or 32). For each video block, the embedding procedure is as follows: A) Generate the watermark by applying an ( N ,A4) Gold code G to the data sequence U: 6 '= G Generally it is assumed that length(U) = A4 << length(W) = N . U ; also U z ,G,,J E {-1, l}, and so W, E [-Ad,Ad]. B) Apply the 3D DFT to obtain a spatio-temporal frequency decomposition of the block, X ' . For imperceptibility and robustness, the mid-range spatial and temporal frequency coefficients are defined as the allowable region for watermark modifications, XL. C) Randomly select N pairs of coefficients from the allowable reX:,2),i = 1,.., N . By making the selection pattern degion, pendent on the key K , it can be easily reproduced at the detector. D) For each pair of coefficients, if W, > 0, set X h = X:,l W,; and if W, < 0, set X h = X:,2 IW,I. E) The template is based on a sparse grid generated by K . The magnitudes of coefficients near the grid are adjusted to ensure that points lying on the grid correspond to local maxima of the 3D DFT decomposition. F) Finally invert the transform to get X, = D F T 1( X h ) . II) Detection begins by transforming a block of X, into the 3D DFT domain and extracting all local maxima. The template is re-generated from K and a fast linear search algorithm is applied in the log-loglog domain to re-scale X h . 111) The primary watermark signal is estimated by locating the N coefficient pairs ,. used for ,. embedding and computing the difference vector W, = Xh,z,l- Xh,z,2. IV) The transmitted data is determined by correlating I,@ and the Gold code G: G . I,@ = N . U - (A4- 1) + err'or = N U . Finally since the message is binary and bipolar, 0 = sign(G I,@).
c.
+
2
+
,
1
302
D. Kundur, K. Su, and D. Hatzanakos
domains, the temporal wavelet transform (TWT) and DCT, to exploit temporal features as well as pixel- and frequency-domain-based masking properties of the video. However, the double transformation adds to the complexity of the scheme and this is its main weakness for real-time applications. The embedding procedure begins by segmenting the video into scenes of LI, frames. A block diagram illustrating how each scene is watermarked is presented in Figure 7. First the TWT is applied to separate the scene into % low- and % high-pass wavelet coefficient frames. These are essentially like pixel domain frames, except that the low-pass frames contain only static components of the scene, while the high-pass frames encapsulate the dynamic components. Therefore a spatial masking model can be used to analyze each wavelet frame and generate a spatial optimization mask gi,j.A frequency masking model is also applied in the 8 x 8 DCT domain to generate a frequency optimization mask Ml,j. These masks specify the maximum tolerable deviation for each wavelet coefficient and the maximum scale factor by which each DCT coefficient can be modified, respectively, while still maintaining watermark imperceptibility. To protect the video from protocol attacks, a dual-key approach is proposed: The first key is the owner's key K and a second videodependent key is computed by passing the original video sequence through a one-way hash function. Both keys are used to produce a cryptographically secure binary watermark signature Y.Finally, each watermark frame is constructed by assembling 8 x 8 blocks l%i,j = Si,j x 2>C7-'(Y,:j x Ml,j),where yZlj is obtained by rearranging Y into a picture frame and applying the 8 x 8 DCT. After all LI, wavelet coefficient frames have been marked by addition with the corresponding watermark frame, the inverse TWT is taken to produce a marked pixel domain video scene. Detection of the watermark from a single video frame requires knowledge of the scene from which the frame was taken, i.e. blind
Digital Video Watermarking: Techniques, Technology and Trends
303
Pixel Domain
LI,Watermarked irW7-l Coefficient ([or each frame) Fo,,i . . F ( L I , - l ) .2 . 1
8 x 8 DCT
Coefficient Blocks
I
-
Figure 7. Watermark embedding in scene-based perceptual method.
detection is not supported. Given a test frame R from scene S , the onto detection statistic is computed as the projection of R - Fo~,, where Fo,,Jis the first low-pass temporal wavelet frame of S , I@oz,J is the watermark derived from FOzIJ, and Y is the owner’s signature. Since the first low-pass wavelet and watermark frames are associated with static components of the scene, they can be used to detect the watermark in any frame of the scene. The temporally layered nature of the watermark provides protection from multiple frame collusion, as well as added resistance to video
304
D. Kundur, K. Su, and D. Hatzanakos
processing attacks. The method is reported to be robust to most common signal processing and geometric distortion attacks including cropping, rotation, re-scaling, MPEG-2 compression, printing, and scanning. One other attractive feature is its suitability for distribution chain tagging applications, since it is demonstrated that a number of watermarks embedded sequentially can all be reliably detected in the final video sequence. The authors identify adding temporal masking properties to their perceptual model as a future direction of research.
4
Hardware Implementation Issues of Digital Video Watermarking Techniques
As discussed in the previous section, the initial focus of digital watermarking research was on the development of algorithmic and performance-enhancing approaches for specific applications. More recently, several theories of watermarking have emerged based on the tool-sets outlined in Section 2.2. The current focus in algorithm development has involved improving robustness primarily through the use of sophisticated perceptual models, interference and attack modeling for advanced detector design, appropriate transform domains for superior modulation, and powerful error-correction codes. Performance evaluation has primarily entailed investigation into the trade-off between robustness and imperceptibility.
In this section, we focus on another, often overlooked, dimensionhardware implementation cost-to this measure of performance. In particular, we discuss hardware complexity issues.
4.1
Why Hardware?
A watermarking system can be implemented with either software or hardware. In a software implementation, the algorithm’s opera-
Digital Video Watermarking: Techniques, .Technology and Trends 305
tions are performed as code running on a microprocessor. For example, high-level scripts written for a symbolic interpreter running on a workstation or machine code software running on an embedded processor are both classified as software implementations. Softwarebased watermarking also provides:
0
0
0
abstraction of the implementation from any hardware details. Thus, instead of being concerned with elements such as flip-flops, RAMS, and gates, the designer focuses on implementation of the algorithm at a much “higher” level. availability of software tools to aid in realizing various data operations. For instance, software designers have libraries of common processing functions so that they may borrow, to a large extent, from past implementations. limited means of improving area and improving time complexity (speed) of the implementation. The software designer does not have direct control over the way RAM and processor interact, posing a limit on speed. To reduce area, (s)he must try to limit the total amount of RAM required. This is in contrast to hardware where there is full control over timing of operations into the RAM, and direct control over the usage of expensive hardware resources.
Conversely, a hardware-based implementation is one where the algorithm’s operations are fully implemented in custom-designed circuitry. The overall advantage is that hardware consumes less area and less power. Although it might be faster to implement an algorithm in software, there are a few compelling reasons for a move towards hardware
306
D. Kundur, K. Su, and D. Hatzinakos
implementation. In consumer electronics devices, a hardware watermarking solution is often more economical because adding the watermarking component takes up a small dedicated area of silicon. In software, implementation requires the addition of a dedicated processor such as a DSP core which occupies considerably more area, consumes significantly more power, and may still not perform adequately fast! Therefore, hardware-level designs offer many more options to reduce area and improve speed than software-level design.
4.2
Hardware Complexity Constraints
4.2.1 Application constraints In this section we give the reader a feel for the hardware constraints of digital video watermarking algorithms as related to some specific applications. Two commonly proposed applications of digital video watermarking are broadcast monitoring, and copy control; a practical implementation of a watermarking system must conform to the constraints on time (performance) and space (area) demanded by these applications.
4.2.1.1 Embeddor complexity In broadcast monitoring, the watermark can either be embedded into the media long. before transmission, or at the time of transmission (e.g. in a live video feed). It is in the latter case that a strict real-time requirement must be met: the watermark embedder must function as fast as the video frame rate. Since the high cost of broadcast equipment can absorb costs associated with the embedder hardware, no strict area constraints are posed by broadcast monitoring. In copy control, embedding is usually done in the recording device prior to storage onto the media (Linnartz et al. 2000) in a process called remarking. There may also be applications where remarking occurs in the playback device. Since video data at these points will
Digital Video Watermarking: Techniques, Technology and Trends 307
likely be streaming at the real-time frame rate, there is a performance requirement that the embedder operate as fast as the frame rate. As copy control schemes will be used in consumer electronics where low-cost is essential, minimizing space complexity is also very important. As with the embedder, the applications of broadcast monitoring and copy control are used to obtain constraints for architecting the detector.
4.2.1.2 Detector complexity In broadcast monitoring, several channels of video data are monitored simultaneously and checked for the presence of watermarks. Due to the large amount of real-time video data that must be processed (continuous streams over many channels), detection at the frame-rate of video is required to ensure thorough monitoring; at a slower rate, portions of the video streams would have to be dropped. For copy control, as with the embedder, detection can be done at the frame rate of the video as video data is being streamed. Hence both applications pose the requirement for detection at the frame rate of video. As with the embedder, the detector space complexity is of no consequence in broadcast monitoring; however, with copy control, due to the need to minimize economic cost, a small implementation must be targeted. Although actual implementation details are beyond the scope of this chapter. The reader is referred to (Mathai et al. 2003) and (Petitjean et al. 2002) for case-studies in implementing popular watermarking algorithms for video applications. In (Mathai et al. 2003), Mathai et al. have designed an 0.18 pm CMOS chip for the JAWS watermarking algorithm. They constrain both the embedder and detector
308
D. Kundur, K. Su, and D. Hatzinakos
architectures to satisfy the aggressive requirements that the embedder handle real-time video frame rates, and minimize implementation area. In (Petitjean et al. 2002), Petitjean et al. design a hardware accelerator that can be attached to the main processor, in, perhaps, a system-on-chip MPEG encoder-decoder with watermarking capabilities.
5
Predicted Trends in Video Watermarking Research
Digital video watermarking is currently a very active area of research. We predict the following research directions for short-term future investigation: 0
0
0
New applications in the area of video watermarking are currently the focus of several research thrusts and special issues. It is predicted that one avenue of investigation will be in matching digital video technology with emerging applications. Some promising applications included the value-enhancement of legacy systems, on-line error-correction of video, and signal tagging for hypervideo among others. The bridge between theory and practice is another direction of work that will be fruitful. Digital video watermarking will only become a well-established and influential tool if we overcome many practical implementation issues and better characterize the trade-offs between performance and hardware cost. The integration of digital video watermarlung withn digital rights management (DRM) systems is the next logical step to applying these principles in practical security and data management applications. It is envisioned that combining watermarking with encryption and perceptual coding will bring fruitful gains in terms of performance-enhancement without additional cost.
Digital Video Watermarking: Techniques, Technology and Trends 309
0
Watermarking of video at other layers of a communication network such as the transport or network layer are also burgeoning areas of research. The availability of supplementary information at other levels of a communication infrastructure makes it possible to use the technology for other applications, Recently, work on data hiding in TCP/IP has been proposed in which video packet streams are used to pass on supplementary information in order to enhance standard network security processing (Ahsan and Kundur 2002).
The multidisciplinary flavor of digital video watermarking makes it an interesting topic of investigation. As highlighted in this chapter, video watermarking represents a family of solutions to a set of very diverse tasks. The design challenge is in selecting the appropriate methodology that exhibits the best overall compromise for a given problem.
References Ahsan, K. and Kundur, D. (2002), “Practical data hiding in TCP/IP,” Proceedings of the ACM Workshop on Multimedia Security, pp. 311-318. Bartolini, F., and Barni, M., Cappellini, V., and Piva, A. (1998), “Mask building for perceptually hiding frequency embedded watermarks,” Proceedings 5th International Conference on Image Processing - ICIP ’98,pp. 450-454. Bloom, J.A., Cox, I.J., Kalker, T., Linnartz, J.-P.M.G., and Miller, M.L., and Traw, C.B.S. (1999), “Copy protection for DVD video,” Proceedings of the IEEE, vol. 87, pp. 1267-1276. Brisbane, G., Safavi-Naini, R., and Ogunbona, P. (1999), “Regionbased watermarlung for images,” Lecture Notes in Computer Science, vol. 1729, pp. 425-435.
310
D. Kundur, K. Su, and D. Hatzinakos
Campisi, P., Kundur, D., Hatzinakos, D., andNeri, A. (2002), “Compressive data hiding: An unconventional approach for improved color image coding,” Eurasip Journal on Applied Signal Processing, vol. 2002, pp. 152-163. Chinook Communications Company Website:
http://www.chinook.com Chung, T.-Y., Hong, M.-S., Oh, Y.-N., Shin, D.-H., and Park, S.-H. (1998), “Digital watermarking for copyright protection of MPEG2 compressed video,” IEEE Transactions on Consumer Electronics, vol. 44,pp. 895-901. Cox, I.J., Kilian, J., and Leighton F.T., and Shamoon, T. (1997), “Secure spread spectrum watermarlung for multimedia,” IEEE Transactions on Image Processing, vol. 6, pp. 1673-1687. Cox, I.J. and Miller, M.L. (2002), “The fist 50 years of electronic watermarking,” Eurasip Journal on Applied Signal Processing, V O ~ 2002, . pp. 126-132. Darmstaedter, V., Delaigle, J.-F., Nicholson, D., and Macq, B. (1998), “A block based watermarlung technique for MPEG-2 signals: Optimization and validation on real digital TV distribution links,” Proceedings 3‘d European Conference on Multimedia Applications, Services and Techniques, pp. 190-206. Deguillaume, F., Csurka, G., O’Ruanaidh, J., and Pun, T. (1999), “Robust 3D DFT video watermarking,” Proceedings ofthe SPIE, V O ~ .3657, pp. 113-124. Digimarc Company Website: http : / /www .digimarc . com Eggers, J. and Girod, B. (2002), Informed watermarking, Kluwer Academic Publisher, New York, NY.
Digital Video Watermarking: Techniques, Technology and Trends 31 1
Fei, C., Kundur, D., and Kwong, R. (2003), “Analysis and design of watermarking algorithms for improved resistance to collusion,” IEEE Transactions on Image Processing (under review). Hartung, F. and Girod, B. (1998), “Watermarlung of uncompressed and compressed video,” Signal Processing, vol. 66, pp. 283-301. Holliman, M., Memon, N., Yeo, B.-L., and Yeung, M.M. (1997), “Adaptive public watermarkmg of DCT-based compressed images,” Proceedings of the SPIE, vol. 3312, pp. 284-295. Kalker, T. (1999), “System issues in digital image and video watermarking for copy protection,” Proceedings IEEE International Conference on Multimedia Computing and Systems, pp. 562-567. Kalker, T., Depovere, G., Haitsma, J., and Maes, M. (1999), “A video watermarking system for broadcast monitoring,” Proceedings of the SPIE, vol. 3657, pp. 103-112. Kiya, H., Noguchi, Y. Takagi, A., and Kobayashi, H. (1999), “A method of inserting binary data into MPEG video in the compressed domain,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E82-A, pp. 1485-1492. Kundur, D. and Hatzinakos, D. (1999), “Digital watermarking for telltale tamper-proofing and authentication,” Proceedings of the IEEE, V O ~ .87, pp. 1167-1180. Kundur, D. and Hatzinakos, D. (2001), “Diversity and attack characterization for improved robust watermarkmg,” IEEE Transactions on Signal Processing, vol. 29, pp. 2383-2396. Langelaar, G.C., Lagendijk, R.L., and Biemond, J. (1998), “Realtime labeling of MPEG-2 compressed video,” Journal of Visual Communication and Image Representation, vol. 9, pp. 256-270.
312
D. Kundur, K. Su, and D. Hatzinakos
Le Gall, D.J. (1991), “MPEG: A video compression standard for multimedia applications,” Communications of the ACM, vol. 34, pp. 46-58. Lin, E.T. and Delp, E.J. (2002), “Temporal synchronization in video Watermarking,” Proceedings of the SPIE, vol. 4675, pp. 493-504. Linnartz, J.-P.M.G. and Talstra, J.C. (1998), “MPEG PTY-Marks: Cheap Detection of Embedded Copyright Data in DVD Video,” Proceedings 5th European Symposium on Research in Computer Security, pp. 22 1-240. Linnartz, J.-P.M.G., Talstra, J.C., Kalker, T., and Maes, M. (2000), “System aspects for copy management for digital video,” Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 203-206. Mathai, N.J., Kundur, D., and Sheikholeslami, A. (2003), “Hardware implementation perspectives of digital video watermarking algorithms,” IEEE Transactions on Signal Processing, vol. 5 1, pp. 925-938. Mobasseri, B.G. (1999), “Exploring CDMA for watermarking of digital video,” Proceedings of the SPIE, vol. 3657, pp. 96-102. 0 Ruanaidh, J.J.K. and Pun, T. (1997), “Rotation, scale and translation invariant digital image watermarking,” Proceedings of the IEEE International Conference on Image Processing, pp, 536539.
Petitjean, G., Dugelay, J.L, Gabriele, S., Rey, C. and Nicolai, J. (2002), “Towards real-time video watermarking for system-onchip,” Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 26-29. Qiao, L. and Nahrstedt, K. (1998), “Watermarking methods for MPEG encoded video: Towards resolving rightful ownershp,”
Digital Video Watermarking: Techniques, Technology and Trends 313
Proceedings IEEE International Conference on Multimedia Computing and Systems, pp. 276-285. Reid, M.M., Millar, R.J., and Black, N.D., “Second-generation image coding: An overview,” ACM Computing Surveys, vol. 29, pp. 3-29. Robie, D.L. and Merserea, R.L. (2002), “Video error correction using steganography,” Eurasip Journal on Applied Signal Processing, vol. 2002, pp. 164-173. Su, K. (200 l), Digital video watermarking principles for robustness to collusion and interpolation attacks, Master’s thesis, University of Toronto.
Su, K., D. Kundur, D., and Hatzinakos, D. (2002), “A novel approach to collusion-resistant video watermarlung,” Proceedings of the SPIE, vol. 4675, pp. 491-502. Swanson, M.D., Zhu, B., and Tewfik, A.T. (1998), “Multiresolution Scene-based Video Watermarking Using Perceptual Models,” IEEE Journal on Selected Areas in Communications, vol. 16, pp. 540-550. Voloshynovsluy, S., Herrigel, A., Baumgartner, N., and Pun, T. (2000), “A stochastic approach to content adaptive digital image watermarking,” Lecture Notes in Computer Science, vol. 1768, pp. 2 12-236. Wayner, P. (2002), Disappearing Cryptography, Second Edition, Morgan Kaufmann, New York, NY. Wolfgang, R.B., Podilchuk, C.I., and Delp, E.J. (1999), “Perceptual watermarks for digital images and video,” Proceedings of the IEEE, V O ~ .87, pp. 1108-1126.
314 D. Kundur, K . Su, and D. Hatrznakos
Zhu, W., Xiong, Z., and Zhang, Y.-Q. (1999), “Multiresolution watermarking for images and video,” IEEE Transactions on Circuits and Systems for video Technology, vol. 9, pp. 545-550.
Chapter 11 Benchmarking of Watermarking Algorithms Nikolaos Nikolaidis and Ioannis Pitas Benchmarking of watermarking algorithms is a complicated task that requires examination of a set of mutually dependent performance indices (algorithmic complexity, decoding/detectionperformance, perceptual quality). This chapter will try to summarize the basic benchmarlung principles and provide a methodology for deriving the corresponding performance metrics. A review of four benchmarking platforms will be also provided.
1
Introduction
The development of digital media and services created an urgent need for multimedia security and copyright protection techniques. Watermarking has emerged recently as an important copyright protection tool. Watermarking research evolved with a tremendous speed in the last few years (Macq 1999, Cox et al. 2002), leading to numerous publications in the scientific or patent literature (Cox et al. 2002, Katzenbeisser and Petitcolas 2000). Up to now, watermarking performance evaluation and method comparison has been carried out in a non-standardized way, with no concrete supporting evidence. This lack of efficient and reliable performance evaluation procedures can be mainly attributed to the fact that judging the performance of a watermarking algorithm is a complex task that requires taking into account a number of mutually dependent and compet315
316
N. Nikolaidis and I. Pitas
ing performance indices. With the watermarking technology entering into a more mature era, backed up by concrete mathematical foundations, it is about time that a systematic and globally accepted benchmarking methodology will be devised. This development would be a benefit for both the watermarking technology suppliers, who could use the benchmark to fine-tune their algorithms and obtain indications on their method ranking in the watermarking arena, and the technology users, who will have a systematic way to compare existing solutions and pick the one that satisfies their needs in the best possible way. Overall, the establishment of concrete benchmarking foundations would give the watermarking community the credibility that is largely lacking. A number of efforts towards this direction are already underway and have resulted in the introduction of the basic benchmarking principles and the development of a number of benchmarking platforms (Cox et al. 2002, Petitcolas 2000, Petitcolas et al. 200 1, Solachidis et al. 2001, Pereira et al. 2001, CERTIMARK 2002). This chapter will try to address the major considerations that arise when designing a benchmarking system for image watermarking methods and summarize the basic benchmarking principles. A review of four benchmarking platforms, pointing at the pros and cons of each system will be also provided. Discussion will be limited to the so-called robust watermarks, leaving aside benchmarking considerations for fragile or semi-fragile algorithms. We will also assume that the system security stems from the use of a set of secret keys, used to generate the watermarks. Despite the fact that the chapter concentrates on still images, the procedures and metrics discussed in it can also be applied to watermarlung techniques for other digital media (audio, video, 3-D models). Furthermore, the performance metrics and methodology that will be presented can be used for benchmarking both blind and non-blind methods.
Benchmarking of Watermarking Algorithms 317
2
Characteristics of A Watermarking System
A watermarlung system consists of two distinct modules: A module that inserts the information in the host image and a module that checks if a given image hosts a watermark and retrieves the conveyed information. The performance metrics that one can derive for a watermarking system vary according to the type of information that a watermark can convey and the type of detection used. Therefore, before proceeding to describe the benchmarking methodology, a short description of the classes of watermarking algorithms will be provided. With respect to the information conveyed by the watermark, watermarking systems can be classified to one of the following two classes: 0
0
Zero bit systems. Watermarking systems of this type can only test whether an image I hosts a certain watermark W ( K )generated by a key K , i.e. verify whether the image is watermarked or not. The term watermark detection is used in this chapter to denote the procedure used to declare the presence of a watermark when one is indeed present in an image and come up with a “no watermark present” answer when applied on images hosting no watermark or hosting a different watermark than the one under investigation. Multiple bit systems. These systems are capable of encoding a multiple bit message in the image. For systems of this type we make the distinction between watermark detection and message decoding. The image under investigation is first tested to verify whether it hosts a watermark or not. This procedure is identical to the detection procedure described above for zero bit watermarks. As soon as the algorithm declares that the image is indeed watermarked, the embedded message should be decoded. Thus, for multiple bit systems, watermark detection and message decoding should be considered as two distinct steps that are performed in
318
N . Nikolaidis and I. Pitas
cascade, the message decoding step taking place only if a watermark has been found to reside in the image. With respect to the output of the watermark detection procedure, systems are categorized as follows: 0
0
Hard decision detectors generate a binary output (watermark detected, watermark not detected) which usually results by comparing internally the test statistic of the corresponding hypothesis test (see Section 3.3) against a decision threshold. Soft decision detectors provide as output the test statistic itself i.e., a real number that is related to detection reliability. In this case, thresholding in order to reach a binary decision is done in a separate, subsequent step.
In a real application, detectors will most probably be of the hard decision type although the soft decision output (detection test statistic) can accompany the binary output in order to provide an indication of the decision reliability. However, during the development stage of an algorithm, one should look at the detection as a soft decision procedure because, as will be described in the sequel, this perspective allows judging the performance of the algorithm under all possible operating conditions and facilitates final threshold selection.
3
Benchmarking Principles
Ideally, a benchmarking tool should have the ability to highlight the advantages and the weaknesses of the watermarking method under test and allow for easy and efficient method comparison. However, this is not an easy task because it involves examining a set of mutually dependent performance factors. Thus, one cannot come up with a single figure of merit but rather with a set of performance indices.
Benchmarking of Watermarking Algorithms 319
An efficient benchmarking method should quantify and present the interactions among the various performance aspects, e.g., the relation between watermark robustness and perceptual quality.
An important fact that one should bear in mind when dealing with watermarking benchmarking is that watermarlung performance depends on the keys that will be used for embedding and detection and the messages that will be embedded. As a consequence, the performance of an algorithm cannot be judged on results obtained from trials with a single key or message. Multiple trials with a sufficiently = 1... N K } and messages large number of different keys K = { Ki/i M = {Mi/i = ~ . . . N Mshould } be conducted to allow us derive statistical performance metrics, such as the probabilities of false alarm and false detection described in the following Sections. Two other topics require also thorough consideration: the images that will be used for the tests and the attacks that will be applied in the watermarking system. 0
0
Image set. The set of images I = { Ii/i = 1...N I } that will be fed as input to the benchmarlung system should contain images that vary in size and frequency content, since these two factors affect the system performance. Moreover, the type of images in the image set (indoor/outdoor scenes, black and white or color images, synthetichatural images etc.) should be representative of the images encountered in the target application. Finally, experiments involving different watermarlung systems should be based on the same set of images to ensure that the obtained results are comparable. The performance evaluation methodology that will be presented in Sections 3.1-3.5 leads to single-image metrics. Methods for summarizing the results obtained on different images will be discussed in Section 4. Attack set. Since watermarks are expected to be robust to host image manipulations, tests for judging the performance of a wa-
320
N . Nikolaidis and I. Pitas
termarking method when applied on distorted images constitute an important part of a benchmarlung system. The set of attacks available in a benchmarking system should include all operations that the average user or an intelligent pirate can use in order to make the watermark undetectable. It should also include signal processing operations and distortions that occur during normal image usage, transmission, storage etc. Unfortunately, modelling the effect of certain operations on the image quality is not a trivial task. The printing and scanning procedure is a typical example of such an operation. Furthermore, as new sophisticated attacks are developed, the benchmark should allow for hassle-free insertion of user-defined attacks. The benchmarking framework that will be presented in the sequel can handle attacks that aim to render the watermark undetectable. Multiple watermarking, i.e., embedding of a number of different watermarks in the same image can be also considered as an attack within this framework. Attacks that aim towards unauthorized embedding of a watermark in a host medium (e.g. the copy attack), or attacks for unauthorized watermark detection and message decoding do not fit within the proposed framework. The same holds for attacks that are not dealing with the watermarking method per se but with other aspects of a watermarlung-based digital rights management system (e.g. attempts to reverse-engineer the watermarking software to gain access to the secret keys). Among the complete list of attacks available in a certain benchmark, the user should select an appropriate subset A = { A i / i = 1,. .NA} to use in her tests. The list of selected attacks heavily depends on the target application. Thus, a system targeting towards copyright protection of medical images used for diagnostic purposes need not be tested against high compression ratios since the quality of such images should be extremely high. The metrics presented in Sections 3.1-3.5 are evaluated on a single watermarked image that is either distortion-free or affected by a certain attack with fixed attack parameters (e.g. cropping by 25%). Obviously,
Benchmarking of Watermarking Algorithms 321
\ .
Attack,
,Epbeddlrlg
M d,._&
---*
.
.. Module .i.
---c
Detection Raw Decoding’ Results Module
1 Quality Specs :Q 1 Figure 1. Generation of raw results.
testing the system against many different attacks will yield a set of NI x N A performance results. Methods for averaging the results obtained on images distorted by different attacks will be discussed in Section 4. The structure of a benchmarlung system is presented in Figure 1. The mutually dependent parameters that should be taken into account when judging the performance of a watermarking algorithm are described in the following Sections. It should be noted however that, apart from the performance aspects described below, a benchmark might include additional, application-specific, tests. For example, a benchmark can include tests to verify whether a watermark is removable or not, i.e., whether the original image can be obtained from a watermarked copy.
3.1
Algorithmic Complexity
For each watermarlung algorithm one should evaluate the algorithmic complexity of the two modules, namely the watermark embedding module and the watermark detection / message decoding module. The easiest but not the most appropriate way to measure complexity is by recording execution time in a fixed hardwarehoftware suite. Mean, maximum and minimum execution times for the two modules might be evaluated over the set of all keys and messages
322
N . Nikolaidis and I. Pitas
for each host image (distortion-free or affected by a certain attack). It should be emphasized here that for certain algorithms (especially those relying on search procedures to recover the watermark in case of distortions) detectioddecoding complexity depends on the type and severity of attacks imposed on the image under test and, thus, complexity evaluation should be performed on each distorted image separately. Furthermore, embedding and detectioddecoding complexity might depend on the image size and type (watermark embedding and detection might be more time-consuming on a 24-bit color image than on a 8-bit black and white image). Other complexity aspects that might be of interest in certain situations include the algorithm’s memory requirements (for both software and hardware implementations of the algorithm) and the number of gates required for hardware implementation of the algorithm. Obviously, the complexity of a watermarking algorithm can be also evaluated by analyzing the algorithm, provided of course that this algorithm is available to the one that conducts the benchmarks. This approach would result in the theoretical evaluation of the number of required operations (multiplications, additions) and is far more appropriate but impossible to be carried out within the context of an automated benchmark.
3.2
Visual Quality
Since watermarks are supposed to be “invisible”, a benchmarking system should be able to quantify the degree of distortion introduced to an image due to watermarhng and, if possible, indicate whether this distortion is visible or not. Thus, a benchmarlung system should be able to measure the perceptual quality of watermarked images. The most effective way to conduct such measurements is by subjective quality evaluation procedures. Subjects are asked to rate the quality of watermarked images without reference to the originals or the fidelity of the watermarked images, i.e., the relative quality of the watermarked images with respect to the originals. Normally, viewers
Benchmarking of Watermarking Algorithms 323
of watermarked images do not have access to the original image. Quality is thus more important than fidelity. Tests are carried out using standard viewing conditions and procedures, with a suitable rating scale. Naturally, the test setup description should always accompany the test results. The most important parameters involved in a subjective test are the following: 0
0
0
0
Number and skill level of subjects (average viewers, imaging professionals, 'golden eyes', etc) viewing distance background luminance, screen luminance, type of display equipment viewing time, viewing sequence.
Many different subjective testing methodologies exist. In Two Alternative, Forced Choice (2AFC) tests (Cox et al. 2002), subjects are presented with the original and distorted (in our case, watermarked) image without being informed which one is the original. They are then asked to point the image with the highest quality. Random choices (50% correct answers) indicate no visible distortion. In a different version of this test, subjects are presented with the original image II.They are also presented with the original I2 and distorted (watermarked) image I3 without being informed which one is the original and are asked to point which of the 12,I3 is identical to II.Again, random choices (50% correct answers) indicate no visible distortion. In the Double Stimulus Continuous Quality Scale tests (ITU 2002), which are based on ITU-R 500 Recommendation, subjects are presented with the original and distorted image without being informed which one is the original. They rate each image separately in a continuous 1-5 quality scale. Differences in ratings given to the original and the distorted (watermarked) image are used
324
N . Nikolaidis and I. Pitas
to judge the quality of the watermarked image. In ITU-R Recommendation 500 Double Stimulus Impairment Scale tests (ITU 2002), subjects are presented with the original and distorted (watermarked) image, and they are informed which one is the original. They rate the distortions in the watermarked image in a discrete, five-level impairment scale. The mean value of the ratings (Mean Opinion Score) is used as a distortion indicator. Unfortunately, subjective quality tests can not be integrated in an automated benchmarlung system. In such a system, the perceptual quality of the watermarked images should be measured in a quantitative way that correlates well with the way human observers perceive image quality. The easiest way to evaluate the quality of watermarked images is by using metrics like the Signal to Noise Ratio (SNR) or the Peak Signal to Noise Ratio (PSNR), considering the watermark as noise and the host image as signal. However, these metrics exhibit poor correlation with the quality perceived by humans. For example, local intensity alterations (e.g. setting the pixel values on a 3x3 area to white) might be highly visible (and annoying) to humans while having only minor effects on the SNWPSNR values. On the other end, minor geometric distortions (e.g. a rotation by lo)heavily affect S W P S N R values but are practically invisible to humans, especially when the original is not available. Obviously, quantitative measures that correlate better with the perceptual image quality than the widely used SNR and PSNR metrics should be devised. Weighted PSNR (Netravali and Haskell1988, Voloshynovskiy et al. 2001) whch equals PSNR weighted at each image position by the local noise visibility function NVF (local signal activity) could be such a measure. In uniform regions, NVF obtains values close to 1, whereas in edges and textured regions NVF values are close to zero. Other objective metrics based on human visual perception theories have been also proposed, mainly in the compression literature. However no globally agreeable visual quality metric currently exists.
Benchmarking of Watemarking Algorithms 325
Apart from the quality of watermarked images, users of a watermarking system might also be interested in the perceptual quality of the attacked images, since watermark removal attacks are meaningful only if the decrease of the host image quality is within acceptable levels, that is, if they do not render the image useless. This is particularly true for copyright protection and copy control applications where a distorted image with the watermark removed will not be of much value for the attacker. Thus, there is a need for objective perceptual quality metrics for the attacked images, or, equivalent for the perceptual severity of an attack. Objective metrics of this type would correctly characterize a small rotation as an attack with a low impact on the image quality. Unfortunately, all quality metrics currently available would fail to quantify correctly the perceptual impact of attacks that change the image geometry (rotation, shearing, cropping etc.).
3.3
Watermark Detection Performance
Watermark detection can be considered as a hypothesis testing problem, the two hypotheses being: 0
Ho : the image under test hosts the watermark under investigation.
0
H I : the image under test does not host the watermark under investigation.
Hypothesis HI can be further divided into two sub-hypotheses: 0
H I , : the image under test is not watermarked.
0
Hlb
: the image under test hosts a watermark different than the one under investigation.
Thus, detection performance can be characterized by the false alarm (or false positive) error and its corresponding probability P f a i.e.,
326
N . Nikolaidis and I. Pitas
the probability to detect a watermark in an image that is not watermarked or is watermarked with a different watermark than the one under investigation, and the false rejection (or false negative) error, described by the false rejection probability Pfr i.e., the probability of not detecting a watermark in an image that is indeed watermarked with the watermark under investigation. Depending on the application, these two types of errors might have different significance. However one should never neglect the importance of false alarms when benchmarking a watermarking algorithm. To understand this fact one can imagine a detection module constructed so as to always report “watermark detected”. Such a detection function would have Pfr = 0. However its false alarm probability would be 1 and, obviously, the system would be useless. Pfu can be evaluated using detection trials with erroneous watermarks (hypothesis Hlb) or detection trials on non-watermarked images (hypothesis HIu). The former might sometimes be preferable since it corresponds to the worst case scenario. Furthermore, false alarm probability evaluated on images watermarked by a different key than the one used for detection provides an indication on whether the keys in the algorithm keyspace are able to generate distinct “non-overlapping” watermarks, and thus lead to estimates of the “effective” keyspace. One can distinguish between three types of false alarms and false rejections (Cox et al. 2002): those evaluated on a single image using multiple keys, those evaluated on multiple images using a single key and those evaluated on multiple images using multiple keys. In the following we will deal with ways of measuring Pf,, Pfr for the multiple keys - single image case. Combination of results from different images in order to come up with metrics for the multiple keys - multiple images case will be studied in Section 4. In order to estimate P f r ,Pfuone should conduct experiments involving a set of images I, a set of keys K and a set of messages M (to be used later on for the message decoding evaluation). Each image Ii is watermarked in turn with all keys K j and all messages Mk of sets My
Benchmarking of Watermarking Algorithms
327
K. The procedure is repeated for all elements of the set I and a set I" of watermarked images is generated. The cardinality of I" equals NK x NI x NM. Subsequently, the images in I" are distorted using the attack under study and the set I" of attacked images, comprising of of NK x NI x NM elements, is generated. Finally, watermark detection is performed to all images of I". Trials with the watermark Kithat has been indeed embedded in the image 1; and with an erroneous watermark K j , (i # j ) are conducted. Alternatively, one can conduct experiments involving detection of watermark Ki in the original, un-watermarked version of the image under study. Thus, for each image two pairs of detector outputs D", D", for the correct and the erroneous watermark (or the no-watermark) case respectively are extracted. Message decoding is also being conducted along with watermark detection but this procedure will be described in Section 3.4. In the following Sections we will describe the detection performance metrics that one can derive using the sets D", D", for a single image in I". 3.3.1
Hard Decision Detectors
In this case, one can use the number N f aof the erroneously detected watermarks and the number Nj, of the missed watermarks from sets D" and D" to evaluate a pair of Pfa, PfTvalues:
Nf a Pfr = Nf T Pf" = Pel' ID"I where ID1 denotes the number of elements in D (cardinality). Since a single performance index can facilitate method comparison, one can also evaluate the weighted sum Per = plPj, PZPfr.The constants p l , p2 should be selected so as to to reflect the relative importance of Pf,, P f , in a certain application scenario.
+
3.3.2 Soft Decision Detectors In case of soft decision detectors one can use sets D" and D" (now containing real-valued numbers instead of binary values) to derive
328
N . Nikolaidis and I. Pitas
the empirical probability distribution functions (histograms) of the detection test statistic for both hypotheses Ho and Hlb (or HIa). By utilizing these empirical distributions the probabilities of false alarm and false rejection as a function of the detection threshold T can be extracted. Let Tl and T2be the minimum and the maximum values within D", D":
TI = min{D", D"}, T2= max{Dc, D"} Then, for a (sufficiently large) set of discrete threshold values in the interval [TI,T2],Pfa(Tk) and Pfr ( T , )can be calculated:
PfT(Tk) ="DI I, where Dkk = {xi< T, I xi E D"}
ID"I Using P f , (T,), Pfr(T,) we can evaluate the Receiver Operating Characteristic (ROC), i.e., the plot of the probability of false alarm Pfa versus the probability of false rejection PfT.The ROC curve (Figure 2) is the most complete way to describe the detection performance of a soft decision detector since it provides an overall view of the algorithm performance in various operating conditions. Using the ROC curve, one can select the threshold value that gives the desired Pf,, Pfr pair. Having evaluated the ROC, one can also evaluate the following performance measures (Figure 2): 0
Pfafor a fixed, user-defined P f r .
0
PfTfor a fixed, user-defined Pfa.
0
Equal error rate (EER), i.e, the point on the ROC where Pfa=Pf,.
The advantage of these performance indices is that they are singlevalued (scalar) and thus allow easy comparison between algorithms.
Benchmarking of Watermarking Algorithms 329
Figure 2. Detection performance metrics.
Furthermore, they allow checking the appropriateness of an algorithm for a certain application scenario, through the comparison of the metric against a performance threshold (see Section 5). If, for example, a certain application requires a specific P f , value, one can fyc this value and compare two algorithms with respect to the corresponding PfTvalues. Despite its simplicity, the ROC curve evaluation approach presented above has a major drawback; in order to obtain accurate estimates of P f , ( T )and PfT ( T )one has to conduct experiments involving an extremely large number of different keys. This is particularly true for the operating points of most interest for a watermarking algorithm, i.e., the threshold values that correspond to the tails of the empirical distributions, where, for a well-behaved algorithm, the error probabilities might be extremely low and thus very difficult to measure.
330
N . Nikolaidis and I. Pitas Watmrrak riot detected
Watmrirulr detected
'I' Figure 3. False alarm and false rejection probabilities.
A solution to this problem is to fit appropriate distribution models f c ( z )and fe(x) on the experimental data D" and D"and proceed to ROC evaluation using these models. In this case Pfa( T ),PfT(2")can be calculated as follows:
In other words, PfTis given by the area of f c ( z )left of the threshold whereas Pfa is the area of f e ( z )right of the threshold, as it is illustrated in Figure 3. The success of this approach depends on how accurately do the theoretical pdfs model the experimental data. For correlation-based detection schemes and due to the central limit theorem, the experimental data can be sufficiently well approximated by Gaussian pdfs. Other embedding / detection approaches might also allow for theoretical modelling of the detector output distribution. In the context of an automated benchmarking system, where the embedding/detection procedures are not known (black box case), the
Benchmarking of Watermarking Algorithms
331
following approach can be used: apply goodness-of-fit tests (e.g. the Chi-square test or the Kolmogorov-Smirnov test) on the data within sets D" and D" to check whether they come from a certain distribution among a pre-selected set of distribution models (using the same significance level for all tests). According to the tests outputs, select the model that best fits the data, or, if more than one models fit the data, select the one with the highest value of the test statistic. The problem of evaluating the false alarm probability has been also treated in (Miller and Bloom 1999).
3.4
Message Decoding Performance
If the watermarking method supports message encoding, its decoding performance can be characterized by the the Bit Error Rate (BER) i.e., the mean number of erroneously decoded bits. Since message decoding is assumed to take place only in case of successful detection, there is a close relation between the decoding and detection performance. As a consequence, a BER value should only be referenced along with the corresponding detection error probabilities i.e., the probabilities of false alarm and false rejection. In certain applications, the message might consist of various parts, each conveying information of different type and importance. In such a case, BER should be evaluated separately for each part of the message. Moreover if the method incorporates error correction techniques to increase the probability of survival for the encoded message, the decoding performance should be evaluated on the error corrected messages. In order to evaluate decoding performance, a message M iis embedded in every image in addition to the watermark, as already described in Section 3.3. Then, watermark detection is performed to all images of I". As a result of the detection procedure, two sets of decoder outputs B",Be are extracted from the detection trials with the correct and the erroneous watermark respectively. In case of hard decision detectors, a single BER value is evaluated by comparing the message
332
N . Nikolaidis and I. Pitas
Mi that has been embedded in the image with the decoded message
Gifor all messages in B",Be (and not only messages in B"as one
can initially assume) that are associated with successfully detected watermarks (either correct or erroneous). In case of soft decision detectors, BER should be evaluated as a function of the detection threshold T (or equivalently as a function of Pfa or Pf,). This can be done by evaluating for each threshold T the mean number of erroneously decoded bits (BER),for all messages associated with watermarks (either correct or erroneous) that have resulted in a detector output greater than T , i.e., for all successfully detected watermarks. The BER for fixed P f , or fixed P f , can be used as a scalar performance index in this case.
3.5
Payload
Another aspect of the decoding performance of a watermarking algorithm is its payload, which can be defined as the maximum number of bits that can be encoded in a fixed amount of data and decoded with a pre-specified BER or alternatively as the amount of data required to host a fixed number of bits so that they can be decoded with a pre-specified BER.Essentially, payload expresses the number of information bits that can be embedded per host image pixel. Payload evaluation can be performed by embedding messages of increasing length in a fixed amount of data or messages of fixed length to a decreasing amount of data until BER reaches the specified limit. For systems employing hard decision detectors one can use such a procedure to construct a plot of the message length versus the achieved BER,assuming a fixed size for the host data. Such a plot can be used either to find the message length that achieves a BER below a certain threshold (payload evaluation), or the BER that is achieved for a certain message length (BERevaluation for a fixed message length). For soft decision detectors, BER is a function of the detection threshold T . As a consequence, the payload of the method should also be evaluated as a function of the threshold T (or equivalently as a function
Benchmarking of Watermarking Algorithms 333
Table 1. Performance metrics that can be evaluated for each class of watermarking algorithms.
soft decision
hard decision
multiple bit execution time visual quality ROC B E R vs P f , or PfT(fixed message) payload vs P f , or PfT(fixed BER) execution time execution time visual quality visual quality PflZ f‘ T Pf,, PfT B E R (fixed message) payload (fixed B E R )
zero bit execution time visual quality ROC
Y
of P f , or PfT).A way for comparing two soft decision methods with respect to payload is to compare their payloads for fixed P f a .
4
Result Summarization
Obviously, the necessity to deal with multiple performance indices makes watermarlung performance characterization, method comparison and result presentation a complicated task. Table 1 summarizes the performance metrics that one can evaluate for each class of watermarking systems. Using the methodology described in the previous Section, a distinct set of values for the performance metrics detailed in this Table are derived for each watermarked image. The requirement for performance evaluation over various attacks and attack parameters (e.g. for various compression factors) adds one more complexity factor to the situation. Thus, if tests involving a number of attacks are conducted, a different set of values for these metrics (with the exception
334
N. Nikolaidis and I. Pitas
Figure 4. Result summarization and presentation.
of visual quality which refers to the watermarked image and not to the attacked watermarked images) is evaluated for each watermarked image distorted by a certain attack. To state the situation in a more formal way, the methodology presented in Section 3 leads to decoding, detection and complexity metrics for the single attack - single image - multiple keys (or messages) case, i.e., to metrics that refer to a single image from the set I, affected by a single attack. Thus, at the end of the benchmarking procedure, a considerably large number of “raw” result data might be available. Obviously, to reach meaningful conclusions, all these raw results need to be summarized and interpreted (Figure 4). A reasonable way to deal with the situation is to derive multiple plots depicting the relation of the various performance factors. Such plots could depict detection or decoding performance versus perceptual quality (for a certain image and attack), detection performance versus attack strength (for a certain image and fixed visual quality of its watermarked instances) etc. However, this approach does not introduce sufficient information compaction; different plots for each image should be constructed, for example. Among the performance aspects described above, visual quality is the only one that can be directly controlled by modifying the watermark embedding strength. Thus, fixing visual quality to values typical for the application under study and measuring the system performance (and proceed to comparisons) with respect to the remaining parameters can be a way to partially deal with the multidimensional-
Benchmarking of Watermarking Algorithms 335
ity problem. Another way to deal with the overwhelming amount of information that a benchmark can generate is to average (summarize) the results with respect to the two sets of inputs that contribute to the number of generated results, namely the images in the image set and the attacks employed in the benchmark. Using the “raw” results one can proceed from the single attack - single image - multiple keys (or messages) case to the single attack- multiple images -multiple keys (or messages) case. Such a derivation is meaningful only if all images in I are watermarked using embedding strength values that lead to watermarked images having the same perceptual quality. For hard decision decoders, average Pf,, PfTvalues for multiple images can be obtained using a weighted averaging function:
For the above formula to be valid, the same number of keys should Pfrzfor all images. Weights that reflect be used for obtaining Pfat, the probability of occurrence of an image in a certain application scenario can be used. For soft decision decoders, one can generate an average ROC curve for multiple images-multiple watermarks by ( T k ) , Pfrz ( T k ) over all images for each threshold first averaging Pfal value TI,:
i=l
i=I
The above formula is valid only if the same set of discrete threshold values has been used for each image. Furthermore, the number of keys used for obtaining Pfai( T k ) , Pfri ( T k ) should be equal for all ( T k ) , Pfr (T k )one can proceed in evaluating the images. Using Pfa average ROC curve. Summarization of decoding performance metrics (BER) and execution times can be done in an analogous way.
336
N . Nikolaidis and I. Pitas
A similar approach can be used in order to average results over the set of attacks A. Results obtained from different attacks can be summarized in many different levels, using weights that reflect the probability of occurrence of a certain attack on the application scenario under study. One can choose for example to average all results obtained from images distorted by the same attack using different attack parameters. Averaging results obtained from images cropped at different percentages is such a case. Alternatively, one could average all results corresponding to attacks from a certain attack category, e.g. geometric attacks, filtering attacks, compression attacks etc. At the most abstract level one could average results obtained from all imposed attacks to get an idea of the overall robustness of the algorithm. This way one can obtain, for example a single ROC curve or a single Pf,, PfTpair for a set of attacks, and thus judge the overall performance of the algorithm with respect to these attacks. The above procedure can be seen as a progressive information compaction scheme that leads from multiple watermarks-single image-single attack results to multiple watermarks-multiple images-single attack results and further to multiple watermarks-multiple images-multiple attacks results. In certain situations that involve attacks whose impact on the host image varies monotonically with respect to a parameter, it might be sufficient for the user to know only the most severe attack of this type that the algorithm can withstand. For a chosen performance measure and a certain attack the “breakdown limit” of the algorithm for this attack when applied on a certain image can be evaluated. To do so, the attack severity is increased in appropriately selected steps (e.g. the P E G quality factor is decreased in steps of 10) until the detector output does not satisfy the chosen performance criterion. The last (strongest) attack for whxh the algorithm performance is above the selected threshold is the algorithm limit for the selected attack. The breakdown limits with respect to a certain attack, for all images (assuming the same visual quality for all watermarked images) can be
Benchmarking of Watermarking Algorithms 337
combined together to obtain an average breakdown limit. The combination procedure described previously can be also used in this case.
5
Method Comparison and Conformance to A Set of Specifications
The methodology presented so far is sufficient for judging the performance of a single watermarking method. However, one is frequently confronted with the question “Is method A better than method B?” Such is the case for example when a watermarking technology provider needs to rank his method against other existing methods or when a watermarking technology user tries to find the best method for her application. Moreover, in many instances one needs to know whether a certain method is suitable for a specific application, that is, whether the method conforms with a set of application-specific performance criteria expressed through a set of performance thresholds. For example, one might need to choose a method that is capable of performing detection within a certain time limit and can achieve a BER smaller than a given threshold on watermarked images cropped up to a specific extent. In both method comparison and method conformance to a list of requirements, obtaining a clear, binary answer is not trivial at all. Especially when one tries to compare one method against another, it is rather rare to reach a binary decision as one method might be better in a certain performance criterion but perform worse in another. The major difficulty in this case stems from the fact that the performance metrics presented above are interrelated and cannot be considered independently. Thus, two methods cannot be compared with respect to their detection performance unless comparisons are done for the same visual quality of the watermarked images. However, even if one fixes the visual quality, the fact that the detection performance is expressed either by a pair of Pf,, Pfrvalues (hard decision detectors) or by a plot of PI, versus Pfr (soft decision
338
N . Nakolaidis and I. Pitas
detectors) complicates the comparison procedure. For hard decision detectors, a comparison is possible if the two methods have either the same Pfaor the same PfT. In this case, methods can be ranked with respect to the second error probability. If this is not the case, methods can be compared using a weighted average of the two error probabilities as described in Section 3.3.1. For soft decision detectors, if it happens that the two ROC curves do not intersect each other, a global ranlung can be achieved. If t h s is not the case, comparison is still possible by fixing the one error probability and comparing the methods with respect to the other. If, in addition, the methods under study need to be compared with respect to their decoding performance, thmgs become even more complex. As mentioned above, a BER value should only be referenced along with the corresponding detection error probabilities and thus the question “which is the best method?”, especially if their detector is of the soft decision type, has no clear answer. Regardless of the comparison or conformance procedure that will be chosen, all tests in such a context should be conducted with the same experimental settings, that is, with the same set of images, keys, messages, attacks, etc. Furthermore, if comparisons are to be carried out not on the basis of the “raw” results but on results that have been averaged at a certain degree using the approaches described above, the same set of summation weights W should be used for both methods. All the above-mentioned parameters should be chosen after great consideration, according to the target application of the algorithms under study. Additionally, when testing a method for conformance with the requirements of a certain application, thresholds that reflect these requirements in the best possible way should be chosen. An efficient approach for performing such tests is to construct for each application scenario a profile that list all benchmark parameters relevant to this scenario.
Benchmarking of Watermarking Algorithms 339
6
Existing Benchmarking Platforms
6.1
Stirmark
Stirmark (http ://www.cl.cam.ac.uk/fapp2/watermarking/stirmark/ index.htm1) is the first benchmarking software that has been developed (Petitcolas 2000, Petitcolas et al. 2001). A new version (4.0) is currently under development. The source code of the benchmark is publicly available, giving users the opportunity to add their own attacks to those provided by the benchmark (sharpening, JPEG compression, noise addition, filtering, scaling, cropping, shearing, rotation, column and line removal, flipping and 'Stirmark' attack i.e., a combination of slight geometric and intensity distortions). Embedding and detection routines should be provided as benchmark-compatible Dynamic Link Libraries (DLL). The user controls the tests or attacks that will be performed by providing appropriate command files in text format called evaluation profiles. For the moment, one can conduct tests for measuring the influence of the embedding strength parameter on the PSNR of the watermarked image, tests for the evaluation of the embedding time and tests for measuring the influence of attacks on the detection and decoding performance. For the latter, the user specifies for each attack the range of attack parameters that will be used. For each attack parameter within this range, the benchmark performs embedding and detection with a random key and message and reports on the the detection certainty or the bit error rate. Raw results are stored on a text file. In the future, the benchmark will include tests for measuring the false alarm probability, a feature that is not currently available. A web-based client-server architecture will be also implemented. The user would submit his algorithm and evaluation profile to the benchmarking server which will run the tests and present the results through a webpage managed by an SQL server. A first version of this service is now available (http://stirmark.kaist.ac.kr).Support for audio and video watermarking algorithms is among the future plans
340
N . Nikolaidis and I. Pitas
of the developers (currently the benchmark focuses on still images).
6.2
Checkmark
Checkmark version 1.2 (http://watermarking.unige.ch/Checkmark/) (Pereira et al. 2001) can be considered as a successor of the previous Stirmark version (i.e., version 3.1). Apart from Stirmark attacks, Checkmark incorporates a number of new attacks that include wavelet compression (ipeg 2000), projective transformations, modelling of video distortions, warping, copy attack, template removal attack, denoising, non-linear line removal, collage attack, dowdup sampling, dithering and thresholding. Being an open-source Matlab application, Checkmark allows for the inclusion of new attacks. Furthermore, Checkmark implements new objective quality metrics namely the weighted PSNR and the so-called Watson metric and provides a number of “application templates” i.e., lists of attacks related to a certain application. In the future, application templates will support application-specific weighted averaging of results. Despite the major improvements, the basic operating principles of Checkmark are very similar with those of Stirmark 3.1: the user should provide a number of watermarked images and a “detection” executable with a user-defined detection rule. Attacks described in the selected application template are applied in every watermarked image and the detection routine is being called.
6.3
Optimark
Optimark (http://poseidon.csd.auth.gr/optimark/) (Solachidis et al. 2001) is a powerful benchmarking platform that features a graphical user interface and incorporates the same attacks with Stirmark 3.1. Cascades of attacks are also possible. The next version of Optimark which is currently under testing will feature user-supplied attacks in DLL format. The user should supply an embedding and a detectioddecoding executable. Optimark supports both hard and soft decision detectors. The user selects the test images, the range of keys
Benchmarking of Watermarking Algorithms 341
and messages that will be used, the attacks that will be performed and the set of PSNR values for the watermarked images, along with the embedding strengths that the embedding software will operate on to achieve these PSNR values. An option for the automatic calculation of the embedding strength that leads to the selected watermarked image quality is also provided. Then, Optimark launches automatically multiple trials using the selected images, embedding strengths, attacks, keys and messages. Detection using both correct and erroneous keys (necessary for evaluating the probability of false alarm under the worst case scenario presented in Section 3.3) is performed. Message decoding performance is evaluated separately from watermark detection. Raw results are automatically processed by the benchmark in order to provide a number of performance metrics and plots. Depending on the characteristics of the algorithm, Optimark evaluates the following metrics: 1. For zero-bit algorithms employing hard decision detectors:
Pfaand PfT. 2. For multiple-bit algorithms employing hard decision detectors: 0
0
0
Pf,and Pfr. Bit Error Rate and percentage of perfectly decoded messages, for a given message length Payload for a given Bit Error Rate.
3 . For zero-bit algorithms employing soft decision detectors: 0
ROC
0
Equal Error Rate
0
P f , for a user defined Pfr
0
Pfr for a user defined Pf,
N. Nikolaidis and I . Pitas
342
4. For multiple-bit algorithms employing soft decision detectors: 0
0
0
All the detection metrics for zero-bit algorithms employing soft decision detectors, described above. Bit Error Rate and percentage of perfectly decoded messages as a function of the detection threshold (for a given message length). Payload as a function of the detection threshold (for a given Bit Error Rate)
The following complexity metrics are also provided: 0
0
0
Average embedding time. Average detection and decoding time in case the images under test host the watermark under investigation. Average detection and decoding time in case the images under test host a different watermark than the one under investigation.
An option to evaluate the algorithm breakdown limit for a certain attack and a certain performance criterion (e.g. equal error rate or probability of false alarm for a fixed probability of false rejection), is also provided. Results can be summarized in multiple levels. Available options include: 0
Average results over a set of images for a certain attack.
0
Average results over a set of attacks for a certain image.
0
Average results over a set of images and a set of attacks.
Benchmarking of Watermarking Algorithms 343
Result summarization is performed by applying a set of user-defined weights on the results obtained for the selected attacks and images. Result averaging over multiple images is possible only if these images are watermarked with embedding strength values that lead to watermarked images having the same perceptual quality. The parameters of a benchmarlung session can be saved for future use. This feature facilitates the comparison of algorithms under the same conditions. A number of “default” benchmarking templates are also provided. The user can use these templates to benchmark her algorithm with a fixed set of images, image quality specifications (PSNR values) and attacks and obtain a predefined set of performance metrics. It should be noted that Optimark evaluates the performance of algorithms employing soR decision detectors under the assumption that the probability distribution of the detector output can be approximated by a Gaussian distribution. Thus, if this assumption is not sufficiently good for the watermarking algorithm under test, the performance results reported by Optimark might deviate from the actual algorithm performance.
6.4
Certimark
R&D project Certimark (http://www.certimark.org) funded by the European Union developed a client-server, web-based benchmarking platform (CERTIMARK 2002). Its main characteristic is its open architecture that allows for easy integration of new functionalities. The benchmark includes a flexible interface to plug-in watermark embedding and detection software. The user uploads to the benchmarking server an embedding/detection DLL and an Extended Markup Language (XML) file that lists the parameters (key values, embedding strength, etc) required by the embedding and detection routines. It also features an extensive list of attacks and allows users to plug-in (upload) new attacks using a DLL-XML file pair. Uploading of test images to the benchmarking server is also possible. Setup and storage of the parameters for the embedding and detection modules and
344 N . Nikolaidis and I . Pitas
the attacks is done through a web-based graphic interface that is dynamically formatted according to the user-provided XML parameter description files. The user can setup, store and execute a sequence of tests that can include multiple images, multiple embedding and detection instances with different parameters and a set of attacks. The benchmark automatically executes trials involving all possible combinations of these elements. Results for each instance include the original, watermarked and attacked images, the decoded message and the detection result. The benchmark is currently managed by the Competence Center for Applied Security Technology whch will oversee its future improvements and extensions.
6.5
Comparison of The Benchmarking Platforms
Each of the four benchmarking platforms presented above has certain advantages and disadvantages. The fact that Stirmark source code is publicly available makes it ideal for users that want to include their own attacks or tests. However, if one wants to use the benchmark as is, she might find that a number of critical features, i.e., automatic execution of tests with multiple keys and messages, tests for measuring the false alarms and processing of raw results to derive performance plots are missing. The recent introduction of the web-based interface is another step towards the improvement of this benchmark. The situation is more or less similar for Checkmark. Its strong points include its source code availability and the fact that it implements a great number of advanced attacks and a improved image quality metrics. Unfortunately, Checkmark inherits many of the drawbacks of Stirmark 3.1 (no option for automatic execution of multiple embedding sessions using different keys and messages, no evaluation of the false alarm probability, failure to address watermark detection and message decoding separately, no complexity or execution time evaluation).
Benchmarking of Watermarking Algorithms 345
The major advantages of the Certimark benchmark are the detailed and intuitive presentation of the attacwdetection results, its webbased operation and the ease of incorporating new attacks. Despite its significant features, some important functionalities are missing: lack of a mechanism for automated execution of trials with multiple keys and messages, no processing of the raw results in order to obtain statistical performance metrics, lack of visual quality metrics. Optimark is perhaps the most complete benchmarlung suite presented so far. It provides the user with a wealth of performance measures (that include false alarm probabilities) and plots derived on the basis of multiple, automatically launched trials. The fact that the current version of the benchmark does not support user-defined attacks and the use of a simple image quality metric (PSNR) are the two major drawbacks of this benchmark.
7
Conclusions
The basic benchmarlung principles for robust image watermarking algorithms along with a methodology for deriving the corresponding performance metrics have been presented in this chapter. Despite the progress that has been achieved in the area of watermarkmg benchmarking, there are still a number of open theoretical and practical issues that have to be solved. Such an issue is how one can measure very small probability values, like those related with false negatives, false positives and BER, without the need to conduct prohibitively large numbers of trials. Research towards these issues will hopefully lead to efficient benchmarking tools in the near future.
346 N . Nikolaidis and I. Pitas
References CERTIMARK (2002), Certification ing Techniques (EU project http://www.certimark.org.
for WatermarkIST-1999-10987),
Cox, I., Miller, M., and Bloom, J. (2002), Digital Watermarking, Morgan Kaufmann Publishers.
ITU (2002), Methodology for the subjective assessment of the quality of television pictures, ITU-R Recommendation BT.500-11. Katzenbeisser, S. and Petitcolas, F. (2000), Information hiding techniques for steganography and digital watermarking, Artech House. Macq, B. (1999), Identification & protection of multimedia information, Special issue on Proceedings of the IEEE, vol. 87. Miller, M.L. and Bloom, J.A. (1999), “Computing the probability of false watermark detection,” Proceedings of the Third International Workshop on Information Hiding, pp. 146-158. Netravali, A. and Haskell, B. (1988), Digitalpictures, representation and compression, Plenum Press. Pereira, S., Voloshynovsluy, S., Madueno, M., Marchand-Maillet, S., and Pun, T. (2001), “Second generation benchmarking and application oriented evaluation,” Information Hiding Workshop 111, pp. 340-353. Petitcolas, F.A.P. (2000), “Watermarlung schemes evaluation,” IEEE Signal Processing Magazine, vol. 17, no. 5 , pp. 58-64. Petitcolas, F., Steinebach, M., Raynal, F., Dittmann, J., Fontaine, C., and Fates, N. (2001), “A public automated web-based evaluation service for watermarking schemes: Stirmark benchmark,”
Benchmarking of Watermarking Algorithms 347
SPIE Electronic Imaging 2001, Security and Watermarking of Multimedia Contents, vol. 43 14, pp. 575-584.
Solachidis, V., Tefas, A., Nikolaidis, N., Tsekeridou, S., Nikolaidis, A., and Pitas, 1. (2001), “A benchmarking protocol for watermarking methods,” Proc. of Int ’I Con$ Image Processing ’01, pp. 1023-1026. Voloshynovskiy, S., Pereira, S., Iquise, V., and Pun, T. (2001), “Attack modelling: Towards a second generation benchmark,” Signal Processing vol. 81, no. 6, pp. 1177-1214.
This page intentionally left blank
PART I11
Advanced Watermarking Techniques
This page intentionally left blank
Chapter 12 Genetic Watermarking on Transform Domain Hsiang-Cheh Huang, Jeng-Shyang Pan, and Feng-Hsing Wang
An innovative watermarlung scheme based on genetic algorithms (GA) in the transform domain is proposed. It is robust against watermarking attacks, which are commonly employed in literature. In addition, the watermarked image quality is also considered. In this chapter, we employ GA for optimizing both the fundamentally conflicting requirements. Watermarking with GA is easy for implementation. We examine the effectiveness of our scheme by checking the cost function in GA, which includes both factors related to robustness and imperceptibility. Simulation results also show both the robustness under attacks, the effectiveness for progressive image transmission, and the improvement in watermarked image quality with GA.
1
Introduction
With the widespread use of Internet and the development in computer industry, the digital media, including images, audio, and video, suffer from infringing upon the copyrights with the digital nature of unlimited copying, easy modification and quick transfer over the Internet. The readers are suggested to refer to Chapter 1 of this book for the fundamentals in digital watermarking. In this chapter, we concentrate our research topic on image watermarking for copyright protection 351
352
H.-C. Huang, J . 3 . Pan, and F.-H. Wang
in the transform domain with the aid of genetic algorithms. Digital watermarking for images is one way to embed the secret information, or the watermark, into the original image itself to protect the ownership of the original sources (Katzenbeisser and Petitcolas 2000). Typical schemes were based on transformdomain techniques with discrete cosine transform (DCT) (Hsu and Wu 1999), spatial-domain methods (Podilchuk and Zeng 1998), and VQ domain schemes (Lu and Sun 2000), to embed the watermark into certain coefficients in their respective domains. The abovementioned schemes employ the embedding of the watermark into some of the selected coefficients in their corresponding domains, which might be fixed in a pre-determined set of coefficients. One major disadvantage for these typical schemes is that when the attackers dissolve the relationships between the original multimedia and the pre-determined set for the watermark embedding, the watermarking capability for copyright protection no longer exists. Another disadvantage for typical schemes is how to decide and choose the pre-determined set. For watermark embedding in the DCT domain, if we embed the watermark in the higher frequency bands, even though the watermarked image quality is good, it is vulnerable to the low pass filtering (LPF) attack. Thus, embedding in the higher frequency bands coefficients is not robust, although the watermarked image quality is assured. In contrast, if we embed the watermark into the coefficients in the lower frequency bands, it should be robust against common image processing attacks such as the LPF attack. However, embedding in the lower frequency bands will cause the resulting watermarked image quality greatly degrades to compare with the original image. This comes from the fact that the energies of most natural images are concentrated in lower frequency bands, and the human eyes are more sensitive to the noise caused by modifying the lower frequency coefficients. Therefore, aside from the two observations above, some researchers claim to embed the watermarks into the “middle-frequency bands” to serve as a trade-off for watermark
Genetic Watermarking o n Transform Domain 353
embedding in the transform domain (Hsu and Wu 1999). Therefore, from the observations and explanations above, we make use of the genetic algorithm (GA) (Goldberg 1989) to find the optimal frequency bands for watermark embedding into our DCT-based watermarlung system, which can simultaneously improve security, robustness, and image quality of the watermarked image. The readers are suggested to refer to Chapter 3 and Chapter 4 for details about GA. Because the scheme operates in the transform domain, it contains three main parts, including image transformation, watermark embedding, and watermark extraction.
2
The Algorithm
2.1
Embedding The Watermark
Let the input image be X with size ( M x N). Our goal is to embed a robust watermark into DCT of X , and have a watermarked reconstruction X' after optimization. Assuming that the binary-valued watermark to be embedded is W having size (Mwx Nw). We perform the (8 x 8) block DCT on X first and get the coefficients Y .Afterwards, we are able to embed the watermark in the DCT domain. A pseudo-random number traversing method (Proalus 1995) is applied to permute the watermark to disperse its spatial relationships; with a pre-determined key, key,, in the pseudo-random number generating system, we have the permuted watermark Wpby
Wp
=
permute ( W ,key,).
(1)
And we use W p for embedding the watermark bits into the DCT coefficients. Figure 1 illustrates the relationship between the watermark and the permuted one.
354
H.-C. Huang. J.-S. Pan, and F.-H. Wang
W
W P
Figure 1. The watermark rose and the permuted one by using Eq. (1).
Before the embedding procedure, we need to transform the spatial domain pixels into DCT domain frequency bands. We perform the (8 x 8) block DCT on X first and get the coefficients in the frequency bands, Y ,
Y
=
DCT(X),
(2)
and for one non-overlapping (8 x 8) block ( m ,n ) ,
m=l n = l
m=l n=l
The illustrations between the spatial domain pixels and the DCT coefficients in block ( m ,n)are depicted in Figure 2. The DCT coefficients are zigzag ordered for every block.
To embed the binary watermark into the original source, we need to adopt some relationships, or the polarities, P , between Y and W p . Let the 64 DCT coefficients in block (m,n)of X be represented by
To clearly depict the relationships between the DCT coefficients and the representations of conventional terms, qm,n) (0) means the DC coefficient, qm,n) (1)means ACl , qm,n) (2) means AC2, etc. By following this manner, qm,n) (63) means AC63, Vm,n, in Figure 2. The
Genetic Watermarking o n Transform Domain 355
X(0,O) X ( 0 , I ) X(O,2) X ( 0 , 3 ) X ( 0 , 4 ) X ( 0 , 5 ) X ( 0 , 6 ) X ( 0 , 7 )
X(1,O) X ( 1 , l ) X ( 1 , Z )X ( 1 , 3 ) X ( 1 , 4 ) X ( 1 , 5 ) X ( 1 , 6 ) X ( 1 , 7 )
x ( 2 , o ) x ( 2 , 1 ) X ( 2 , 2) X ( 2 , 3 ) X ( 2 , 4 ) X ( 2 , 5 ) X ( 2 , 6 ) X ( 2 , 7 )
X(3,O) X ( 3 , l ) X ( 3 , 2 ) X ( 3 , 3 ) X ( 3 , 4 ) X ( 3 , 5 ) X ( 3 , 6 )X ( 3 , 7 )
X ( 4 , O ) X ( 4 , 1 ) X ( 4 ,2 ) X ( 4 ,3 ) X ( 4 , 4 ) X ( 4 ,5 ) X ( 4 , 6 ) X ( 4 , 7 )
x ( 5 , o ) X ( 5 , 1 ) X ( 5 ,2 ) X ( 5 ,3 ) X ( 5 , 4 ) X ( 5 , 5 ) X ( 5 , 6 ) X ( 5 , 7 )
X ( 6 , O ) X ( 6 , 1 ) X ( 6 , 2 ) X ( 6 , 3 ) X ( 6 , 4 ) X ( 6 , 5 ) X ( 6 , 6 ) X ( 6 , 7j
x ( 7 , 0) x(7,1) X ( 7 , 2) X ( 7 , 3 ) X ( 7 , 4 ) X ( 7 , 5 ) X ( 7 , 6 ) X ( 7 , 7 :
.U- 8 x 8 D C T Y(0)
Y(1)
Y(5)
Y(6)
Y ( 1 4 ) Y(15) Y ( 2 7 ) Y(28)
Y(2)
Y(4)
Y(7)
Y(3)
Y(8)
Y(12) Y(17) Y(25) Y(30) Y(41) Y(43)
Y(13) Y(16) Y(26) Y(29) Y(42)
Y ( 1 1 ) Y(18)
Y(24) Y(31) Y(40) Y(44) Y(53)
Y(10) Y(19) Y ( 2 3 )
Y(32) Y(39) Y(45) Y(52) Y(54)
Y(9)
Y(20) Y(22) Y(33) Y ( 3 8 ) Y(46) Y(51) Y(55) Y(60)
Y(21) Y(34) Y ( 3 7 ) Y ( 4 7 ) Y(50) Y(56) Y(59) Y ( 6 1 )
Y(35) Y(36) Y(48) Y(49) Y(57) Y(58) Y(62) Y(63)
Figure 2. The illustrations between spatial-domain pixels X(m,nland the DCT coefficients Y(m,nl. The subscripts (m,n ) are omitted for convenience. These denotations also apply to Eqs. (5), (6) and (7.b).
356
H.-C. Huang, J.-S. Pan, and F.-H. Wang
frequency set, F , will be chosen randomly to embed W p . For each (8 x 8) image block, only (64 x MW'Nw M . N ) coefficients will be included in the frequency set, F , and it can be expressed by M -
F =
N -
fiu 8
m=l n=1
wherek
(
)
= 1 , 2 , . . - , 6 3 a n d i= 0 , 1 , . - . , 64'%wwI;Nw - 1are the ran-
(
domly designated 64'%wwI;Nw) bands in the frequency set out of the 63 DCT coefficients (except for the DC coefficient) in each block. However, randomly selecting the bands might cause the degradation in watermarked image quality and its robustness. Therefore, we employ GA to choose the DCT coefficients in F for W ' under certain attacks in every training iteration. The block diagram for illustrating watermark embedding with GA is shown in Figure 3. Once the bands in the frequency set are selected in the training process, we designate the mapping between i and k in Eq. (4)as the secret key, key,. Next, we generate a reference table, R = R ( i ) ,i E F , with the DCT coefficients Y by using the factors between the DC and AC coefficients, which is denoted by
We use the DC value of every block, e.g.,ym,n) (0) in q,,n) of block ( m ,n),as a reference value, and produce the relationships among the DC value of one block, the current AC coefficients, and the reference table for further operation with Wp.Consequently, we can calculate the polarities of the DCT coefficients in the frequency set by
Genetic Watermarking o n Transform Domain 357
Optimized Watermarked Yes
Embedding with F-Table
4
t Selection
Attacks
Xtemp
+
t I
Evaluation Cost
--l Watermark Extraction
Calculation
I-
1
NC
Calculation
Figure 3. The block diagram for watermark embedding with GA.
In Eq. (S), i E F , and @ means the exclusive-or (XOR) operation. Y remain unchanged if i $! F . Thus, the DCT coefficients after
358
H.-C. Huang, J.-S. Pan, and F.-H. Wang
embedding the watermark become
After embedding with the polarities in every GA iteration, we are able to perform inverse DCT on Y ’ ,and get the watermarked image of the current iteration after performing inverse DCT,
Xlemp = inverse-DCT ( Y ’ ) .
(10)
and extract the waterNext, we apply the attacking schemes on Xtemp with their normalized cross-correlation (NC) valmarks from XILm, ues. With the building blocks of GA, we need to assign the cost function in the Zth iteration, fl, with
where p and Xh,l are the number of attacking schemes and the weighting factor for the NC values, respectively. In Eq. (1 l), PSNRl plays the role of imperceptibility measure, while NCh,lplays the role of robustness measure. Because the PSNR values are dozens of times larger than the associated NC values in the GA cost function, we need to magnify the NC values with the weighting factors Xh,l in the cost function to balance the influences caused by both the imperceptibility and robustness requirements (Gen and Cheng 1997). After completing the number of iterations, we output both the watermarked image, X ’ , and the secret key from Eq. ( 5 ) , key,, and transmit them to the receiver over the Internet or the wireless channels, which are depicted in Figure 3. For transmitting the secret key, key,, we can integrate cryptography with our watermarking schemes to protect the copyright in an open network such as the Internet (Piva et al. 2002).
Genetic Watermarking o n Tbansform Domain 359
Completing the whole procedures in each iteration shown in Figure 3, we feedback the watermarked image with the largest cost value for further training with the mutation, of the current iteration, Xtemp, crossover, and selection procedures in GA (Pan et al. 2001), (Shieh etal. 2003).
2.2
Extracting The Watermark
In extracting the embedded watermark from the watermarked image, we calculate the DCT of the watermarked image after attacking Y ’ , in the optimized X ’ , with the reference table in Eq. (9,
By gathering the extracted bits in Eq. (12.a), we have the extracted, permuted watermark,
Finally, we use key, to acquire the extracted watermark W’ from
w-, W’
3
=
inverse-permute (Wf,,key,) .
(13)
Simulation Results
In our simulation, we take the well-known test image, lena, with size (512 x 512), as the original source X , whch is also presented in Figure 4(a). We have the embedded watermark W , rose, with size (128 x 128), shown in Figure 4(b), in our simulations. Hence, ~ 6= 4 4. Next, the number of bits to be embedded in one block is taking the frequency set in (Hsu and Wu 1999), or ACI4,ACI5,AC16,
360
H.-C. Huang, J.-S. Pan, and F.-H. Wang
AG7, to be the initial F . That is, F
=
{~m,n)(14),~m,n,(15)
16),I$,,n) (27)}, Vrn, n, in our simulations. We perform two sets of simulations, namely, the conventional scheme, and the progressive transmission scheme, in the next two sub-sections.
3.1
The Conventional Scheme
We apply three attacking methods, namely, low-pass filtering, median filtering, and JPEG compression with quality factor SO%, for simulating the conventional watermarking scheme, then we have p = 3. Afterwards, the resulting PSNR of the watermarked image, and the three NC values after attaclung, work together to evaluate the , I, cost function. And we perform several different values of X ~ , JVh, in Eq. (1 1) in our simulation.
(a)
(b)
Figure 4. The original image and the watermark for simulations in this chapter. (a) lena,with size 512 x 512. (b) rose,with size 128 x 128.
Genetic Watermarking o n Transform Domain 361
The simulation results with our algorithm are depicted in Figs. 5, 6, and 7, also tabulated in Tables 1 and 2. Figure 5 represents the watermarked image at the 200thiteration in GA with the PSNR of 34.79 dB. Figure 6(a), (b), and (c) show the extracted watermarks for the Oth iteration with the initial F , and Figure 7(a), (b), and (c) demonstrate their corresponding ones in Figure 6 at the 200thiteration. We can observe the improvement in both the watermarked image quality and the NC values after certain attacks with the aid of GA. The selection of X in Eq. (1 1) would influence the watermarked image quality and the robustness at the final output if we change the value of X when we compare the results in Tables 1 and 2.
Figure 5. The watermarked image after 200 iterations in GA training. PSNR = 34.79 dB.
From the data in Table 1 and Table 2, we find that both the PSNR and NC values increase with the increase in every 50 iterations. Generally speaking, all the parameters in the cost function in Eq. (1 1) have a tendency to increase. We also represent the relationships among
362
H.-C. Huang, J . 3 . Pan, and F.-H. Wang
NCI
= 0.5300
(a>
NC2 = 0.5762 (b>
NC3 = 1.0
(4
Figure 6. The extracted watermarks and the NC values of the proposed algorithm under various attacking methods with X = 10 for the Oth iteration. (a) LPF attack. (b) MF attack. (c) JPEG attack.
NC1 = 0.7426 (a>
NCB = 0.7947 (b)
NC3 == 1.0 (c>
Figure 7. The extracted watermarks and the NC values of the proposed algorithm under various attacking methods with X = 10 for the 200th iteration. (a) LPF attack. (b) MF attack. (c) JPEG attack. Table 1. The PSNR and NC values under different GA iterations with the weighting factor X = 10 in Eq. (1 1).
I
I
I
I
I
I
Genetic Watermarking o n Transform Domain 363
Table 2. The PSNR and NC values under different GA iterations with the weighting factor X = 30 in Eq. (1 1).
NC3 (JPEG)
1.o 1.o 1.o 1.o
1.o the watermarked image quality, extracted NC values with the number of every iteration in Figure 8. We can observe that in Figure 8(a), the PSNR curve is a non-decreasing function with the increase in the number of iterations. One interesting phenomenon is that the NC curves in Figure 8(b) and (c) are not non-decreasing, i. e., the NC values decrease in some range of iterations. This might come from the fact that GA chooses the best individuals in one iteration according to the largest cost value in the cost function. As we observed from Eq. (1 l), the cost function is composed of the four parameters, including the watermarked PSNR and the three NC values after intentional attacks. With the fundamental knowledge of GAYwe are sure that the cost value is a non-decreasing function with the increase in iterations. Hence, if we acquire a larger increase in the watermarked PSNR, a smaller loss in the NC values will still produce a larger or the same cost in the next iteration. In our algorithm, the bands for the watermark to be embedded, F!m,n)(i)}, differ from one block to another. Therefore, from a statistical point of view, we record the number of occurrence in the 63 embedded frequency bands in all the blocks within our test image,
{
364
H.-C. Huang, J.-S. Pan, and F.-H. Wang
Figure 8. GA watermarking on lena. (a) Watermarked image quality (in PSNR) vs. number of iterations. (b) Extracted watermark under LPF attack (in NC) vs. number of iterations. (c) Extracted watermark under MF attack (in NC) vs. number of iterations.
Genetic Watermarking o n Transform Domain 365
and the hstogram is shown in Figure 9 for the test image lena. From the simulation data with X = 10, we observe that Y ( 6 ) Y , (9), Y(11),and Y(12) are the four bands to acquire the best cost value. A simple application of the best bands that we acquire is to replace the random selection of embedded bands in (Hsu and Wu 1999) by the best bands after GA training. In Table 3, the PSNR and NC values with the method in (Hsu and Wu 1999) are inferior to our results with the best bands after GA training. If there are limitations in practical implementations of our algorithm, we can directly use the frequency bands with the largest number of occurrences for embedding the watermark into every block of the test image, that is, we embed the watermark into the same frequency bands of every (8 x 8) block. Although the simulation results are not so good as to compare with those after GA training, they are better than the results with random selection of frequency bands. The histogram lor best band Selection with lena and 1 = 10 1100
Figure 9. The best band vs. number of occurrence plot with X = 10 for the whole watermarked image. We observe that ACs, ACg, AC11, and AC12 are the top-four best bands with the best cost values for embedding the test image lena.
366
H.-C. Huang, J . 3 . Pan, and F.-H. Wang
Table 3. The PSNR and NC values comparisons between the best bands training with X = 10 and the bands in literature (Hsu and Wu 1999).
Bands
PSNR (dB)
NC1 (LPF)
NC2 (MF)
NC3 (PEG)
in literature
'(14)' '(15) Y(16), Y(27)
0.5261 0.5759
1.O
Our results
y(6)y 32.52 0.6624 0.7134 Y ( I I ) , Y(12)
1.0
30.19
Making comparisons for the test image l e n a with different weighting factors X in Eq. (1 l), we choose X = 1,15,25, and 200 and acquire the histograms for band selection, in addition to the one with X = 10 in Figure 9. The distributions in those histograms are quite similar, in addition, the top-four selections for embedding the bands are all AC6, AC9, ACll, and AC12. This means that after a number of iterations in the GA training process, no matter what the value of X is, the best bands for embedding the watermark with the same test image are identical. From another point of view, we can make use of other test images in our simulation. We use the two test images pepper and baboon to obtain the best bands for embedding the watermark and comparing with the results of lena. Figs. 10 and 11 illustrate the band selection histograms for pepper and baboon with X = 10, respectively. We observe that for pepper, the best bands are ACg, AClo, AC11, and AC12, while the best bands for baboon are ACg, AClo, AC11, and ACI2. Although the top-four best bands for pepper and baboon are the same, the distribution in both the histograms differ a lot. In addition, owing to the fact that baboon contains more components in high-frequency bands than the other two test pictures, the occurrences for AC6, AC7, AC8, and AC17 are also comparable to the topfour components with the test image baboon. Hence, the bands to be embedded differ from one original image to another.
By carefully selecting the weighting factors among the test images,
Genetic Watermarking o n Transform Domain 367
and the simulation results represented in this section, we can observe that the best bands after GA training conform with the heuristic concepts to embed the watermark in the “middle-frequency bands”. The histogramfor k s t band ssleclon with p p p r and A = 10
700
I
1
0
200 100
-0
I
10
20
30
40
50
60
The AC mefkisnts
Figure 10. Band selection histogram for pepper, X = 10. We observe that ACg, AClo, AC11, and AClz are the top-four best bands with the best cost values for embedding the test image pepper.
3.2
The Progressive Transmission Scheme
People nowadays retrieve digital multimedia items, especially digital images, through the World Wide Webs. Users might want to view part of the image while the image transmission is in progress. To meet both the interest for progressive transmission to the users, and the requirements for intellectual property protection, we discuss a progressive watermarking technique with the aid of GA based on the methods depicted in Sec. 2. Aside from the conventional scheme, by extending the concepts in Sec. 3. I, we can employ the progressive transmission scheme in JPEG into our transform-domain geneticbased watermarking system with some modifications by sending the watermarked DCT coefficients to the receiver. We use the JPEG
368
H.-C. Huang, J.-S. Pan, and F.-H. Wang The histogram for best band selndion with babwn and A = 10 700
0' 10
20
30
40
50
60
The AC caeffidentr
Figure 11. Band selection histogram for baboon, X = 10.We observe that ACs, AClo, AC11, and AC12 are the top-four best bands with the best cost values for embedding the test image baboon.
spectral selection mode for progressive transmission in our simulation (Huang et al. 2003). As shown in Sec. 3.1, the extracted watermarks survive well under the JPEG attack. Thus, in this section, we choose the LPF and MF attacks only in the GA training process. The cost function format in Eq. (1 1) remain unchanged; the only differences are that the number of attacking schemes change to p = 2, and we set Xh,l = 40 in the simulations. By using the spectral selection mode in JPEG progressive coding, we group the DC coefficient of every block, send to the receiver, and all the DC coefficients are served as stage 0. Next, we send all the first AC coefficients, or AC1 in every 8 x 8 block, which are served as stage 1. We follow this manner until stage 63. The test image employed is also l e n a with size 512 x 512. To compare with the existing results in (Chen and Chen 2000), the watermark, shown in
Genetic Watermarking o n Transform Domain 369
Figure 4(b), has size 128 x 128 in our simulations, whose capacity is much larger than the one in (Chen and Chen 2000), with size 8 x 8. Moreover, with our watermark which represents a flower, we can be easily recognized the meaning with the increase in the received stages. We keep the DC coefficient in DCT unchanged, and embed the watermark into the other 63 AC coefficients. After training for 200 generations in GA, we obtain the coefficients to be embedded in key,, and output the watermarked image X ’ . key, can be securely transmitted to the receiver with the methodologies provided in (Piva et al. 2002). By applying the spectral selection mode in P E G , we transmit the watermarked image progressively. After receiving the stages and perform inverse DCT to obtain the watermarked image, it is subjected to the LPF and MF attacks. Simulation results show both the increase in PSNR and NC values while receiving the progressively transmitted image. In Figure 12, under no attacking schemes, extracted watermarks become more and more recognizable with the increase in the received stages. Moreover, when all 64 stages are received, the extracted watermark is identical to the embedded one because NC = 1. In Figure 13, the PSNR value in the watermarked image increases monotonically in our scheme, and the resulting watermarked image quality is 34.97 dB after receiving all stages. With the conventional scheme in (Hsu and Wu 1999) to embed the watermark into the AC coefficients of {Y(14), Y(15), Y(16), Y ( 2 7 ) } ,the PSNR value drops at these stages after receiving progressively. The resulting watermarked PSNR is 30.35 dB after receiving all stages. In Figure 3, the NC value increases monotonically with our scheme; with the scheme in (Hsu and Wu 1999), because the watermark is embedded into {Y(14),Y(15),Y(16),Y(27)},NCvalue increases suddenly at these stages when no attacking applied. We also simulate the LPF and MF attacks for progressive watermark-
370 H.-C. Huang, J . 3 . Pan, and F.-H. Wang
Transmitted stages
DC + 2 AC DC + 4 AC DC + 7 AC coefficients coefficients coefficients
Extracted watermark
NC
Transmitted stages
0.5315 (a)
0.5724 (b)
0.6823 (c>
DC + 12 AC DC + 20 AC DC + 63 AC coefficients coefficients coefficients
Extracted watermark NC
0.8990 (d)
1.o
0.9899 (el
(f)
Figure 12. The extracted watermarks after receiving different number of coefficients of the watermarked image.(a) Stage 2. (b) Stage 4. (c) Stage 7. (d) Stage 12. ( e ) Stage 20. ( f ) Stage 63.
ing. Figure15 and Figure 16 illustrate the NC comparisons between our scheme and the scheme in (Hsu and Wu 1999). As shown in both figures, NC values with our scheme outperform those in literature under the two types of attacks. In addition, in Figure 17 and Figure 18, we depict the extracted watermarks with our scheme under LPF and MF attacks. With our experiences, the extracted watermarks become recognizable after the threshold NC 0.7. Extracted watermarks become recognizable after receiving stage 12 with our scheme. In contrast, the extracted watermarks with schemes in (Hsu and Wu 1999) cannot survive LPF and MF attacks because they cannot be recognizable after receiving all 64 stages.
>
Genetic Watermarking o n Transform D o m a i n 371 The PSNR values in watermarked image increase wiih the increase of reCeivsd stager
36
I
Figure 13. The watermarked PSNR values increase with the increase of received stages.
The extracted NC values increase with the increase of received stages
10
20
30 40 Number of transmitted rlager
50
60
Figure 14. The extracted NC values increase with the increase of received stages.
372
H.-C. Huang, J.-S. Pan, and F.-H. Wang
0.95
-
8 .Training with GA CanvenlionalScheme
-
10
20
30 40 Number of transmitted stages
50
60
Figure 15. The NC comparisons of GA-training and the schemes in (Hsu and Wu 1999) under LPF attack.
The COmpatiSOnSof GA-Laining and Conventional schemes under MF attack
-
1
8- Training wiul GA Conventionalscheme 0.95
0.85
-
8
0.75
-
H=
0.7
-
0.65
-
P
6
0.5
10
20
30
40
50
60
Number of transmitted slages
Figure 16. The NC comparisons of GA-training and the schemes in (Hsu and Wu 1999) under MF attack.
Genetic Watermarking o n Transform Domain 373
Transmitted stages
DC + 2 AC DC + 4 AC DC + 7 AC coefficients coefficients coefficients
Extracted watermark NC Transmitted stages
0.5485 (a>
0.5743 (b)
0.6188 (c>
DC+12AC coefficients
DC+20AC coefficients
DC+63AC coefficients
0.7059 (dl
0.7190
0.7191
(4
(0
Extracted watermark NC
Figure 17. The extracted watermarks after receiving different number of coefficients of the watermarked image after LPF attack. (a) Stage 2. (b) Stage 4. (c) Stage 7. (d) Stage 12. (e) Stage 20. ( f ) Stage 63.
4
Summary
A genetic and robust algorithm for DCT-based watermarking has been presented in this chapter. It is robust because we make use of GA to train the frequency set for embedding the watermark. In addition to the robustness of the proposed algorithm, we also improve the watermarked image quality with the aid of GA. Furthermore, the proposed algorithm extends to the application for the progressive watermarking scheme. Simulation results reveal that if we just borrow the concepts of existing algorithms, both the watermarked image quality and the NC val-
374 H.-C. Huang, J.-S. Pan, and F.-H. Wang
Transmitted stages
DC + 2 AC DC + 4 AC DC + 7 AC coefficients coefficients coefficients
Extracted watermark NC
0.5299
(4 Transmitted stages
0.5591 (b)
0.6290 (4
DC + 12 AC DC + 20 AC DC + 63 AC coefficients coefficients coefficients
Extracted watermark NC
0.7495 (d)
0.7720 (e)
0.7724
(0
Figure 18. The extracted watermarks after receiving different number of coefficients of the watermarked image after MF attack. (a) Stage 2. (b) Stage 4. (c) Stage 7. (d) Stage 12. (e) Stage 20. ( f ) Stage 63.
ues of the extracted watermarks after certain attacks will be poor. We also simulated the progressive watermarking scheme to show the effectiveness for watermarking based on GA. Hence, GA offers a systematic way to consider the improvements of the cost functions. With the simulation results under a variety of attacking techniques, we are able to claim its robustness and superiority over the existing algorithm with the proposed techniques. In comparison with the existing methods, embedding with our scheme can get better-watermarked image qualities and the higher NC values in the extracted watermarks.
Genetic watermarking o n Transform Domain 375
References Chen, T.P. and Chen, T. (2000), “Progressive image watermarking,” IEEE Int’l Conf on Multimedia and Expo, pp. 1025-1028. Gen, M. and Cheng, R (1997), Genetic algorithms and engineering design. John Wiley & Sons, New York, NY. Goldberg, D. E. (1989), Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Reading, MA. Hsu, C.-T. and Wu, J.-L. (1999), “Hidden digital watermarks in images,” IEEE Trans. Image Proc., pp. 58-68. Huang, H.-C., Pan, J.S., Shieh, C.S., and Wang, F.H. (2001), “Progressive watermarlung techniques with genetic algorithms,” 37th Carnahan Int ’I Conf on Security. Katzenbeisser, S. and Petitcolas, F. (2000), Information hiding Techniquesfor steganography and digital watermarking. Artech House, Nonvood, MA.
Lu, Z. M. and Sun, S. H. (2000), “Digital image watermarking technique based on vector quantisation,” Electron. Lett., pp. 303-305. Pan, J.S., Huang, H.-C., and Wang, F.H. (2001), “Genetic watermarlung techniques,” Fifth Int ’I Conf on Knowledge-Based Intelligent Information Engineering System & Allied Technologies, pp. 1032-1036. Piva, A., Bartolini, F., and Barni, M. (2002), “Managing copyright in open networks,” IEEE Internet Computing, pp. 18-26. Podilchuk, C. I. and Zeng, W. J. (1998), “Image-adaptive watermarking using visual models,” IEEE Journal on Selected Areas in Commun., pp. 525-539.
376
H.-C. Huang, J . 3 . Pan, and F.-H. Wang
Proalus, J. G. (1995), Digital communications, 3rd ed. McGraw-Hill, New York, NY. Shieh, C.S., Huang, H.-C., Wang, F.H., and Pan, J.S. (2003), “Genetic watermarking based on transform domain techniques,” Pattern Recognition (accepted).
Chapter 13 Genetic Watermarking on Spatial Domain Feng-Hsing Wang, Lakhmi C. Jain, and Jeng-Shyang Pan
A spatial domain based watermarking scheme combined with genetic algorithms (GA) is introduced in this chapter. With the genetic training techniques for P E G attack, the quality of the watermarked image and the robustness under the P E G attack can be improved.
1
Introduction
Our focus in this chapter is to employ the optimum t e c h q u e into the spatial-based watermarking scheme for having better performance. Among the optimum techmques, genetic algorithm (Holland 1975, Goldberg 1992) is a popular and well-known one, which has been applied successfilly upon many subjects and has been proved its performance. Hence, in this chapter, we will introduce the genetic algorithm (GA) into the spatial domain based watermarking system and show how the performance can be improved to the reader. With the employment of genetic algorithm, we expect the spatial-based watermarking scheme will have better performance, not only in watermarked image quality, but also in robustness. In this chapter, we will begin with illustrating a simple spatial-based watermarking scheme, and then follow by introducing the use of genetic algorithm upon the watermarking system. Some examples
377
378
F.-H. Wang, L. C.Jain, and J . 3 . P a n
and experimental results will be demonstrated, and end by the discussion and conclusion section.
2
Watermarking Scheme without GA
To make the reader understand how to employ the genetic algorithm into a spatial-based watermarking system, the illustration of a spatial-based watermarking system (Pan et al. 2001) will be mentioned first.
2.1
Embedding Procedure
In a spatial-based watermarking scheme, usually the first step is to pick up some pixels fi-om the host image by referring a user-selected seed. With the seed, the random number generator will be able to generate two sequence streams X and Y:
x = { X i ( X iE (0 - M -1};O
Ii < Lw},
(1)
Y = {yi1 y i E {0 N-I};O I i < Lw},
(2)
-
where M and N are the width and height of the host image, and L, is the length of the watermark bits. X and Y will be seen as the coordinates, and the pixels (P)at positions (X,Y) of the host image will be used for embedding the watermark bits ( W) w i t h . The definitions of P and Ware:
W = {WiI wi E [0,1];0 I i < L,}.
(4)
To embed the signal of Winto P,the simplest method is to add W into P directly. But here we do not suggest so because when noise
Genetic Watermarking o n Spatial Domain 379
happens, the watermarked signal is very easy to be affected, which means the robustness is not good. Hence, we take the surrounding pixels of P into account and introduce the method below. be the i-th pixel ofP, and w ibe the watermark bit to Let pi(xi,yi) be embedded within pi(xi,y , ) . We calculate the mean value ,U of the surrounding pixels of p i(xi, yi) by:
where q ( j , k ) is the surrounding pixel of pi(xi,yi). The defhtion of the surrounding pixels of a picked up pixel is illustrated in Figure 1
the i-th pick-up pixel Pi(xi,Yi)
Figure 1. Definition of the surrounding pixels.
Then, with a user-selected
4 we can embed wi into
obtain the watermarked pixel p l ( x , ,y i ) by:
pi(xi,yi) and
380
F.-H. Wang, L. C. Jain, and J.-S. Pan
Here we can choice to have better quality or better robustness by controlling the value of S: A smaller Srneans the quality of the watermarked image will be better, but on the other hand the robustness will become weaker. After all the watermark bits have been embedded, we merely put all the watermarked pixels back to their original positions of the host image. Finally, a watermarked image is generated.
2.2
Extraction Procedure
To extract the hidden watermark signal from the watermarked image, the original image is not required in this watermarking scheme. The same as the first step in the embedding procedure, the first step of the extraction procedure is to pick up some pixels from the received image, whch can be done by referring to the same seed that used in embedding procedure. With the picked-up pixels, we can calculate the mean value of the surrounding pixels for each picked-up pixel by using Equation (5). Then, Equation (7) is used for determining the extracted watermark bit wi as bit-0 or bit- 1.
,
w;= {
0, if Pl(X,,Y,) < ru; 1, otherwise,
(7)
We repeat the procedure until all the pick-up pixels have been processed. Here we denote W’ as the extracted watermark bits and W ’ = {w;I w;E [0,1]; 0 5 i < L,}.
Genetic Watermarking o n Spatial Domain 381
2.3 Performance To demonstrate the performance of this scheme, we used the famous test images LENA and PEPPER (both in gray-valued, 512x512 pixels) as the test images, and ROSE (128x128 pixels, binary-valued, see Figure 2) as the watermark.
Figure 2. The watermark used in experiments.
We applied the peak-signal-to-noise ratio (PSNR) for evaluating the quality of the watermarked image, and the bit correct ratio (BCR) to evaluate the robustness. The definitions of PSNR and BCR are:
[
2S2 PSNR(0,O’) = 10xlog MSE(O,O’))’
BCR(W ,w ’) = (I -
) x 100%.
i=l
LM where MSE is mean square error, 0 = { p ( i , j ) I 0 I i < M ,O I j < N } and 0’= { p’(i, j ) I 0 I i < M ,O I j < N } are the original image and the watermarked image respectively.
382
F.-H. W a n g , L. C. Jain, and J.S. P a n
For testing the robustness, we employed P E G compression with different quality factors (Q) to attack the watermarked images. The reason for applymg JPEG compression as the attacking hnction is due to the popular of transmitting JPEG images under the Internet. If we can extract out a watermark with a hgher BCR value from the attacked image, it means the robustness under the attacking function is better; otherwise, the system has poorer resistance under the attacking hnction. Figure 3 displays the extracted watermarks and Table 1 lists the BCR values whde using LENA as the test image. Figure 4 and Table 2 show the extracted results while using PEPPER as the test image.
(b)
(4
(c)
Figure 3. The extracted watermarks under the P E G attacks while using LENA as the test image and &15: (a) Q=70%, (b) Q=SO%, and (c) Q=90”/0. .
(a)
.
(b)
(c)
Figure 4. The extracted watermarks under the JPEG attacks while using PEPPER as the test image and 6=15: (a) Q=70%, (b) Q=SO%, and (c) Q=90%.
Genetic Watermarking o n Spatial Domain 383
Table 1. PSNR values and BCR values with different d6 under JPEG attacks while using LENA as the test image
Table 2. PSNR values and BCR values with different d6 under P E G attacks while using PEPPER as the test image.
3
Watermarking Scheme with GA
In t h s section, we wdl introduce how to employ genetic algorithms into the mentioned spatial-based watermarking scheme. We will define an object function and use GA to optimize t h s function. After that, the GA trained result will be seen as a secret key and will be used in the embedding procedure and extraction procedure. Here we assume the reader has the basic background about GA. (More details about GA can be obtain in Chapter 3.)
3.1
GA Training Procedure
3.1.1
General Steps
The steps for applying GA into watermarking system can be summarized as below, and the block diagram is shown in Figure 5. The details about each step will be mentioned later.
384 F.-H. W a n g , L. C. Jain, and J.-S. P a n
O
PSNR Calculation
-+ Evaluation
f Selection + -+
i 4 e a
Step 1: Generate d keys randomly as the initialized GA individuals. Step 2: Execute the embedding procedure with the keys one by one. (d watermarked images will be generated after this step.) Step 3: Calculate the PSNR value for each watermarked image. Step 4:Apply the attack function upon the watermarked images one by one. (d attacked images will be generated then.) Step 5: Use these attacked images as the input images for the extraction procedure and extract out the watermarks. (d watermarks will be extracted out.) Step 6 : Calculate the BCR value for each individual. Step 7: Calculate the fitness score for each individual. Step 8: Select the individual with the best fitness score and backup the mformation about this individual. Step 9: Stop the training procedure if the GA iteration t = tfin,,
Genetic Watermarking o n Spatial Domain 385
Step 10:Execute the crossover procedure to generate the new individuals for next generation. Step 11:Execute the mutation procedure upon each individual Step 12:Set t=t+l and repeat Step 2 to Step 12.
3.1.2 Preprocessing Before executing Step 1, we suggest to decompose the host image into several non-overlap blocks. The first reason is to avoid embedding too many watermark bits in one location of the host image. Usually, it makes some image-cropping attacks have higher probability to destroy more watermark signal. For example, if some of the selected pixels are gathered too close, such as Figure 5 (a), the robustness under image-cropping attacks will become weaker, such as Figure 5 (b). If the selected pixels are spread in whole image, such as Figure 5 (c), the affection of image-cropping attacks will become more stable. The second reason for dividing the host image into blocks is to shorten the training time of GA. Bigger block in size usually means we have to spend more search time for obtaining a better pixel set. After decompose the host image into b blocks, we select n pixels from each block and embed n watermark bits w i t h the selected pixels. For convenience, we set b x n = L, .
386
F.-H. Wang, L. C. Jain, and J.-S. Pan
.
. ... .
I
Figure 6. The affection about the distribution of the selected pixels under the image-cropping attack. The dots in the figures mean the selected pixels. (a) Watermarked image 1, (b) watermarked image 2, (c) cropped result of Figure 5(a), and (d) cropped result of Figure 5(b).
3.1.3 Details about the GA Training Steps In Step 1, instead of using a seed to generate the X-Y data streams, here we generate them directly. The generated data will be seen as the content of one GA individual (or maybe you would like to call it a chromosome). For the GA training, d individuals have to be generated in this step. We define D as the individual set and ind, as the i-th individual of D .
D = {ind, 10 I i < d } , ind, = {&
I 0 Ij c Lw},
(11) (12)
Genetic Watermarking o n Spatial Domain 387
where p i , j is thej-th selected pixel of ind, fi-om the host image 0. In Step 2, with the generated individuals, we embed W into 0 by referring ind, and generate a watermarked image Or.Repeat the procedure and finally we can obtain d watermarked images {0;,0;, ..., Oi-,} . Then, we calculate the PSNR value by using Equation (8) for each individual. The d PSNR values will be used in evaluating the score of the fitness function then. In Step 4, we apply the attack h c t i o n upon the d watermarked images one by one. After that, we can obtain d attacked images: 0”= {0;10 Ii < d } , where 0;is the result of applying the attack function upon the watermarked image OZ’. In Step 5, the attacked images are used as the input images of the extraction procedure. Hence, d watermarks will be extracted out. We calculate the BCR value for each extracted watermark by using Equation (10). In Step 7, we calculate the fitness score by taking the PSNR value and BCR value of each individual into account. For the i-th individual, the fitness score f , in this case is defined as:
where h is a parameter for controlling the balance of quality and robustness. If the quality of the watermarked image is concerned, the value of A can be set smaller, thus the PSNR value will have stronger affection to the training results. Otherwise, we can increase the value of A to make the BCR value has stronger affection. Then, in Step 8, we select the individual with the best fitness score and backup the content of it. If the GA iteration t has reached the desired times tfina,, we stop the training procedure and see the best
388 F.-H. Wang, L. C. Jain, and J.-S. P a n
individual of current iteration as the trained result; otherwise, we keep doing the training procedure by executing the crossover procedure and the mutation procedure. In the crossover procedure, if an individual has better fitness score, the probability for assigning it as the parent to generate the new individuals for next generation will become higher. Readers have the interests in the details about the crossover procedure and the mutation procedure can obtain more information in Chapter 3. Finally, the best individual will be seen as the key for the embedding procedure and extraction procedure.
3.2
Performance
The same as the experiments in Section 2, we used LENA and PEPPER as the test images, and ROSE as the watermark. The sizes of the gray test images and binary watermark are 512x512 pixels and 128x128 pixels respectively. We also applied the P E G compression with different quality factors (Q) as the attacking function. d=10, tfinal =loo, and The other settings were: A = lOOxPSNR(0,0~). Table 3 and Figure 7 show the extracted watermarks, PSNR values, and BCR values while using LENA as the host image and &15. Table 4 and Figure 8 display the results while using PEPPER as the host image. Table 3. PSNR values and BCR values with different & under JPEG attacks while using LENA as the test image after 100-iteration training. BCR (%) PSNR (dB) JPEG, Q=70% JPEG, Q=80% JPEG, Q=90% 88.49 80.08 78.91 51.90 5 91.70 98.21 88.43 46.81 10 96.41 99.73 42.96 93.95 15
’
Genetic Watermarking o n Spatial Domain 389
Table 4. PSNR values and BCR values with different 6s under JPEG attacks while using PEPPER as the test image after 100-iteration training.
51.66 47.09 43.44 .
.
.
.
(a>
89.12 94.98
92.16
97.96
. .
(b)
(c>
Figure 7. Extracted watermarks under the JPEG attacks while using LENA as the test image with GA trained key and 6 1 5 : (a) Q=70%, (b) Q=80%, and (c) Q=90”/o. .
(a>
(b>
.
(c)
Figure 8. Extracted watermarks under the JPEG attacks while using PEPPER as the test image with GA trained key and 6 1 5 : (a) Q=70%, (b) Q=80%, and (c) Q=90”/.
390
3.3
F.-H. Wang, L. C. Jain, and J.-S. Pan
Discussion about More Attacking Functions
In the GA training procedure, we only employ one attacking function, whch means the watermarked images have better robustness under the considered attacking function. To extend the robustness under more attacking functions, we can mod@ the attacking procedure by taking more attacking hnctions into account, such as Figure 9. In Figure 9, after extracting watermark out from each attacked images, we can calculate the BCR value for each extracted watermark. After that, we calculate the average BCR value bcr and then use it to calculate the fitness score for current individual. Attack Function 1
0'
Attack Function2
Attack Functionn
--+
Extraction
-F
Extraction
---+
Extraction
-F
BCR
BCR
-
bcr
BCR
Figure 9. Taking more attacking functions into the GA training procedure.
As to if we can employ every kmd of attacking functions into the GA training procedure together, the answer is obviously no. Reader who has the interests can try to thmk what will happen while employing low-pass filter as the first attacking function and high-pass filter as the second attackmg function. Further more, while only employing the random noise as the attacking function, what will happen? We leave these to the reader and encourage the reader to thmk of them.
Genetic Watermarking on Spatial Domain
4
391
Discussion and Conclusions
Usually, the spatial-based watermarking schemes have poorer robustness under the common image processing procedures or the transformation of file formats (i.e. save the image as a P E G format file fiom a BMP format file) than the fkequency-based schemes or VQ-based schemes. By the employment of introducing genetic algorithm into the spatial-based watermarlung system, we can obtain better performance not only in quality but also in robustness. Compare Table 1 with Table 3, and Table 2 with Table 4 respectively, the results support the performance of GA. Further more, although the system with GA needs more time to generate a secret key for embedding and extraction, we can do this duty offline. In other words, if a system does not require the o n h e training, the GA can be employed upon the watermarkmg system for having better results. Further more, the concept of using the optimum technique for better performance can also be applied upon other watermarking systems that based on frequency domain or VQ domain. More details about these schemes can be found in Chapter 6 and Chapter 7. And, if the reader has the interests, he or she can also use other optimum techniques, such as tuba search, to shorten the training time or even to have better trained results. One problem associated with the watermarlung scheme in this chapter is the key-delivery problem. Currently, the problem exists in most of the watermarking systems, not only in the mentioned scheme in this chapter. Some researchers had proposed some methods for solving this problem, such as register the keys to some organizations. Readers who have the interests can find some articles to study, and whch is not the focus here. Another problem about the scheme in this chapter is the key-reuse problem. Basically, the generated key is associated with the water-
392 F.-H. Wang, L. C. Jain, and J.-S. Pan
marked image. Each key can only be used for embedding or extracting the watermark from the same watermarked image. The key cannot be used again to embed the watermark into another host image, unless the encoder does not care about the quality of the output results. In conclusion, the method of how to employ genetic algorithm upon the spatial-based watermarking scheme has been presented in this chapter, which provides better performance than the scheme without GA training. We define an object function and use GA to optimize it. The trained result is used as a secret key in the embedding procedure and extraction procedure. In addition, the object h c t i o n also provides a way to control the embedding quality or the extracting quality by changing the value of 2. From the simulation results and compare with the existed method, it is clear that GA can be successfully employed upon the watermarkmg scheme for having better performance.
References Holland, J. (1975), Adaptation in natural and artzjkial systems. The Michigan University Press. Goldberg, D. E. (1992), Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading, Massachusetts. Katzenbeisser, S. and Petitcolas, F. (2000), Information Hiding Techniques f o r Steganography and Digital Watermarking, Artech House, Nonvood. Pan, J.S., Huang, H.C., and Wang, F.H. (2001), “Genetic watermarking techmques,” Proceedings of the Fifth International
Genetic Watermarking o n Spatial Domain 393
Conference on Information Engineering Systems & Allied Technologies, pp. 1032-1036.
This page intentionally left blank
Chapter 14 Robust Image Watermarking Systems Using Neural Networks Chin-Chen Chang and Iuon-Chang Lin Digital image watermarking technology has provided good solutions for protecting intellectual property rights. Several influential watermarking schemes using neural networks have been proposed to provide a stronger and robust protection from attacks. Neural networks perform the best on pattern mapping and classifications. By using the characteristics of neural networks, these watermarking schemes can automatically select system parameters and recover the embedded watermark. In this chapter, we focus on digital image watermarlung technology using neural networks. We also propose a novel wavelet-based watermarking scheme using neural networks since the discrete wavelet transform has been widely used in compression standards such as JPEG 2000 and MPEG-4. According to our experiments, our proposed scheme provides excellent results in unobtrusiveness and robustness.
1
Introduction
In this chapter, we focus on digital watermarking problems for image data. Several watermarking methods for digital images have been proposed in recent years (Caronni 1995, Hsu and Wu 1999, Langelaar et al. 1997, Su et al. 1999). These methods can be classified into two types: embedding the watermark into the spatial domain, and 395
396
C.-C. Chang and I.-C. Lin
embedding the watermark into frequency domain. The first type provides good computing performance but usually degraded robustness, while the second is more robust especially when the watermarked image is compressed with JPEG compression methods (Charier et al. 1999, Haskell et al. 1998). These methods also can be classified based on whether the original image is required during the watermark extraction phase. Since it requires a large image database to store the original images, most of watermarking researches have assumed the original image is not required. Recently, some researches have utilized neural networks to design robust watermarking systems (Hwang et al. 2000, Yu et al. 2001). The advantages of neural networks include the relatively easy algorithmic specification, and good pattern mapping and classification (Roth 1990). Given a network architecture, a set of training input data and the excepted output, the network can learn from the training set and then can be used to classify or predict the unseen data. Neural networks have been used efficiently to provide solutions for applications such as pattern recognition, classification, signal processing, and image analysis. It also has been successfully applied to digital watermarking technology. In these methods (Hwang et al. 2000, Yu et al. 2001), neural network techniques have been applied to analyze the characteristics of digital image. The advancements are very successful because the constructed watermarking systems are perceptually invisible and robust to a number of different attacks. In this paper, we propose a novel watermarlung scheme for the image data using neural networks. Different from previous schemes (Hwang et al. 2000, Yu et al. 2001), our scheme is based on discrete wavelet transform (DWT) (Mallat 1989). DWT provides hierarchical multiresolution: three parts of multiresolution representation (MRR) and a part of multiresolution approximation (MRA). For example, a level n resolution wavelet decomposition will result a pyramid structure as shown in Figure 1 (Shapiro 1993). The subbands labeled L H 1 ,H L 1 ,and H H 1 of MRR represent the high frequency informa-
Robust Image Watermarking Systems Using Neural Networks 397 Horizontal Frequency
Low
High b
.
I
.
Figure 1. Illustration of the decomposition of an image signal in three resolution levels through DWT
tion such as edges and textures of an image. Generally, the human visual system is not sensitive to discover changes in such subbands. The subband LL1 of MRA represents the low frequency information which contains important data. In the subsequent resolution levels, the subband LL1 is decomposed again into further hierarchical subbands. The process can be repeated until the final resolution level is reached. Furthermore, wavelet transformation has been widely used in several image and video compression standards such as JPEG
398
C.-C. Chang and I.-C. Lin
2000 and MPEG-4. In order to reduce the communication load, most multimedia data will be compressed before the transmission. More over, the multimedia data is usually stored in the compressed format. According to the characteristics of DWT, most watermarking schemes attempt to embed the watermark into the high frequency subbands. However, a stronger image watermarking scheme must be robust for the JPEG compression. Several wavelet-based image watermarking schemes (Barni et ul. 2001, Hsieh et ul. 2001, Wang et al. 2002) have been proposed to reduce the susceptibility of JPEG compression but not all. The goal of our proposed method is to apply neural networks to enhance the robustness to different types of attacks. The rest of this paper is organized as follows: Section 2 provides a brief introduction to neural networks; Section 3 briefly reviews some influential watermarking schemes using neural networks; Section 4 provides a detailed description of our wavelet-based watermarking scheme using neural networks; Sections 5 and 6 present and discuss the results of our experiments; Section 7 outlines the conclusions.
2
Neural Networks
Neural networks are powerful tools for pattern recognition and classification applications. They provide more alternatives than conventional classification techniques. Generally speaking, neural network is a mathematical method to generate a classification model to describe the relationship in the input data. The network is able to learn from a set of training patterns and attempts to minimize the error between its predictions and the excepted output. After iterations of training processes, the error can be decreased. Subsequently, the network can be used to predict or classify the unseen input data. Back propagation network (BPN) is one of the most popular architectures of neural networks. It is a supervised learning model
Robust Image Watermarking Systems Using Neural Networks 399
(Roth 1990, Soucek 1989). The BPN architecture is depicted in Figure 2. It includes three layers: input layer, hidden layer, and output layer. The input layer corresponds to the features of input data and the output layer corresponds to the pattern classes. In addition, the BPN architecture may contain one or more hidden layers to help for feature extraction. Each layer has one or more processing units, and each unit is fully connected to its adjacent layer. The connection between two units in adjacent layers is called a link. Each link has a weighted value modified by the training patterns. The activation function is used to determine the new activation values of the units. Each layer can have a different number of units. The numbers of units in the input layer and output layer vary depending on the practical necessity. Furthermore, according to the study in (Klimasauskas 199l), more units in the hidden layer can reduce the total of errors in training. However, fewer units in the hidden layer can provide a better performance in constructing the network.
unit
Figure 2. The 4 x 3 x 4 BPN architecture
400 C.-C. Chang and I.-C.Lin
The principle behind BPN involves using the steepest gradient descent method to reach a small approximation error. A BPN model with M layers can be represented by a tuple of the form (units of input layer x units of layer 1 x units of layer 2 x . . - x units of layer M), where the Adth layer is the output layer (Hertz et al. 1991). All units from layer i are connected to all units from layer i 1. For example, the connectivity graph in Figure 2 can be represented by (4 x 3 x 4). The input value from each unit is the sum of the previous layer's output values multiplied by a weight vector. Each unit i in layer m can be conducted in sequence by the following computations:
+
where
hT
xrn
-+
-+
fact
-+
wm
+
Om
t
the activation value of the ithunit in the mthlayer, the output of the ithunit in the mthlayer, the activation function, the weight of the link from unit j in layer m - 1 to unit i in layer m, the threshold or bias of unit i in layer m.
The function fact is the activation function defined by fact(x)
=
1 1 e-"
+
(3)
This function is also called sigmoid function and used to determine the new activation values of the units. In order to minimize the error between the actual output vectors of the last layer and the excepted output vector, the adaptive weight changes are required in the
Robust Image Watermarking Systems Using Neural Networks 401
training process. The details of the training algorithm, or the back propagation algorithm, are described by the following steps (Hertz et al. 1991):
1. Assign all initial weights w$ with small random values. 2. Input a pattern from the training set.
3. Propagate the output values through the network with the propagation formula
4. Determine the error for each unit in the output layer by comparing the output V,” with the target output:
p
= f;ct(hy)(ty-
Vy),
where t y is the target output value of unit i in output layer A4 and f i c t (z) is equal to fact (z)( 1 - f a c t (z))5. Compute the required change values Aw? for all weights by the formula
nw;
=q x
sm x ym,
(4)
where 7 is the learning factor and the value 6r is computed by using the rules
fict( h T )(ty - V,” f:Ct(hY)(xk
-
43m
Then update the weights by
1), if unit i is in the output layer M , if unit i is in the hidden layer.
402
C.-C. Chang and I.-C. Lin
6. Repeat the process from Step 2 until the sum of squared error (SSE) reaches its minimum or the error has not changed. Once the training is completed, the weights are stored in the neural network. We can then use the trained network to predict the corresponding output values.
3
Related Works
In this section, we will briefly review two watermarking schemes using neural networks. These two schemes embed the watermark into the spatial domain and frequency domain separately. The concept of neural network as described earlier will also be included.
3.1
Image Watermarking System Based on Discrete Cosine Transform Using Neural Networks
In our pervious work (Hwang et al. 2000), a back-propagation neural network (BPN) is used as a tool to design a watermarking scheme. BPN is a supervised learning neural network, one of the most popular models in neural networks. BPN is employed to embed the watermark into the frequency domain, which can simultaneously improve the security and robustness of the watermarked image. To the best of our knowledge, our previous work is the first scheme to use neural networks to design a watermarking system. Because the scheme operates in the frequency domain, it contains three main parts: image transformation, watermark embedding, and watermark extracting. First, the locations to embed the watermark pixels in are selected by using a location decision procedure. A one-way hash function is used in this scheme to help decide the locations and enhance the security. Each embedding location is an 8 x 8 block. Then the discrete cosine
Robust Image Watermarking Systems Using Neural Networks 403
transform (DCT) (Gonzalez and Woods 1992) is applied to these locations, where the cosine value form is used to represent the original data. The major feature of DCT transformation is that it concentrates an image’s energy around the upper-left corner. Due to this desirable feature, DCT has been widely used in image compression techniques such as JPEG.
input vectors
1-
output vector
Figure 3. Indices of AC components
To embed the watermark in the image, some coefficients in the transformed domain are modified according to a certain watermarking rule. In this scheme (Hwang et al. ZOOO), a BPN model is used to design the watermarking rule. Figure 3 shows the transformed domain of an embedding block. The first nine AC coefficients (AC1, AC2, - . . , AC9) are the input vectors and the twelfth AC coefficient AC12 is the output vector in the BPN model. The coefficient to be modified concerns the output vector in BPN. According
404 C.-C. Chang and I.-C. Lin
to the feature of DCT, if the lower indices of the AC coefficients are used, the robustness of the watermark will increase, but the quality of the watermarked image will decrease. The BPN model is illustrated in Figure 4. The model consists of three layers: the input layer, the hidden layer, and the output layer. The processing units between the layers are fully connected. The input value from each unit is the sum of output values from the previous layer multiplied by a weight vector. The weight values can be modified by the training set according to the approximation errors. After the BPN model is trained, an output vector AC12; can be calculated in linear time. The watermark pixel W iE (0, l} is embedded by replacing the original AC12i with AC12:, where AC12: is computed by the following rules: ACl2;
=
c
AC12; AC12:
- 6,
+
if W i= 0, 6, if W i= 1.
(6)
Here 6 is a system parameter. The value 6 can be cdermined by choosing between perceptual invisibility and robustness. A larger 6 will provide robustness, but the image quality will be sacrificed. To extract the watermark from the watermarked image, the procedure is similar to the embedding procedure. When the correct secret keys and the trained BPN model are introduced, the corresponding AC12i and ACl2: can be obtained. The extracting rules are as follows:
wa = 3.2
0, if AC12i < AC126, 1, ifAC12i > AC12:.
(7)
A Watermarking System Using Neural Networks in Spatial Domain for Color Images
In this subsection, we will briefly review Yu et al.'s watermarking scheme (Yu et al. 2001). This scheme embeds an invisible water-
Robust Image Watermarking Systems Using Neural Networks
AC3
AC4
AC5
AC6
AC7
0 0 0 0 0
405
'
AC12
layer
7 hidden layer input layer Figure 4. A watermarking scheme using BPN model
mark into a color image. The watermark embedding procedure directly modifies the subsets of image pixels. According to the characteristics of the embedded watermark and the watermarked image, the image owner can train a neural network. The trained neural network can recover most of the watermarks from the watermarked image even if the watermarked image has been attacked by some imaging processes. Following are the procedures for watermark embedding and watermark extracting.
406 C.-C. Chang and I.-C. Lin
First, the image owner randomly selects a subset of pixels from the original image to embed the watermark data. Many pseudo random number generators (PRNGs) can be applied to obtain a sequence of random positions Pt over the original image 0 . The watermark image W can then be embedded into the original image 0 by modifying the blue component Bpt such as BPt = BPt
+ (2Wt - l)QLPt,
(8)
+
+
where Lpt is the luminance of Opt by Lpt = 0.299Rpt 0.587Gpt 0.114Bpt and Q is a positive constant. Larger Q offers better robustness but degrades the visual quality of the watermarked image. Repeat the embedding processes until all bits in W are embedded.
Figure 5 . The symmetric cross-shaped window for c = 2
To extract the watermark W’ from the watermarked image 0’,the watermark embedded positions are generated by using the same PRNG in the embedding processes. Next, the image owner collects a set of training patterns to train a neural network. Consider a symmetric cross-shaped window with c pixels along the vertical and horizontal direction, as shown in Figure 5. The training pattern is the
Robust Image Watermarking Systems Using Neural Networks 407
difference between the intensity of the blue component of the central pixel and the others within the window. Thus, each training pattern 7 contains 9 input vectors and 1 output vector, and it can be defined by 7 =
{&-2,j,
&-l,j, &,j, & + I , j 1 & + 2 , j ,
~q j - 2. 7 . z ,~j - 1 , . z., j + l~, ~. ~. $ ~2 a 1, j . . d . . }(9), where c?u,v is defined by
and di,j is the desired output of the neural network, defined by the following rules
The legal image owner can use the trained neural network to extract watermarks from the watermarked image at any time. The structure of the trained neural network is illustrated in Figure 6. It is a 9 x 5 x 1 multilayer perception including 9 units in the input layer, 5 units in the hidden layer, and 1 unit in the output layer. According to the output value dt, the bit of watermark Wl can be calculated by the following rules:
w;=
1, ifdt 2 0, 0, otherwise.
(12)
Once all the bits in watermark are extracted, the extracted watermark W’ can be used to identify the copyright of the owner’s intellectual property. The two schemes can keep the location of the embedded watermark unknown to any illegal user, and allow the legal user to extract the
408
C.-C. Chang and I.-C. Lin
Figure 6. The structure of the neural network used in Yu et al.’s watermarking scheme
embedded watermark from an altered image. Furthermore, the security and robustness can be improved simultaneously by using neural network techniques.
Robust Image Watermarking Systems Using Neural Networks
4
409
Image Watermarking System Based on Wavelet Transform Using Neural Networks
Based on the characteristics of discrete wavelet transformation and neural networks, we develop a robust and unobtrusive watermarking system for image data. Different from conventional wavelet-based methods embedding the watermark in the high frequency subbands, our method allows more flexibility in selecting the DWT coefficients used in the embedding process. In this system, it is assumed that the original image 0 is of gray scale with 8 bits per pixel and the watermark W is a binary image with one bit per pixel. The original image 0 contains OH x Ow elements and is defined as:
0 = {Oi,jIO 5 oa,j 5 2* - l}, (13) where 0 5 i 5 OH and 0 5 j 5 Ow. The watermark image W contains W, x W w elements and is defined as: W
is 0 or I}, (14) where 0 5 i 5 W H and 0 5 j 5 Ww. We also can convert the 2D data into a one-dimensional array Wo,Wl , . . . Ww, xww-l. In this proposed system, suppose that O H ,Ow, W,, and W w are integers power of two. The generality is not restricted because a padding with zero’s of the original image can be done. The system contains two phases: the embedding phase and the extracting phase. The details of the two phases are described in the following subsections.
4.1
=
Embedding Phase
The flow chart of the embedding phase is illustrated in Figure 7. Following are the steps of the embedding phase. 1. We first decompose the original image into the frequency domain with several hierarchical subbands using DWT as shown
410
C.-C. Chang and I.-C. Lin
Original Image
t
No Visual Difference
Reconstruction
Watermarked Image
I i&
Watermark
Figure 7. The flow chart of the embedding phase
in Figure 1. LL,, LH,, HL,, and H H , represent the lowlow, low-high7 high-low, and high-high subbands in resolution level T . The process continues until the final resolution level n is reached. The details of DWT algorithm can be found in (Mallat 1989, Shapiro 1993, Woods 1991). 2. Hierarchical subbands of DWT decomposition can be represented by using several corresponding quadtrees. Except the approximation coefficient in LL,, the other coefficients in LL, are the roots of the quadtrees respectively. The quadtree structure is shown in Figure 8. Each tree node has four children and it is associated with a coefficient in the DWT decomposition such as Figure 1.
3. A coordinate set S is selected from DWT decomposition by using a pseudo-random generator (PRNG). S is produced and defined
Robust Image Watermarking Systems Using Neural Networks 411
.
Figure 8. The illustration of the quardtree structure corresponding to the DWT decomposition
In addition, to prevent the embedded locations from being disordered, if the coordinate of a corresponding node in the quadtree has been selected, the coordinates of its siblings can not be selected. For example, if the coordinate of the node C, has been selected, the other siblings C,, Cl , C2 can no longer be selected. The coordinate (ik,j k ) is determined by the PRNG with a seed k. Many useful PRNGs have been proposed (Blum et al. 1986, Hwang et al. 1999) and can be applied in this step. The seed k is a part of the secret key to detect a watermark. After generating the coordinate set, we have to sort the coordinates in the coordinate set according to the scan order in Figure 9.
412
C.-C.Chang and I.-C.Cin
2
I
J(
4
Figure 9. The scanning order of the hierarchical subbands in the DWT decomposition
4. In order to construct a neural network, we have to prepare a training set T to train the network. The training set T is defined as follows:
Each
training pattern t k contains eight input vectors Zk,o1 X k , i , X k , 7 and four excepted outputs Y k , o l % , i , Yk,2] Ylc,3. Here X k $ , ~ k , lX,k , 2 , x k , 3 are the coefficients corresponding to the siblings of sky and Z k , 4 , ~ k ,X k~$ ,, X k , J are the coefficients corresponding to the siblings of sk's parent. The excepted * *
Robust Image Watermarking Systems Using Neural Networks
413
outputs yk,O, yk,l, y k , 2 , yk,3 are the coefficients corresponding to s k ' s four children. For example, if the coordinate s k of node C3 is selected, the input vectors are the coefficients of nodes C1, C2, C3, C4, B1,B2, B3,B4, and the excepted outputs are the coefficients of nodes D1,D2,D3, D4. According to the training algorithm described in Section 2, we can construct a BPN model to embed and extract the watermark. 5. After training, the trained network N can be used to embed the watermark. For all selected coordinates in s, input x k , ~z , k , ~ , - , x k , 7 to the trained network N . The corresponding output vectors are Y ; , ~ , &, ZJ;,~, y,L3, which are acquired from yk,O,Y ~ , ~ , Z J ~ ,The ~ . output vectors are calculated as below:
The watermark sequence wqxk,W 4 x k + l ,W4xk+2, wqXk+3,where I C = O , l , . . . , w H ww - 1,is embedded by replacing the original coefficients %,a, Y k J , ?4k,2, Yk,3 by $,o, $,l, !4,27 $,3- Y'j, where i = 0,1,2,3, is computed by the rules
The value of parameter 6, depends on the resolution level T of y l ~ ,Higher ~. resolution levels result more important coefficients such as the upper-left corner of Figure 1. Thus, the parameter 6, in a higher resolution level should be smaller to avoid serious distortion. On the other hand, the parameter 6, in a lower resolution level could be larger to enhance the robustness. The value can be determined by the user requirements. 6. After embedding all watermark sequences, we can get a watermarked DWT decomposition. Finally, the watermarked image 0' can be obtained by using inverse DWT.
414
C.-C. Chang and I.-C. Lin
After the neural network N is trained, the copyright owner can deliver the trained network N and the identity I D to a trusted third party (TTP) to prevent the multiple-claim problem (Craver et al. 1997). We assume that the copyright owner is the first person to register the copyright of the image 0 at time TS.TTP produces a signature for the copyright owner of image 0 from the following formula: SO= S i g K T T p( N ,T S ,I D ) ,
(18)
where So is the generated signature for image 0 and SigKTTp()is a signature generater function using the TTP’s private key KTTp. The signature So can then be used to avoid any dispute regarding registering images redundantly by different copyright owners.
4.2
Extracting Phase
When the secret key Ic and the trained neural network are correctly provided by a legal owner, the watermark can be easily extracted. The flow chat of our extracting phase is illustrated in Figure 10. The steps of the embedding phase are similar to the embedding phase, except for the training process and inverse DWT process. The details of these steps are as follows. 1. The watermarked image 0’ is first transformed to DWT decomposition.
2. Convert the hierarchical subbands of DWT decomposition into the representation using quadtrees.
3. Let the secret Ic be the seed of the pseudo-random number generator. As mentioned in Step 3 of embedding phase, if a right seed is introduced, the right coordinate set S can be obtained. Then we sort the all coordinates according to the scan order in Figure 9.
Robust Image W a t e n a r h n g Systems Using Neural Networks 415
& Watermark Watermarked Image
Trained Neural Network
Figure 10. The flow chat of our extracting method
4. According to the selected coordinate set
s
= {sklk = - l}, we can input the corresponding vectors xk,o, xk,1, . . . , zk,7to the trained neural network N and compute ~ , The embedded watermark the output vectors yL,olYL,~,~ j , , Y;,~. sequences Wqxk+i can be extracted by following the relationship between Yk,i and yk,i such as
O,L...
1
H xww 4
xww - 1 and i = 0,1,2,3. Finally, exwhere k = 0 , 1 , . . . tracted information W’ can be obtained to prove the ownership. ]
To verify whether the trained neural network was indeed generated at certain time T S and has been registered with TTP, we can use the public key of T T P to verify the validity of the signature So.
5
Experiments
Back propagation network was used in our experiments. In the BPN training process, the numbers of hidden layers and processing units
416
C.-C.Chang and I.-C.Lin
are important for minimizing the margin of error in prediction. Generally, complicated problems require a larger number of processing units in the hidden layer. Hidden layers usually have a better rate of convergence. However, too many hidden layers will complicate the network and degrade the rate of convergence (Klimasauskas 1991). In this experiment, the network architecture used is illustrated in Figure 11. It has eight input units in the input layer, six processing units in one hidden layer, and four output units in the output layer. The system ran on a Pentium I11 450 Mhz processor with 128 MB memory and Borland C++ Builder compiler. The training pattern of this experiment has 12 values: Numbers 1- 8 are the input vectors and Numbers 9 - 12 are the excepted output vectors. Since the input and output vectors in the activation function of BPN are always real numbers between 0 and 1,we have to normalize the training patterns into this range before training the network. Therefore, we have to design a linear translation function to translate DWT coefficients into the range from 0 to 1. Because the range of DWT coefficients is between -256 and 256 and the frequency distribution appears as the normal distribution, the linear translation function f (x)and its inverse function f (y) are defined as follows:
f(x) f-l(y)
=
-' x + 256
512 ' = 512y - 256.
(20) (21)
After normalizing the training patterns, the training set is used to train the network. In the training process, the initial weights of the network model are randomly assigned and the learning rate in this experiment is 0.5 (Jacobs 1988). The activation function is Sigmoid function. The training cycle is repeated until the sum of squared error (SSE) reaches its minimum or the error has not changed (convergence). Once the training is completed, the weights with the network model are stored.
Robust Image Watermarking Systems Using Neural Networks 417
n XI
x2
Yl
x3
x4
y2
x5
y3
'6 y4
output layer hidden layer input layer Figure 11. The BPN architecture used in our experiment
We employ the Peak Signal of Noise Ratio (PSNR) to evaluate the distortion between the pre-processing image and the post-processing image. Theoretically, smaller distortion between the preprocessing image and the postprocessing image results a larger value of PSNR. A larger PSNR value (greater than 30db) indicates that there is little difference between the original image and the processed image, and the quality of the processed image is acceptable. Bit Correct Ratio (BCR) is usually used to estimate the correctness of the extracted
418
C.-C. Chang and I.-C. Lin
watermark, and is defined as follows:
where wiis the original watermark sequence, wi is the sequence from the extracted watermark, and @ denotes the exclusive-OR operator. In order to test the robustness of the proposed scheme, we first attack the watermarked image and then extract the watermark. The possible attacks include JPEG lossy compression, blurring, sharpening, and scaling. Three 512 x 512 gray scale images “Lena”, “Barbara”, and “Plane” are used in the experiments. Figure 12(a) shows the original image of “Lena” and Figure 12(b) shows the logo of “National Chung Cheng University” (1 bit/pixel, 64 x 64). We performed our proposed scheme to embed Figure 12(b) into Figure 12(a), which produces the watermarked image in Figure 12(c) (PSNR=39.46). The extracted watermark from Figure 12(c) is shown in Figure 12(d) (BCR=98.87%). JPEG lossy compression, blurring algorithm, sharpening algorithm, and scaling process were used to alter the original image “Lena”. The JPEG lossy compression used the lowest quality parameter in the above altered image. The blurring algorithm performed a 5 x 5 neighborhood median. Adobe Photoshop was used to conduct these changes. The altered results are shown in Figures 13(a), 14(a), 15(a), and 16(a) with the corresponding PSNR values being 32.17 dB, 31.53 dB, 32.26 dB, and28.17 dB, respectively. Figures 13(b), 14(b), 15(b), and 16(b) shown the extracted watermarks from Figures 13(a), 14(a), 15(a), and 16(a). The bit correct ratios of these extracted watermarks are 88.43%, 89.25%, 95.12%, and 78.58%, respectively. According to the experimental results, the all extracted watermarks under different attacks are recognized.
Robust Image Watermarking Systems Using Neural Networks 419
Figure 12. (a) Original image of “Lena,” (b) logo of “National Chung Cheng University,” (c) watermarked image of “Lena,” (d) extracted watermark from watermarked image
The result of the experiment on different images is shown in Table 1. The extracted watermarks are recognizable. Our proposed scheme is robust to attacks, such as JPEG lossy compression, blurring, sharpening, and scaling.
6
Discussion
Similar to our previous work in (Hwang et al. 2000), the proposed scheme also uses a neural network to develop a watermark system. The difference between the two schemes is that the proposed scheme
420
C.-C. Chang and I.-C. Lin
(a) PSNR=32.17 dB
(b) BCR=88.43%
Figure 13. (a) Reconstructed image from P E G compressed image, (b) extracted watermark
(a) PSNR=3 1.53 dB
(b) BCR=89.25%
Figure 14. (a) Blurred image with watermark, (b) extracted watermark
(a) PSNR=32.26 dB
(b) BCR=95.12%
Figure 15. (a) Sharped image with watermark, (b) extracted watermark
Robust Image Watermarking Systems Using Neural Networks
(a) PSNR=28.17 dB
421
(b) BCR=78.58%
Figure 16. (a) Resized image from the shrunken image, (d)extracted watermark
re
Table 1. The bit correct ratios of the extracted watermarks under various attacks atermarked ImAttacks PSNR (dB) BCR ("A) Embedded
Barbara Plane
38.25 37.64
98.03 97.41
JPEG
Barbara Plane
32.33 31.86
87.57 85.43
Blurring
Barbara Plane
29.84 3 1.58
90.73 89.64
Sharpening
Barbara Plane
3 1.25 32.85
92.54 93.87
Scaling
Barbara Plane
25.13 26.37
78.84 79.26
422
C.-C. Chang and I.-C. Lin
embeds the watermark into the DWT decomposition but the previous one embeds the watermark into the DCT decomposition. The result of the experiment demonstrates that the two schemes are both robust under attacks, such as JPEG lossy compression, blurring, sharpening, and scaling. However, the proposed scheme is more practical than the previous scheme described in (Hwang et al. 2000). The proposed scheme is different from the previous scheme in embedding the watermark by modifying the coefficients in some particular positions of DCT domain, such as AC12; the proposed scheme embeds the watermark into randomly selected positions in the whole DWT decomposition. The newly proposed scheme can avoid the attack to acute distorting some particular coefficients. According to the DWT characteristics, the low frequency coefficients, such as locating on the upper-left corner of the image, are important. Therefore, the suffered changes on these low frequency coefficients are much less. On the other hand, the high frequency coefficients may easily be distorted by several image processing operations. In order to enhance the robustness, the suffered changes on high frequency coefficients must large. In our proposed scheme, the parameters 6,’s are used to control the range of changes. A larger 6, provides the greater robustness for the watermarked image but degrades the quality of the watermarked image. In our experiment, the parameters 6,’s in the lower resolution level have larger values. According to the selection rule of parameters S,’s, our scheme can produce the unobtrusive watermarked image and keep its robustness. The multiple claims problem, also known as the ownership deadlock problem, (Craver et al. 1997) can be solved in the proposed scheme. This problem is similar to the one in the cryptography of digital signatures. In order to resolve this problem, a trusted third party is required to participate in the ownership authentication. Therefore, an extra protocol is required to authenticate each owner’s identity. In our scheme, the trained neural network with owner identity and timestamp are sent to a trust third party and then produce a signature by
Robust Image Watermarking Systems Using Neural Networks
423
using the private key of TTS. It is not necessary to modify the original scheme. Table 2 shows the comparison between our proposed scheme and some conventional schemes (Caronni 1995, Hsu and Wu 1999, Hwang et al. 2000, Langelaar et al. 1997, Su et al. 1999, Yu et al. 2001). Due to the flexibility and adaptability of the neural network, our proposed scheme appears to be more robust under several types of attacks. Furthermore, our proposed scheme does not need the original image in the watermark extracting procedure, and the multiple claims problem (Craver et al. 1997) can be solved. Furthermore, the required memory space is adopted in our proposed scheme. The BPN model is used in our experiment. The memory requirement for the trained BPN depends on the number of weights. Since BPN is a fully connected architecture, the number of weights is equal to the number of links, i.e. L I x LH x LO,where L I , L H , and LO is the number of processing units in the input layer, hidden layer, and output layer respectively. Thus, the BPN architecture used requires 8 x 6 x 4 weights. Suppose each weight requires 2 bytes, the required total memory is 8 x 6 x 4 x 2 = 384 bytes. Therefore, our scheme is practical in terms of memory requirements.
7
Conclusions
The techniques of neural networks have been successfully used in digital watermarking systems. Based on DWT, we design a perceptually invisible and robust watermarking system to satisfy the current requirements. Moreover, our scheme contains more outstanding features than conventional methods. The original image is not needed in the watermark extracting procedure. The multiple claims problem also can be solved efficiently. According to the results of the experiments, our method is truly robust under various types of attacks.
424
C.-C. Chang and I.-C. Lin
Table 2. Comparison between the proposed scheme and some conventional schemes Methods
Frequencyl spatial domain
Original image for watermark extracting
Robustness
Multiple claims problem
(Hsu and Wu 1999)
Freq. (DCT)
Yes
JPEG, Cropping
Undescribed
(Su et al. 1999)
Freq. (DWT)
No
JPEG, SPHIT
Undescribed
(Caronni 1995)
Spatial
Yes
PEG
Undescribed
(Langelaar et al. 1997)
Spatial
Yes
PEG
Undescribed
No
PEG, Blurring, Sharpening, Scaling, Rotation
Undescribed
No
JPEG, Blurring, Sharpening, Scaling
Undescribed
No
JPEG, Blurring, Sharpening, Scaling
Trained network with TTP
(YU al. 2001)
et
(Hwang et al. 2000)
The proposed scheme
Spatial
Freq. (DCT)
Freq. (DWT)
Robust Image Watermarking Systems Using Neural Networks 425
References Barni, M., Bartolini, F., and Piva, A. (2001), “Improved waveletbased watermarking through pixel-wise maslung,” IEEE Transactions on Image Processing, vol. 10, pp. 783-791. Blum, L., Blum, M., and Shub, M. (1986), “A simple unpredictable pseudo-random number generator,” SIAM Journal on Computing, V O ~ .15, pp. 364-383. Caronni, G. (1995), “Assuring ownership rights for digital images,” Proceedings of Reliable IT Systems VIS’95, H.H. Brueggemann and W. Gerhardt-Haeckl (Ed.), Vieweg Publishing Company, Germany, pp. 25 1-264. Charrier, M., Cruz, D. S., and Larsson, M. (1999), “JPEG2000, the next millenium compression standard for still images,” Proceedings of IEEE International Conference on Multimedia Computing Systems, pp. 131-132. Craver, S., Memon, N., Yeo, B.L., and Yeung, M. (1997), “Can invisible watermarks resolve rightful ownership?” Proceedings of the SPIE International Conference on Storage and Retrieval for Image and video Databases, vol. 3022, pp. 3 10-321. Gonzalez, R. and Woods, R. (1992), Digital Image Processing, Addison Wesley. Haskell, B.G., Howard, P.G., LeCun, Y.A., Puri, A., Ostermann, J., Civanlar, M.R., Rabiner, L., Bottou, L., and Haffner, P. (1998), “Image and video coding - emerging standards and beyond,” IEEE Transactions on Circuits and Systems for video Technology, VOI. 8, pp. 814-837. Hertz, J.A., Krogh, A.S., and Palmer, R.G. (1991), Introduction to the Theory of Neural Computation, vol. 1, Addison Wesley.
426
C.-C. Chang and I.-C. Lin
Hsieh, M.S., Tseng, D.C., and Huang, Y.H. (2001), “Hiding digital watermarks using multiresolution wavelet transform,” IEEE Transactions on Industrial Electronics, vol. 48, pp. 875-882. Hsu, C.T. and Wu, J.L. (1999), “Hidden digital watermarks in images,” IEEE Transactions on Image Processing, vol. 8, pp. 58-68. Hwang, M.S., Chang, C.C., and Hwang, K.F. (1999), “A watermarking technique based on one way hash functions,” IEEE Transactions on Consumer Electronics, vol. 45, pp. 286-294. Hwang, M.S., Chang, C.C., and Hwang, K.F. (2000), “Digital watermarking of images using neural networks,” Journal of Electronic Imaging, vol. 9, pp. 548-555. Jacobs, R.A. (1988), “Increased rates of convergence through learning rate adaptation,” Neural Networks, vol. 1, pp. 295-307. Klimasauskas, C.C. (1991), Applying Neural Networks, part III: Training a neural network, PC AI. Langelaar, G., Lubbe, J.C.A., and Lagendijk, R. (1997), “Robust labeling methods for copy protection of images,” Proceedings of the SPIE International Conference on Storage and Retrieval for Image and video Databases, vol. 3022, pp. 298-309. Mallat, S.G. (1989), “A theory of multiresolution signal decomposition: The wavelet representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, pp. 674-693. Roth, M. (1990), “Survey of neural network technology for automatic target recognition,” IEEE Transactions on Neural Networks, V O ~ .1, pp. 28-43. Shapiro, J.M. (1993), “Embedded image coding using zerotrees of wavelet coefficients,” IEEE Transactions on Signal Processing, V O ~ 41, . pp. 3445-3462.
Robust Image Watermarking Systems Using Neural Networks 427
Soucek, B. (1989), Neural and Concurrent Real-time System, New York: Wiley. Su, P.C., Kuo, C.C.J., and Wang, H.J.M. (1999), “Blind digital watermarking for cartoon and map images,” Proceedings of the SPIE International Conference on Security and Watermarking of Multimedia Contents, vol. 3657, pp. 296-306. Wang, Y., Doherty, J.F., and Van Dyck, R.E. (2002), “A waveletbased watermarking algorithm for ownership verification of digital images,” IEEE Transactions on Image Processing, vol. 11, pp. 77-88. Woods, J.W. (1991), Subband Image Coding, Boston: MA: Kluwer. Yu, P.T., Tsai, H.H., and Lin, J.S. (2001), “Digital watermarking based on neural networks for color images,” Signal Processing, vol. 8 1,pp. 663-67 1.
This page intentionally left blank
Chapter
15
A Perceptually Tuned Watermarking Scheme for Digital Images Using Support Vector Machines Chin-Chen Chang and Iuon-Chang Lin This chapter provides the feasibility of exploiting support vector machines (SVM) to design a watermarking scheme for the copyright protection of digital images. The scheme takes advantage of the speed and generalization ability of SVM in classifying numerous models. The trained SVM can automatically determine where the perceptually significant blocks are and to what extent the intensities of the block pixels can be modified. If the SVM can carefully select intensities according to the the characteristics of the blocks, we can expect that the marked image is visually indistinguishable from the original. Furthermore, the security has also been enhanced, and so has the extension to color images provided. Our experimental results have demonstrated that our scheme can achieve unobtrusiveness and robustness.
1
Introduction
The Internet brings forward business of a new trend. For example, transactions can nowadays be conducted over the Internet to pay for pay-per-view video on demand, on-line consulting, and on-line museum, etc. To satisfy the growing need of sharing artistic productions online, many schemes have been brought up where the owner of a 429
430 C.-C. Chang and I.-C. Lin
creation can charge the users via some payment mechanism. However, one of the main bottlenecks of such a digital technology is that the data can usually be accessed and duplicated easily. As the result, data piracy has become a critical problem to overcome before any payment mechanism can be actually put to use. Therefore, intellectual property right protection such as copyright protection is always a tricky job. Digital watermarking is a useful concept for the protection of copyright. In order to strengthen the ownership of the creation, a trademark of the owner can be selected as a watermark. In general, the watermark can be embedded as either visible or invisible data (Hwang et al. 2000). The main advantage of the visible watermark is that the owner can be easily identified. However, the visible watermark is usually not robust against image processing techniques; the embedded watermark can be easily removed from the images. Compared to the visible watermark, the invisible watermark has two specific advantages (Zhao and Koch 1998): 1. Security: The embedding locations and the modified values are secret. Without the secret keys, no one gets to know where the watermark is embedded and how many pixels have been modified. Only the authorized user can extract the secret watermark. 2. Robustness: Under the premise that the image quality does not get seriously harmed, the embedded watermark is resistant to the most common signal processing techniques. According to the researches released in (Bas et al. 2002, Hwang et al. 1999), we can also classify digital watermarks into two different domains where the watermarks are embedded differently. The two domains are as following:
1 . Spatial domain This solution is to embed a watermark directly into spatial domain. It inserts or modifies some pixel values in the
A Perceptually Tuned Watermarking Scheme for Digital Images 431
least significant bits. The main advantage of this method is that it has a good computing performance.
2. Frequency domain This solution is to transform the original image by using some standard transformation method such as Fourier transformation, discrete cosine transformation, or wavelet transformation. Then, the watermark is embedded into the frequency domain. Generally speaking, an efficient watermarking scheme must satisfy the following requirements (Barni et al. 1998):
1. Unobtrusiveness: The watermark should not affect the quality of the cover image. The watermarked image should not be significantly perceivable by the human visual system (HVS). Since, the image quality is not seriously degraded, it will not draw any special attention of attackers. 2. Readily extraction: The embedded watermarks must be able to be easily and securely extracted by their owners.
3. Inclusion of no original image: Some watermarking schemes (Cox et al. 1997, Hsu and Wu 1999, Lu et al. 2000) extract the watermark by comparing the watermarked image to the original image. Those techniques have two problems: a large image database will be required to store the original images, and the watermarking system will become too complicated.
4. Robustness: The watermarking system must be resistant to lossy compression, filtering and other types of image processing operations. In order to make watermarks robust and perceptually tuned, many schemes suggest the embedding of the watermark around the perceptually significant blocks of an image, e.g. edge blocks (Lin 2000, 0
432
C.-C. Chang and I.-C. Lin
Ruanaidh and Pun 1998). The reason is if we embedded a watermark into a smooth block, it would result in an obtrusive marked image, and an attacker could spot and remove the watermark easily. In this chapter, we will focus on the type of watermarking scheme that embeds invisible watermarks into the spatial domain. The copyright information is embedded by modifying the intensity of a block of image pixels. The modified intensity can be dynamically selected according to the characteristics of the block. The feature is useful in making the marked image perceptually tuned. For example, for a smooth block, we can adjust the intensity of modification to be low or even zero. T h s feature can also be exploited to design a watermarking scheme with extremely high robustness (Nikolaidis and Pitas 1998).
So far, many techniques have been proposed to detect perceptually significant blocks (Tabatabai and Mitchell 1984, Yang and Tsai 1996), but they cannot build an empirical model to automatically determine the characteristics from the perceptually significant blocks. Fortunately, some techniques, e.g. the human visual system, can be exploited to make the marked image perceptually better tuned (Delaigle et al. 1998). Besides these techniques, support vector machines can also be exploited. The Support Vector Machines (SVM) are useful classification tools based on statistical learning theory (Vapnik 1998). Because of the good generalization ability of SVM, now, SVM have been widely and successfully applied to a number of fields such as handwritten digital recognition, face detection, particle identification, and text categorization (Raudys 2000). In this work, we shall try to set up an empirical model by training an SVM to classify blocks of pixels from an image and determine the ranks of the blocks in accordance with the perceptual significance blocks. The objective of our proposed scheme is to make the watermarked image visually indistinguishable from the original and to enhance the security. The proposed mechanism can be easily carried out in web associated applications for the protection of ownership. Moreover,
A Perceptually Tuned Watermarking Scheme for Digital Images 433
our scheme can fully satisfy the requirements of unobtrusiveness, readily extraction, inclusion of no original image, and robustness. The rest of this chapter is organized as follows. In Section 2, we shall briefly review some existing watermarking schemes. Section 3 provides an overview of support vector machines with different goals. Section 4 provides a detailed description of our new watermarlung scheme with SVM. Then, our experimental results and extensive discussions will be given in Sections 5 and 6. Finally, we shall summarize the benefits that our scheme provides in Section 7.
2
Related Works
In this section, we shall briefly review some watermarking schemes in the spatial domain and frequency domain. First, we shall discuss the type of watermark embedding scheme that directly modifies the subsets of image pixels. A simplest method is to replace the least significant bits (LSB) with the watermark bits (Schyndel et al. 1994). However, this method is vulnerable to the harm of some image processing operations. So far, many advanced researches have been done to develop watermarking schemes on this domain (Kutter et al. 1998, Lin 2000, Nikolaidis and Pitas 1998, Voyatzis and Pitas 1998). In order to enhance the security, these schemes usually insert or modify some pixel values using a key. In Voyatzis’ and Pitas’ scheme (Voyatzis and Pitas 1998), a binary copyright image is transformed into a noise-like image by using toral automorphism, which is then superimposed onto the original image. Due to the fact that the embedded locations are randomly decided by a strongly chaotic mixing system, the security of this scheme can be enhanced. To extract the watermark, a statistical detection rule is applied without referring the original image. Unfortunately, when the noise-like image is superimposed onto the original image, the
434 C.-C. Chang and I.-C. Lin
watermarked image sometimes becomes obtrusive. Besides, Kutter et al. have also proposed a watermarking scheme (Kutter et al. 1998) that allows a watermark to be embedded into a color image. Since the blue channel is relatively less sensitive in the color domain, the watermark data are embedded by modifying multiple bits in the blue channel of a color image. The pixel values are manipulated in proportion to the luminance. The authors claimed that the scheme is robust against translation, slight blurring, and JPEG attacks. Recently, Lin has proposed a block-oriented, modular-arithmeticbased watermarking scheme (Lin 2000, Lin 2001). It allows a watermark, which is a binary image and contains a registered company logo and a unique licence number, to be embedded into a gray level image. The embedding algorithm directly modifies the pixel values of some textured blocks of the image. The modifications depend on a secret parameter. Furthermore, the embedding locations are controlled by a secret key. Lin claimed that his scheme was robust against image processing operations such as lossy compression, noise, and filtering. Furthermore, Lin’s scheme is much superior to Voyatazis and Patis’s (Voyatzis and Pitas 1998) and Cox et al.’s (Cox et al. 1997) with respect to robustness and security. However, Chan and Chen have shown that Lin’s scheme is insecure (Chan and Cheng 2002). Due to the fact that the embedding and extracting processes leak out some information, the attacker can easily obtain an inverted watermark by modifying the pixel intensity in the textured block. Therefore, the embedded watermark can be completely destroyed without any knowledge of the secret parameters. On the other hand, many methods are designed for embedding the watermark into the frequency domain (Barni et al. 1998, Hwang et al. 2000, Inoue et al. 1999). Such methods transform original data into the frequency domain using Fourier, discrete cosine, or wavelet transform.
A Perceptually Tuned Watermarking Scheme for Digital Images 435
Hwang et al. (Hwang et al. 2000) used a back-propagation neural network (BPN) as a tool to design a watermarking scheme. BPN is a supervised learning neural network, which is one of the most popular models in neural networks. The work employs BPN to embed the watermark into the frequency domain, which can simultaneously improve security and robustness of the watermarked image. Neural Networks are designed to minimize the empirical risk, i.e. to minimize the error as to the training patterns. The techniques have been widely used in many image processing applications such as coding, pattern recognition and texture segmentation, and they have already obtained desirable results (Srinivasan 1994). However, the main drawbacks of neural network techniques nowadays are (1) the training of a neural network is time-consuming and (2) the training process is subtle. Recently, a new technique, called Support Vector Machines (SVM), provides a good generalization ability. Different from neural networks, SVM is to minimize the structural risk, i.e. to minimize the generalization error as to the unknown test data. Furthermore, it provides a simpler, faster, and more efficient learning algorithm for estimation. Nowadays, SVM has been used successfully in many classification applications. Therefore, in this chapter, we will exploit the advantages of SVM to build a watermarking scheme. The embedding algorithm will directly operate in the spatial domain. Our scheme not only can produce a high quality watermarked image but also can enhance the security of Lin’s scheme (Lin 2000, Lin 2001).
3
Support Vector Machines
The concept of Support Vector Machines (SVM) is the most recent idea for classification. The learning algorithm of SVM is invented by Cortes and Vapnik (Cortes and Vapnik 1995) and based on Statistical Learning Theory. It maps the input vector into a high dimen-
436
C.-C. Chang and I.-C. Lin
sional feature space, and then an optimal separating hyperplane is constructed through some decision functions. This can be formulated into an optimization problem. Originally, SVM is designed for binary classification. Nowadays, many works are focused on how to effectively extend it to multi-class classification. In the following subsections, we will introduce the evolution of SVM in three different steps (Gutschoven and Verlinde 2000, Hsu and Lin 2002, Shevade et al. 2000).
3.1
Linear SVM for Two-Class Separable Data
SVM (Cortes and Vapnik 1995) was originally designed for classifying linearly separable data by using a linear learning algorithm. The learning algorithm is performed to find an optimal separating hyperplane which is determined by certain points, called support vectors, in the training set. Figure 3 illustrates the concept of the optimal separating hyperplane. The solid line is the optimal separating hyperplane. It lies halfway of the maximum margin between the two classes of data. The maximum margin is the sum of distances from the hyperplane to the closest training point. For example, the maximum margin in Figure 3 is dl d2.
+
Therefore, given a set of training vectors xi E R", i = 1 , 2 , . . . , k belonging to different classes yi E { 1,-1). We wish to separate this training set using a linear decision function. The optimization problem solved by SVM can be formulated as
with respect to
A Perceptually Tuned Watermarking Scheme for Digital Images 437
0 Class2
Class 1 Figure 1. SVM optimal separating hyperplane
Here, q5(zi) is a transformation function that is used to map xi into a high dimensional feature space. The constant C > 0 is the upper bound that is determined by the tradeoff between the smoothness of the decision function and the generalization error. The maximum margin between the two classes of data can be found by minimizing 1 Tw. Furthermore, if the training data are not linearly the term of zw separable, the term of C errors.
k& can reduce the number of training
According to Wolfe duality theory, the optimization problem is equivalent to solving the following dual problem min ffi
' -
2
T
T
ai&ai - e ai,
(20
438
C.-C. Chang and I.-C. Lin
with respect to 0 5 ai 5 C, yTai = 0.
i = 1 , 2 , . . . , k , and
Here, Q is a k by k positive matrix, Qij G y i y j K ( x i , x j ) where , K ( x i ,z j ) = q5(xi)'q5(zj) is the kernel. Finally, the optimal separating function is
3.2
Linear SVM for Multi-Class Separable Data
The concept of linear SVM for two-class separable data can be easily extended to multi-class separable data. A simple method is to construct g SVM models, where g is the number of classes. Similar to solving the optimal problem in two-class SVM, the training set for the ith SVM can be divided into two subsets: one includes all the points in ith class, and the other covers all other points in the training set. However, this method is inefficient (Hsu and Lin 2002).
In (Krebel 1999), an efficient method for the multi-class case using linear SVM is proposed. This method constructs g(g - 1)/2 classifiers, and each classifier trains data from two different classes, e.g. the ith and j t h classes. The classification problem can be formulated as
(4) with respect to
+ 2 1- ( y , ((wij)'4(xt) + bzj) 5 -1 + ( y , ( ( ~ Z j ) ~ q 5 ( bzj) ~ )
(;j 2 0.
if zt is in the ith class, if xt is in the j t h class, and
A Perceptually Tuned Watermarking Scheme for Digital Images 439
If the decision function is f(x) = sign(wij)T4(x)+bij = 1,it means x is in the ith class; otherwise, z is in the j t h class. A voting strategy is used to decide the classification. According to the voting result, we predict x is in the class with the biggest number of votes.
3.3
Non-Linear SVM
In the above subsections, the kernels used in the decision functions are linear. However, some non-linear kernels can be used to deal with non-linear, multi-class training sets. In this case, the training data will be mapped into a high dimensional space through a non-linear transformation. The non-linear transformation is performed by calling some non-linear kernel functions such as a polynomial of de1)” and a radial basis function gree p kernel K ( z i ,z J ) = (z,zj K (xi,xj) = e112z--231 1 2 / u 2 . After training, an optimal separating hyperplane can be constructed in the feature space. Figure 4 illustrates the concept of non-liner SVM (Gutschoven and Verlinde 2000). The capability of singling out the outliers is outstanding.
+
4
The Watermarking Scheme Using Support Vector Machines
Because of the good generalization ability of support vector machines, we decide to try to establish an SVM model that can classify image blocks with some regulations. In the proposed scheme, the watermark is a binary image with one bit per pixel. Let W be the watermark image of size u x b. It can be represented by a twodimensional (2D) array as follows:
LW(a - 1 , O ) W ( u - 1,l) . . . W ( u - 1,b - 1)J
440 C.-C. Chang and I.-C. Lin
Optimal separating perplane in feature space
High dimensional feature space
transformation
Figure 2. The concept of the non-linear SVM
where W ( i , j )E (0,l}, 0 5 i < a, and 0 5 j < b. The watermark must be a meaningful proof for someone who owns the copyright of the original image. The watermark will be hidden in the original image. Let 0 be an original image with 8 bits per pixel, it can also be represented by a 2D array as
O(0,O)
O= O O :[I):
O(0,l) O(1,l) O(hil,l)
... ...
:::
O(0,w - 1) O(1,w- 1) O(h-1,w-1)
where h and w are the original image’s height and width, respectively. The range of O ( i , j )is from 0 to 255, where 0 5 i < h and O<j<W.
A Perceptually Tuned Watermarking Scheme for Digital Images 441
The procedure of the proposed scheme includes three phases:( 1) the location decision phase, (2) the watermark embedding phase, and (3) the watermark extracting phase. The location decision phase is responsible for deciding where the watermark pixels will be hidden. Then, in the watermark embedding phase, we will exploit SVM to embed the watermark pixels into these locations. The embedding algorithm is block-oriented and modular-arithmetic-based (Hwang et al. 2000, Lin 2001). In the watermark extracting phase, we can easily extract the watermark from the marked image without referring to the original image. The details of these phases will be in the following subsections.
4.1
Location Decision Phase
For security purpose, the watermark should not be directly embedded into the original image. We need a secure location decision algorithm to decide the locations where the watermark pixels will be hidden. So far, many algorithms (Hwang et al. 2000, Hwang et al. 1999, Lin 2000) have been proposed to decide the locations. In this work, we will design a location decision algorithm more suitable for our proposed scheme. First, we divide the original image 0 into many 3 x 3 nonoverlapping blocks. Suppose that B is the block set. It can be defined as follows.
+
0(2x 3 , j x 3) 0(2x 3 , j x 3 1) O ( i x 3 + l , j x 3) O ( i x 3 + l , j x 3 + 1 ) O(i x 3 + 2 , j x 3 ) O ( i x 3 , j x 3 + 1)
+
O(2 x 3 , j x 3 2) O ( i X 3 + l , j X 34-21 0(2x 3 , j x 3 2)
+
I
The block bi,j is selected to embed the watermark in by performing the location decision procedure.
442
C.-C.Chang and I.-C. Lin
1. Inspired by Rabin’s public key cryptosystem (Rabin 1978), we make the image owner choose two large primes s and t, and let n = s x t. Next, two secret keys kl and k2 are chosen to decide the coordinate (z,,y, ) for block bzrrYr.Totaly, there are a x b locations to be determined to embed the watermark of size a x b.
2. The initial location (zo,yo) can be computed as follows:
Xo
= k;modn,
y0
=
k;modn,
zo
=
Xo mod -, and
yo =
W
3 h Yomod -. 3
(7) (8) (9) (10)
3. The other a x b - 1 locations are computed as follows:
X,
=
X:-, modn,
Y,
=
y?,modn,
z,
=
X,mod -, and
y,
where r
=
(1 1) (12)
W
3 h = Y, mod -. 3
(13) (14)
1 , 2 , . . . , a x b.
Besides, we must avoid using the same locations in this location decision procedure.
4.2
Watermark Embedding Phase
After finding the a x b location blocks, we can embed the watermark into these blocks. The watermark embedding procedure is as follows.
A Perceptually Tuned Watermarking Scheme for Digital Images 443
Marked Image 0
Onginal Image 0
Watermark W
Figure 3. Watermark embedding procedure
1. In order to embed the watermark into these blocks, the stored 2D data must be converted into a binary bit string, called mark sequences, which can be represented as
where W, = w ( i ,j ) , k = j
+ (i x b), 0 5 i < a, and 0 5 j
< b.
2. According to the features of the blocks, we can build some regulations to prepare a training set for SVM. Suppose the training set, {di,pi}, contains m data points. Here, di is the ith input pattern which is the 4 leftmost most significant bits (MSB) for each pixel in a texture block. Since the block size is 3 x 3, the input pattern di is a 9-dimensional data. On the other hand, pi is the corresponding target value which is assigned by the image owner. In order to produce an unobtrusive watermarked image, the output value pi has to be selected carefully by means of the Human Visual System (Delaigle et al. 1998). The target value pi is very important because it decides the range of each pixel in the block to be modified. The training algorithm of SVM has been described in Section 3. In this case, the linear SVM or non-linear SVM for multi-class data can be used in our scheme. After training, we can construct an SVM to extract the feature of the test block. The SVM and classification regulations must be kept secret by the image owner.
444 C.-C.Chang and I.-C. Lin
3. For each selected block b2r,Yr,We can determine a flexible output value p , by performing the trained SVM according to the input pattern in the corresponding block, where x, and y, is the coordinate of the block. The mean intensity p, of the nine pixels in the block bzr,yr can be computed by
Further, we can compute yT = p, mod 2pT. The average variation C, to be made in bIrlYr can then be computed by the rules
"
,);:I
E 2 - 7r1
if W, = O and O s y , <
h 2 -yT1
if W, = 0 and 5 y, < 2pT I if W, = 1 and 0 5 y, < 2, if W, = 1 and 9 5 yr < 2p,.
9
=
The mark sequence W, is embedded by modifying each pixel in block bxr,Yr.Finally, the marked block bkrrYr becomes
+ + Cr O ( 3 ~+ r 2,3yr) + Cr o ( 3 ~ r 193yr)
+ + 1) + Cr O(32r + 2,3yr + 1)+ Cr O ( 3 ~ r 1,3yr
+ + 2) + Cr O(32r + 2,3yr + 2) + Cr O ( 3 ~ r 1,3yr
J
After embedding all mark sequences, we can get a marked image 0'. Figure 5 shows the watermark embedding procedure.
4.3
Watermark Extracting Phase
The watermark extracting procedure is similar to the watermark embedding procedure. When the secret keys ( k l ,kz) and the trained SVM are correctly provided by the image owner, the marked blocks
.
A Perceptually Tuned Watermarking Scheme for Digital Images 445
can be easily found by using the location decision procedure described earlier in Subsection 4.1. Then, make the 4 leftmost MSB for each pixel in the designated block as the input pattern. The output value p: can be easily obtained through S V M computing.
Watermark W
Figure 4. Watermark extracting procedure
Furthermore, the mean intensity pi of the nine pixels in the marked block bkr,,r can be computed as
and thus 7; = pi mod 2pT. Finally, the mark sequence can be extracted by following the watermark extracting rules
After extracting all mark sequences, the watermark can be recovered. Figure 6 illustrates the watermark extracting procedure.
5
Experimental Results
In the investigation phase, we design a training data set for SVM according to HVS. We apply the Library of Support Vector Machines (LIBSVM) as the tool to simulate the required SVM in our proposed
446
C.-C. Chang and I.-C. Lin
scheme. LIBSVM is a share softwared which can be downloaded from (Chang and Lin 2002). It is a simple and useful tool that provides many models, such as linear or polynomial kernels, to let users obtain best generalizations. In the training process, model selection is very important for different training data size and multi-class data. LIBSVM provides a very useful interface to select models. In this experiment, we prepare 50 patterns in the training set, where each pattern contains 9 input values and 1 target value. The training set is then input to LIBSVM to construct the required SVM. Because the training data is multi-class, we use the same training set to test the learning ability and performance of linear and non-linear SVM. The selected kernels in our test are linear kernel, polynomial kernel of degree 3, and radial basis function that are described in Subsections 3.2 and 3.3, respectively. After constructing the three SVM types, we use the same training data to test the three. The SVM has good generalization ability, and the classification results come out as we have expected. The test environment of the experiment is a PC with a Pentium 111450 Mhz processor, 128 MB main memory and Borland C++ Builder compiler. The comparison among these three models is shown in Table 1. The good performance can be obtained by using polynomial kernel and radial basis function. No matter whether it is in terms of accuracy, mean squared error, or training time, the nonlinear SVM outperforms linear SVM in our experiment. Table 1. Comparison among the three SVM models
I Linear Models 48% Accuracy 2.88 MSE Training time 312.7 sec
1
I
I Polvnomial I Radial Basis Function 1
I I
100% 0
3.1 sec
100%
1
I
0 2.3 sec
In order to avoid the watermark from being destroyed, the quality of the watermarked image should not be degraded significantly. We employ the Peak Signal of Noise Ratio (PSNR) to evaluate the distortion between the pre-processing image and the post-processing
A Perceptually Tuned Watermarking Scheme for Digital Images 447
image. The PSNR formula is defined as follows: PSNR = lOlog,,
(2”
- 1)2
MSE ’ where n represents the number of bits per pixel, and MSE (MeanSquare-Error) is defined as
The notations u and 21 stand for the image’s height and width, respectively. The notations xij and xLj represent the preprocessing image pixel value in position (i, j ) and the post-processing image pixel value in position (i,j ) , respectively. Theoretically, if the distortion between the preprocessing image and the postprocessing image is small, the value of PSNR comes out large. Therefore, a larger PSNR value means there is little difference between the original image and the processed image. Usually, if the PSNR value is greater than or equal to 30 db, the distortion between the original image and the processed image is not suspicious to the human eye. On the other hand, in order to estimate the correctness of the retrieved watermark, the Bit Correct Ratio (BCR) is used:
BCR =
i=o j = o
x 100%. (21) axb Here W ( i ,j ) stands for the original watermark element, W’(z,j) stands for the element from the retrieved watermark, and @ denotes the exclusive-OR operator.
To test the robustness of the proposed scheme, it must go through various attacks. Simulations include blurring, sharpening, lossy compression, and scale processing attacks. Note that all the altering algorithms were performed by using “Photoshop”, a creation of the
448 C.-C. Chang and I.-C. Lin
Adobe Company. Figure 7(a) is the original image of “Lena” (8 bitdpixel, 512 x 512), and Figure 7(b) is the logo of “National Chung Cheng University” (1 bit/pixel, 64 x 64). We perform our proposed scheme to embed Figure 7(b) into Figure 7(a), which produces the watermarked image in Figure 7(c) (PSNR40.51 dB). Then, the retrieved watermark from Figure 7(c) is shown in Figure 7(d) (BCR=99.63%) .
(a)
(c) PSNR=40.51 dB
(b)
(d) BCR=99.63%
Figure 5. (a) Original image of “Lena”, (b) logo of “National Chung Cheng University”, (c) watermarked image of “Lena”, (d)extracted watermark from the watermarked image
First, we use a blurring algorithm and a sharpening algorithm (Gonzalez and Woods 1992) to alter the watermarked image. The altered images are shown in Figure 8(a) and Figure 8(c), respectively. The corresponding PSNR values of the processed images are 34.13 dB and 32.83 dB, respectively. The extracted watermark from Fig-
A Perceptually Tuned Watermarking Scheme for Digital Images 449
ures 8(a) and 8(c) are shown in Figures 8(b) and 8(d) with the associated BCR values being 93.92% and 94.87%, respectively.
(a) PSNR=34.13 dB
(c) PSNR=32.83 dB
(b) BCR=93.92%
(d) BCR=94.87%
Figure 6. (a) Blurred image with watermark, (b) extracted watermark from Figure 8(a), (c) sharpened image with watermark, (d)extracted watermark from Figure 8(c)
Next, a lossy compression algorithm (JPEG) is applied to the watermarked image in Figure 7(c). The compression ratio is 12:1. Figure 9(a) is the reconstructed image from the JPEG compressed image. The PSNR value of the reconstructed image is 32.47 dB. Figure 9(b) is the extracted watermark from Figure 9(a), and the associated BCR value is 86.72%. Figure 9(c) shows the shrunken watermarked image from 512 x 512 to 300 x 300. Before the watermark is extracted from the shrunken image, the image has to be recovered to 512 x 512 by using the nearest neighbor algorithm. The PSNR value of the recovered image is 29.04 dB. The extracted watermark is shown in Figure 9(d). The BCR value is 82.08%.
450
(7.-C. Chang and I.-C. Lin
(a) PSNR=32.47 dB
(c) PSNR=29.04 dB
(b) BCR=86.72%
(d) BCR=82.08%
Figure 7. (a) Reconstructed image from P E G compressed image, (b) extracted watermark from Figure 9(a), (c) resized image from the shrunken image, (d)extracted watermark from Figure 9(c)
Finally, the extracted watermarks from different host images are shown in Table 2. The experimental results demonstrate that the extracted watermarks are recognizable after various attacks. Therefore, a legal user can extract the embedded watermark from an altered image to prove the copyright of the image.
6
Discussions
The proposed location decision procedure is based upon Rabin’s public key cryptosystem (Rabin 1978). The security of Rabin’s cryptosystem depends on the problem of the difficulty to find square roots modulo a composite number. The problem is equivalent to factoring
A Perceptually Tuned. Watermarking Scheme for Digital Images
451
Table 2. Extracted watermarks from different host images after various attacks Host image
PSNR fdB)
Barbara
Extracted watermark
BCR (%)
Embedded 41.49
1 1
Blurring
Sharpening
JPEG
Scaling
26.04
25.80
28.10
23.05
88.04
88.87
84.72
77.12
33.12
32.59
29.56
95.80
88.50
83.91
El 99.7 1
~~~
PSNR (dB)
Peppers
Extracted watermark
BCR (%)
99.61
I
95.78
a large number. Therefore, even the initial location (zo, go) leaks out, it is still infeasible to derive the secret keys Icl and Icz. SVM is superior to neural networks in several aspects. First, since the input and output values in BPN are real numbers ranging from 0 to 1, we need a linear translation function to translate all the input data so that they fall in the range 0 to 1. This step is also called normalization. However, the input values in SVM are real numbers, and they come without the normalization step. Next, the architecture of BPN is fully connected. Thus, the required memory space is larger than that required by SVM. For example, if the numbers of units in the input layer, hidden layer, and output layer are Ul, UH, and Uo, respectively, the BPN network requires Ul x UH x UO weights. Suppose each weight requires 2 bytes, and that means the memory requirement equals 2 x Ul x UH x UO bytes in total. Therefore, if the number of processing units increases, the required memory space will grow rapidly. Finally, the required training time by SVM is obvious less than BPN. No matter whether it is linear SVM or nonlinear SVM, it only needs a few minutes for training. The BPN, on
452
C.-C. Chang and I.-C. Lin
the other hand, usually requires several hours to train a network.
As the experimental results show, the quality of our watermarked image is high. The extracted watermarks from these altered images are recognizable. In our scheme, the modification range for each pixel in a designated block depends on the parameter p. A larger p will result in greater robustness for the watermarked image. However, the distortion will be increased, too. The parameter p can be determined in accordance with the user’s requirement and then kept secret. However, if the parameter p is fixed on using the whole image, Chan and Cheng (Chan and Cheng 2002) has shown that is insecure because the inverted watermark can be obtained by modifying the pixel intensity, and then the watermark can be completely destroyed. Therefore, we exploit SVM to select the parameter p dynamically. In order to produce a perceptually tuned watermarked image, the selection rules can be designed by using the human visual system (Delaigle et al. 1998). Since the parameter p is flexible, our scheme can effectively withstand Chan and Cheng’s attack. Furthermore, according to the watermark embedding rules in Equation 17, the modification range is from to p . Thus, the parameter p can be selected from 1 to 16 in our scheme. Therefore, if the modifications in Equation 17 change the 4 lefhost most significant bits in certain pixels, the extracted mark sequence may be wrong. This will make SVM output incorrect values. One simple solution to the problem is to modify only the 4 rightmost least significant bits. That way, errors can be reduced but not all. This is also the reason why the bit correct ratio (BCR) of the extracted watermark in Figure 7(d) cannot achieve 100%. Besides, the extension to embed multi-watermarks can also be realized. If many duplicated watermarks are to be embedded in an original image, a voting strategy can be employed in the watermark extracting stage. This method can enhance the robustness of the watermarked image, but the image quality will decrease. In addition,
A Perceptually Tuned Watermarking Scheme for Digital Images 453
our scheme also can be extended to color image watermarking. Only slight modifications are required in our scheme to process color images because it is similar to the processing of gray-level images.
7
Conclusions
In this chapter, we have proposed a perceptually tuned watermarking scheme to enhance the security and robustness. The mark sequences are embedded into an original image by modifying the intensity of each pixel in the texture blocks. The range of modification can be dynamically selected by a well trained SVM according to the block feature. Due to the different modification values, the proposed scheme can withstand Chan’s and Cheng’s attack. According to the experimental results, the quality of the watermarked image is high in our scheme. The original image is not needed in the watermark extracting procedure. In addition, our scheme is robust against blurring, sharpening, lossy compression, and scaling attacks. Consequently, our scheme can fully satisfy such watermarking requirements as unobtrusiveness, readily extraction, inclusion of no original image, and robustness. Although the copyright protection scheme proposed in this chapter does not survive all attacks, this chapter points out an important direction of using support vector machines to enhance the security and robustness.
References Barni, M., Bartolini, F., Cappellini, V., and Piva, A. (1 998), “A DCTdomain system for robust image watermarking,” Signal Processing, vol. 66, pp. 357-372. Bas, P., Chassery, J.M., and Macq, B. (2002), “Image watermarking: An evolution to content based approaches,” Pattern Recognition, V O ~ .35, pp. 545-561.
454 C.-C. Chang and I.-C. Lin
Chan, C.K. and Cheng, L.M. (2002), “Security of Link image watermarking system,” The Journal of Systems and Software, vol. 62, pp. 21 1-215. Chang, C. C. and Lin, C. J. (2002), “Libsvm: a library for support vector machines,” Available at http://www.csie.ntu.edu.tw/”cjlin/libsvm. Cortes, C. and Vapnik, V. (1995), “Support-vector network,” Machine Learning, vol. 20, pp. 273-297.
Cox, I.J., Kilian, J., Leighton, F.T., and Shamoon, T. (1997), “Hidden digital watermarks in images,” IEEE Transactions on Image Processing, vol. 6, pp. 1673-1687. Craver, S., Memon, N., Yeo, B.L., and Yeung, M. (1997), “Can invisible watermarks resolve rightful ownership?” Proceedings of SPIE, Academic Press, pp. 3 10-321. Delaigle, J.F., De Vleeschouwer, C., and Macq, B. (1998), “Watermarking algorithm based on a human visual model,” Signal Processing, vol. 66, pp. 319-336. Gonzalez, R.C. and Woods, R.E. (1992), Digital image processing, Addison -Wesley. Gutschoven, B. and Verlinde, P. (2000), “Multi-modal identity verification using support vector machines (svm),” Proceedings of the IEEE Third International Conference on Information Fusion, V O ~ .2, pp. 3-8. Hsu, C.T. and Wu, J.L. (1999), “Hidden digital watermarks in images,”IEEE Transactions on Image Processing, vol. 8, pp. 58-68. Hsu, C.W. and Lin, C.J. (2002), “A comparison of methods for multiclass support vector machines,” IEEE Transactions on Neural Networks, vol. 13, pp. 415-425.
A Perceptually Tuned Watermarking Scheme for Digital Images
455
Hwang, M. S., Chang, C.C., and Hwang, K.F. (1999), “A watermarking technique based on one-way hash functions,” IEEE Transactions on Consumer Electronics, vol. 45, pp. 286-294. Hwang, M.S., Chang, C.C., and Hwang, K.F. (2000), “Digital watermarking of images using neural networks,” Journal of Electronic Imaging, vol. 9, pp. 548-555. Inoue, H., Miyazaki, A., Yamsmoto, A., and Katsura, T. (1999), “A digital watermark technique based on the wavelet transform and its robustness on image compression and transformation,” IEICE Transactions on Fundamentals of Electronics, Communications, and Computer Sciences, vol. E82-A, pp. 2-10. Krebel, U. (1 999), “Painvise classification and support vector machines,” Advances in Kernel Methods-Support Yector Learning, pp. 255-268. Kutter, M., Jordan, F., and Bossen, F. (1998), “Digital watermarlung for color images using amplitude modulation,” Journal of Electronic Imaging, vol. 7 , pp. 326-332. Lin, P.L. (2000), “Robust transparent image watermarking system with spatial mechanisms,” The Journal of Systems and Software, V O ~ .50, pp. 107-116. Lin, P.L. (200 l), “Digital watermarking models for resolving rightful ownership and authentication legitimate customer,” The Journal of Systems and Software, vol. 55, pp. 26 1-271. Lu, C.S., Huang, S.K., Sze, C.J., and Liao, H.Y.M. (2000), “Cocktail watermarking in images,” IEEE Transactions on Multimedia, V O ~ 2, . pp. 209-224. Nikolaidis, N. and Pitas, I. (1998), “Robust image watermarking in spatial domain,” Signal Processing, vol. 66, pp. 385-403.
456 C.-C. Chang and I.-C. Lin
Rabin, M.O. (1978), “Digital signatures,” Foundations of Secure Communication, pp. 15 5- 168. Raudys, S. (2000), “How good are support vector machines?” Neural Networks, vol. 13, pp. 17-19. 0 Ruanaidh, J.J.K. and Pun, T. (1998), “Rotation, scale and translation invariant spread spectrum digital image watermarking,” Signal Processing, vol. 66, pp. 303-3 17. Schyndel, V.R.G., Tirkel, A.Z., and Osborne, C.F. (1994), “A digital watermark,” Proceedings of International Conference on Image Processing, pp. 86-90. Shevade, S.K., Keerthi, S.S., Bhattaacharyya, C., and Murthy, K.R.K. (2000), “Improvements to the smo algorithm for svm regression,” IEEE Transactions on Neural Networks, vol. 11, pp. 1188-1193. Srinivasan, V. (1994), “Edge detection using a neural network,” Puttern Recognition, vol. 27, pp. 1653-1662. Tabatabai, A.J. and Mitchell, O.R. (1984), “Edge location to subpixel values in digital imagery,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, pp. 188-201. VapniK, V.N. (1998), Statistical Learning Theory, A WileyInterscience Publication. Voyatzis, G. and Pitas, I. (1998), “Digital image watermarkmg using mixing systems,” Comput, & Graphics, vol. 22, pp. 405-416. Yang, C.K. and Tsai, W.H. (1996), “Reduction of color space dimensionality by moment-preserving thresholding and its application for edge detection in color image,” Pattern Recognition Letters, V O ~ .17, pp. 48 1-490.
A Perceptually Tuned Watermarking Scheme for Digital Images 457
Zhao, J. and Koch, E. (1998), “A generic digital watermarking model,” Comput. & Graphics, vol. 22, pp. 397-403.
This page intentionally left blank
Chapter 16 Recent Development of Visual Cryptography Kuo-Feng Hwang and Chin-Chen Chang Visual cryptography is a kind of secret sharing scheme that uses the human visual system to perform the decryption computations (stacking pre-shared transparencies). A secret sharing scheme allows confidential messages to be encrypted in t-out-of-n shared secret schemes. Whenever the number of participants from the group (n members) is larger than or equal to the predetermined threshold value t , the confidential message can be obtained by these participants. Visual cryptography has received wide interest in cryptography research. In this chapter, we shall present some important research results of visual cryptography. Also, the applications of visual cryptography will be introduced here.
1
Introduction
In 1979, Blakely (Blakley 1979) and Shamir (Shamir 1979) developed the concept of secret sharing independently. A t-out-of-n secret sharing scheme allows t or more users of the group (n members) to share a confidential message. Whenever the number of participants from the group is larger than or equal to the predetermined threshold value t , the confidential message can be obtained by these participants. On the other hand, when the number of participants is less than t , they cannot get any useful information about the confidential message. In practice, the above mentioned confidential message 459
460
K.-F.Hwang and C.-C. Chang
is used as a secret key or called a session key, which is used to encrypt and decrypt larger confidential messages using a symmetric cryptosystem. Next, we briefly review Shamir’s ( t ,n) secret sharing scheme. First of all, the system randomly creates a polynomial f(z)with degree t - 1 as follows:
f(z)= At-lzt-l
+ At-2zt-’ + + A l z + k mod P,
(1)
where P is a large prime number (IPI 2 512 bits), and k is the session key. In Equation (l), the coefficients Ai’s, for i = 1 , 2 , , t - 1, are selected arbitrarily. The values of (ui,f(ui))’s, for i=l to n, are shadows, which have to be individually distributed to the participants ui’s by a secure channel. The term “shadow” is used to refer to the distributed representations of the session key. When t or more participants want to obtain the session key k together, each participant submits his shadow (ui, f(ui)) to the central authority (CA). CA is responsible for regenerating the session key k . The secret polynomial f(z) can be thus reconstructed by applying Lagrange interpolation formula with t or more shadows. In 2001, Hwang et al. (Hwang et al. 2001) pointed out that there were two disadvantages of the conventional shared secret scheme. The first disadvantage is that the system must carefully distribute the shadows to the participants. And the other is that the scheme requires another cryptosystem to encrypt/decrypt the confidential messages. In the visual cryptography (will be introduced in the next paragraph), only the first disadvantage remains. Hwang et al. further proposed an efficient threshold scheme which has three advantages. The first advantage is that both the shadow distribution and generation are not required anymore. Secondly, their scheme can efficiently encrypt large secret files or messages. Finally, the third advantage is that other cryptosystems for enciphering/deciphering the confidential messages are not needed.
Recent Development of Visual Cryptography 461
Naor and Shamir (Naor and Shamir 1995) invented a new type of secret sharing scheme, called visual secret sharing (VSS). It has the capability of decoding the secret (represented by a binary image) without any computation. In a (t,n) VSS scheme, to decode the secret, t or more shares (represented by transparencies) are required. The secret will be revealed by stacking these transparencies together, if these transparencies are the correct shares. Since the visual cryptography was invented, it is definitely an interesting topic in cryptography research. To realize what the visual cryptography is, the reader can refer to Figure 1, which illustrates a (2’2) VSS scheme. This chapter is organized as follows. The original two-out-of-two visual secret sharing scheme will be described in Section 2. Furthermore, some improved visual secret sharing schemes will also be introduced in Section 2. By contrast to the traditional visual secret sharing schemes, there are many colored visual secret schemes that have been developed. We will review two colored visual secret sharing schemes in Section 3. In Section 4,we will introduce some applications of visual cryptography. Finally, Section 5 concludes this chapter.
2
Visual Secret Sharing Schemes
In this section, we will introduce the main idea behind visual cryptography, which was proposed by Naor and Shamir (Naor and Shamir 1995). In addition, we will briefly review some extended methods that are based on Naor and Shamir’s scheme.
2.1
Two-Out-Of-Two VSS
Noar and Shamir pointed out an interesting question: Is it possible to create a secret sharing scheme that can decode the secret image visually by stacking a subset of the shares? Here each share is a transparency made up by black and white pixels. The answer to this
462 K.-F. Hwang and C.-C. Chang
question is positive and the invented scheme is called visual secret sharing scheme. In a (t,n) VSS scheme, there are n shares, and whenever stacking t or more shares, the secret image will be magically revealed. On the other hand, no useful information will be obtained if the number of stacked transparencies is less than t. The ways of reconstructing secrets by a traditional secret sharing scheme and a VSS scheme are quite different. Normally a VSS scheme does not require any computation, while a traditional secret sharing scheme requires many computations in a finite field. It might seem impossible to develop a VSS scheme as we mentioned above. Let us consider the following simple example. Suppose that a particular pixel is black, no matter what color the other share is, the result will be always black. In other words, some useful information can be obtained from the shares. And this is not allowed in a secret sharing scheme. Fortunately, Naor and Shamir designed an elegant way to prevent the above mentioned problem. The way is to expand each pixel ofshares. This is the main idea of Naor and Shamir’s scheme. For a (2,2) VSS scheme, each pixel is expanded into 1 x 2 subpixels as shown in Table 1. Two shares are constructed pixel by pixel. For a pixel s with white color in the secret image, the shares are randomly chosen one row from the first two rows of Table 1. Similarly, if the pixel s is black, then the shares are randomly chosen one row from the last two rows of Table 1. Let’s analyze the security of the above mentioned method. For any subpixel in Share 1, it is obvious that there are two possibilities to recover the secret pixel s by stacking Share 2’s subpixel, and vice versa. Consequently, we cannot obtain any information from only one share.
To obtain shares with the same height-to-width proportions as the
Recent Development of Visual Cryptography 463
Table 1. Pixel expansion in a (2,2) VSS scheme
I
secret pixel
I Share 1
d m m m m
(white)
(black)
Share 2
m m m m
I
stacked
m m m m
secret image, Naor and Shamir suggested to expand each pixel into 2 x 2 subpixels for a (2,2) VSS scheme. Table 2 illustrates the possible combinations of applying 2 x 2 subpixels. Figure 1 illustrates the secret message “DMD NTIT” (Figure l(a)), which is encrypted into two images as shown in Figures l(b) and l(c). Figure l(d) is the reconstructed message using the scheme described in Table 2. From the result of the reconstructed message, we observe that Figure l(d) has lost 50% of its contrast. If the secret image is very complicated, it may be difficult to recognize the content of the reconstructed image. Furthermore, there are two disadvantages in the scheme. The first disadvantage is the shares are meaningless images. If a user holds many shares at the same time, it will be hard to find the right share for encrypting the corresponding secret. The other disadvantage is a realistic problem, Le., hard to align the shares precisely because the transparencies are made by film. In other words, they move around easily. Finally, regarding the details of the generalized VSS scheme (t-outof-n VSS scheme), the readers are suggested to refer to (Ateniese et al. 1996, Naor and Shamir 1995).
464 K.-F. Hwang and C.-C. Chang
2.2
Encrypting Secret into Meaningful Shares
/DMD NTlT (a) Secret message
(b) Share 1
(c) Share2
(d) Superimposed image Figure 1. An example of (2 x 2) VSS scheme
Recent Development of Visual Cryptography 465
Table 2. A (2 x 2) VSS scheme using 2 x 2 subpixels
secret pixel
white
Share 1 Share 2
R R O R m
stacked
B 8 E H R R m
L L I U m rr-
m w
W
black
As mentioned in the previous subsection, “meaningless shares” is one of the VSS scheme’s disadvantages. In 200 1, Hwang and Chang proposed an improved method that can encrypt a secret image into two meaningful images (Hwang and Chang 2001). In particular, these two camouflaged images (shares) can be chosen arbitrarily. Hwang and Chang’s method is similar to Naor and Shamir’s scheme, but the subpixels should be chosen according to the color of the camouflage images rather than chosen arbitrarily. Table 3 shows one of the possible combinations. Note that the definitions of subpixels for
466
K.-F. Hwang and C.-C. Chang
secret images are different from that for camouflage images. For example, the definition of a white pixel in the secret images consisted of two white subpixels and seven black subpixels (Column 4 of Table 3). The same condition is defined for a black pixel in the camouflage images (Columns 2 and 3 of Table 3). Table 3. Hwang and Chang’s (2,2) VSS scheme using meaningful shares
;ecret pixel
white white
white black
black
a
Share 1
a
Share 2 white black
m
stacked white white
white
white
black
black
white
white
white
black
white
black
black
white
black
black
m
A El
black black black
Figure 2 shows an example of Hwang and Chang’s method, in which two original shares and a secret image are used. Figures 3(a) and 3(b) are the produced transparencies of the original shares. And Figure 3(c) is the reconstructed message by superimposing Figures 3(a) and 3(b). The main advantage of Hwang and Chang’s method is that
Recent Development of Visual Cryptography 467
the share can be a meaningful image. However, there is a minor disadvantage, i.e., the lost contrast of the reconstructed secret image is larger than that of traditional VSS scheme.
(a) Share 1
(b) Share2
(c) Secret
Figure 2. Example of Hwang and Chang's method
(a) Share 1
(b) Share2
(c) Reconstructed secret image Figure 3. Results of Figure 2
468
2.3
K.-F. Hwang and C.-C. Chang
Other Research Results
Having reviewed two VSS schemes in the previous two subsections, we observe that losing contrast of the reconstructed secret image is a basic problem. There are many researchers focused on this problem of VSS scheme. In 1996, Naor and Shamir proposed an alternative VSS model improving the lost contrast (Naor and Shamir 1996). However, they noticed that their (t,n)VSS method will not work well when the condition n 2 t 2 3 holds. In 1999, Blundo et al. analyzed the contrast of the reconstructed image in t-out-of-n VSS schemes (Blundo et al. 1999). Blundo et al. gave a complete characterization of (2, n) VSS schemes having optimal contrast and minimum pixel expansion in terms of certain balanced incomplete block designs. Furthermore, in case of (t,n) VSS schemes with t 2 3, they gave its upper and lower bounds on the optimal contrast. Blundo et al. further analyzed the contrast of the reconstructed image (Blundo et al. 2003). They defined a canonical form for ( t , n ) VSS scheme and provided a characterization of (t,n) VSS scheme, in which the contrast optimal (n - 1,n) VSS scheme in canonical form is established. Moreover, for n 2 4,Blundo et al. developed a contrast optimal (3, n ) VSS scheme in canonical form. We do believe that Blundo et al.’s research results are valuable for the researchers who are interested in the topic of visual cryptography.
3
Color Visual Cryptography
In 1997, Verheul and Van Tilborg (Verheul and Tilborg 1997) proposed a colored visual secret sharing scheme. The idea behind Verheul and Van Tilborg’s scheme is to transform the pixel into b subpixels of colors 0, 1,. . , , c - 1. In 2000, Yang and Laih (Yang and Laih 2000) proposed a different construction mechanism for the colored visual secret sharing scheme. They argued that their method can be easily implemented and can get much better block length
Recent Development of Visual Cryptography
469
than Verheul and Van Tilborg’s scheme. We briefly review those two methods in the following two subsections, respectively. In addition, we will briefly review some modified colored visual secret sharing schemes at the end of this section.
3.1
Verheul and Van Tilborg’s Scheme
In Verheul and Van Tilborg’s scheme, a pixel is transformed into b subpixels of colors 0,1, . . . , c - 1.The color of each subpixel can be as shown in Figure represented by a circle with a sector of angle 4(a). Figure 4(b) shows the definition of stacking (“OR’ operation) subpixels. In short, if all subpixels are color i, the result of “OR” operation equals color i, otherwise it appears to be black. The formal required conditions of a colored (t, n)scheme have been defined in (Verheul and Tilborg 1997) (the Definition 6.1) as follows:
A (t,n) visual secret sharing scheme S = (Co,C,, . . . , C,-,), consists of c collections of n x b q-ary matrices, in which the c colors are elements of the Galoisfield GF(q). To share apixel of color i, the The chosen madealer randomly chooses one of the matrices in Ci. trix defines the color of the b subpixels in each one of the transparencies. The solution is considered valid if thefollowing three conditions are met for all 0 5 i 5 c - 1: 1. For any S E Ci, the “OR” 37 of any t of the n rows satisfies zj(37)2 h, where 77 is a vector with coordinates in c colors and black color; and zi(T’) denotes the number of coordinates in T’ equal to color i. 2. For any S E
Ci,the “OR” 77 of any t of the n rows satisfies
z j ( T )5 1, f o r j # i. 3. For any il < i2 < . . . < is in {1,2, . . . , n ) with s < t, the collections of s x b matrices Dj, for j E (0, 1,. . . , c- 1) obtained
470 K.-F. Hwang and C.-C. Chang
Color 1
Color 0
Black
Color c-1
(a) Representation of c colors and black color
@ @=@ “OR”
Color i
Color i
Color i
Color i
Color j , for i # j
Black
Color i
Black
Black
(b) The definition of “OR” operation Figure 4. The infrastructure of Verheul and Van Tilborg’s scheme
by restricting each n x b matrix in Cj to rows i l , iz, . , , , is are indistinguishable in the sense that they contain the same matrices with the same frequencies. In the above conditions, h and 1 are used to determine the quality of the revealed secret image, where h > 1. Additionally, b is the block length, and it is used to determine the resolution of the original picture. Consequently, the value of the parameter b should be as small as possible. In short, the first two conditions ensure that the original
R e c e n t D e v e l o p m e n t of Visual Cryptography
471
pixel’s color will be revealed by stacking t transparencies. And the thrd condition ensures that t - 1 or fewer transparencies have no information about the pixel’s color. Although the color of subpixels can be represented by a circle with a sector of angle (Figure 4(a)), it is hard to implement in modern computer systems. Therefore, rectangle representation (matrix) is used for Verheul and Van Tilborg’s infrastructure. And that is why the value of the parameter b must be as small as possible. For instance, in a (2,2) colored visual secret sharing scheme with c colors, the pixel expansion (b) in Verheul and Van Tilborg’s scheme is c x 3. In 2000, Yang and Laih (Yang and Laih 2000) proposed a different infrastructure that makes the value of b reduced to c x 2. We will briefly review Yang and Laih’s in the next subsection.
3.2
Yang and Laih’s Scheme
By contrast, Yang and Laih’s scheme (Yang and Laih 2000) is simpler than Verheul and Van Tilborg’s scheme. Their infrastructure (refer to Figure 5(a)) for colored visual secret sharing is extended from the conventional blacwwhite ( t ,n ) mechanism, which has been reviewed in the previous section. Yang and Laih defined a different “OR’ operation as shown in Figure 5(b). Please note that there is no definition of “OR’ between color i and color j. This is because Yang and Laih’s scheme excludes this circumstance. Next, the formal construction procedure of a colored ( t , n )VSS scheme is reviewed as follows: (the Construction 1 in (Yang and Laih 2000))
Let Bo and B1 be the two n x m Boolean white and black matrices, respectively, as defined in conventional black and white ( t ,n) VSS scheme. Theparameters are the share size m, the Hamming weight of any t of the n rows in black share matrix h’, the Hamming weight of any t of the n rows in white share matrix l’, and h’ > 1’. Then, a col-
472
K.-F. Hwang and C.-C. Chang
Color 1
Color 0
Color c- 1
Black
(a) Representation of c colors and black color
Color i
Color i
Color i
Color i
Black
Black
(b) The definition of “OR’ operation Figure 5. The infrastructure of Yang and Laih’s scheme
ored ( t ,n ) VSS scheme with c colors has the n x ( c x m ) matrices Ci, i E {0,1, . . . , c } and Ci ={all the matrices obtained by permuting O+jz;l-+* O+jc-1;1-+* the columns of [B;+i;l+* o-’jl;l-** IBl I-IB1 IBl 11. where j1 jc-l E {{0,1,. . . , c - l} - {i}}. The superscript (0 + i ; 1 -+ *) means that the elements 0 and 1 in Bo or B1 are replaced by i and *, respectively. The * denotes the black color: N
Here, m denotes the share size of the used blacwwhite ( t , n )VSS scheme. The block length, i.e., the value of the parameter b, equals c x m, where c is the number of colors used in the system. Yang and Laih (Yang and Laih 2000) have shown that the value of b used in their scheme is much smaller than that of Verheul and Van Tilborg’s scheme.
3.3
The Variations of Colored VSS Scheme
A major common disadvantage of the above reviewed colored VSS schemes is the number of colors and the number of subpixels domi-
Recent Development of Visual Cryptography 473
nate the resolution of the revealed secret image. If a great many colors are used, the subpixels require a large matrix to represent it. In other words, the contrast of the revealed secret image will go down drastically. Consequently, how to correctly stack these shared transparencies and precisely recognize the revealed secret image is a more difficult problem (refer to the explanation in the previous section). In 2000, Chang et al. proposed a new secret color image sharing scheme (Chang et al. 2000), which is based on the modified visual cryptography. In their scheme, a predefined Color Index Table (CIT) and a few computations are needed for precisely recovering the secret image. Chang et al.’~scheme has the capability of recovering the secret image with the same resolution as its original. But, the number of subpixels in their scheme is also proportional to the number of colors used in the secret image. The other disadvantage is that additional space is required to store the CIT. In 2002, Chang and Yu proposed another modified secret color image sharing scheme (Chang and Yu 2002), i.e., a few computations are also needed. They argued that their modified scheme can share a secret gray image (256-color) in n-color images and has the capability of recovering the secret image clearly. Furthermore, their scheme does not require any predefined Color Index Table, and the sizes of shared color images are fixed no matter how many colors are used in the secret image. In particular, the pixel expansion in their scheme is exactly 9. Compared to the previously introduced methods, the pixel expansion of Chang and Yu’s method is less.
4
Applications of Visual Cryptography
Although the original purpose of visual cryptography is sharing secrets, many applications that are based on visual cryptography have
474
K.-F. Hwang and C.-C. Chang
been developed. Here, we will introduce two main applications of visual cryptography. One is visual authentication and identification while the other is intellectual copyright protection.
4.1
Visual Authentication
In 1997, Naor and Pinkas used visual cryptography to create a solution for the problems of user authentication and identification (Naor and Pinkas 1997). Authentication and identification are very important problems in cryptographic research. Naor and Pinkas argued that there has been no satisfactory solution for the problem of authentication without using any trusted computational device. Naor and Pinkas’ methods are very natural and easy to use, in particular can be implemented using very common “low tech” technology. A two-out-of-two VSS scheme is used in their methods. Here we summarize the main idea behind Naor and Pinkas’ schemes as follows: The user H receives a transparency (say Share 1) from the server S via a secure channel. During the authentication stage, S sends another share to the user H , which implies a secret message M . The server will accept the user H , if and only if H returns the right message M to S. Based on the concept described above, Naor and Pinkas proposed three methods for the problem of authentication. The first method is called “areas and black areas”. The second method “position on the screen” has greater security than the first method. The third method “black and grey” has the greatest security among these methods. However, the third method reduces the contrast of the reconstructed image. Furthermore, these three methods can be used only for a single secure authentication. Therefore, Naor and Pinkas further proposed a more secure method which can be used many times. Moreover, their method can be used for several authentications. Finally, Naor and Pinkas pointed out many open problems relating to
Recent Development of Visual Cryptography 475
authentication and identification. For example, they asked where to find an authentication method which does not reduce the contrast and whose security is exponential in the hamming difference between the messages. Another question they posed was how does one design a method that enables the message receiver to authenticate the received message without requiring two-way interaction.
4.2
Intellectual Copyright Protection
Digital watermarking techniques were developed for solving the problem of intellectual copyright protection (Hwang et al. 2000, Petitcolas et. al. 1999). However, many problems still exist in the conventional digital watermarking techniques (Chang et al. 2002a). Therefore, many other techniques have been proposed for protecting the intellectual copyright. In 2002, Chang and Chuang proposed a copyright protection scheme for gray-level images (Chang and Chuang 2002), which is based on VSS scheme. They used a two-out-of-two VSS scheme to construct two shares, in which one share is constructed from the host image that will be protected, and the other is generated by the image owner. To identify the copyright owner of a protected image, just superimpose these two shares to reveal the pre-registered information. The right ownership can be verified if the revealed information is correct. The main difference between Chang and Chuang’s method and the conventional digital watermarking is that their method does not embed any datum into the host image. However, Chang and Chuang argued that their method satisfied the requirements of digital watermarlung techniques. Based on the above mentioned concept, Chang et al. proposed another VSS based copyright protection scheme for color images (Chang et al. 2002b).
476
5
K.-F. Hwang and C.-C. Chang
Conclusions
In this chapter, we have reviewed some research results proposed recently on visual cryptography including the Naor and Shamir’s VSS scheme and two colored VSS schemes. In addition, we have introduced some improved VSS schemes that focus on the issues of meaningful shares as well as in improving the contrast of reconstructed secret picture. In Section 4 of t h s chapter, we introduced some visual cryptography’s applications including visual authentication and identification as well as intellectual copyright protection. We believe that visual cryptography is applicable to other subjects. Consequently, discovering new applications of visual cryptography will be an interesting research direction. Similar to the traditional secret sharing schemes, VSS schemes also inherit the problems, i.e., the shares only can be used once and the shares must be carefully distributed to the participants. We believe that solving these problems will be another interesting research issue. Finally, we have to indicate that a practical problem of the ( t ,n)VSS scheme still not be solved, i.e., it is hard to precisely align the shares especially when t is large. Certainly, this problem is another valuable issue for investigation.
References Ateniese, G., Blundo, C., Santis, A.D., and Stinson, D.R. (1996), “Visual cryptography for general access structures,” Information and Computation, vol. 129, pp. 86-106. Blakley, G. (1 979), “Safeguarding cryptographic keys,” in: Menvin, R.E., Zanca, J.T., Smith, M. (Eds.), Proceedings of National Corn-
Recent Development of Visual Cryptography 477
puter Conference, 48, AFIPS Press, New York. Blundo, C., Santis, A.D., and Stinson D.R. (1999), “On the contrast in visual cryptography schemes,” Journal of Cryptology, vol. 12, pp. 261-289. Blundo, C., S’Arco, P., Santis, A.D., and Stinson, D.R., “Contrast optimal threshold visual cryptography schemes,” the SIAM Journal on Discrete Mathematics, vol. 16, pp. 224-261. Chang, C.C., Chen, T.S., and Tsai, C.S. (2000), “A new scheme for sharing secret color images in computer network,” Proceedings of International Conference on Parallel and Distributed Systems, pp. 21-27. Chang, C.C. and Chuang, L.Z. (2002), “An image intellectual property protection scheme for gray-level images using visual secret sharing strategy,” Pattern Recognition Letters, vol. 23, pp. 93 1941. Chang, C.C., Hwang, K.F., and Hwang, M.S. (2002a), “Robust authentication scheme for protecting copyrights of images and graphics,” IEE Proceedings - Vision, Image and Signal Processing, vol. 149, pp. 43-50. Chang, C.C., Yeh, J.C., and Hsiao, J.Y. (2002b), “A color image copyright protection scheme based on visual cryptography and discrete cosine transform,” Image Science Journal, vol. 50, pp. 133-140. Chang, C.C. and Yu, T.X. (2002), “Sharing a secret gray image in multiple images,” Proceedings of International Symposium on Cyber Worlds: Theories and Practice, pp. 230-240. Hwang, M.S., Chang, C.C., and Hwang, K.F. (2000), “Digital watermarking of images using neural networks,” Journal of Electronic Imaging, vol. 9, pp. 548-555.
478
K.-F. Hwang and C.-C. Chang
Hwang, M.S., Chang, C.C., and Hwang, K.F. (2001), “A threshold decryption scheme without session keys,” Computers & Electrical Engineering, vol. 27, pp. 29-35. Hwang, R.J. and Chang, C.C. (2001), “Hiding a picture in two pictures,” Optical Engineering, vol. 40, pp. 342-35 1. Naor, M. and Pinkas, B. (1997), “Visual authentication and identification,” in: Advances in Cryptologv - CRYPT’97, Lecture Notes in Computer Science, Springer-Verlag, Berlin. Naor, M. and Shamir, A. (1995), “Visual cryptography,” in: Santis, A.D. (Ed.), Advances in Cryptology - EUROCRYPT’94, Lecture Notes in Computer Science, Springer-Verlag, Berlin. Naor, M. and Shamir, A. (1996), “Visual cryptography 11: Improving the contrast via the cover base,” Security in Communication Networks, pp. 197-202. Petitcolas, F.A.P., Anderson, R.J., Kuhn, M.G.; Hartung, F., Kutter, M.; Wolfgang, R.B., Podilchuk, C.I., Delp, E.J.; Cox, I.J., Miller, M.L., McKellips, A.L.; Hernandez, J.R., Perez-Gonzalez, F.; Kundur, D., Hatzinakos, D.; Brassil, J.T., Low, S., Maxemchuk, N.F.; Voyatzis, G., Pitas, I.; Paskin, N.; Hill, K.; Schneck, P.B.; Augot, D., Boucqueau, J.-M., Delaigle, J.-F., Fontaine, C., Goray, E.; Bloom, J.A., Cox, I.J., Kalker, T., Linnartz, J.-P.M.G., Miller, M.L., and Traw, C.B.S. (1999), “Special issue on identification and protection of multimedia information,” Proceedings of the IEEE, vol. 87, pp.1062-1276. Shamir, A. (1979), “How to share a secret,” Communications of the ACM, V O ~ 22, . pp. 612-613. Verheul, E.R. and Tilborg, H.C.A.V. (1997), “Constructions and properties of k out of n visual secret sharing schemes,” Designs, Codes and Cryptography, vol. 11, pp. 179-196.
Recent Development of Visual Cryptography 479
Yang, C.N. and Laih, C.S. (2000), “New colored visual secret sharing schemes,” Designs, Codes and Cryptography, vol. 20, pp. 325335.
This page intentionally left blank
Chapter 17 Watermark Embedding System Based on Visual Cryptography Feng-Hsing Wang, Lakhmi C. Jain, and Jeng-Shyang Pan The content of this chapter is focused on the watermarking system with visual cryptography for still digital images. Generally speaking, the visual cryptography is applied upon the original watermark to split it into several shares. And deferring from the general watermarlung schemes to embed the original one into the cover image, the generated shares are served as the input watermarks and are embedded into the cover image. Some watermarking systems based on the content of the previous chapters will be mentioned and described in t h s chapter. We expect the watermarlung schemes with the visual cryptography will have better security and can be implemented to solve some problems such as ownership verification.
1
Introduction
Visual cryptography (Noar and Shamir 1994) is one kind of procedure for splitting one image into several shares. Based on the ability of it, some watermarlung schemes were proposed for solving some problems, i.e. ownership of digital data, or providing better security. In spatial domain based watermarking schemes, Hou and Chen (2000) proposed a modified visual secret sharing scheme to split the original binary watermark into two unrecognizable shares in grayvalued. The first share is used as the input watermark and the second one is served as a secret key for verification. To reveal the wa48 1
482
F.-H. Wang, L. C. Jain, and J.-S. P a n
termark fiom the watermarked image, the secret share has to be stacked upon the watermarked image. If the original watermark can be recognized from the stacked result, the ownership of it can be claimed then. Wang et al. (2000) proposed a method to process the repeating watermarks into two shares, one is used for embedding and one is used for verification. Based on vector quantisation (Gray 1984), Pan et al. (2002) proposed a gain-shape vector quantisation based scheme. They suggested encrypting the split shares with the VQ indices. In 2002, Pan et al. proposed another VQ based scheme, which embeds fake watermarks. In this chapter, some watermarking schemes based on visual cryptography will be described and examined. The construction of this chapter will begin with the spatial domain based watermarking scheme in Section 2, and follow by the vector quantisation (VQ) based schemes. Examples and experimental results will be presented in each section. The conclusion section will be presented in the end to sum up this chapter. We assume the readers have had read the earlier chapters and have had the basic ideas about the mentioned schemes or techniques. The computer programs about the mentioned schemes can be found in the attached CD-ROM. The instruction of using the programs will be illustrated in the appendix section. Readers who have the interests are encouraged to try and study these programs.
2
Spatial Domain Based Scheme
Visual cryptography is a spatial-based procedure for images, so the direct way to add the visual cryptography feature into the watermarking schemes is to adopt the spatial based ones. In this section, the scheme modified from Want et al. (2000) will be introduced. We will start with the procedure of generating a public
Watermark Embedding System Based on Visual Cryptography 483
watermark and a secret one from the original watermark. The public one will be embedded into the cover image and the secret one will be served as the user-key. In the extraction procedure, after extracting out a public watermark from the watermarked image, the original watermark can be obtained by stacking the secret one and the extracted one. Finally, a reduction procedure is carried out to recover the original watermark. The ownershp can be verified if the recovered result can be recognized. Figure 1 illustrates the block diagrams of the watermarking scheme. In Figure l(a), X , B , W , W , , and W, are the cover image, the defined block set (see Table 1 and Table 2), the original watermark, the public watermark, and the secret watermark, respectively. In Figure l(b), X’ is the watermarked image. In Figure l(c), X , e p , W T , and @ are the attacked image, the rebuilt public watermark, the stacked result, and the recovered original watermark, respectively.
484
F.-H. Wang, L. C. Jain, and J . 3 . P a n
W
X +
Search
-
Table Lookup
B
ws
B
(a) Generating watermarks
X
&kxt Embedding
(b) Embedding
x+T&&3(-bfi Search
Reduction
B (c) Stacking and reduction
Figure 1. The block diagrams of the watermarking scheme.
Watermark Embedding System Based on Visual Cryptography 485
2.1
Public And Secret Watermarks
In t h s section, the method for splitting the given watermark into two shares, one is called the public watermark and the other one is called the secret watermark, will be described.
2.1.1 Classification of Block Type To generate the public watermark and the secret watermark, the original one has to be divided into non-overlapping blocks firstly. Then, the concept of block truncation coding (Mitchell 1978) is employed to determine the block type for each block. With the block types and referring to the original watermark, the public codes and the secret codes can be obtained from Table 1 or Table 2, and finally the public watermark and the secret watermark will be generated. All the blocks can be classify into two classes: smooth block and edge block. The smooth block and the edge block can be further classified into four types and six types respectively. Table 1 and Table 2 demonstrate the characteristic of each block type, and the procedure below illustrates how to determine the block type for a given block. For a given block x, let p , a, b, and A be the mean value of the block, the h g h level of reconstruction, the low value of reconstruction, and a user-defined threshold, respectively. The definitions of p, a, and b are:
486
F.-H. Wang, L. C. Jain, and J.-S. Pan
Table 1. Block types of smooth blocks. (p is the mean value of the block.) Characteristic
Type O
I
Public
OIyI63
Secret Code
1
B
I H o r M o r N
6 4 5 p I 127 128 I ,u I 191
3
I
I
I
I
192IpI255
Table 2. Block types of edge blocks. Block Type
Characteristic
4
Lower horizontal orientation
5
Right vertical orientation
6
45-degree orientation
7
135-degree orientation
8
Left vertical orientation
9
Upper horizontal orientation
Public Code
Secret Code
IF0
MF1
m m i m m m B
i
B
w
8 8
m
W a t e r m a r k Embedding S y s t e m Based on Visual Cryptography 487
where n is the number of pixels in x, x(i) is the i-th pixel of x, and
q is the total number of x(i) 2 p in the given block. If la - b)2 A , then the block is a smooth one; otherwise, it is an edge one. Figure 2 shows some examples. The values in each block are the gray values of the pixels.
Figure 2. Examples of block type classification.
For the first block in Figure 2(a), ,u =110, a=130, and b=90. If la - bl c A , then this block is a smooth one. From Table 1, we can find the block type of it is 1 because 64 1 p 1127. If la - b( 2 A, this block is an edge one, which can be classified hrther more. To accomplish ths, we transform the block into a binary one by: YG>=
1, 0,
if x ( i ) 2 p; otherwise,
where y ( i ) is the transformed result of the i-th pixel.
(4)
488
F.-H. Wang, L. C. Jain, and J . 3 . P a n
Figure 3(a) shows the transformed result of Figure 2(a). It is obvious that t h s block has the characteristic of lower horizontal orientation, and from Table 2 the block type of it is 4. Following the same rules, the other blocks in Figure 2 can be classified to obtain the block types, and which are displayed in Figure 3.
(a) Block type = 4
(b) Block type = 6
(c) Block type = 7
Figure 3. Block types of the blocks in Figure 2 while la-bl21..
2.1.2 Generating Watermarks To generate the public watermark and the secret watermark, the steps are summarized below. Step 1: Divide the cover image X into non-overlapping blocks of size 2x2. Step 2: For the i -th block x i use the classification procedure to obtain the block type. Step 3: Use Table 1 or Table 2, and the obtained block type to determine the corresponding public code. Step 4: Determine the secret code from Table 1 or Table 2 by referring to the block type and the corresponding watermark bit w i. Step 5: Set i = i + 1 and repeat Step 2 to Step 5 until all the blocks are handled.
Watermark Embedding System Based on Visual Cryptography 489
Step 6: Collect all the obtained public codes and the secret codes to generate the public watermark and the secret watermark respectively. Figure 4 shows the example of applying the above procedure upon a given watermark image.
(b) Cover image
490 F.-H. Wang, L. C. Jain, and J.-S. P a n
(c) Public watermark
(d) Secret watermark
W a t e r m a r k Embedding S y s t e m Based o n Visual Cryptography 491
(e) Stacked result
Figure 4. An example of the mentioned watermarks generating procedure.
F.-H. Wang, L. C. Jain, and J.-S. Pan
492
2.1.3 Stacking Procedure and Reduction Procedure To obtain the hidden watermark fkom the public watermark and the secret watermark, the XOR operator is applied:
W,=W,OW,,
)(5)
where W, is the stacked result whch has the same size as W , or Ws
*
Furthermore, to remove the redundant information from W , , the procedure below is used: Step 1: Decompose W, into non-overlapping blocks of size 2x2. Step 2: For the i -th block of W , , count the number of black pixels. Let count be the counted result. Step 3: Use the criterion below to determine the reduced pixel w’(i) : w’(i) =
0, 1,
if count > 2; otherwise.
(6)
Step 4: Let i = i + 1 and repeat Step 2 to Step 4 until all the blocks have been processed. Step 5: Collect all the reduced pixels to recover the original watermark W’ . Figure 4 (e) demonstrates the stacked result and Figure 4 (Q reveals the reduced result.
W a t e r m a r k Embedding S y s t e m Based on Visual Cryptography 493
2.2
Watermarking Algorithm
After generate the public watermark and the secret watermark, the procedure below illustrates how to embed the public watermark into the cover image and how to extract it fiom a watermarked image.
2.2.1 Embedding Procedure The steps for the embedding procedure are: Step 1: Divide the cover image X into m non-overlapping blocks of size 2x2. Let x i be the i-th block of X , where
Step 2: Divide the public watermark Wp into m non-overlapping 2x2 blocks. Let w p i be the i-th block of Wp , where
Step 3: For the i -th block x i of X , use the block classification procedure to obtain the block type. Step 4:If x i is a smooth block, then embed no bit into this block
(XI= x i ); otherwise, embed wpi by: xi( j , k ) =
xi ( j ,k) + 6,
xi ( j ,k ) - 6,
if wpi( j ,k) = 1; otherwise.
(7)
494
F.-H. Wang, L. C. Jain, and J . 3 . P a n
Here 6 is another user-defined threshold, 0 5 j , k I 1 , and
Step 5 : Set i = i + 1, and repeat Step 3 to Step 5 until all the blocks have been processed. Step 6: Collect all the modified blocks to generate the watermarked image. 2.2.2
Extraction and Verification
To recover the original watermark, the public watermark Wp' has been extracted from the received watermarked image firstly. The steps in section 2.1.2 are applied for doing so. Then, with the secret watermark, the stacking procedure and the reduction procedure in section 2.1.3 are performed to obtain the original watermark W' . If a watermark can be recognized from the reduced result, the ownership of it can then be claimed.
2.3
Performance
To demonstrate the performance of the above watermarking scheme, some experiments were done with the materials: Lena (512x512, gray-valued) as the cover image and the image in Figure 4(a) (256x256, binary-valued) as the original watermark. To test the robustness of the method, some attacks were applied. In the experiments, the peak signal to noise rate (PSNR) is used to evaluate the quality of the watermarked images. Figure 5 to Figure 8 show the stacked results and the reduced results under the attacks while A=10 and &8. Figure 9 displays the PSNR results while using different values of A and S:
Watermark Embedding System Based on Visual Cryptography 495
(a) Stacked result
(b) Reduced result Figure 5. Stacked result and reduced result under the P E G 2000 attack with quality factor =6O%.
496
F.-H. Wang, L. C.Jain, and J.-S. Pan
(a) Stacked result
(b) Reduced result Figure 6. Stacked result and reduced result under the low-pass filtering with window size=3.
W a t e r m a r k Embedding S y s t e m Based o n Visual Cryptography 497
(a) Stacked result . .
(b) Reduced result
Figure 7. Stacked result and reduced result under the change-contrast attack with 20% increase.
498 F.-H. Wang, L. C. Jain, and J.-S. Pan
(a) Stacked result
(b) Reduced result
Figure 8. Stacked result and reduced result under the shifting attack with downward one pixel.
Watermark Embedding System Based on Visual Cryptography 499
48
-e 6=4
-
+6 = 6
-+- 6 = 8 6=10
34 32 30
+6 = 1 2
~
I
8
10
I
I
12
14
16
18
20
Threshold A Figure 9. The relationship between quality (PSNR) and different values of threshold A and 6.
500
3
F.-H. W a n g , L. C. Jain, and J . 3 P a n
Vector Quantisation Domain Schemes
The watermarlung methods that illustrated in t h s section are based on vector quantisation (Gersho 1992). The content of this section will focus on the applications of VQ-based watermarking schemes with visual cryptography. Readers who have the interests about the related techniques can find more information in Chapter 7. After introducing the basic concept of watermarking scheme with visual cryptography and the VQ based watermarking schemes in the earlier chapters, it should be not difficult for the readers to adopt a VQ-based watermarking scheme into a visual cryptography based one. A simple example is given in Figure 10 to show the readers how to design a VQ-based watermarking scheme with visual cryptography. In t h s example, X i s the input image, C is a VQ codebook, I is the set of the VQ indices, W is the original watermark, W , is the public watermark, W , is the secret watermark, I’ is the modified VQ index set, and X’ is the watermarked image. The methods that proposed by Lu et al. (2000) or Jo and Kim (2002) etc for embedding the watermark bits by modifjmg the VQ indices can be employed in the given example. Despite the mentioned system above, the advanced VQ-based scheme will be introduced instead in this section.
Watermark Embedding System Based on Visual Cryptography 501
W
ws
b VSSScheme
C
WP X+
VQ Nearest Codeword Search
XI-
VQ Nearest Codeword Search
I ,
-
Index Modlfication
I I,
Extraction
I’,
VQTable Lookup
Stacking
__+
X‘
W’
(b) Extraction
Figure 10. The VQ-based watermarking scheme with visual cryptography.
3.1
Gain-Shape VQ
Comparing with the traditional VQ system, the gain-shape VQ (Gersho 1992) owns some advantages such as faster encoding time and smaller codebook storage space. Figure 11 shows the structure of a gain-shape VQ system.
502
F.-H. Wang, L. C. Jain, and J.-S. Pan
Cs,e
VQ Search
'shape 'shape
VQ Search
t Cgh (a) Encoding
'shape
'shape
1
I,&
Table Lookup
Table Lookup
t (b) Decoding Figure 11. The block diagrams of the gain-shape VQ system.
W a t e r m a r k Embedding S y s t e m Based o n V i s u a l Cryptography
503
The steps of the encoding procedure are: Step 1: The same as the traditional VQ system, the input image X is decomposed into many non-overlapping vectors firstly. Step 2: For the i-th vector x i , the nearest codeword search is carried out to obtain a nearest shape codeword from the shape codebook Cshape . Step 3: Refer to the obtained shape codeword, the nearest search is applied again to obtain a nearest gain value from the gain codebook Cein. Step4:Repeat Step 2 and Step 3 until all the vectors have been handled. and the gain Step 5 : Collect all the obtained shape indices as Ishape indices as Igin . Transmit them to the receiver. and Iein, the procedure below is employed After receiving Ishape with the same shape codebook and gain codebook to reconstruct the VQ image: Step 1: For the i-th gain index and shape index, execute the table lookup procedure to obtain a gain value g from Celn and a shape codeword s fiom Cshape respectively. Here g E Cgln and ‘shape
.
Step 2: Use the equation below to reconstruct the i-th output vector
504 F.-H. W a n g , L. C. Jain, and J.-S. P a n
Step 3: Repeat Step 1 and Step 2 until all the indices have been processed. Step 4: Piecing together all the reconstructed vectors, and X’ can be obtained.
3.2
Modified Splitting Procedure and Stacking Procedure
In the splitting procedure of the traditional visual cryptography, if the size of a given input image is m , then the sizes of each share will be 4xm. To avoid embedding the redundant information of the shares into the cover image, the modified scheme is introduced. Let W be a binary input image and Table 4 be the reference table. The steps below illustrates how to split Winto two shares. Step 1: For the i-th pixel wi of W, if wi is white, one pixel pair of No.1 or No.2 will be selected fiom Table 4 randomly. If w iis a black one, the pixel pair of No. 3 or No. 4 will be selected randomly. Step 2: Assign the obtained pixels as the i-th pixel of share 1 and share 2 respectively. Step 3: Repeat the above two steps until all the pixels of W are processed. Step 4: Piece together all the pixels of share 1 and all the pixels of share 2 to generate W , and W, respectively.
Watermark Embedding S y s t e m Based o n Visual Cryptography 505
Table 4. The pixel pairs that are used in the modified (2,2)-VSS scheme.
I
Original Pixel
1
No
1 - Share 1 l Share 2 I I Stackedresult (XOR) I
0 1
3
2
4
o l . l o l . 0
I
0
I
I
Figure 12. An example of the modified secret sharing scheme. (a) The original watermark, (b) share 1, (c) share 2, and (d) the stacked result.
506 F.-H. Wang, L. C. Jain, and J.-S. P a n
In the stacking procedure, the XOR operator is applied to recover the original watermark:
w = w ,ow,.
(9)
An example of employing the illustrated scheme upon a given image
is demonstrated in Figure 12.
3.3
Watermarking Algorithm
For a given watermark, the modified secret sharing scheme in Section 3.2 is applied to split it into two unrecognizable shares firstly. Then, the two shares will be seen as the watermarks and will be embedded into the gain indices and shape indices respectively. The system generates two output keys, which will be used for extracting the embedded information and v e r i m g the ownership. The block diagrams for the embedding procedure and extraction procedure are displayed in Figure 13. The steps for the embedding procedure are summarized below. Step 1: For a given watermark W, split it into two unrecognizable shares W, and W, by applying the procedure in Section 3.2. Step 2: For a given cover image X , perform the gain-shape VQ procedure to obtain the shape indices I , and gain indices I , . Step 3: Calculate the mean values for the surrounding indices of I , and I , respectively. Step 4:Generate the polarity data streams 4 and P, by comparing I , with the mean value of I , , and I , with the mean value of I * , respectively.
Watermark Embedding System Based on Visual Cryptography 507
W
b
Split
Polarity
Wl
4
Embedding
__*
key,
Cshape
I x
Search
Polarity
a
Figure 13. The block diagrams of the gain-shape VQ based (a) embedding and (b) extraction procedures.
508
F.-H. Wang, L. C. Jain, and J . 3 Pan
Step 5:Encode the watermark bits W, with 4 and W, with P2 to generate the output keys Key, and Key, , respectively. To generate the polarity data stream, let i n d ( j , k ) be the index of the ( j , k )-th block in X , and p ( j , k ) be the mean value of its surrounding indices. The definitions of the surrounding indices and p ( j ,k ) are shown in Figure 14 and Equation ( 1 0) respectively.
U
I
I
Figure 14. The definition of the surrounding indices. 1
1
1
The equation below is used for determining the polarity of current index: p ( j y k )=
1,
if i n d ( j , k ) 2 p ( j , k ) ;
0,
otherwise.
In Step 5, to generate the output key, let wl ( j ,k) be the ( j ,k) -th bit of W ,, and p , ( j ,k ) be the polarity of the ( j ,k ) -th shape index. The XOR operator is applied in the below equation to generate the output key:
Watermark Embedding System Based on Visual Cryptography 509
Finally, we collect all the bits and the first output key can be obtained:
Following the same rules, the second output key can be generated fiom W2 and P2 too.
As to the extraction procedure, Figure 13 (b) illustrates the block diagram. Here X is the input watermarked image. Following the same steps as described in the embedding procedure, two polarity data streams and k2 can be established. Then, the equation below is used to recover the hidden watermark:
k,= Key, 0 4, ( i E [O,l]).
(14)
6’]and W 2 ,the recovery procedure in Section 3.2 is performed and the original watermark 6‘ can be recovered. After extracting
3.4
Performance
In the experiment, the well-known test image “Lena” was used as the input image, and “Rose”, see Figure 12(a), was used as the original watermark image. The size of them are 512x512 and 128x128 respectively. The size of the gain codebook and the size of the shape codebook are 16 and 16 respectively. The watermark was split into two unrecognizable shares, which are displayed in Figure 12 (b) and (c). The PSNR value between the original image and the output image is 28.58 dB.
510 F.-H. Wang, L. C. Jain, and J.-S. Pan
For testing the robustness of our system, the JPEG compression attack with different quality factors, low-pass filter, median filter, and rotation were applied. Table 5 lists the normalize correlation (NC) values between the input watermarks and the extracted watermarks, and the original watermark and the stacked watermark under the mentioned attacking functions respectively. Figure 15 demonstrates the final results after extracting the shares and stacking them under the mentioned attacking functions.
.. ., . I
.'.: . . .
NC=0.9762
NC=0.9891
NC=0.9552
(a) JPEG, Q=60%
(b) JPEG, Q=8O%
(c) Low-pass Filter
NC=0.9578
NC=0.8832
NC=O.8 637
(d) Median Filter
(e) Rotation, 1"
(f) Rotation, 2"
Figure 15. The recovered watermarks under different attacking functions.
Watermark Embedding System Based on Visual Cryptography 511
Table 5. The NC values under different attacking functions.
I
AttackFunctions
I
NC(Wl,~l)
JPEG (QF=60%) JPEG (QF=80%)
Low-pass Filtering Median Filtering Rotation (1 ) Rotation (2" )
4
0.9765 0.9893 0.9648 0.9614 0.9355 0.9304
I
NC(W,,Fk2)
0.9997 0.9999 0.9892 0.9958 0.9387 0.9227
I
NC(W,@)
1
0.9762 0.9891 0.9552 0.9578 0.8832
0.8637
Conclusions
In t h s chapter, some watermarking Schemes with visual cryptography have been introduced and illustrated. In Section 2 of this chapter, we introduced the spatial domain based scheme, which splits the original watermark into one public watermark and one secret watermark. T h s scheme can be implemented for ownership verification or identification. In Section 3, the concept of adopting the VQ based watermarking scheme with visual cryptography was given, and the scheme based on gain-shape VQ was introduced too. Furthermore, the mentioned scheme can be adopted upon the multistage VQ system or other systems to enhance them with the secret sharing ability. The concept of the introduced schemes and methods in this chapter should be easy to understand and to be implemented or adopted. And with the hnctions of visual cryptography, we believe the safety of the watermarking systems can be improved.
512
F.-H. Wang, L. C. Jain, and J . 3 . Pan
References Gersho, A. and Gray, R. M. (1992), Vector Quantization and Signal Compression, Kluwer Academic Publisher, London. Gray, R. M. (1984), “Vector quantization,” IEEE ASSP Magazine, pp. 4-29. Hou, Y.C. and Chen, P.M. (2000), “An asymmetric watermarking scheme based on visual cryptography,” IEEE Proceedings of Fifth ICSP, pp. 992-995. Huang, H.C., Wang, F.H., and Pan, J.S. (2001), “Efficient and robust watermarlung algorithm with vector quantisation,” IEE Electronics Letters, vol. 37, pp. 826-828.
Jo, M. and Kim, H. (2002), “A digital image watermarking scheme based on vector quantisation,” IEICE Transactions on Iformation and System, vol. E85-D, pp. 1054-1056. Katzenbeisser, S. and Petitcolas, F. (2000), Information Hiding Techniques for Steganography and Digital Watermarking, Artech House Press, Nonvood.
Lu, Z.M. and Sun, S.H. (2000), “Digital image watermarking technique based on vector quantisation,” IEE Electronics Letters, vol. 36, pp. 303-305. Mitchell, O.R., Delp, E.J., and Carlton, S.G. (1978), “Block truncation coding: a new approach to image compression,” Proceeding of ICC.
Watermark Embedding System Based on Visual Cryptography 513
Noar, M. and Shamir, A. (1994), “Visual cryptography,” Euroclypt ’ 94, Lecture Notes in Computer Science, Springer-Verlag, Perugia, Italy, pp. 1-12. Pan, J.S., Wang, F.H., Jain, L.C., and Ichalkaranje, N. (2002), “A multistage VQ based watermarlung t e c h q u e with fake watermarks,” Proceedings of the First International Workshop on Digital Watermarking, pp. 402-406. Pan, J.S., Wang, F.H., Yang, T.C., and Jain, L.C. (2002), “A gainshape VQ based watermarking t e c h q u e with modified visual secret sharing scheme,” Proceedings of the Sixth International Conference on Knowledge-Based Intelligent Information & Engineering Systems, pp. 402-406. Wang, C.C., Tai, S.C., and Yu, C.S. (2000), “Repeating image watermarking technique by the visual cryptography,” IEICE Special Section on Digital Signal Processing, pp. 1589-1598.
This page intentionally left blank
Chapter 18 Spread Spectrum Video Data Hiding, Interleaving and Synchronization Yun Q. Shi, Jiwu Huang, and Heung-Kyu Lee
In this chapter, some typical spread spectrum based video data hiding algorithms are presented, and two related research issues are addressed. One is to correct both random and bursts of errors using 3-D interleaving together with random error correction codes. Another is fiame synchronization in hidden data detection.
1
Introduction
It is noted that copyright protection, in particular, the digital versatile disk (DVD) video copyright protection has been the driving force for the tremendous attention paid and efforts made to digital watermarking since 1990s. This does not come with surprise because digital video of high visual quality can be easily copied without distortion and quickly distributed to any place in the world through network andor DVD distribution. In addition to DVD, video broadcasting and video on demanding (VOD) face the similar copyright protection issue. Consequently, owners of digital videos are hesitated to release their digital videos. It has been realized that encryption alone cannot resolve copyright protection and video watermarking becomes an effective measure to protect copyright after digital video has been decrypted.
515
516
Y. Q. Shi, J. Huang, and H.-K Lee
There are some special requirements for video watermarking, such as: (1) Imperceptibility, or difficult to notice the distortion introduced by watermarking. Obviously, this is necessary for watermarked video to have commercial values. (2) Robustness, or difficult to remove. It is expected that even though the watermarking algorithm may be known to public unauthorized removal of watermark is impossible. In addition, common video manipulation such as video compression, noise addition, format conversions and geometric shiR should not lead to failure of watermark detection. (3) Due to the fact that a video is a sequence of video fkames presented at a certain frame rate, watermark detection should be fast and inexpensive. (4)The payload of video watermarking should satisfy certain requirement (say, equal to or larger than eight bits per detection interval for DVD application). (5) The probability of a false positive alarm (a watermark is detected while there was no watermark actually embedded) should be extremely small (say, less that per detection). Though video protection is the driving force of recent emerging research of digital multimedia watermarking, it appears that compared with image watermarking, video watermarlung has been reported much less in the literature. In this chapter, we focus on spread spectrum video data hiding and some research issues that need to be resolved. We first present two algorithms that apply spread spectrum technique to video watermarking in uncompressed and compressed domains, respectively (Hartung and Girod 1998). Next, an algorithm in uncompressed spatial domain developed in what is known as Millennium system is introduced (Maes et al. 2000). It is not our goal to cover in t h s chapter all video watermarking algorithms, e.g., another algorithm in the compressed domain by Langelaar et al. (1998). It is noted that the scope of video data hiding is wider than that of video watermarking. Namely, the data hidden in a video sequence may be used for purposes other than copyright protection. Therefore, in the second part of this chapter, we address two issues
Spread Spectrum Video Data Hiding, Interleaving and Synchronization 517
related to video data hiding. One is correction of both random and bursts of errors occurred in stego-video. There, a newly developed multi-dimensional (M-D) interleaving t e c h q u e and investigations of applying this interleaving technique to enhance robustness of hidden data against bursts of errors (Shi and Zhang 2002, Shi et al. 2003) are presented. Another is resynchronization issue related to spread spectrum techques. There, frame synchronization newly developed for video data hlding (Liu et al. 2002) is reported.
2
Spread Spectrum Video Watermarking in Uncompressed and Compressed Domains
Spread spectrum (SS) techniques, emerged since 1950s, have found wide applications in military communications systems due to its secrecy to, and robustness against interception from, an unauthorized person. It was first applied to image watermarking by Cox et al. (1997) and soon became the most popular one among numerous watermarlung methods. The SS was applied to video watermarking by Hartung and Girod (1 998).
2.1
Spread-Spectrum Technology
Because of its importance in image and video watermarlung, in thls section, we introduce the SS technology. A block diagram of SS digital communication system is shown in Figure 1. It is noted that an identical pseudorandom or pseudonoise (PN) binary sequence is generated and available at both transmitter and receiver. At the transmitter, it is used to spread the transmitted signal in spectrum, while it is used to despread the received signal in spectrum at the receiver. The PN sequence, which is independent of the information
518
Y. Q. Shi, J . Huang, and H.-K Lee
sequence and used to spread and despread information sequence at the transmitter and receiver, respectively, is the key feature of the SS technology.
data
'
Modulator
N generator
'
Channel
-
data
Demodulator
b
Fl generator
Figure 1. Block diagram of spread spectrum digital communication system.
Through popularly used direct-sequence SS technology, we explain how SS methods work. Considering an information sequence is(t),
where a, = +1, pT,( t ) is a rectangular pulse of duration Tb , and
Tbis the reciprocal of data rate of the baseband signal is(t), Rb . The binary PN code sequence pn(t) can be expressed as
where b, = f l , p , ( t ) is a rectangular pulse of duration Tc , and
T, is the reciprocal of data rate of the code sequence pn(t), Rc . The rectangular pulse p , ( t ) is often referred to as a chip, and the rate
Spread Spectrum Video Data Hiding, Interleaving and Synchronization
519
of the code sequence pn(t), R, , is often called chip rate. The data
rates, R, and R, , satisfy the following relation.
In spreading, depicted in Figure 1, we consider the product of is(t) andpn(t,. Note that is(t).pn(t)= +I for any given time moment t. In Figure 2, an example of is(t),pn(t) and the product of the two are illustrated. Assume that the double-sideband suppressed carrier (DSB-SC) modulation is used and a sinusoid Accos(2nf,t) is used as the carrier. The DSB-SC modulated signal is then mod@)= A, , i s ( t ) . p n ( t ) . c o s ( 2 ~ f ct )
(4)
Because is(t).pn(t) = k1,Vt the modulated signal is in fact a binary PSK signal. According to Figure 1, at the receiver side, we have the following despread signal.
The conventional demodulation technique can now be applied to demodulate this despread signaI and extract the information sequence. A good summary of SS technology can be found, say, in (Proakis and Saleh 1994). The PN code is used at the transmitter to spread information sequence into a wide bandwidth for transmission and used at the receiver to despread the signal back to a narrow bandwidth. Since the channel interference normally occupies a wide bandwidth, the SS technology can reduce the interference power by a factor of R, I R, , which is referred to as SS processing gain z, . That is,
520 Y. Q. Shi, J . Huang, and H.-K Lee
Often, the processing gain rGis an integer. In addition to channel interference reduction, secrecy is another major advantage brought by the SS technology. That is, since PN code is only known to the authorized receiver and unauthorized receivers cannot demodulate the transmitted signal. The price paid for achieving the processing gain and secrecy using SS technology is wide channel bandwidth and added computational complexity.
1
0
1
0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 1 1
I 0 1 1 0 1 0 1 1 1 0 1 0 1 0 0 1 1
Figure 2. Information signal (the first row), PN sequence (the middle row) and their product (the bottom row).
2.2
SS-based Video Watermarking in Uncompressed Domain
In t h s case, watermark signal (a binary bit sequence) is the baseband information signal discussed in Section 2.1. It is denoted by
Spread Spectrum Video Data Hiding, Interleaving and Synchronization 521
The spreading of thts information signal with a binary PN sequence is essentially the same as discussed in Section 2.1. The procedure, however, is seemingly different. That is, the watermark signal is first expanded into another binary sequence {bi} as follows.
where z is an integer, equivalent to the processing gain zG or chip rate Rc when R, = 1. Hence, we see that we in fact embed one information bit into z pixels in a video frame. A binary PN sequence, {ci,ci = {- lJ}, i E I } , is then used to multiply the expanded {bi} sequence. The SS watermark signal is now:
{wi= bi .ci, i~ I }
(9)
By SS embedding in uncompressed domain, it is meant that the spread spectrum watermark signal multiplied by a scaling factor, denoted by a , whtch is possibly adapted to some local properties of the original video frame, is added into the pixel gray-level value of pixels of the original video fi-ame, xi, where i represents a sequential index of the pixel. Putting together, we have the following embedding formula.
where the scaling factor aimay be related to some local property of video fi-ame. The whole embedding process is exactly like what introduced in Section 2.1. In watermark signal detection, the same binary PN sequence is used to despread the received signal in order to retrieve the watermark signal like what discussed in Section 2.1. One implementation can be:
522
Y. Q. Shi, J . Huang, and H.-K Lee
(m+l),m-l
=
C(Xi.ci+q*bi) i=m.z
where xi' denotes the pixel gray value at the watermark detection. It can be shown that the sign of the above correlation sum determines the information bit am, i.e., positive sign represents a binary 1 and negative a binary 0. In (Hartung and Girod 1998), high-pass filtering of watermarked video sequence is applied. It is observed that this technique does not need to know the original video fiames in watermark signal retrieval.
2.3
Synchronization in SS-based Video Watermarking
It is noted that pixels in each frame of a video is cascaded in a rowby-row or column-by-column manner. Then the whole video sequence is cascaded in a frame-by-fiame manner. In this way, all pixels in a video sequence are cascaded into a 1-D sequence. Information bits are embedded into each pixel in the resultant l-D sequence, which is depicted in Figure 3. In watermark signal detection, we need to do correlation for each group of pixels that are corresponding to one watermark bit. That is, watermark signal detection heavily replies on accurate synchronization. This is true in general for all SS technologies. The synchronization of all pixels in the video sequence, whch are 3-D in nature, becomes a critical issue, and is not easy to handle.
Spread Spectrum Video Data Hiding, Interleaving and Synchronization
/
523
/
Figure 3. A 1-D pixel sequence of huge dimensionality formed from a 3D video sequence.
2.3.1 Brute-force Search Hartung and Girod proposed in their paper a technique, called sliding correlator. Namely, the window is sliding to find out the best position where the correlation achieves the maximum correlation sum. In fact, this is an exhaustive search for all possible shifts. Considering the huge size of the 1-D pixel sequence formed from a 3-D video, this sliding correlator technique requests h g h computation.
2.3.2 Embedding Side-information for Synchronization In a newly developed technique by Liu et al. (2001), frame synchronization was achieved by embedding frame numbers as sideinformation into each video frame. It will be presented in Section 5 as a research subject.
524
2.4
Y. Q. Shi, J. Huang, and H.-K Lee
SS-based Video Watermarking in Compressed Domain
It is known that almost all current international video compression standards including MPEG-1, 2, H.261 and H.263, the baseline mode of MPEG-4 are based on motion compensated prediction coding and block based discrete cosine transform (DCT) coding (Shi and Sun 1999). Video watermarking in compressed domain therefore makes sense. Hartung and Girod (1998) applied their technique, discussed in Section 2.2 to video watermarking in compressed domain as well. That is, they embed watermark signal in DCT coefficients using the spread spectrum method. Consequently, the SS based watermarking methods applied to uncompressed and compressed domains are compatible. A block diagram of forward motion estimation compensation is shown in Figure 4. Note that the DCT coefficients after quantization is zigzag scanned, run-length coded, and finally by Haffman coded. As far as MPEG coded video is concerned the SS based video watermarking algorithm can be described as follows. a. W i t h MPEG coded data, motion vectors and header information remain unchanged. Only the DCT coefficients (either in intraframe coding, or interfiame coding) are used to embed watermark signal. b. Watermark signal for each video frame is generated in the same way as used in watermarking in uncompressed domain, described in Section 2.2. c. Thus generated watermark signal is arranged in a manner compatible to the MPEG data structure, say the hierarchical structure (fi-ame, macroblock and 8x8 block, refer to (Shi and Sun 1999)).
Spread Spectrum Video Data Hiding, Interleaving and Synchronization 525
f -
f MCP
’
ME
FB
-
Figure 4. Forward motion estimation and compensation, T: transformer, Q: quantizer, FB: frame buffer, MCP: motion compensated predictor, ME: motion estimator, e: prediction error, f: input video frame, f P:predicted video frame, f ,: reconstructed video frame, q: quantized transform coefficients, v: motion vector.
526
Y. Q. Shi, J. Huang, and H.-K Lee
d. The watermark signal corresponding to one 8x8 block is DCT transformed. e. The DCT coefficients generated from video sequence and the DCT transformed watermark signal are then added according to the way described in Section 2.2. Indeed, we can see the SS based video watermarking algorithm applied to the compressed domain is compatible to the counterpart applied to uncompressed domain. Of course, however, there are some differences between techques embedding in uncompressed and that in compressed domains. 2.4.1
Bit-rate Constraint of MPEG Coded Data
One difference is that in compressed domain we need to take special measure to prevent video bit-rate from increase. That is, SS based watermarking algorithm when applied in compressed domain may result in data increase, which is not desired and should be avoided. Let us consider a typical case, i.e., after motion compensated predictive coding, the residual errors are DCT transformed, the DCT coefficients are then quantized with some carefblly constructed and selected quantization table. As a result, most of the quantized DCT coefficients may appear to be zero. Zigzag scan is carried out. Runlength coding is applied. Entropy coding, say, Huffman coding is finally uthzed to code the pair of run-length of zeros and the following non-zero DCT coefficient. Thus, one Huffinan codeword represents a non-zero DCT coefficient, both its magnitude and its position. In data embedding, each Huffman codeword is decoded and then converted to a value via inverse quantization. On the other hand, the watermark signal is arranged into a 2-D array of the same dimensionality as the image, and split into 8 by 8 blocks. DCT is applied to each block. The corresponding DCT coefficient thus generated is added to the above-mentioned value. The sum is then quantized and Huf€inan encoded, resulting in a new codeword. This
Spread Spectrum Video Data Hiding, Interleaving and Synchronization 527
process may cause an increase of bit-rate. In order to prevent the bit-rate of marked data from increasing, the embedding in those coefficients, which will increase bit-rate should not be pursued. That is, we do not embed watermark signal whenever it may cause bit-rate increase. Because DC coefficient of each block is encoded with a fixed-length codeword, all DC coefficients can be used to embed data. According to (Hartung and Girod 1998), typically about 1020% of the DCT coefficients are altered for data embedding. 2.4.2 Drift Compensation It is noted that motion compensated hybrid coding used in video coding standards is recursive in nature. Namely, motion estimation is conducted among neighboring fi-ames. Therefore the “drift” caused by data embedding will be propagated and cause severe problem for video quality. T h s is a problem of directly modifjmg DCT-coefficients of an encoded video stream in compressed domain. In order to combat thts type of drift, Hartung and Girod (1988) proposed to add a signal to compensate the drift due to data embedding. T h s signal is simply the difference of DCT coefficients before and after the data embedding.
3
DVD Video Copy Protection
As a result of tremendous efforts, there are two contending proposals on DVD video copy protection in the late 90s. One is known as “Galaxy” proposal jointly supported by Hitacht, IBM, NEC, Pioneer and Sony, another “Millennium” proposal, jointly supported by Philips, Macrovision, and Digimac. These two contenders in the standardization are currently pending. It is noted that the Galaxy and Millennium groups were merged into one group, called VWM (Video Watermarking) group, in April 200 1. In this section, we describe the Millennium approach (Maes et al. 2000).
528
3.1
Y. Q. Shi, J. Huang, and H.-K Lee
Fundamental Principles
As said that encryption alone is not sufficient for video copy protection. Namely, after decryption, which is inevitable in reality for video to be displayed, video data are vulnerable for illegal copying. Therefore, digital video watermarking will play a role in video copy protection. 3.1.1 Targeted Requirements In this specific context, however, targeted requirements have to be determined carefidly. It should be realistic, i.e., a trade-off between performance and cost. As discussed in Section 1, the Millennium proposal aimed at fast detection of watermark, i.e., w i t h the declared DVD detection interval of 10 second. Therefore, highly complicated algorithms are not feasible. Instead, they have to be simple and yet effective. As far as payload is concerned, it targeted at equal or larger than eight bits per detection interval. For robustness, it aimed at common image processing such as compression, noise addition, format conversion, shft, etc. By shift, it is meant that the position of a video frame is varied by a small amount. This may be due to normal image handhg or malicious attack. As to the latter, it is noted that it is easy and inexpensive for a hacker to shift video frame spatially even on a fi-ame-by-fi-amebasis. As a result, this shiR becomes a problem that needs to be addressed. The probability of a false alarm was set as low as less than 10-12per detection. It is worth noting that these requirements are carefully determined. They are compromised results at the current technological levels. 3.1.2 Embedding and Detection in Spatial Domain The design of watermarking scheme followed the above-mentioned requirements. Because of the fast implementation requirement, the Millennium proposal did not choose to use embedding techniques in transform domain. That is, it did not use the global DCT, the wave-
Spread Spectrum Video Data Hiding, Interleaving and Synchronization
529
let transform (WT), and the Fourier transform (FT). It did not use block-by-block transform techques either (which can be implemented faster than the global transform counterparts) because possible spatial she, discussed above, may fail detection easily. Consequently, the proposed t e c h q u e was implemented exclusively in the spatial domain. Specifically, a watermark signal, W = { w i} , is added to a video frame,
X = { x i } , thus generating a marked video frame,
A''= {xi'},where w i, x i , and xi' may be as defined in Section 2.2. Note that in (Maes et al. 2000) W is drawn from a Gaussian distribution with zero mean and unit standard deviation. The embedding is written below. x i 1 =xi + pi. y
(12)
where the scaling factor pi may be a product of two factors, one of which takes care of global scaling while another local scaling. Watermark detection is performed by spatial correlation. That is, COYY
1
= - Cx;'. w ;
(13)
Li
where L is the total number of pixels involved in the correlation. The spatial correlation COYY larger than a threshold indicates the existence of the watermark signal and otherwise the absence of the watermark signal. 3.1.3 Payload
In the Millennium proposal, a video sequence is treated a sequence of still video frames. Pixels of each frame are used to embed watermark signal, and the watermark signal is embedded repeatedly into each frame. T h s strategy is equivalent to first accumulating pixel gray level values and then multiplying the accumulated value with the watermark signal, thus saving computation dramatically. It also
530
Y. Q. Shi, J . Huang, and H.-K Lee
makes algorithm more robust in watermark signal detection. However, this strategy leads to a low payload since one video sequence only contains one-bit information. Apparently, one-bit payload is not enough to meet the payload requirement. To increase payload, two ways are possible. One way is to embed more than one watermark sequence in each video fiame. Another way is to embed a slowly time-varying watermark sequence. These two ways are not exclusive to one another. That is, combination of both ways can also be used to increase payload. It is reported in (Maes et al. 2000) that three or four may be the maximum number of watermark sequences that can be embedded into a video fiame under the constraint of computational complexity and watermark imperceptibility.
3.1.4 Shift Invariance As mentioned, the embedded watermark signal must be robust against spatial shit, because the spatial shift is likely to happen in practice and it will destroy the geographical synchronization that is a necessity for watermark detection. The simplest and straightforward way to keep video watermarlung shift invariant is brute-force search. That is, for each possible spatial shift, perform the spatial correlation; then find out the maximum correlation to determine the shift the video fiame has experienced. This approach is, however, computational prohibitive due to the real-time watermarking requirement. To dramatically reduce computational complexity, the Millennium approach introduced translational symmetry in the watermark signal Witself, expressed below. Wi + k = Wi
(14)
where k is the amount of spatial shift that only assumes multiples of a pre-selected integer N with N chosen as 128. T h s is to say that
Spread Spectrum Video Data Hiding, Interleaving and Synchronization
531
the watermark signal is two-dimensionally periodic. Or in other words, the watermark signal is a tile of an N x N two-dimensional array of random numbers. It is clear that such a watermark signal, which possesses the translational symmetry, makes the search for maximum correlation much simpler in terms of computational complexity.
3.1.5 Synchronization The shifi invariance just discussed above takes care of geometric synchronization. In terms of temporal synchronization, the proposed Millennium approach embeds the same watermark signal to a group of video fi-ames. In addition, watermark signal is formed fi-om Gaussian random sequence. These measures achieve a more robust synchronization.
3.2
Performance
In this subsection, we present performance of the Millennium system reported in (Maes et al. 2000). 3.2.1 Real-time Implementation Real-time watermark embedding has been implemented on TriMedia processor board and FPGA-based board. Real-time watermark detection has been performed on three different platforms, i.e., Silicon Graphics workstation, TriMedia and FPGA. These successes have demonstrated the feasibility of real-time video watermarking, a requirement set by Millennium system.
3.2.2 Robustness Many experiments conducted have shown that video watermarking implemented by the Millennium system is robust against MPEG-2 compression down to 2.5 Mbps. M P E G compression, D/A and A/D conversion, PAL conversion, noise addition, quantization, sub-
532
Y. Q. Shi, J. Huang, and H.-K Lee
titling and logo insertion, cropping, fi-ame erasure, speedups and transmission errors. It is noted that, however, there is no concrete robustness test performance available in public.
3.3
Concluding Remarks
In conclusion, we would like to point out that the ftamework of the Millennium system, reported in (Maes et al. 2000), has provided us with a picture of the state-of-the-art in digital watermarking for DVD video copy protection. Based on the targeted requirements, design of watermarking system was carehlly carried out with various signal processing techques. These requirements, in fact corning from compromised decision, aiming at hlfillment of most critical and practical and yet feasible requirements with the current technologies. Consequently, the system works well with respect to these requirements. For the issues for hrther investigation, including copy control, interested readers are referred to (Maes et al. 2000).
4
Interleaving to Combat Random and Bursts of Errors in Video Data Hiding g
So far what we have discussed in thts chapter are concerned with video watermarking, in particular, for DVD video copy protection. However, video data hiding has a wider scope. That is, video data hiding may be used for other applications such as covert communications, annotation, control, authentication, data security and other purposes. Therefore, it is necessary to discuss some issues, which may or may not relate to video copy protection. In this and the next sections, two of these issues wdl be addressed. In this section, we present our initial investigation on interleaving to combat bursts (clusters) of errors, which occurred to video with htdden data. In
Spread Spectrum Video Data Hiding, Interleaving and Synchronization
533
the next session, we present our initial work on synchronization, whch is related to video data hding for both copy protection and for applications other than copy protection. It is well-known that robustness is one of the basic requirements for imperceptible data hding in some applications. Error correction codes (ECC) have been adopted to improve the robustness of watermark signal in (e.g., Huang et al. 1998, Huang and Sh 2002). As shown next, however, ECC is only suitable to correct random errors, and will not be efficient for correction of bursts of errors. When cropping or random rows/columns removal, also known as jitter attack, takes place in a stego-image, bursts of errors do occur in watermarked images. Frame loss and 3-D error clusters are typical bursts of errors that can occur to watermarked video sequences. Transmission error may be another source of bursts of errors. When bursts of errors occur, how to extract and detect the hidden data correctly becomes a challenge. While using ECC alone to correct these bursts of errors is not efficient, surprisingly, combating bursts of errors using interleaving, a common tool used in communication systems, has been neither recognized nor addressed so far in the data hiding community. In this section, we first introduce philosophy of interleaving, followed by the t-interleaved array technology for multi-dimensional (M-D) interleaving. We further point out why this technology does not fit the image and video data hiding well. Then, the novel successive packing (SP) approach to 2-D/3-D interleaving is discussed. Finally, we present our initial investigation on applying 2-D/3-D successive packmg interleaving techmques to combat bursts of errors occurred in marked still itnagelvideo sequence. The experimental results demonstrate that the robustness of hidden data inside still imagehide0 sequences against bursts of errors is significantly improved by using 2-D/3-D SP interleaving followed by ECC.
534
4.1
Y. Q. Shi, J . Huang, and H.-K Lee
Introduction
In data manipulation and transmission, errors may be caused by a variety of factors including noise corruption, limited channel bandwidth, and interference between channels and sources. It is well known that many ECCs have been developed to correct errors in order to ensure data fidelity. In doing so, redundancy has been added to ECC, resulting in what is known as random error correction codes. There are basically two different types of ECCs. One is known as block codes, another convolutional codes. The fi-equently used block codes are oRen denoted by a pair of two integers, i.e., (n, k), and one block code is completely defined by 2kbinary sequences, each is an n-tuple of bits, known as codeword. Note that for simplicity only binary code symbols are considered in this section. Specifically, consider the commonly used BCH codes, which are one kind of block codes, named after the three inventors Bose, Chadhuri, and Hocquenghem (Bose and Chadhuri 1960, Hocquenghem 1959). The notation BCH (3 1,6) indicates that there are at most 26 distinct messages, each represented by six bits and encoded by a codeword consisting of 31 bits in this BCH code. Instead of six bits, 31 bits are used to represent an input symbol, implying an added redundancy. According to channel coding theory, the minimum Hamming distance between any two different codewords in the BCH (31,6) code is 15, and the error correction capability of the code is seven. In other words, a 31-bit codeword in the BCH (31,6) code can be correctly decoded as long as there are no more than seven error bits regardless of the bit error positions within the codeword. That is why random ECC refers to its ability to correct random bit errors w i t h a codeword.
4.1.1 ECC Alone Cannot Correct Bursts of Errors Efficiently Bursts of errors are defined as a group of consecutive error bits in the one-dimensional (1-D) case or connected error bits in M-D cases. In t h s sense, we can see that the channel has memory. One
Spread Spectrum Video Data Hiding, Interleaving and Synchronization
535
example can be several consecutive transmitted error bits in a mobile communication system caused by a multipath fading channel. Another example can be an area formed by many connected error bits in a 2-D barcode. In (Wicker 1995), a bursty channel is defined as a channel over which errors tend to occur in bunches, or ccbursts,”as opposed to the random patterns associated with a Bernoulli-distributed process. Therefore, a random error correction code, when applied, may not be powerful enough to correct the bursts of errors and at the same time it may be a waste for other occasions (1-D case) when there are no bursts of errors, or regions (M-D case) where there are no bursts of errors. For instance, consider a case in which there is one burst of errors consisting of 60 consecutive bits. Obviously, the BCH (31,6) code is not be able to correct t h s burst of errors. One may thuik of using a more powerfbl BCH code to combat t h s burst of errors. For example, BCH (255, 9) seems to be a suitable candidate since it can correct 63 errors in a 255-bit codeword. Indeed, this error burst consisting of 60 consecutive error bits can be corrected by this powerful BCH code. However, for vast majority of time, the error correction capability at the expense of high redundancy (each codeword now consists of 255 bits) has been wasted. This example demonstrates that using random ECC to combat bursts of errors is not efficient.
4.1.2 The &InterleavedArray Approach to M-D Interleaving Although some codes, including Fire codes (e.g., Imai 1973), suitable for correcting bursts of errors have been developed, they are not efficient for random error correction. In most practical systems, unfortunately, both types of errors may exist. By far, interleaving before applymg random ECCs is a most fi-equently used and efficient way to combat bursts of errors and random errors. In this subsection, first M-D bursts of errors and optimality of interleaving are defined. Afterwards, the t-interleaved array M-D interleaving techniques (Blaum et al. 1998) are presented.
536
Y. Q. Shi, J. Huang, and H.-K Lee
2-0 and M-D Bursts of Errors The scenarios, where M-D error bursts may occur, include magnetic and optical (say, holographic) data storage, charge-coupled devices (CCDs), 2-D barcodes, and dormation hiding in digital images and video sequences. In particular, it is worth mentioning that in the holographic recording a laser beam illuminates a programmable spatial light modulator thereby generating an object beam, which represents a 2-D page of data. An entire page of data can be retrieved all at once, thus achieving a very high data rate. Therefore, the reliability issue of M-D information has arisen as an important task, having both theoretical and practical significance.
Figure 5 . A 2-D burst of errors of size 10.
Instead of defining a burst of errors as a rectangular area or a circular area, Blaum et al. defined a 2-D burst of errors as an arbitrarilyshaped, connected area (Blaum et al. 1998). Consider Figure 5, where all the code symbols (assigned to the elements of the 2-D array) marked with triangles form a 2-D error burst. Note that all of these symbols are connected to each other and the connectivity here is constrained to the horizontal and vertical directions, referred to as 4-connection. This definition can be generalized to the M-D case. The size of a burst is defined as the total number of code symbols contained in the burst. Hence, the size of the error burst in Figure 5 is 10.
Spread Spectrum Video Data Hiding, Interleaving and Synchronization 537
...
......
.....
.......
...
...
.....
..
6 1 7 I
(a) Interleaving degree is 8.
8 ! 1
2 1 3
4 / 9
(b) Interleaving degree is 10.
Figure 6. Two 4-interleaved arrays with different interleaving degrees.
t-Interleaved Array, Interleaving Degree and Optimality Blaum et al. (1998) introduced the concept of the t-interleaved array. Consider the two 4-interleaved arrays shown in Figure 6. By a 4interleaved array, it is meant that no matter how we choose four 4connected elements in the array we always have these four elements marked with distinct numbers. Assuming that in Figure 6 all elements in the 2-D array denoted by the same number form one codeword, we can then conclude that whenever a burst of errors of size four takes place w i t h the 4-interleaved 2-D array each codeword will encounter at most one error. If hrther assuming that the code has a one-random-error-correction capability, we can see that the error burst can be corrected. Through this discussion, it is observed that a t-interleaved array together with a random error correction code having one-random-error-correction capability can combat an error burst of size t. Without loss of generality, only one error burst is discussed in this subsection. A close look at Figure 6 (a) and (b) reveals that there are a total of eight and 10 distinct numbers, i.e., eight and 10 distinct codewords in (a) and (b), respectively. The total number of distinct codewords
538
Y. Q. Shi, J . Huang, and H.-K Lee
is referred to as the interleaving degree. Optimality of interleaving is achieved if the interleaving degree reaches its lower bound. The lower bound for correcting arbitrarily-shaped error bursts has been proved to be: t 2/ 2 if t is even and (t’ + 1)/ 2 if t is odd (Blaum et al. 1998). Thus, for a 4-interleaved array, the lower bound of the interleaving degree is equal to eight. Therefore the code depicted in Figure 6 (a) is optimal, while that in Figure 6 (b) is not. It has been shown that optimality can be guaranteed for the 1-D and 2-D cases; however, this is not always true for the 3-D case (Balum et al. 1998, Golomb and Welch 1970). Basic Ideas and Algorithms The key idea of the Blaum et al. ap-
proach is based on the Lee-spheres and the close t h g . Llnking the Lee spheres with the odd burst sizes and creating some spheres for the even burst sizes, Blaum et al. used these spheres as fundamental building blocks to construct interleaved arrays via closely titling. Lee sphere of radius 1 is shown in Figure 7 (a), where one can see that the maximum 4-distance (only considered in either horizontal or vertical directions) from the central element to any other elements in the sphere is equal to one. This Lee sphere can be used as a fundamental building block to construct (closely tile) a 3-interleaved array as shown in Figure 7 (b). Note that, in Figure 7 (b), there is a square array of 5x5 enclosed by solid lines and formed by several whole and partial Lee spheres of radius 1 (also bounded by solid lines). It is further noted that the 3-interleaved arrays shown in Figure 7 (b) can in turn be used as the building block to closely tile 3interleaved array of larger size. This can be verified by the observing that the five central elements in the first row in Figure 7 (b): 3,4,5,1,2 with dashed lines repeat themselves in the last row within the solid line square. The same is true for the five in the bottom row, the left-most column, and the right-most column of Figure 7 (b), respectively. By closely titling, it is meant that the translated building blocks are used to construct a larger block, which is the union of the translated building blocks and there is no overlapping among the
Spread Spectrum Video Data Hiding, Interleaving and Synchronization
539
translated building blocks in the process. (For more information on these two concepts, i.e., the Lee sphere and close tiling, interested readers may refer to (Golomb and Welch 1970).) Blaum et al. have shown that if one labels each element in the fundamental building block with a distinct number and uses the building block to closely (meaning no uncovered elements) tile (meaning no overlapping between blocks) a large enough 2-D area, then one can produce a tinterleaved array. In t h s interleaved array, each element in any arbitrarily-shaped, connected subset consisting of t elements is labeled with a distinct number. All numbers of the same kind form a codeword. Consequently, the error burst of size t can be corrected by one-random-error-correction codes.
2 ' 3 , 4
(a)Lee sphere of radius 1
(b) 3-interleaved array
Figure 7. 3-interleaved array and its fundamental building block, Lee sphere of radius 1.
Comments and Discussions Though it can effectively spread arbitrarily-shaped 2-D burst errors of size t, the above characterization of the t e c h q u e (Blaum et al. 1998) does reveal some of its h t a tions on the other hand. Firstly, the technique is based on the size of a burst of errors, t. For combating bursts of errors of size t equal to a specific to, one needs to implement the algorithm with a set of parameters to construct an interleaving code. When the size t in-
540
Y. Q. Shi, J. Huang, and H.-K Lee
creases, i.e., t>to, one needs to implement the algorithm with a new set of parameters to construct another interleaving code. That is, the interleaved array constructed for a specific to may not be able to correct a burst of errors of size t as t>to. Since in reality, e.g., in the application of 2-D barcodes, the size of error bursts may not be known exactly a priori, the implementation of the technique may become cumbersome and ineffective. Secondly, when the actual size of a burst, t, is less than to, with which the interleaving algorithm is applied, the techmque is no longer optimal. This can be justified as follows. As mentioned, the optimality means that the interleaving degree reaches its lower bound. And in the 2-D case the interleaving degree, associated with an interleaving scheme designed for some burst size t in (Blaum et al. 1998), is guaranteed to reach its lower bound. Furthermore it is known that the lower bound of the interleaving degree is a monotonically increasing hnction of the burst size t. Specifically, the lower bound is t2/2 for even t and (t2+1)/2 for odd t. Therefore, with respect to the implementation of the interleaving scheme designed for a burst size to, when the actual size of an error burst, t, is smaller than to, the achieved interleaving degree with tois larger than the lower bound that corresponds to t. That is, the interleaving scheme designed for a burst size to is not optimal for a smaller bust size, t. In many applications, the size of a given 2-0 or M-D array is known. For instance, a digital (watermarked) image may be known to have a size of 512 by 512 pixels. Under the circumstances, one may wonder if it is possible to develop a 2-D interleaving technique, whch is optimal for all (if possible) or (at least) for many of the possible error burst sizes. Therefore, it can be implemented only once for a given 2-D array. Motivated by these observations, a novel 2-D interleaving technique, called successive packing approach, has been proposed ( S h and Zhang 2002).
Spread Spectrum Video D a t a Hiding, Interleawing and Synchronization
4.2
541
2-D/3-D Successive Packing Interleaving
Given that digital images, video frames, charge-coupled devices (CCDs), and 2-D bar-codes are all in the form of 2-D arrays, without loss of generality, square arrays of 2" x 2"are considered here. The utilization of 2" x 2" arrays will be firther justified later.
4.2.1 2-D Codewords and 1-D Sequence of Code Symbols In general, the codewords in the 2-D case are of 2-D in nature. 1-D codewords, either row-type, or column-type, or other-type, can be considered as special cases of 2-D codewords. The successive packing t e c h q u e is able to handle 2-D codewords because all the code symbols in the 2-D codewords are first lmked into a 1-D sequence of code symbols. Without loss of generality, the quartering indexing scheme is described below for illustrative purposes. That is, a square array of 2" x 2" is viewed as consisting of four quadrants, each quadrant itself consisting of its own four quadrants; the process repeats itself until it reaches a level where all four quadrants are of 2 x 2 . This is referred to as 2-D successive doubhg, as shown in Figure 7. These 2 x 2 arrays are the fundamental structure. When the quartering indexing scheme is applied, each code symbol, assigned to an element of the array has a pair of subscripts. The first subscript represents the index of the 2 x 2 array in which the code symbol is located, whde the second subscript indicates the index of the code symbol w i t h the 2 x 2 array. To convert the quartering index, Sij , into the 1-D index, sk, we apply the following operation: k=4i+j.
542
Y. Q.Shi, J . Huang, and H.-K Lee
(a) Square block
(b) Row block
(c) Broken line block
(d) Coluinn block
Figure 8. Four different types of 2-D codewords having four code symbols.
Quartering indexing is not the only choice for the proposed interleaving technique. Actually, codewords can be of any shape. Several shapes of a codeword consisting of four code symbols are shown in Figure 8. Obviously, for any given shape of 2-D codewords, it is always possible to label the code symbols into a 1-D sequence with a possibly more complicated bookkeeping scheme.
4.2.2 The Successive Packing Algorithm Now we present the successive packing interleaving technique in the 2-D case in a general and compact way, which allows straightforward generalization to the M-D case. The 2-D interleaving using the successive packing proceeds as follows. Consider a 2-D array of 2" x 2" for 2-D interleaving. When n = 0, i.e., an array of 1 x 1 is considered for interleaving, the interleaved array is the original array itself. That is,
Spread Spectrum Video Data Hiding, Interleaving and Synchronization
543
where sorepresents the element in the array, and S1 the array. Note that the subscript in the notation S, represents the total number of elements in the interleaved array. Hence, when n = I , i.e., for a 2 x 2 array, the interleaved array is denoted by S4; when n = 2, the interleaved array is SI6. In general, for a given n, the interleaved array is denoted by SZ2" . The procedure is carried out successively. Given an interleaved array Si, the interleaved array of S4i can be generated according to 4xSi + O
4xSi + 2
4xSi+3
4xSi+l
where the notation of 4xSi + k with k=O, 1, 2, 3 represents a 2-D array that is generated from Si. T h s indicates that 4 x Sj + k has the same dimensionality as Si. Furthermore, each element in 4 x Si+ k is indexed in such a way that its subscript equals to four times of that of the corresponding element in Siplus k. By the corresponding element, we mean the element occupying the same position in the 2-D array. It appears that S4i is derived from Sjby packing Sj four times. T h s explains why the term successive packing is used. According to the above rule, we have
S d a r l y , we have S,, as follows.
544
Y. Q. Shi, J. Huang, and H.-K Lee r
1
The resemblance between the successive packing interleaving and the fast Fourier transform (FFT) is observed. Firstly, the successive doubling mentioned before is also used in FFT. Secondly, after the successive doubling, what is left here is a 2 x 2 basis array, which is expressed in (3) and depicted in Figure 9. This 2x2 basis array is the counterpart of the basic butterfly computation structure used in FFT. Thirdly, both techmques work on a group of data whose dimensionality is an integer power of two to facilitate utilization of digital computers.
Figure 9. 2 X 2 basic array.
4.2.3 Main Results It has been proved in (Shi and Zhang 2002) that in a 2-D interleaved array of 2" x 2", A, generated with the successive packing techque, any square error burst of 2k x 2k with 1 2 k I n - 1 and any rectangular error burst of 2k x 2k+' or 2k+* x 2k with 0 I k I n - 1 can be spread so that each element in the burst falls into a distinct block in the de-interleaved array, where the block size, K, is 22n-2k for the burst of 2k x 2 k , and 22n-2k-' for the burst of 2k x 2k+' or 2k+1x 2k . T h s indicates that, if a distinct code symbol is assigned to each element in a block and all the code symbols associated with the block form a distinct codeword, then t h s t e c h q u e guarantees that the error burst can be corrected with a one-random-error-
Spread Spectrum Video Data Hiding, Interleaving and Synchronization
545
correction code, provided the code is available. (Note that the code capable of correcting one code symbol error w i t h a codeword of two code symbols does not exist in reality. Therefore, though the error burst of 2"-' x 2" or 2" x 2"-' can be effectively spread in the de-interleaved arrays as described above, they in fact cannot be corrected with a one-random-error-correction code.) Furthermore, the interleaving degree equals to the size of the burst error, hence minimizing the number of codewords required for an interleaving scheme. In other words, the interleaving degree obtained by the successive packing interleaving is indeed the lower bound (Shi and Zhang 2002). In this sense, the successive packing interleaving technique is optimal. If a coding technique has a strong randomerror-correction capability, say, it can correct one error in every codeword of size eight, then any error burst of 2"-' x 2"-2 or 2"-2x 2"-' can be corrected. If a code, on the other hand, has a weaker random-error-correction capability, say, it can only correct one random error w i t h a codeword of size 64, then only smaller error bursts, i.e., any burst of 2"" x 2n-3in the interleaved array can be corrected by the successive interleaving.
4.3
Simulation Results
In this subsection, we report our initial investigation on applying 2D/3-D successive paclung interleaving techniques to combat bursts of errors occurred in still imagedvideo sequences data hiding. The experimental results demonstrate that the robustness of hidden data inside still imageshideo sequences against bursts of errors is significantly improved by using 2-D/3-D SP interleaving followed by ECC.
4.3.1 Applying 2-D SP Interleaving to Enhance Robustness of Still Image Data Hiding Consider the "Lena" image ( 2 5 6 x 2 5 6 ~ 8 )shown in Figure 10 (a), a widely used image for image processing experiments. The data hiding is carried out in the block discrete cosine transform (DCT) do-
546
Y. Q. Shi, J. Huang, and H.-K Lee
main. First, the image is split into non-overlapped blocks of 8x8 pixels each. Then, DCT transform is applied to each block. The largest three AC DCT coefficients are used to embed bits. If the bit to be embedded is “I”, the coefficient is added by a quantity A (e.g., A = 6 is empirically used in our simulations). If the bit is “0”, the coefficient is subtracted by A . We use six bits to represent one symbol. The ECC used in our simulations is the BCH (31,6) code. Hence, 99 symbols are embedded. The image is scanned three times. Each scan embeds 1/3 of the total bits (about 1024 bits) in an AC coefficient having the same position w i t h each block. These 1024 bits are first interleaved using the 2-D SP interleaving techmque, and are then embedded into the AC coefficients. For the data extraction, parts damaged by the error burst are replaced with “0”s. Without 2-D interleaving, we simply embed bits block by block, say, from left to right, from top to bottom through out the whole image. In each block, we embed three bits. The experimental results are shown in Figures 11.
(a) Original Lena image
(b) Marked Lena image
Figure 10. The original and marked Lena images.
Spread Spectrum Video Data Hiding, Interleaving and Synchronization 547 I-hjth
?D interleaving
\hMhout 2D interleaving1
Figure 11. Test results of square error bursts.
In Figure 11, the horizontal axis represents the size of the 2-D error burst. For example, size 2 means that the error burst is a square area of 2 x 2 blocks (each block is of 8 x 8 pixels). That is, the square error burst has a size of 16x 16 pixels. The vertical axis stands for the symbol error rate (SER). In the simulation, we consider all of the possible positions of the error burst. Then, the arithmetic average of SERs corresponding to the bursts occupying all possible positions is reported in the figure. As shown in the figure, without interleaving, the SER emerges when the error burst size is larger than four. With 2-D SP interleaving, the SER is still zero even when the error burst size is 16. T h s indicates that even when a quadrant of the marked Lena image has been in error, the SER is still zero, implying significant improvement of robustness. Note that when the error burst size is larger than 22, the SER with interleaving will be larger than that without interleaving. In t h s case, almost half of the image has been
548
Y. Q.Shi, J. Huang, and H.-K Lee
damaged. The SER for both algorithms with and without interleaving has been almost 50%, practically rendering both algorithms useless in this case. 4.3.2 Applying 3-D SP Interleaving to Enhance Robustness of Video Sequence Data Hiding A video sequence is a set of successive frames. Each frame is a 2-D image. Here we consider two types of bursts of errors that may occur to the data embedded in a video sequence. The first type of error bursts is fi-ame loss. Frame loss may occur in video transmission, especially when the video is transmitted through a bursty and noisy channel. The second type of error bursts is what is known as 3-D error bursts. Since there is a high correlation among the successive frames, a 2-D error burst sometimes leads to the errors approximately in the same location of the succeeding frames, thus causing a 3-D error burst. For these two types of errors, again, we conducted our simulation in a way slrmlar to that used for image data hiding. The testing video sequence has 32 frames. Each frame is a gray image of 256x256. We split each fi-ame into non-overlapped blocks of 8 x 8 pixels each. As a result, we have 3 2 x 3 2 ~ 3 2blocks within the entire sequence. DCT transform is applied to each block and the data bits are embedded into the three largest AC DCT coefficients in each block. Again, six bits are used to represent one symbol and the BCH (3 1,6) code is used. Data embedding is carried out in three scans. In each scan, we embed 3 2 ~ 3 2 ~ 3 2 = 3 2 , 7 6bits 8 (equivalent to 1,057 symbols) in an AC coefficient having the same position in each block. These 32,768 bits are first interleaved using the 3-D SP interleaving technique, which is a straightforward extension fiom 2-D SP to the 3-D case (Zhang et al. 2002, Elmasry 1999), before being embedded into the AC coefficients. In the data extraction, error parts are filled with “0”s. Without 3-D interleaving, we simply embed bits block by block, say, fiom left to right, from top to bottom, and from
Spread Spectrum Video Data Hiding, Interleaving and Synchronization
549
front to rear. In each block, we embed three bits. Figures 12-13 show the simulation results, demonstrating that 3-D interleaving can greatly improve the robustness of data hiding.
Figure 12. Test results of video frame loss.
In Figure 12, the horizontal axis denotes the number of lost frames (that are consecutive). From t h s figure, it is seen that when eight frames are lost, the SER with interleaving is almost zero while that without interleaving is about 25 percentages. Note that when 15 frames are lost, the error rate with interleaving will be higher than that without interleaving, in which case almost half of the 32 frames are lost. The SER for both algorithms with and without interleaving has been almost 50%, implying that the hidden data have been severely damaged.
550
Y. Q. Shi, J. Huang, and H.-K Lee
In Figure 13, the horizontal axis is the size of the 3-D error burst. For example, size 2 means that the error burst is a cubic volume of 2 x 2 x 2 blocks (each block is of 8 x 8 pixels). That is, the cubic error burst has a size of 16 x 16 pixels in two consecutive frames. With interleaving, the SER is still zero even when the error burst size is 16 (one eighth of the blocks are lost), while without interleaving the SER is more than 12%. Clearly, the significant improvement on the robustness of hidden data against 3-D bursts of errors has been acheved. t
1
2
3
4
5
B
7
-t With
a
9
3D interleama
10
11
12
13
--
Without 3D interleawna
14
15
16
17
I
I B 19 a? 21
22
23 24
a
26
3-D error burst size
Figure 13. Test results of 3-D error burst.
In summary, from our initial investigation of applying 2-D/3-D SP interleaving techniques to enhance the robustness of hdden data in still imagedvideo sequences, it is shown that in either still image or in video sequence data hiding, the SP interleaving can greatly enhance the robustness of hidden data against bursts of errors. Therefore, 2-D/3-D successive packing interleaving can play a promising role in enhancement of robustness in imagehide0 data hiding. It is
Spread Spectrum Video Data Hiding, Interleaving and Synchronization
551
expected that the 2-D interleaving that can make 2-D data more robust and reliable can also find important applications in the areas such as 2-D bar-codes and holographic storage.
5
Frame Synchronization in Video Data Hiding
As discussed in Section 2.2, pixels in each fiame of a video are cascaded in, say, a row-by-row or column-by-column manner, then the whole video sequence is cascaded in a fiame-by-fiame manner so that all pixels in a video sequence are cascaded into a huge l-D sequence. In spread spectrum video data hiding, we need to do correlation for each group of pixels that are corresponding to one bit in data extraction. Ths indicates that data detection heavily relies on synchronization. This is true in general for all spread-spectrum technologies. The synchronization of all pixels in the video sequence, which are 3-D in nature, becomes a critical issue, and is not easy to handle. It is also discussed that brute-force search for best matching is not an efficient to acheve synchronization.
5.1
Embedding Side-information for Synchronization
It is known that for coherent modulation digital communication systems, a three-level synchronization: phase, symbol and fiame synchronization is required. For noncohenrent modulation, a different three-level synchronization: fiequency, symbol and fiame synchronization is necessary (Sklar 1988). Therefore embedding some sideinformation is a way to acheve synchronization. In (Liu et d.2001), fiame synchronization was achieved by embedding fiame numbers as side-information into each video fiame.
552
Y. Q. Shi, J. Huang, and H.-K Lee
5.1.1 Introduction to the Algorithm by Liu et al. (2001) T h s algorithm embeds watermark signal and frame number into uncompressed domain, specifically, in discrete wavelet transform (DWT) domain, a three-level DWT using Daubeches 97 technique. A number of features of this video watermarking techniques are listed below. (1) Instead of embedding watermark signal into hgh frequency subbands, the watermark signal and side-information are embedded into the LL subbands in order to achieve stronger robustness. (2) In order to combat possible bursts of errors and random errors that may occur in watermarked video sequences, the newly developed, efficient 3-D interleaving techmque (Shi and Zhang 2002) and with an error correction code, BCH (6 1,S), are u t k e d .
(3) The security of the watermark signal is enhanced by modulating the watermark with a random sequence. I
Original
7
I
3-D ,interleaving
LL3 coefficients
4
Figure 14. Block diagram of the proposed algorithm.
Spread Spectrum Video Data Hiding, Interleaving and Synchronization 553
(4) Frame number is embedded into each frame as side-information to achieve frame synchronization. The watermark signal is embedded into the centered portion of the LL3 sub-band in the DWT domain of a video frame. The rest four corners can be used to embed the frame number. Considering the phenomenon that people pay more attention to the centered area of the image whde less to the boundary especially to the four corners of the image, the synchronization information is embedded with extra strength, compared with watermark signal.
A block diagram of the proposed algorithm is shown in Figure 14.
5.1.2 Experimental Results It was reported that the algorithm has been tested on various CIF video sequences, including “Salesman,” “Mobile,” and “Paris.” Total of 96 frames in these sequences are used to embed data. An 1140-character string can be hdden. 128 bytes (including the redundant bits in EEC) can be embedded into one CIF video fkame. Note that this payload is much larger than that in the DVD video copy protection, discussed in Section 3. The experimental results with the “Salesman”, including invisibility and the robustness to some attacks such as fkame loss, MPEG-2 coding, and rescale are shown in Figures 15, 16, and 17, respectively. Similar results have been obtained with other video sequences. Figure 15 demonstrates the invisibility of the hidden data with the proposed algorithm, where 15 (a) is one of the original frames of the “Salesman” sequence, 15 (b) is the data hidden fi-ame. The embedded data are perceptually invisible when we compare two video frames. It was reported that the experiments also showed that the dynamic invisibility of the watermark could be guaranteed when the marked frames are played. The tests of the robustness of the data hdden video sequence against consecutive and random discrete
554
Y. Q. Shi, J . Huang, and H.-K Lee
frame loss were carried out. The comparison between the performance of with the 3-D interleaving and without the 3-D interleaving was made. The frame loss rate is defined as f r = number of lost frames/nurnber of total frames.
(19)
Figure 16 (a) and (b) demonstrate the robustness of the embedded data against consecutive and random discrete fi-ame loss, respectively. It is observed that the robustness with the 3-D interleaving is much stronger than that without interleaving when f r < 0.5. The experiments also demonstrate that the algorithm achieves almost zero byte error rate when the frame loss rate is about 0.4, no matter how the frames are lost, randomly or consecutively.
(a) Original video frame
(b) Marked frame (PSNR=49.23 dB)
Figure 15. Demonstration of invisibility.
Robustness against MPEG-2 compression at different compress ratio (CR) is tested. Figure 17 is the performance curve of the robustness to MPEG-2 coding. The algorithm can correctly detect the embedded character string when CR=11.6, and data rate is 2.7Mb with the PSNR ofvideo frames equal to 41.12 dB. The byte error rate is less than 1%, when CR is within 13.81. In addition, we test the robustness to rescale (scale 2.0) attack. The embedded information can be detected error-freely.
Spread Spectrum Video Data Hiding, Interleaving and Synchronization 555
10
1
-+no-interleaving 4
0.8.
interleaving
.
0.8
+a
a,
?!
c
E 0.6
b 0.6-
2
L
r 0.2.
ba, 0.4
f
al
+'
2
+/'
,:" '
f
0.2
A,-.
0
Figure 16. Robustness against frame loss.
32
34
36
38
40
42
PSNR(dB) Figure 17 Robustness against MPEG-2 coding
44
556
Y. Q. Shi, J. Huang, and H.-K Lee
Acknowledgments T h s work was supported in part by New Jersey Commission of Science and Technology via NJCWT and NJWINS, New Jersey Commission of Higher Education via NJ-I-TOWER, and NSF via IUCRC; by NSF of China (69975011,60172067,60133020), “863” Program (2002AA144060), NSF of Guangdong (0 13164), Funding of China National Education Ministry; and by the KOSEF through the AITrc in Korea.
References Blaum, M., Bruck, J., and Vardy, A. (1998), “Interleaving schemes for multidimensional cluster errors,” IEEE Transactions on Information Theory, vol. 44, pp. 730-743. Bose, R.C. and Ray-Chaudhuri, D.K. (1960), “On a class of error correcting binary group codes,” Information and Control, vol. 3, pp. 68-79.
Cox,I.J., Khan, J., Leighton, T. and Shamoon, T. (1997), “Secure spread spectrum watermarking for multimedia,” IEEE Transactions on Image Processing, vol. 6, pp. 1673-1687. Elmasry, G.F. (1999), “Detection and robustness of digital image watermarking signals: a communication theory approach,” Ph.D. dissertation, Department of Electrical and Computer Engineering, New Jersey Institute of Technology, August 1999. Golomb, S.W., and Welch, L.R. (1970), “Perfect codes in the Lee metric and the packing of polyominnoes,” Slam Journal on Applied Mathematics, vol. 18, pp. 302-317.
Spread Spectrum Video Data Hiding, Interleaving and Synchronization 557
Hartung, F. and Girod, B. (1998), “Watermarking of uncompressed and compressed video,” Signal Processing, vol. 66, pp. 283-301. Hocquenghem, A. (1959), Codes Correcteurs d ’Erreurs, Chiffres, V O ~ .2, pp. 147-156. Huang, J., Elmasry, G., and Shi, Y.Q. (1998), “Power constrained multiple signahg in digital image watermarking,” Proceedings of 1998 IEEE Workshop on Multimedia Signal Processing, pp. 388-393, Los Angeles, CA. Huang, J. and Shi, Y.Q. (2002), “Reliable information bit hiding,” IEEE Transactions on Circuits and Systems f o r Video TechnolOD, V O ~ .12, pp. 916-920. Imai, H. (1973), “Two-dimensional fire codes,” IEEE Transactions on Information Theory, vol. 19, pp. 796-806. Langelaar, G.C, Lagendijk, R.L., and Biemond, J. (1998), “Realtime labeling of MPEG-2 compressed video,” Journal of Visual Communication and Image Representation, vol. 9, pp. 256-270. Liu, H., Chen, N., Huang, J., Huang, X., and S h yY.Q. (2002), “An robust DWT-based video watermarking algorithm,” Proceedings of IEEE International Symposium on Circuits and Systems, Phoenix, AZ. Maes, M., Kalker, T., Linnartz, J.M.G., Talstra, J., Depovere, G.F.G., and Haitsma, J. (2000), “Digital watermarking for DVD video copy protection,” IEEE Signal Processing Magazine, vol. 17, pp. 47-57. Shi, Y.Q., Ni, Z.C., Huang, J., Ansari, N., and Su, W. (2003), “A successive 2-D/3-D interleaving to enhance robustness of im-
558
Y. Q. Shi, J. Huang, and H.-K Lee
agelvideo data hiding,” IEEE International Symposium on Circuits and Systems, Bangkok, Thailand. Shi, Y.Q. and Sun, H. (1999), Image and Video Compression for Multimedia Engineering, CRC Press, Boca Raton, FL. Shi, Y.Q. and Zhang, X.M. (2002), “A new two-dimensional interleaving t e c h q u e using successive packmg,” IEEE Transactions on Circuits and Systems, Part I: Fundamental Theory and Application, vol. 49, pp. 779-789. Sklar, B. (1988), Digital Communications, PTR Prentice Hall, Englewood Cliffs, New Jersey. Wicker, S.B. (1995), Error Control System for Digital Communication and Storage, Prentice-Hall, Inc, Englewood Cliffs, NJ. Zhang, X.M., Shi, Y.Q., and Basu, S. (2002), “On successive approach to multidimensional interleaving,” Proceedings of the FiJteenth International Symposium on Mathematical Theory of Networks and Systems (MTNS02), University of Notre Dame.
PART IV
Practical Issues in Watermarking and
Copyright Protection
This page intentionally left blank
Chapter 19 Video Watermarking: Approaches, Applications, and Perspectives Alessandro Piva, Roberto Caldelli, and Mauro Barni This chapter aims at giving an introduction to some general problems related to video watermarking. First of all the importance of video watermarlung is reviewed by making reference to possible application scenarios, including DVD protection, broadcast monitoring, indexing, annotation, object based video protection. The main differences and similarities with respect to still image watermarking are introduced and carefully discussed. Then we classify the possible approaches to video watermarking into broad categories. A first distinction is made between techniques operating in the raw and in the compressed domain. We review the main advantages and drawbacks of both the approaches and present some case studies to better illustrate the concepts we introduced. Case studies will consider watermarking in the spatial as well as in the frequency domain. As a second perspective, we consider systems operating frame by frame and systems watermarking the video sequence as a whole. Once again we review the main weakness and strengths of both the approaches and illustrate them by means of a couple of examples.
In the last part of the chapter we consider watermarking of video objects for the protection of MPEG4 video streams, or part of them. By following the same approach used in the previous sections, we first 56 1
562
A . Piva, R. Caldelli, and M. Barni
discuss the main problems and challenges set by object-based watermarking, then we describe two practical systems to better illustrate the concepts introduced before. The chapter ends with some conclusions summarizing the state of the art in the video watermarking field and pointing directions for future research.
1
Introduction
From a very general perspective, video watermarking is not much different than watermarking of any other media, especially still images. This is particularly true when the possible applications of video watermarlung are considered. For this reason, this chapter does not discuss video watermarking from scratch, on the contrary, it mainly focuses on the peculiarities of video watermarking with respect to the video case. Such an approach dictates the content and organization of this chapter. After a brief review of the main applications of video watermarlung, we classify video watermarking algorithms according to whether the watermark is inserted (retrieved) in the raw or in the compressed domain (section 2). Such a distinction plays a fundamental role in the video case, as the usefulness of raw video is limited only to very specific applications while in the large majority of cases a compressed format is used. Another important distinction, that is not necessary in the still image case, regards system treating the video as a sequence of still images which are marked independently, and systems treating the video sequence as a whole. The pro’s and con’s of the two different approaches are discussed in section 3. In section 4 we consider object-based watermarlung, a concept gaining more and more importance to the increasing diffusion of objectbased video compression standards, such as MPEG-4. Let us start, then, with a brief description of the most important applications of video watermarking.
Video Watermarking: Approaches, Applications, and Perspectives 563
1.1
Video Watermarking Applications
As for any watermarking system, the interest in video watermarking was first triggered by its potential use for copyright protection applications. For this reason, early works were mainly focused on robust video watermarking, where the possible presence of an enemy interested in removing the watermark has to be taken into account. It has recently became clear, though, that possible applications of digital watermarking go far beyond copyright protection, to include, among the others, data authentication, labelling, annotation, error detection and concealment. In the following, the most remarkable applications of video watermarking are briefly reviewed (Langelaar et al. 2000, Petitcolas 2000, Podilchuk and Delp 2001). Note that with the noticeable exception of error detectiodconcealment, the applications listed below also apply to the watermarking of different kinds of media, e.g. still images or audio signals.
0
0
Copyright protection: in copyright protection applications, the embedded signal, i.e. the watermark, conveys copyright-related information about the hosting video, i.e. the cover work. The exact content of the embedded information depends on the particular application and may include the identity of the creator or the distributor of the cover work, the identity of the particular customer whom the work is sold, or the licensing terms between the seller and the purchaser. The casted information can be used later to demonstrate content ownership, content misappropriation or as a proof of purchase. Among the characteristics a watermarking system to be used in copyright protection applications must possess, robustness against intentional or unintentional processing of the video content is of primary importance. Authentication and integrity veriJication: in some applications, such as video news delivery, it is important to verify the identity of the content originator, and whether the content has been
564
A . Piva, R. Caldelli, and M. Barni
modified or falsified since its distribution. In such a case, the watermark is embedded when the video is captured; after its delivery to the user, the watermark is extracted from the video sequence to reveal a possible content tampering: if the extracted watermark matches the embedded one, then the content is assumed to be uncorrupted. On the contrary, if the watermark can not be recovered, or the recovered and embedded watermarks do not match, this is taken as an evidence that tampering occurred. This kind of watermark is defined to befragile, since in this case the watermark code is modified as soon as the cover work is modified. 0
0
0
Error detection/concealment: a slightly modified version of the authentication perspective has recently been adopted to enable detection and, possibly, correction, of transmission errors affecting a video signal transmitted over an error-prone channel, i.e. a wireless channel (Manetti et al. 2001, Robie and Merserau 2002). As for any authentication scheme, if the watermark embedded within the video prior to transmission is not recovered safely, then an error is detected. By properly designing the watermarking, useful information can also be obtained to localize the error and, possibly, to correct or conceal it. For example, the watermark may convey a low resolution version of the video, to be used to restore the parts of the video that were corrupted by errors.
Broadcast monitoring: here the watermark is exploited for tracking the use of a piece of content in pay-per-use applications, such as accounting of royalties for video broadcasting: the video clips are imperceptibly watermarked by embedding a license number; a monitoring device listening to the broadcasting transmissions can detect the code associated to a given clip and automatically count its passages and compute the royalties to be paid. Copy-control marking: in such a case the watermark carries information regarding the copy generation system managing the number of copies allowed. An example of a copy-control application
Video Watermarking: Approaches, Applications, and Perspectives
565
regards the digital versatile disk copy protection mechanisms. In t h s case the watermark carries one of the following indications: the video can be copied, it can be copied only once, or it can never be copied. Note that every time a copy is made, the hardware may need to modify the embedded code (Bloom et al. 1999). 0
0
Identijication of video content: in this application digital watermarking is used to embed a hidden label uniquely identifying the video content. The label can be used as a pointer to a record of a database. This record could store descriptive data concerning the content, the rights holder name, the license terms, and similar information that cannot be embedded directly into the content due to capacity constraints. Access level verijication of video content: in this scenario watermarking is used to provide different access level to the same video data, e.g. by controlling the level of detail of the displayed image sequence (Swanson et al. 1998): if the watermark reveals that the user has a high access level, then details are shown that an user having lower access level would not see. Otherwise, extra information, such as multilingual tracks, can be embedded in a video file that is broadcast.
1.2
Video vs. Image Watermarking
A video sequence can be considered as a sequence of consecutive and equally time-spaced still images: video watermarking issue seems thus very similar to the image watermarlung one (Hartung and Kutter 1999). Actually, there are a lot of papers where an image watermarking system is extended to work with video. However, there are also several differences between images and video, demanding for specific system design. A first important difference is the size of the host data where the watermark has to be hidden: images are characterized by a limited amount of data (i.e. the number of pixels, or the number of transform domain coefficients), whereas this
566 A . Piua, R. Caldella, and M. B a r n i
amount is really higher for video assets. This allows to have a less rigid constraint for the achievement of the tradeoff between visibility and robustness: since more samples are available, the modifications necessary to embed the hidden information will be less visible, so that there will not be the need to apply complicated models of the Human Visual System to better hide the watermark. On the other hand, since a really higher amount of data has to be presented to the user, in the case of video sequences there are more demanding constraints on real-time effectiveness of the system. This constraint is even more tight for some particular applications of watermarking, like video-on-demand, where it is not possible to use a watermarking system that works on the uncompressed domain, since the time required for decompression, watermark embedding and recompression would be too long. Another difference between still images and video is that for the second type of assets, as we will analyze in the following, a number of attacks exist that can not be applied to the still image case, e.g. frame averaging, or frame swapping.
2
Raw vs. Compressed Domain Watermarking
A first distinction can be made between techniques embedding the watermark in the compressed video stream and those embedding the watermark in the raw domain, prior to any compression algorithm possibly applied to the video.
2.1
Generalities
In a raw domain watermarking algorithm the watermark code is casted directly into the video sequence. Watermark embedding can be performed either in the spatio/temporal domain or a transformed domain (e.g. the DCT of DFT domain). The choice of the most appropriate category of watermarking algorithm, strongly depends on
Video Watermarking: Approaches, Applications, and Perspectives
567
the intended applications, and the requirements it sets to the watermarlung system.
To give some general guidelines on the issues to be considered when choosing between raw and compressed domain techniques, we can start by noting that, in many cases, the digital video is stored in a compressed format (e.g. MPEG or H.263). If a raw domain watermarking algorithm is used to embed the watermark code, then the system needs to decompress the video stream, embed the watermark into the uncompressed frame sequence and compress again the watermarked content to obtain the final watermarked video stream. Watermark detection also needs to be preceded by video decoding. On the contrary, if a compressed domain watermarking algorithm is adopted, it is possible to embed and decode the watermark directly into the compressed bit stream without going through a full decoding, watermarking, and re-encoding process, thus reducing significantly complexity and additional time delays. Compressed domain watermarking systems allow thus for computationally efficient watermark casting and decoding, a characteristic which is of primary importance in many applications, e.g. real-time applications. Compressed domain watermarking presents some drawbacks as well. First of all, an additional constraint that the bit rate of the watermarked compressed video stream does not significantly exceed the bit rate of the unwatermarked stream has to be considered. Moreover, this kind of algorithms is sensitive to a number of attacks which can be neglected when a raw watermarking scheme is used. One of the most important such attacks is format conversion, e.g. NTSC/PAL conversion, MPEG/H.263 transcoding, compression rate conversion, A/D and D/A conversion.
2.2
Examples
In this section, we describe some case studies to exemplify how actual watermarking algorithms implement the concepts discussed
568 A . Piua, R. Caldelli, and M. Barni
above. We first describe two systems embedding the watermark in the raw domain, with the first algorithm operating in the spatial domain and the second one in the frequency domain. Then we describe an algorithm embedding the watermark directly in the MPEG domain. In all cases we assume that the watermark is embedded frame by frame, thus avoiding to exploit the temporal nature of the video sequence. Of course it does not need to be so, anyway we preferred to focus on frame-by-frame watermarking in this section and present two examples of sequence-based algorithms in the next section.
2.2.1
Raw video watermarking in the spatial domain
In (Kalker et al. 1999) and (Maes et al. 2000) a video watermarking algorithm for video broadcast monitoring applications called JAWS (Just Another Watermarking System) is presented. Basically, the method relies on a simple spatial addition of the watermark to the pixel luminance values (chrominance data is ignored) to embed a one-bit payload. Detection relies on the correlation between the watermark pattern and the watermarked frame. To be specific, the embedded watermark consists of a 128 x 128 block of samples, independently drawn from a normal distribution. The watermark block is tiled to obtain a pattern W = wiwith the same size of the to-be-marked video frame. The watermark is pixelwise added to the original frame X = xi,to obtain the watermarked frame Y = gi.The embedding rule is thus described by the equation:
where s is a global scale factor, and Xi is a local scale factor which takes into account the local activity, measured using a Laplacian high-pass filter, in order to avoid visible artifacts. It is well known, in fact, that the watermark is more easily hidden in areas characterized by a larger activity. In order to improve detection reliability, the same watermark is embedded into several consecutive video frames. In watermark detection, a spatial prefilter is applied to reduce cross
Video W a t e r m a r k i n g : Approaches, Applications, and Perspectives 569
talk between video content and watermark; then each frame is folded to a matrix B of size 128 x 128. Finally, detection is performed by comparing the correlation between the video under inspection and the to-be-looked-for watermarking pattern, where, as usual, correlation d is computed as
where N is the number of pixels involved in the correlation. Detection is performed by comparing d against a threshold Td. If d > Td the detector decides for the watermark presence, whereas if d < Td, a negative answer is given. The detection threshold is usually set by resorting to the Neymann-Pearson criterion, i.e. the probability of missing the watermark is minimized subject to a maximum allowable false detection probability (Piva et al. 1998, Di Franco and Rubin 1980). In order to improve detection reliability and to decrease computational complexity, the watermark is embedded repeatedly in every frame of the video, then folded frames are accumulated in time prior to detection.
To cope with possible spatial shifts, e.g. those due to cropping, the correlation over all possible 128 x 128 shifts has to be computed. However, to save computational time, the correlation is performed in the FFT domain. Authors found experimentally that the magnitude information can be ignored, so that only the phase information of the FFT is used: this method of detection is well known in the field of pattern recognition and is referred to as symmetrical phase only filtering (SPOMF) (Brown 1992). The basic method described so far, is only able to embed 1 bit of information, since the detector can only decide whether the cover video contains a given watermark pattern or not. In order to increase the payload, the watermark can be generated using a set of different watermark patterns, so that the information is encoded in the choice of the basic pattern. Additionally, two or more patterns can be em-
570
A . Piva, R. Caldelli, and M . Barni
bedded in the same video sequence, with the to-be-hidden information encoded in the relative position between patterns. Experimental tests demonstrated that, to avoid visible artifacts and computational complexity, it is not convenient to use more than three or four basic patterns; moreover relative shifts between patterns are chosen only as multiple of a grid size of 4.In this way the algorithm can hide up to about 8 bits in the same frame. The method survives MPEG-2 compression down to 2.5 Mb/s, Motion-JPEG (MJPEG) compression, DNAD conversion, PAL conversion, noise addition, quantization, subtitling and logo insertion, cropping, frame erasure, speedups, and transmission errors. 2.2.2 Raw video watermarking in the frequency domain In (Caldelli et al. 2000) a raw frame-based video watermarking algorithm is proposed, which consists in applying a tested and wellperforming watermarlung technique (Piva et al. 1999), originally designed for still images, to raw video, by considering it as a set of consecutive still frames. Watermark casting is carried out by extracting the brightness of the to-be-marked frame and by computing its full-frame Discrete Fourier Transform (DFT). The watermark is embedded by introducing a slight modification in the magnitude of some DFT coefficients, belonging to a specific middle frequency region of the transformed domain, successively the Inverse DFT is performed to obtain the watermarked image. In contrast to the JAWS system, watermark embedding follows a multiplicative embedding rule whereby the magnitude of watermarked DFT coefficients is calculated as follows: Yi = Xi(1 YW), (3) where y is a parameter controlling watermark strength, yi and xiare the watermarked and original magnitudes of DFT coefficients and where wirepresents the sequence of watermark samples drawn from a uniform distribution taking values in [-1,1].
+
To better preserve the original quality of the video sequence, a par-
Video Watermarking: Approaches, Applications, and Perspectives 571
ticular masking operation, exploiting the knowledge of the characteristics of the Human Visual System, is performed. Masking is performed in the spatial domain according to the formula:
where I and Iw and the original and watermarked frames (expressed in the spatial domain), M is a proper masking image giving a pixelwise measure of the sensitivity of the human eye to noise addition (Bartolini et al. 1998), and W is the watermarking signal expressed in the spatial domain. More specifically, W is calculated by watermarking the host frame according to equation 3, thus obtaining an intermediate watermarked frame . I ; , and by considering the difference between I and I; in the spatial domain. In the detection step, the luminance of the image to be checked for watermark presence is extracted and the magnitude of its DFT is considered again; an optimum criterion to verify if the mark is present in the image is derived, based on statistical decision theory (De Rosa et al. 2001). In practice the response of a maximum likelihood function is compared to a threshold, whose value is set to impose a given probability of false alarm. This kind of watermark is unperceivable and presents a good robustness to usual image processing as linearhon-linear filtering, sharpening, JPEG compression and so on; furthermore resistance to geometric transformations as scaling, rotation, cropping, etc., is well-granted thanks to the insertion of a synchronization template during the coding step. The above image watermarking method has been successfully applied to non-coded video sequences, where each frame is processed in a distinct and different way (Caldelli et al. 2000). An advantage of using a multiplicative embedding rule is that the inserted code is frame-dependent and though the private key is always the same, the mark actually introduced in the image is diverse for every frame. Doing so we embed correlated watermarks between cor-
572 A . Piva, R. Caldelli, and M. Barni
related frames and uncorrelated watermarks between uncorrelated frames, allowing changes, due to code insertion, to adapt gracefully to the video content. Obviously dealing with raw video allows to achieve video-coding format independence and moreover to be able to choose how many and which are the frames to be marked. In particular for this (Caldelli et al. 2000) application it has been decided to watermark the first frame of each GOP (Group Of Pictures), which was composed of 12 frames, leaving the other ones uncorrupted. Being the frame rate equal to 25 framehec, this approach grants that at least 2 frames per second are marked and this would seem to be a considering part with respect to video length. Anyway if a superior protection is needed a higher number of frames can be marked, at most the entire GOP. This choice also results in a good preservation of the marked video quality. Alternatively, one frame every six frames can be marked, that is two frames for each GOP. During the detection phase all the video is checked for the mark presence, not only the first frame of each GOP, in such a way that the synchronization of the stream is not required and the knowledge of the exact position of the watermark within the sequence is not needed, as it would seem to be for other algorithms (Hartung and Girod 1996). When the checked code is found, it means that the video contains at least a watermarked frame; the process could go on with its search into the remaining part of video, but it would not be necessary if a more accurate validation is not specifically required, thus resulting in a saving of computational time. The possibility to decide to watermark one or more frames in a GOP (at most all of them), together with the choice of considering each frame as a still image yields some important advantages from a robustness point of view. First, a trade-off between the time spent for marking and the degree of robustness needed for the sequence can be achieved. In other words, the lower the number of watermarked frames in the GOP, the faster the coding phase, but, conversely, just a minor part of the video stream will be watermarked, thus causing
Video Watermarking: Approaches, Applications, and Perspectives
573
a robustness decrease; obviously if a superior security has to be obtained a higher amount of frames might be considered. Moreover, if some attacks like frames exchange or frames droppingheplacing, which do not result in a strong video quality degradation, are applied to the watermarked sequence, it will be always possible to reveal the watermark; experiments carried out in this direction have confirmed these assumptions. Furthermore it has been verified that MPEG-2 coding/decoding operations, at various and lower bit-rates, do not prevent the correct watermark detection. Thanks to the good robustness the watermarking algorithm had already shown, in the case of still images, against usual image processing as linearhon-linear filtering, noise addition, JPEG compression, and geometric transformations as rotation, scaling, cropping, also for video applications these longed-for characteristics are held. 2.2.3 Compressed domain In (Langelaar et al. 1998),(Langelaar et al. 2000),(Langelaar and Lagendijk 2001) two methods that embed a watermark directly in the MPEG compressed domain are proposed. The first method, named Bit Domain Labelling, embeds the watermark code directly in the compressed bitstream by properly replacing some variable length codes (VLC’s) of DCT coefficients, so that decoding and re-encoding is not required. The second one, named Coefficient Domain Labelling, is based on setting to zero some quantized DCT-coefficients in the compressed stream, so that complexity is higher with respect to the first method, since VLC decoding and run-length decoding is needed. On the other side, however, a higher robustness is reached. The Bit Domain Labelling method embeds a watermark consisting of L label bits B = { b l , b2, ..., b ~ in} the MPEG-stream by selecting suitable VLCs and forcing the LSB of their quantized level to the value of the bit bj. To ensure that the modifications are perceptually invisible and that the video stream maintains its original size, only those VLCs for which another VLC exists with the same run length,
574
A . Pava, R. Caldelli, and M. Barni
a quantized level difference of one, and same code word length are selected. In particular, the VLCs in the intra- and intercoded macro blocks are used and the DC coefficients are left unaltered. To add the label bit stream L to an MPEG video stream, the VLCs in each macro block are tested. If a candidate VLC is found, its LSB value is evaluated: if it is different from the watermark bit b j , this VLC is replaced by another one, whose LSB-level represents the label bit, otherwise it is not changed. To extract the watermark the VLCs in each macro blocks are tested. If a candidate VLC is found, the value represented by its LSB is assigned to the label bit bj. The algorithm allows to obtain a high payload (up to 29 kbit/s) without introducing visible artifacts. The drawback is that the watermark can easily be removed by decoding and reencoding the watermarked video stream or by embedding a new watermark into the stream. The Coefficient Domain Labelling embeds a watermark in a set of DCT blocks in which high frequency DCT coefficients are removed, i.e. by imposing some defined energy differences between DCT blocks. The watermark, consisting of L label bits B = { b l , ba, ..., b L } , is embedded bit-by-bit in a set of n = 16 DCT blocks of size 8 x 8 (called Ic-region)taken only from the I-frames of the MPEG video stream. The watermark bit is encoded by introducing an energy difference between the high frequency DCTcoefficients of the top half A of the lc-region and the bottom half B , each one containing n/2 DCT blocks. The high frequency DCTcoefficients of each block are all the coefficients reordered in the zig-zag scan that are higher than a predefined cut-off index c. If the lc-subregion A contains a h g h frequency energy EA higher than that of the lc-subregion B , EB, this means that a bit 0 has been embedded, otherwise a bit 1 was casted. Before computing the energy of the high-frequency DCT coefficients, a prequantization with a predefined quality factor is applied to DCT coefficients; this allows to embed the watermark in perceptually important image details that are not significantly affected by MPEG compression, so that the method
Video Watermarking: Approaches, Applications, and Perspectives
575
is robust to this process. The watermark embedding rule must therefore adapt E A and EB to manipulate their difference D. To embed a watermark bit 0, the energy EB is forced to zero by setting the corresponding DCTcoefficients to zero, yielding D = EA. If on the contrary a watermark bit 1 must be embedded, all DCT-coefficients of lc-subregion A are forced to zero, so that EA = 0, yielding D = -EB. Since the embeddedding domain is the compressed bit stream, the DCT coefficients can easily be forced to zero without re-encoding the bit stream by shifting the end of block marker (EOB) of the 8 x 8 DCT blocks in one of the two lc-subregions towards the DC-coefficient, up to the selected cut-off index.
3
Frame-by-Frame vs. Sequence Watermarking
3.1
Generalities
As already pointed out, a video sequence can be considered as a set of consecutive frames. In a frame-by-frame watermarlung algorithm the watermark code is embedded into each frame, by considering it as a still image, regardless of the other frames composing the sequence. On the contrary, a sequence watermarking system embeds the code into the whole, or at least a part of the video stream, involving a set of consecutive frames. Of course, watermark embedding may still operate either in the raw or the compressed domain. Sequence-based algorithms operating in the raw domain, may just design the watermark signal added to the host video in such a way that it spans more than one frame. More sophisticated algorithms, may exploit the correlation between subsequent frames, e.g. by embedding the watermark in a 3D transformed domain (for example the 3D-DCT or 3D wavelet domains), that takes into account both the
576
A . Piva, R. Caldelli, and M. Barni
spatial and the temporal characteristics of the video sequence. When operating into the compressed domain, it has to be noted that current video compression standards such as MPEG or ITU H . 2 6 ~standards are based on the same general structure including block based motion compensation, which takes into account temporal correlation of the content, and block-based DCT coding, which takes into account spatial correlation. A video stream contains thus three kinds of bits: header bits, motion vectors, and DCT coefficients. Sequence-based schemes may, then, operate on those parts of the video stream taking into account both the spatial and temporal characteristics of the video, e.g. motion vectors or some other information representing all the video sequence like the GOP structure. Alternatively, sequenceoriented watermarking may be achieved by embedding the watermark into a set of DCT coefficients spanning two or more frames of the video stream. With regards to possible attacks against sequence watermarking, some new lunds of manipulations have to be taken into account, for example, frame averaging, frame dropping, and frame swapping. At usual frame rates these modifications would possibly not be perceived by the end user, but the watermark synchronization would be lost. Of course these attacks do not constitute a problem with frameby-frame watermarking algorithms.
3.2
Examples
We now give two examples to clarify some of the concepts illustrated above. In particular, by considering that all the examples in the previous section regarded systems operating frame-by-frame, we will consider two sequence-oriented schemes, one operating in the raw domain and one embedding the watermark directly in the compressed domain.
3.2.1 Average luminance watermarking In (Haitsma and Kalker 2001) a watermarking method which operates on the video sequence by taking into account the stream as a
Video Watermarking: Approaches, Applications, and Perspectives 577
whole is proposed. In this approach the temporal axis is exploited and the watermark is embedded by changing the mean of luminance values of the frames belonging to the sequence according to the Samples of the mark. Detection simply takes place by correlating the watermark with the average luminance values extracted from the tobe-checked video sequence. The watermark consists of a binary pseudo-random sequence of N samples taking values in -1, +I which is embedded by increasing all the pixels of each frame by 1if the mark sample is a +1 or by decreasing them by 1if the value is - 1 (hence a single watermark Sample is embedded in each frame). To take into consideration the H V S (Human Visual System) characteristics (sensibility to modification in flat non-moving areas), the mark frequency insertion is reduced by introducing the same watermark sample in a group containing a fixed number N f (e.g. N f = 5) of consecutive frames; furthermore an adaptive scheme is adopted to weigh the change in luminance for each pixel, by means of a scaling factor, according to pixel activity. This activity is computed both spatially, through a Laplacian filter, and temporally, through the absolute difference between the actual frame and the previous one: the local scaling factor is the minimum of these two calculated scaling factors. Doing so the watermark is weighted before adding it to the luminance of the original frame. The detection step is performed blindly by extracting the means of the luminance of a sequence of N * Nf frames. These means are distributed in Nf buffers each one containing a possible replica of the watermark sequence (i.e. if N f = 5 the first buffer will contain the means of frame 1, 6, 11..., the second will contain the means of frame 2, 7, 12... and so on). Each of these N f sequence is used to make a correlation operation with the mark sequence to be checked. Moreover, to boost detection a FIR filter is applied to each buffer and then a clipping operation is performed. If the result of correlation is
578 A . Piva, R. Caldelli, and M. Barni
larger than a threshold at least in one buffer the content is declared as watermarked, otherwise it is not. Experimental results confirm the goodness of the methodology in particular with respect to robustness against geometric attacks like slight rotations and zoom changes.
3.2.2
Sequence coding in the compressed domain
In order to exemplify the sequence-embedding perspective in the compressed domain, we consider the system proposed in (Linnartz and Talstra 1998). In such a system, the watermark is embedded by properly modifying the Group of Pictures structure of the video stream. In MPEG-1 and MPEG-2 standards a given frame can be encoded according to three different Picture Types (PTY for short): an I-Frame, a P-Frame or a B-Frame. An I-Frame is encoded independently from other frames, i.e. as a still image, without talung into account temporal redundancy; a P-Frame is encoded according to its difference with a given previous I-Frame or P-Frame; finally a B-Frame is encoded according to the difference with respect to a previous frame and the interpolation between a preceding and a following I-Frame or P-Frame. A sequence of these frames, started by a I-Frame, until next I-Frame, is called a Group of Pictures (GOP). Since the encoder is free to choose how to encode each frame in the GOP, several different sequences of P- and B-Frames can encode the same slot of a video sequence. This degree of freedom can be used to embed the watermark by imposing a particular sequence of frames to the GOP structure of a video stream: this system has been called by the authors PTY-Mark. To embed some information into the particular GOP structure, some constraints have to be taken into account: first of all, the watermark payload has been estimated in 6 bits every GOP for a DVD-video content, by considering that a watermark of 64 bits should be embedded each 10 seconds and that the maximal GOP length is about 0.6 seconds. The GOP-structure for a consumer or professional DVD encoder has the following compo-
Video Watermarking: Approaches, Applications, and Perspectives
579
sition: IB..BPB..BPB.. = IB"(B"P)"-l, with n= 1,2 and m = 4 for professional encoders, and n= 0 for consumer ones; a modified GOP structure should have more or less the same number of P-Frames and B-Frames with respect to the conventional one, to maintain an acceptable level of encoding complexity and to obtain a similar bit rate. Occasionally there can be non conventional GOP structures, with more than 4 consecutive P-Frames, or consecutive I-Frames. The PTY-Mark works by encoding a frame as a P-Frame to embed a bit "0" and as a B-Frame to embed a bit "l", so that every GOP represents a particular binary sequence. To satisfy the previous requirements, a PTY-alphabet of particular GOP structures has been built. In particular, to carry 6 bits, all watermarked GOPs should belong to one of 26 = 64 different groups. To obtain sequences similar to the standard ones, the GOP-length has been fixed to 12, with 6 Bframes. Finally, all code-words must have a Hamming distance of 4. These constraints yielded an alphabet of 62 code-words. Experimental results demonstrated that the modifications introduced by the watermark do not deteriorate the image quality, neither from a statistical point of view, nor from a subjective observation of the sequences.
4
Object-Based Watermarking
4.1
Generalities
In this section, attention will be paid to the analysis of video sequence as composed by different video objects and in particular to the description of some watermarking algorithms dealing with object-based representation of video. The basic aim of this section. hence, is to investigate and understand how knowledge on watermarking, that have been acquired for still images and for video sequences, can be transferred and consequently applied to video objects belonging to a specific scene. It is also important to compre-
580 A . Piua, R. Caldelli, and M. Barni
hend how a technology as watermarking can get rid of problems that are typical of multimedia and of MPEG-4 world, like video editing, scalability, coding, presence of different objects and so on. Two main approaches have been individuated to allow object watermarking: one working in the raw domain by extracting all the different video-objects and one operates directly in the coded domain by accessing to the video streams; both these methodology are debated in the sequel.
4.2
Attacks and Challenges
The main attacks that a coded video sequence whose objects are singularly watermarked, can suffer and that could inhibit watermark extraction, are basically those implying a format conversion, for example from MPEG-4 to MPEG-2. In fact this modification determines the loss of all the references related to video-objects though they continue to appear in the video scene and algorithms which specifically work with the coded streams would be unable to understand where to look for the watermark. As it has been analyzed before, another operation that can affect mark detection is, possibly object-dependent, bit-rate reduction. Another important issue to be taken into account, regards video editing. In fact MPEG-4 has been conceived to deal with objects to to permit the final user to modify the original video sequence by moving the objects in diverse locations, scaling them, erasing them and even adding some new ones. Digital watermarlung algorithms must show robustness against this sort of changes without hindering MPEG-4 applications.
4.3
Examples
Only a few works addressing object-based watermarking of a video sequence have been proposed so far. In the sequel we present two of them, one operating in the raw domain and one in the compressed domain.
Video Watermarking: Approaches, Applications, and Perspectives
581
4.3.1 Watermarking in the wavelethaw domain In this section an object-based watermarking method that works on raw video is outlined (Piva et al. 2000). The algorithm is not fully integrated with MPEG-4 coding and decoding systems, but it permits to directly interact with video objects and to embed in each of them a specific code; doing so it allows to separately and differently deal with each object contained in the sequence. The proposed algorithm operates frame by frame by casting a different watermark in each video object. Watermarking relies on an algorithm presented in (Barni et al. 2001), originally developed for still images, and embeds the code in the Discrete Wavelet Transform (DWT) domain.
w3
I
Figure 1. Watermark casting process.
According to Figure 1, the objects contained in each frame are extracted, obtaining a different image for each one (video objects are
582
A . Paua, R. Caldelli, and M. Barni
located on a black background), and in each image a different code is embedded by means of the system presented in (Bami et al. 2001) and resumed in the following. The procedure, described in Figure 1, is then applied to all the frames of the sequence. The image to be watermarked is first decomposed through DWT in four levels: let us call 1 : the sub-band at resolution level j (where j = 0 , 1 , 2 ) and having orientation 0 (where 0 = LL, L H , H L , H H ) . The watermark we(i,j ) , consisting of a pseudo-random binary image, is inserted by modifying the wavelet coefficients belonging to the three detail bands at level 0, i.e. I k H , I c L and I t H . Before adding it to the DWT values, each binary value is multiplied by a weighting parameter which is obtained by a noise sensitivity function. In this way the maximum tolerable level of disturb (i.e. watermark coefficient) is added to each DWT coefficient. The construction of the sensitivity function is mainly based on the analysis of the degree of image activity in the neighborhood of the pixel to be modified (see (Barni et al. 2001) for more details). The final embedding rule, hence, assumes the form:
where Q is a global parameter accounting for watermark strength, and me(i,j ) is a band-dependent weighting function considering the local sensitivity of the image to noise. To properly adapt this method to video-objects, which generally are small with respect to the rest of the image, the sensitivity function is first computed on the whole frame and then the visual mask related to each video-object is extracted from it and used during the watermark embedding. The inverse DWT is finally computed obtaining the watermarked video-objects. The watermarked video objects are then merged together in order to rebuild the frame containing the copyright information concerning each object present in the scene. When all the
Video Watermarking: Approaches, Applications, and Perspectives
583
frames have been marked, the sequence can be compressed, obtaining the watermarked MPEG-4 coded bit-stream. The detection phase takes place after decoding the watermarked MPEG-4 video and obtaining a sequence of frames. Once again, the objects present in the scene are extracted frame by frame, resulting in a different image for each object. The DWT of each image is then computed and the code corresponding to each object is detected by means of the correlation between the watermark we(2, j ) and the DWT marked coefficients $(2, j ) . The value of the correlation is compared to a threshold Tp,which depends on the variance ap”of the DWT coefficients of the watermarked image, to decide if the watermark is present or not. When the correlation value is higher than the threshold the tested code is considered to be present in the examined video-object. As usual, the threshold value is set according to the Neyman-Pearson criterion, whereby the probabilty of missing the watermark is minimized subject to a target false detection rate. Since the detection process is performed frame by frame, the watermark embedded into a video object can be revealed also if the VQ is transferred from a sequence to another. However, since DWT is not invariant to translation, if the video object is placed in a different position of the new scene, the synchronization between the watermark and the VO is lost; to cope with this situation, the watermark detector would need to compute the correlation for all the possible shifts of the frame. 4.3.2 Object watermarking in the coded domain In this subsection an algorithm (Barni et al. 2000) that embeds a watermark in each video object of an MPEG-4 coded video bit-stream by imposing specific relationships among quantized DCT coefficients is described. The algorithm deals directly with the quantized coefficients that are recovered from the MPEG-4 bit-stream by re-
584 A . Piva, R. Caldelli, and M. Barni
versing run-level entropy coding, zig-zag scanning and intra DC and AC DCT coefficients prediction; quantized coefficients are modified to embed the watermark and then encoded again. The watermark are equally embedded into intra and inter MBs. A masking method is also adopted to limit visual artifacts in the watermarked VOPs and to improve, at the same time, the robustness of the system. Watermark recovery does not require the original video and is achieved in the compressed domain. Going into detail within the embedding operation, it can be seen that this methodology allows to hide a bit of the copyright code in every luminance block belonging to MBs selected on a pseudo-random basis. If the MB is slupped the corresponding bit is also skipped, not to lose synchronization between the embedder and the decoder. The watermarking code is repeated over the whole VOP (i.e. after the last bit of the code has been embedded, the process consider the first bit again, and so on); on every VOP of the same VOL the code is embedded again starting from the first bit. Every bit is thus embedded more than once during a sequence of VOPs, but, due to MBs skipping in the coded bit-stream, some bits are embedded less frequently than others. In Figure 2 the watermark embedding scheme is summarized. At the start of each VOP, a pseudo-random binary sequence is generated, based on a secret key and on the characteristics (number of MBs) of the VOP itself, for choosing those MBs where the watermarking code has to be embedded and those coefficients pairs that should be modified to embed the watermark. If the chosen MB is not skipped, one bit of the watermarlung code is embedded within it by imposing a particular relationships between the values of selected pairs of coefficients that belong to each luminance block, otherwise the bit is skipped too. To achieve a trade-off between the requirement of invisibility and robustness, the values of quantized DCT coefficients ( Q F ( u ,u))belonging to a mid frequency range are considered for watermark embedding.
Video Watermarking: Approaches, Applications, and Perspectives 585
Number of modifled pairs in a block
Author signature
Figure 2. Diagram of the watermark embedding phase
The watermark is carried by the signal (watermarking feature) which is the difference among the magnitudes of some selected pairs of the quantized DCT coefficients belonging to the mid-frequency region, 1.e.:
W(ui,vi,u2,v2) = IQF(ui,vi)l - IQF(u27v2)I
(6)
where (ul, u l ) and (uz,u2) are the coordinates of the two coefficients of one of these pairs. It is expected that W(u1,v1, u2, v2) is a non-stationary random process having zero mean, and a moderate variance if the coefficients composing each pair are sufficiently near one each other. If { Q F (ul, q ), Q F (u2, vz)} is a randomly selected pair, the corresponding watermarked pair should be outlined as (QF’(u1, q), Q F ’ ( ~ 2 , v 2 ) } . By supposing the bit to be embed is an 1, two cases can hold:
586 A . Piva, R. Caldelli, and M. Barni 0
both coefficients of the pair are non zero;
0
one or both coefficients of the pair are zero.
In the first case the watermark is inserted with maximum strength: the sign of the watermarked coefficients is not changed with respect to the original sign. Instead, when one or both coefficients of the pair are zero, it is more difficult to maintain the watermark perceptually invisible because it is absent the masking effect between the DCT frequency components. In this case the coefficients of the pair are changed less heavily. In those pairs where a bit 1 has been inserted it results W’(ul,u l , u 2 , v 2 ) 0, where W’(ul,v1,~ 2 , 2 1 2 ) = ( Q F ’ ( u 1 , v 1 ) l - I Q F ’ ( u 2 , 212)l. For embedding a bit 0 the algorithm 211) and ( ~ 2 , 2 1 2 are ) is similar, but the roles of the coefficients in (u1, exchanged, and thus at the pairs where a bit 0 has been embedded it results W’(ul,211, ~ 2 , 2 1 2 )5 0.
>
Watermark retrieval is performed in two steps as it can be seen in Figure 3. The first step is analogous to the one used in the embedding process and requires the knowledge of the parameters used in the embedding phase (i.e. the secret key and the index of coefficient pairs that were modified in each block) to correctly identify MBs and coefficients pairs actually hosting the watermark. In the second step the relationships between the coefficients of the selected pairs are analyzed. The knowledge of the watermarking code length is needed to compute the repetition step of the watermarking code in each VOP (i.e. how many times the copyright code was embedded in the considered VOP). For reading the j t h bit of the watermarkmg code, an accumulator is considered Accj, where the values of W’(u1,211, u2,v 2 ) , corresponding to all the pairs of coefficients where the bit itself was inserted, are summed up. Let us call $ j the set of these pairs:
A C C=~
C
pairs E $3
W’(u1, 211, ~ 2 , 2 1 2 ) .
(7)
Video Watermarking: Approaches, Applications, and Perspectives
587
Wate
d Watermarkit1g code
Selected MBs
--r Confidence
Selected pairs
I Number of modified pairs in a black
Author signature length
Figure 3. Diagram of the watermark recovering phase
Such a sum is then compared to a threshold value of the embedded bit:
TD,
to decide for the
if Accj > 0, if -TO 5 Accj L TO, if Accj < 0.
(8)
The value of ’T is usually set to 0 since it is expected for Accj to be positive when the embedded bit is a 1 and negative in the opposite case. The system presents a good level of robustness and unobtrusiveness, also when bit-rate reduction is applied to the different video objects. This approach, obviously, does not allow to extract the watermark after format conversion (e.g. from MPEG-4 to MPEG-2).
588
5
A . Pava, R. Caldella, and M. Barni
Conclusions
In this work we have tried to summarize the main challenges set by video watermarking. We first classified watermarking algorithms dealing with video, by looking at the embedding domain, i.e. we distinguished between algorithms embedding the watermark in the raw domain and those operating directly in the compressed domain, the former approach being preferable from the point of robustness, whereas the latter is superior from a computational point of view for all the applications where the video signal is storedtransmitted in compressed format. In many cases watermarking of a video sequence is faced with by resorting to techniques chosen among those developed for the case of still images. According to such an approach each frame of the video is watermarked by itself. We have shown, by means of some examples, that this approach is capable of granting good results, however, at least in principle, watermarking the video sequence as a whole, i.e. without ignoring the temporal dimension, is a preferable approach. Finally, we considered the emerging need for object-based video watermarking, a need that stems from the increasing interest towards object-based video compression standards, such as MPEG4. Even only a few works have been presented in this area, we tried to highlight some of the challenges in this field through. We also described two practical systems permitting to watermark a video sequence at the object level. Future research in this area should concentrate on the exploitation of the temporal axis of a video sequence, on object-based watermarking, and on a better understanding of the mechanisms underlying human perception of moving visual stimuli. As a matter of fact, systems proposed so far fail to fully exploit the knowledge of the HVS (Human Visual System) to design a hgh-capacity, robust watermarking system without sacrificing the perceived quality of the watermarked
Video Watermarking: Approaches, Applications, and Perspectives
589
video.
Acknowledgments The authors would like to thank F. Bartolini and A. De Rosa for useful suggestions and discussion.
References Barni, M., Bartolini, F., Caldelli, R., and Piva, A. (2000), “A robust frame-based technique for video watermarking,” Proceeding of European Signal Processing Conference, EUSIPCO 2000, Tampere, Finland, pp. 1969-1972. Barni, M., Bartolini, F., Checcacci, N., and Cappellini, V. (2000), “Object watermarking for MPEG4 video streams copyright protection,¶’ Security and Watermarking of Multimedia Contents, Proc. SPIE, vol. 3971, pp. 465-476. Barni, M., Bartolini, F., De Rosa, A., and Piva, A. (2001), “A new decoder for the optimum recovery of nonadditive watermarks,” IEEE Trans. on Image Processing, vol. 10, pp. 755-766. Barni, M., Bartolini, F., Manetti, A., and Piva, A. (2001), “A data hiding approach for correcting errors in H.263 video transmitted over a noisy channel,” Proceedings of MMSP 2001, IEEE Multimedia Signal Processing Workshop, pp. 65-70. Barni, M., Bartolini, F., and Piva, A. (2001), “Improved waveletbased watermarking trough pixel-wise masking,” IEEE Trans. on Image Processing, vol. 10, pp. 783-791. Bartolini, F., Barni, M., Cappellini, V., and Piva, A. (1998), “Mask building for perceptually hiding frequency embedded water-
590 A . Piva, R. Caldelli, and M. Barni
marks,” Proceedings of International Conference on Image Processing, ICIP 1998, vol. I, pp. 450-454. Bloom, J.A., Cox, I.J., Kalker, T., Linnartz, J.-P.M.G., Miller, M.L., and Traw, C.B.S. (1999), “Copy protection for D.V.D. video,” Proceedings of the IEEE, vol. 87, pp. 1267-1276. Brown, L. (1992), “An overview of image registration techniques,” ACM Computing Surveys, vol. 24, pp. 325-376. Di Franco, J.V. and Rubin, W.L. (1980), Radar Detection, Artech House, Inc., Dedham, Ma. Haitsma, J. and Kalker, T. (2001), “A watermarking scheme for digital cinema,” Proceedings of International Conference on Image Processing, ICIP 2001 ,pp. 487-489. Hartung, F. and Girod, B. (1996), “Digital watermarking of raw and compressed video,” N. Ohta editor, Digital Compression Technologies and Systems for Edeo Communications, SPIE Proceedings Series, vol. 2952 , pp. 205-213. Hartung, F. and Kutter, M. (1999), “Multimedia watermarking techniques,” Proceedings of the IEEE, vol. 87, pp. 1079-1107. Holliman, M., Macy, W., and Yeung, M. (2000), “Robust framedependent video watermarking,” Security and Watermarking of Multimedia Contents, Proc. SPIE, vol. 3971, pp. 186-197. Kalker, T., Depovere, G., Haitsma, J., and Maes, M. (1999), “A video watermarking system for broadcast monitoring,” Security and Watermarking of Multimedia Contents, Proc. SPIE, vol. 3657, pp. 103-112. Langelaar, G.C. and Lagendijk, R.L. (200 l), “Optimal differential energy watermarking of DCT encoded images and video,” IEEE Trans. on Image Processing, vol. 10, pp. 148-158.
Video Watermarking: Approaches, Applications, and Perspectives
591
Langelaar, G.C., Lagendijk, R.L., and Biemond, J. (1998), “Realtime labeling of MPEG-2 compressed video,” J. fisual Commun. Image Representation, vol. 9, pp. 256-270. Langelaar, G.C., Setyawan, I., and Lagendijk, R.L. (2000), “Watermarking digital image and video data,” IEEE Signal Processing Magazine, vol. 17, pp. 20-46. Linnartz, J.P. and Talstra, J., (1998), “PTY-Marks: Cheap detection of embedded copyright data in DVD-video,” ESORICS 98, pp. 22 1-240. Maes, M., Kalker, T., Linnartz, J., Talstra, J, Depovere, G., and Haitsma, J., (2000), “Digital watermarking for DVD video copy protection,” IEEE Signal Processing Magazine, vol. 17, no. 5 , pp. 47-57. Petitcolas, F. (2000), “Watermarking schemes evaluation,” IEEE Signal Processing Magazine, vol. 17, pp. 58-64. Piva, A., Barni, M., Bartolini, F., and Cappellini, V. (1998), “Threshold selection for correlation-based watermark detection,” Proceedings of of COST254 Workshop on Intelligent Communications, pp. 67-72. Piva, A., Barni, M., Bartolini, F., Cappellini, V., De Rosa, A., and Orlandi, M. ( 1999), “Improving DFT watermarking robustness through optimum detection and synchronisation,” Proceedings of ACM Workshop on Multimedia and Security ’99,pp. 65-69. Piva, A., Caldelli, R., and De Rosa, A. (2000), “A DWT-based object watermarking scheme for MPEG-4 video streams,” Proceedings of International Conference on Image Processing, ICIP 2000, pp. 5-8. Podilchuk, C. and Delp, E., (2001), “Digital watermarking: Algorithms and applications,” IEEE Signal Processing Magazine, vol. 18, pp. 33-46.
592 A . Piva, R. Caldelli, and M. Barni
Robie, D.L. and Merserau R.M. (2002), “Video error correction using steganography,” EURASIP Journal on Applied Signal Processing, vol. 2002, pp. 164-173. Swanson, M.D., Kobayashi, M., and Tewfik, A.H. (1998), “Multimedia Data-Embedding and Watermarking Technologies,” Proceedings of the IEEE”, vol. 86, pp. 1064-1087.
Chapter 20 Quantization Index Modulation Techniques: Theoretical Perspectives and a Recent Practical Application Brian Chen Quantization Index Modulation (QIM) data embedding methods (Chen 2000, Chen and Wornell 2001b, Chen and Wornell 2001c) are a class of methods that display attractive properties from both a practical engineering perspective and a more theoretical perspective. As we shall see in this chapter, from an engineering perspective, QIM arises quite naturally from first principles, and one can conveniently exploit the structure of these QIM systems to trade off data embedding rate, embedding-induced distortion, and robustness. In addition to their engineering attractiveness, however, we shall also see that QIM systems possess an information theoretical justification as well. Specifically, these systems possess a host signal interference rejection property that is an essential characteristic of all good data embedding systems. As we discuss in this chapter, the host signal interference rejection capabilities give rise to very h g h achievable embedding rates, enabling new practical applications such as hybrid transmission systems for backwards-compatible upgrading of legacy communication networks.
1
Problem Model and Notation
We begin by presenting a familiar model of data embedding to establish the notation that will be used throughout this chapter. We 593
594
B. Chen
Ld
I
I
Channel
DEC
-1
A
m
n = y-s Figure 1. General information-embedding problem model. An integer message rn is embedded in the host signal vector x using some embedding function s(x, rn). A perturbation vector n corrupts the composite signal s. The decoder extracts an estimate ih of rn from the noisy channel output y.
consider the problem illustrated in Figure 1. We wish to embed some digital information or watermark m in some host signal vector x E XN.For example, this host signal could be a vector of pixel values or Discrete Cosine Transform coefficients from an image, Discrete Fourier Transform or linear prediction coding coefficients from an audio or speech signal, or samples from an video signal. In general, any signal representable with a set of real numbers is a suitable host signal. We wish to embed at a rate of R, bits per dimension (bits per host signal sample) so we can think of m as an integer, where
An embedding function, denoted s(x, m) in Figure 1, maps the host signal x and embedded information m to a composite signal s E RN. The embedding should not unacceptably degrade the host signal, so we have some distortion measure D ( s , x) between the composite and host signals. In this chapter, we shall consider explicitly the squareerror distortion measure
and the weighted squared error distortion measure
D ( s ,x)
1
= -(s
N
- x ) T W ( s - x),
(3)
Quantization Index Modulation Techniques
595
where W is some weighting matrix, along with the expectations D, = E [ D ( s x)] , of these. The composite signal s is subjected to various common signal processing manipulations such as lossy compression, addition of random noise, and resampling, as well as deliberate attempts to remove the embedded information. These manipulations occur in some channel, which produces an output signal y E !RN. For convenience, we define a perturbation vector n E !RN to be the difference y - s.
A decoder forms an estimate iil of the embedded information m based on the channel output y. The robustness of the overall embedding-decoding method is characterized by the maximum noisiness of the channel, as measured by the variance c r i of the perturbations for example, for which one can reliably decode the watermark. Specific channels of interest in this chapter include (1) additive Gaussian noise channels and (2) arbitrary, square-error distortionconstrained attack channels, where the attacker can choose any channel law pyls(yIs) subject to the constraint E [ ( y- s ) ~ 5] 0;. One desires the embedding system to have high rate, low distortion, and high robustness, but in general these three goals tend to conflict. Thus, the performance of an information embedding system is characterized in terms of its achievable rate-distortion-robustness tradeoffs.
2
QIM Basics
Although one can consider the embedding function s(x, m) to be a function of two variables, the host signal and the embedded information, one can equivalently view s(x, m ) to be a collection or ensemble of functions of x, indexed by m. We denote the functions in this ensemble as s(x; m ) to emphasize this view. As one can see from (l), the rate R, determines the number of possible values for m, and
596 B. Chen
hence, the number of functions in the ensemble. If the embeddinginduced distortion is to be small, then each function in the ensemble must be close to an identity function in some sense so that s(x; rn)
M
x,
Vm.
(4)
That the system needs to be robust to perturbations suggests that the points in the range of one function in the ensemble should be “far away” in some sense from the points in the range of any other function. For example, one might desire at the very least that the ranges be non-intersecting. Otherwise, even in the absence of any perturbations, there will be some values of s from which one will not be able to uniquely determine rn, as illustrated in Figure 2. This nonintersection property along with the approximate-identity property (4),which suggests that the ranges of each of the functions “cover” the space of possible (or at least highly probable) host signal values x, suggests that the functions be discontinuous. Quantizers are just such a class of discontinuous, approximate-identity functions. Then, “quantization index modulation (QIM)” refers to embedding information by first modulating an index or sequence of indices with the embedded information and then quantizing the host signal with the associated quantizer or sequence of quantizers. Figure 3 illustrates this QIM information-embedding techmque. In this example, one bit is to be embedded so that rn E { 1,2}. Thus, we require two quantizers, and their corresponding sets of reconstruction points in !J2N are represented in Figure 3 with x’s and 0’s. If rn = 1, for example, the host signal is quantized with the x -quantizer, ie.,s is chosen to be the x closest to x. If rn = 2, x is quantized with the 0-quantizer. Here, we see the non-intersecting nature of the ranges of the two quantizers as no x point is the same as any o point. This non-intersection property leads to host-signal interference rejection. As x varies, the composite signal value s varies from one x point ( m = 1) to another or from one o point ( r n = 2) to another, but it never varies between a x point and a o point. Thus, even with an
Quantization Index Modulation Techniques 597
,
m=2
Figure 2 . Embedding functions with intersecting ranges. The point SO belongs to the ranges of both continuous embedding functions. Thus, even with no perturbations (y = SO) the decoder cannot distinguish between m = 1 (and x = zl)and m = 2 (and x = zz). Using discontinuous functions allows one to make the ranges non-intersecting.
infinite energy host signal, one can determine rn if channel perturbations are not too severe. We also see the discontinuous nature of the quantizers. The dashed polygon represents the quantization cell for the x in its interior. As we move across the cell boundary from its interior to its exterior, the corresponding value of the quantization function jumps from the x in the cell interior to a x in the cell exterior. The x points and o points are both quantizer reconstruction points and signal constellation points,’ and we may view design of QIM systems as the simultaneous design of an ensemble of source codes (quantizers) and channel codes (signal constellations). The structure of QIM systems is convenient from an engineering perspective since properties of the quantizer ensemble can be connected to the performance parameters of rate, distortion, and robustness. For example, as noted above the number of quantizers in the ensemble determines the information-embedding rate. The sizes and shapes of ‘One set of points, rather than one individual point, exists for each value of m.
598
B. Chen
0
X
X
0
0
0 X
Figure 3. Quantization index modulation for information embedding. The points marked with x's and 0's belong to two different quantizers, each with its associated index. The minimum distance dminmeasures the robustness to perturbations, and the sizes of the quantization cells, one of which is shown in the figure, determine the distortion. If rn = 1, the host signal is quantized to the nearest X . If rn = 2, the host signal is quantized to the nearest 0 .
the quantization cells determine the embedding-induced distortion, all of which arises from quantization error. Finally, for many classes of channels, the minimum distance dminbetween the sets of reconstruction points of different quantizers in the ensemble determines the robustness of the embedding. We define the minimum distance to be Alternatively, if the host signal is known at the decoder, as is the case in some applications of interest, then the relevant minimum distance may be more appropriately defined as either A
dmin(x) =
min
(i , j ):i#j
I~s(x;~)
- s(x;~)",
Quantization Index Modulation Techniques 599
or
A
dmin= min min IIs(x;i) x
(i;j):i#j
- s(x;j)ll
(7)
The important distinction between the definition of ( 5 ) and the definitions of (6) and (7) is that in the case of (6) and (7) the decoder knows x and, thus, needs to decide only among the reconstruction points of the various quantizers in the ensemble corresponding to the particular value of x. In the case of (5), however, the decoder needs to choose from all reconstruction points of the quantizers. Intuitively, the minimum distance measures the size of perturbation vectors that can be tolerated by the system. For example, in the case of an additive white Gaussian noise channel with a noise variance of u:, at high signal-to-noise ratio the minimum distance characterizes the error probability of the minimum distance decoder (Lee and Messerschmitt 1994),
where &(.) is the Gaussian Q-function,
The minimum distance decoder to which we refer simply chooses the reconstruction point closest to the received vector, i. e., h ( y ) = arg min min ( ( y- s(x;m)(1. X
m
(9)
If, which is often the case, the quantizers s(x;m) map x to the nearest reconstruction point, then (9) can be rewritten as h ( y ) = argmin JIy- s(y; m)II. m
Alternatively, if the host signal x is known at the decoder,
(10)
600
3
B. Chen
Distortion Compensation
Distortion compensation is a type of post-quantization processing that can improve the achievable rate-distortion-robustnesstrade-offs of QIM methods. Indeed, with distortion compensation one can achieve the information-theoretically best possible rate-distortionrobustness performance in many important cases, as discussed in Sec. 5 later in this chapter. We explain the basic principles behind distortion compensation in this section. As explained above, increasing the minimum distance between quantizers leads to greater robustness to channel perturbations. For a fixed rate and a given quantizer ensemble, scaling2all quantizers by a 5 1 increases dLinby a factor of 1/a2.However, the embedding-induced distortion also increases by a factor of l / a 2 .Adding back a fraction 1 - a of the quantization error to the quantization value removes, or compensates for, t h s additional distortion. The resulting embedding function is
where q(x; m, A / a ) is the m-th quantizer of an ensemble whose reconstruction points have been scaled by a so that two reconstruction points separated by a distance A before scaling are separated by a distance A / a after scaling. The first term in (1 1) represents normal QIM embedding. We refer to the second term as the distortioncompensation term. Typically, the probability density functions of the quantization error for all quantizers in the QIM ensemble are similar. Therefore, the distortion compensation term in (1 1) is statistically independent or nearly statistically independent of m and can be treated as noise or interference during decoding. Thus, decreasing a leads to greater minimum distance, but for a fixed embedding-induceddistortion, the 21f a reconstruction point is at q, it is “scaled” by a by moving it to q/a.
Quantization Index Modulation Techniques 601
distortion-compensation interference at the decoder increases. One optimality criterion for choosing a is to maximize a “signal-to-noise ratio (SNR)” at the decision device,
where this SNR is defined as the ratio between the squared minimum distance between quantizers and the total interference energy from both distortion-compensation interference and channel interference. Here, dl is the minimum distance when a = 1and is a characteristic of the particular quantizer ensemble. One can easily verify that the optimal scaling parameter a that maximizes this SNR is DNR ~ S N R = (13) DNR+ 1’ where DNR is the (embedding-induced) distortion-to-noise ratio D,/ai. Such a choice of Q also achieves the information theoretic capacity discussed in Sec. 5 in the case of an additive Gaussian noise channel and Gaussian host signal and asympotically achieves capacity in the high-fidelity limit of small embedding-induced distortion and small perturbation energy.
4
Signal-to-Noise Ratio Analysis
One form of QIM, called spread-transform dither modulation (STDM) (Chen 2000), lends itself quite nicely to analysis and is also convenient to implement. Furthermore, this particular type of QIM is closely related to a simple form of the popular spread spectrum watermarking method, which we refer to as amplitude-modulation spread-spectrum (AM-SS), allowing a direct comparison between the two that highlights the importance of the host signal interference rejection property mentioned in Sec. 2. We consider embedding one bit in a length-l block x using both STDM and AM-SS methods, allowing a fixed amount of embedding
602
B. Chen
distortion D,. Thus, since the embedding rate and distortion are both equal in the two cases, we can fairly compare the systems’ robustness. For both STDM and AM-SS one defines a pseudo-random vector v E !J2L called the spreading vector, which is of unit length. STDM in this case involves the following three-step process: 1. Projection. The host signal vector x is projected onto v to get x = x T v.
2. Quantization. The projection value X is quantized with one of two uniform, scalar quantizers, one quantizer corresponding to m = 1 and the other corresponding to m = 2. The two quantizers are shifted versions of each other, where the shift is one-half the quantization step size A:
5 Id(1) - d(2)I
= q(?+d(m)) -d(m), =
(14)
A/2.
Shifted quantizers of some base quantizer q ( . ) are known as dithered quantizers (Jayant and No11 1984, Zamir and Feder 1996), and the shift d is called the dither value. Thus, we refer to choosing the amount of shift as a function of the embedded information m as dither modulation.
3. Composite Signal Vector Construction. We need to find the composite signal vector s(x, m ) with the desired projection value 5. This vector is given by
One can view the projection and composite signal vector construction steps as a kind of transform and inverse transform, respectively, in to and out of the “projection domain”. We refer to this transform as
Quantization Index Modulation Techniques 603
the “spread transform” and, hence, the term spread-transform dither modulation. The embedding function for AM-SS has the form
i.e., the pseudorandom vector v is amplitude modulated by the embedded information rn and the result is additively combined with the host signal vector x. This process is equivalent to projecting x onto v, adding a ( m ) to the resulting projection X to get J , and constructing the corresponding composite signal vector. Thus, relative to the three-step STDM process described above, the projection and composite signal vector construction steps are the same, but for AM-SS the quantization step is replaced by an addition step: 5
= %+a(rn),
(15)
a(1) = - 4 2 ) .
Because the embedding occurs entirely in the projections of x onto v, the problem is reduced to a one-dimensional problem with the embedding functions (15) and (14). For AM-SS (15), a(rn) = so that la(1) - a(2)I2= 4LD,. (16) For STDM (14),
&a
where A = J m i so that the expected distortion in both cases is the same, and where we have used the fact that d(1) and d ( 2 ) are chosen such that (d(1) - d ( 2 ) ( = A / 2 . Because all of the embedding-induced distortion occurs only in the direction of v, the distortion in both cases also has the same time or spatial distribution and frequency distribution. Thus, one would expect that any perceptual effects due to time/space masking or frequency masking are
604
B. Chen
the same in both cases. Therefore, square-error distortion may be a more meaningful measure of distortion when comparing STDM with AM-SS than one might expect in other more general contexts where squared error distortion may fail to capture certain perceptual effects.
v,
The decoder in both cases makes a decision based on the projection of the channel output y onto v. In the case of AM-SS,
y = a(m) + X + n, while in the case of STDM,
7 = J(X, m ) + n, where fi is the projection of the perturbation vector n onto v. We let P ( . ) be some measure of energy. For example, P(x) = x2 in the case of a deterministic variable x, or P ( x ) equals the variance of the random variable x. The energy of the interference or “noise” is P(X h) for AM-SS, but only P ( b ) for STDM, i.e., the host signal interference for STDM is zero. Thus, the signal-to-noise ratio at the decision device is
+
for AM-SS and
3LD, SNR~TDM =P(.>
for STDM, where the “signal” energies P(a(1) - 4 2 ) ) and P (rnin(21,s2) IJ(X1,l) - S(X2, 2)/) are given by (16) and (17). Thus, the advantage of STDM over AM-SS is
which is typically very large since the channel perturbations fi are usually much smaller than the host signal X if the channel output
Quantization Index Modulation Techniques
605
j is to be of reasonable quality. For example, if the host signal-tochannel noise ratio is 30 dB and 2 and fi are uncorrelated, then the SNR advantage (18) of STDM over AM spread spectrum is 28.8 dB.
Furthermore, although the SNR gain in (18) is less than 0 dB (3/4 = -1.25 dB) when the host signal interference is zero (2 = 0), for example, such as would be the case if the host signal x had very little energy in the direction of v, STDM may not be worse than AM-SS even in this case since (18) applies only when X is approximately uniformly distributed across the STDM quantization cell so that D, = A2/(12L). If X = 0, however, and one chooses the dither signals to be d( m) = =tA/4, then the distortion is only D, = A’/( 16L) so that STDM is just as good as AM-SS in this case. Finally, we recall from Sec. 3 that distortion compensation can provide an additional boost in SNR for STDM (but not for AM-SS, of course). By substituting the SNR-maximizing distortion compensation parameter a (13) into (12), one can show that
Thus, the boost in SNR from distortion compensation is 1
5
+ DNR. 1
Information Theoretic Perspectives
The SNR-based analysis described above is quite convenient for performing back-of-the-envelope type calculations that are so common in engineering practice. Complementing this analysis, a number of more formal information-theoretic results concerning watermarking have been developed in recent years (for example, see (Chen 2000, Chen and Wornell ZOOlb, Chen and Wornell ZOOlc, Moulin and O’Sullivan 2003, Cohen and Lapidoth 2002, Cohen 2001)). One of the key results that has emerged from these analyses is that in
606
B. Chen
many scenarios, the achievable rate-distortion-robustness tradeoffs are the same regardless of whether or not the host signal is available at the decoder. A corollary to this result is that systems that do not reject host signal interference, e.g., additive spread spectrum systems, cannot be optimal in these scenarios. Clearly, if host signal interference degrades performance, then by definition one cannot achieve the same performance when one doesn’t know the host signal at the decoder as that achievable when one does know the host signal at the decoder. Thus, these more formal information-theoretic results confirm the intuition developed earlier in this chapter that host signal interference rejection is an important property of good watermarking systems. More specific results include that in the case where the host signal is Gaussian and the channel is an additive Gaussian noise channel, the capacity (highest achievable rate) is (Chen 2000, Chen and Wornell 200 1c) 1
+
log, (1 DNR) bitshample. (19) 2 This result applies not only in the white host and white channel noise case but also in cases where the host signal, the channel noise, or both are colored and a weighted square-error distortion measure (3) is used (Chen 2000, Chen and Wornell 2001~).Furthermore, in the case of square-error distortion constrained attacks, the Eq. (19) also gives the capacity in the high-fidelity limit (02 >> D,, a:). Nonasymptotic capacity expressions for this case of square-error distortion constrained attacks are more complicated but can still be obtained in closed form (Moulin and O’Sullivan 2003, Cohen and Lapidoth 2002).
CGauss =
A specific condition on the so-called capacity-achieving probability distribution has been derived under which distortion-compensated QIM can achieve capacity (Chen 2000, Eq. (4.2)). This condition is satisfied for (1) the Gaussian-host, additive Gaussian noise channel case (Chen 2000), (2) the Gaussian-host, square-error distortion
Quantization Index Modulation Techniques 607
constrained attack channel case (Moulin and 0’ Sullivan 2003), and (3) the non-Gaussian-host, square-error distortion constrained attack channel, high-fidelity limit case (Moulin and 0’ Sullivan 2003). In contrast, additive spread spectrum techniques such as AM-SS cannot achieve capacity in these cases due to the presence of host signal interference at the decoder. For example, in the white Gaussian host, additive white noise channel case, the capacity of additive spread spectrum is (Chen 2000)
where SNR, is the host signal to channel noise ratio. Thus, the gap between capacity-achieving distortion-compensated QIM and additive spread spectrum is SNR, + 1, a result that is reminiscent of the result (1 8) derived in Sec. 4 using less formal analysis.
6
Hybrid Transmission: A Practical Application
Careful examination of the capacity expression (19) in Sec. 5 suggests an interesting new application of data hiding techniques, provided that one employs host-interference rejecting techniques capable of achieving these high rates. Specifically, as we discuss in this section, the achievable rates implied by Eq. (19) are sufficiently high to enable data hiding based hybrid transmission schemes to backwards-compatibly upgrade legacy communication networks. The host signal x is the signal that is currently transmitted by the legacy network. The “watermark” rn is a new signal that one wishes to transmit over the same network. Because the embedding distortion D, is sufficiently low, legacy receivers can continue to receive x. New receivers containing watermark decoders can receive rn. In this sense, the newtork upgrade to enable the transmission of the new signal rn is backwards-compatible with the legacy receivers designed to receive x.
608
B. Chen
To understand the capabilities of such a hybrid transmission system, we now examine the implications of Eq. (19) in more detail. Eq. (19) gives the achievable embedding rate in bits per host signal sample. Suppose the legacy network transmits continuous-time signals of some bandwidth B Hz. For example, these signals could be either analog television or radio signals or modulated digital signals such as continuous-time waveforms corresponding to QAM or QPSK signals. Since there are 2 independent host signal samples per second for every Hertz of host signal bandwidth (Lee and Messerschmitt 1994), the capacity (19) in bits per second per Hertz 1s
+
C = log,( 1 DNR) b/s/Hz. (20) With no embedding, the received SNR of the legacy receivers is SNR, = a;/a; since the only source of interference to the host signal x is the channel noise n. Embedding the new signal m creates an additional interference source DNR times as strong as the channel noise. Thus, when the network is upgraded to carry m, the legacy receivers’ received SNR becomes
4 D,+aZ
-
SNR, 1+DNR’
+
a decrease by a factor of 1 DNR. In dB terms, the decrease in SNR is 10 log,,( 1 DNR) dB. (21)
+
The ratio between (20) and (21) gives the gain in data rate for the new signal M for each dB drop in quality of the legacy signal x. This ratio is
C=
+
1 log,( 1 DNR) = - log, 10 z 0.3322 b/s/Hz/dB (22) 10 10 log,,( 1+ DNR)
Thus, the available embedded digital rate in bits per second depends only on the bandwidth of the host signal and the tolerable degradation in received host signal quality. Achievable rates for several types of host signals are shown in Table 1.
Quantization Index Modulation Techniques 609 ~~
Host Signal
Bandwidth
Capacity
NTSC video
6 MHz
2.0 Mb/s/dB
Analog FM
200 kH2
66.4 kb/s/dB
Analog AM
30 kHz
10.0 kb/s/dB
Audio
20 kHz
6.6 kb/s/dB
Telephone voice
3 kHz
1 .O kb/s/dB
Table 1. Acheivable rates for hybrid transmission over additive Gaussian noise channels for various types of host signals. Capacities are in terms of achievable embedded rate per dB drop in received host signal quality.
A prototype implementation of such a hybrid transmission system was developed recently to backwards-compatibly upgrade cable television networks (Chen 2001a). In that case, the host signals were NTSC television signals. These signals were divided into time and frequency-domain regions, and each region exhibited different sensitivity to embedding distortion. For example, inactive video regions in the time-domain such as the so-called vertical blanking interval and horizontal sync pulses were less sensitive to distortion than active video regions. Similarly, in the frequency domain, some frequency subbands were less sensitive to distortion than others. Thus, one cannot apply the top line of Table 1 directly to this particular implementation since the derivation of that line assumes that embedding distortion is distributed uniformly across time and frequency. Within each time and frequency region, however, one can apply (22) to determine the achievable data rate within each region given the
610 B. Chen
region’s bandwidth and maximum tolerable drop in SNR. Leaving some implementation margin to account for non-capacity-achieving error correction codes, uncertain channel noise conditions, imperfect timing and synchronization, and other practical issues, one can achieve a data rate of up to 6 Mbps in a single 6-MHz television channel. This data rate is sufficient to carry two new 3-Mbps digital video signals in addition to the existing analog NTSC signal, for example.
References Chen, B. (2000), Design and analysis of digital watermarking, information embedding, and data hiding systems, PhD thesis, Massachusetts Institute of Technology, Cambridge, MA. Chen, B. (2001a), “The key to unlocking network assets: Increasing capacity by embedding content in current video transmissions,” CED Magazine, vol. 12, pp. 102-104. Chen, B. and Wornell, G.W. (2001b), “Quantization index modulation: A class of provably good methods for digital watermarking and information embedding,” IEEE Transactions on Information Theory, vol. 47, pp. 1423-1443. Chen, B. and Wornell, G.W. (2001c), “Quantization index modulation methods for digital watermarking and information embedding of multimedia,” Journal of VLSI Signal Processing Systems for Signal, Image, and Edeo Technologv, Special Issue on Multimedia Signal processing, vol. 27, pp. 7-33. Cohen, A.S. (200 l), Information theoretic analysis of watermarking systems, PhD thesis, Massachusetts Institute of Technology, Cambridge, MA.
Quantization Index Modulation Techniques 611
Cohen, A.S. and Lapidoth, A. (2002), “The Gaussian watermarking game,” IEEE Transactions on Information Theory, vol. 48, pp. 1639-1667. Jayant, N.S. and Noll, P. (1984), Digital coding of waveforms: Principles and applications to speech and video, Prentice-Hall, Englewood Cliffs, NJ. Lee, E.A. and Messerschmitt, D.G. (1994), Digital communication, second ed., Kluwer Academic Publishers, Boston, MA. Moulin P. and 0’Sullivan, J.A. (2003), “Information-theoretic analysis of information hiding,” IEEE Transactions on Information Theory, vol. 49, pp. 563-593. Zamir, R. and Feder, M. (1996), “On lattice quantization noise,” IEEE Transactions on Information Theory, vol. 42, pp. 11521159.
This page intentionally left blank
Chapter 21 Digital Watermarking for Digital Rights Management Sai Ho Kwok
A key function of digital watermarking is the insertion of digital information into digital media contents so that the embedded information can be used for performing digital rights management (DRM) in the context of electronic commerce. DRM enables the secure distribution, promotion, and sales of digital media content over the Internet. In general, DRM relates to copyright and copy protection, ownership assertion and verification, rights and payment enforcement and so on. Some commercial DRM solutions are based on watermarking technologies and they utilize digital watermarks in many different ways and for different purposes. Some solutions not based on watermarking technologies (cryptographic). This chapter first considers some commercial DRM solutions and standards by discussing their definitions of DRM, their core technologies, and their architectures. They include Digimarc, InterTrust Rights Systems, Microsoft Windows Media Rights Manager (WMRM), and Secure Digital Music Initiative (SDMI). This discussion leads to a summary of the features required for a practical DRM solution. The next section is devoted to the rights information needed in DRM and the integration of this rights information into a digital watermark. An SDMI-based DRM system is then presented to demonstrate how digital watermarking relates to DRM and more importantly to illustrate the design of 613
614
S. H. Kwok
digital watermarks for DRM. Finally, the chapter concludes with a comparison of watermarking-based and non watermarking-based DRM systems.
1
Introduction
In the last ten years, we have witnessed an explosion in the distribution of electronic digital material via media such as compact discs (CD-ROMs), digital video discs (DVDs) and digital television. Digital technology has also made duplicating and distributing illegal copies of copyright-protected material fast, easy and cheap. In order to combat illegal copying and the consequent revenue losses, the owners and distributors of copyrighted electronic media, such as image libraries, the music and film industries and online publishers, have sought ways to detect and prevent copyright infringement. Digital rights management (DRM) systems can be used to protect property rights and the business of distributing copyrighted material, specifically in the domains of electronic commerce and mobile commerce (Hartung and Ramme 2000). In these domains, DRM enables the secure distribution, promotion, and sales of digital media content over the Internet (Windows Media DRM 2002). Rights management has become a very important concern of content providers for rights verification, ownership identification, copyright protection, and control of rights to access. Rights management, in general, refers to the management of intellectual property rights, including copyright protection, and also to assuring that, in a commercial setting, payment is made for use of content and that the actual use does not exceed the authorized use (Treese and Stewart 1998). DRM systems typically incorporate encryption, conditional access, copy control mechanisms, media identification and tracing mechanisms (Digimarc Corporation 2002b). Digital watermarking is regarded as a tool for rights management. Page
Digital Watermarking for Digital Rights Management 615
(Page 1998) defined digital watermarking as a form of rights management in the protection and authentication of digital images, audio, and video. Digital watermarking basically adds information that is normally invisible or barely visible to the human eyes inaudible or barely audible to the human ear, but that can be detected in order to identify the authenticity or copyright owner of the media. Watermarking is an essential component of DRM systems. Until now, there has been no international standard to govern rights management or intellectual property protection in electronic commerce for using digital watermarking. Solutions described in the literature usually suggest applications and services with their own rules, regulations, and approaches. The Digital Property Rights Infrastructure (DPRI) is a research project that specifically addresses copyright issues in music distribution by the Hong Kong University of Science and Technology. The project is funded by the Hong Kong SAR Government under the Industrial Support Fund (ISF) scheme. The DPRI project has been completed and our empirical results show that digital watermarking could improve not only copyright management but also the broader rights management in electronic commerce systems, especially online music distribution. The prototype of SDMI-based DRM system presented in this chapter is part of the DPRI project.
2
Commercial DRM Solutions
This section reviews three popular commercial DRM solutions, namely Digimarc, Intertrust, and Windows Media Rights Manager (WMRM). Digimarc
Digimarc Corporation (Digimarc Corporation 2002b) is a provider of digital watermarking components and technologies for security,
616 S. H. Kwok
identification and brand protection applications. Digimarc places an inherent digital identity into all media content, allowing media owners or issuers to verify content, authenticate content, link media to related databases, manage and track digital assets, and prevent unauthorized copying. The Digimarc DRh4 solution was initially designed for digital images. Its core technology is called Digimarc ImageBridgeTM. Digimarc ImageBridgeTMwatermarking allows owners of digital images to communicate their copyrights, track their images as they travel the web, and provide image commerce opportunities. All distributed images carry unique watermarks. A Digimarc ImageBridge reader allows users to read the watermarks saved on a hard disk directly from Windows Explorer or any web page as they browse the web with Internet Explorer InterTrust InterTrust technologies (InterTrust Technologies 2002a) covers many aspects of the basic infrastructure necessary for protecting and managing digital media, enterprise-trusted computing, and next-generation distributed computing platforms. InterTrust DRM technology is suitable for a wide variety of business models, ranging from Business-to-Consumer (B2C) to Business-toBusiness (B2B). Watermarked digital products at all stages of a distribution chain are protected and managed by the InterTrust Rights)SystemTM technology. Concerned parties include the publisher, the customer, and the payment and usage clearinghouses. InterTrust offers a single server architecture for music service providers. The InterTrust Rights(SystemTMtechnology has been optimized for content, service, and technology providers seeking to deliver high-volume, retail-based and subscription services. The InterTrust DRM can also support superdistribution, in which the authorized user can pass the protected content to other potential
Digital Watermarking for Digital Rights Management
617
users while the distributed content remains protected and the rules defined for the protected content are always enforced. Windows Media Rights Manager (WMRM)
In general, WMRM provides rights protection and management in the Windows environment. The second-generation technology, WMRM version 7 (Windows Media DRM 2002), provides both server and client software development kits (SDKs) that enable applications to protect and play back distributed media files. The Windows Media Encoder 7 is a powerful production tool for converting both live and pre-recorded audio and video into Windows Media Format. Services with live content delivery to client computers and to a file for later usage are supported. Realtime signal sources from audio or video cards, including a CD player, microphone, VCR, or video camera can be easily captured as source materials.
2.1
Comparisons among DRM solutions
Tables 1 to 3 present the details of the three commercial DRM solutions. Table 1: Details of the Digimarc DFtM solution.
Digimarc I . DRM definition Digimarc watermarking solutions (Digimarc 2002a) provide the power to control images online. Brand managers, web masters, photographers, marketing managers, illustrators, and advertising managers are using Digimarc watermarking to: - communicate image copyrights - track images as they travel the web - enable image licensing and commerce opportunities L - enhance asset management applications
618
S. H. Kwok
2. Media Type
Document, Image (TIFF, PICT, JPEG, GIF, 'NG, PSD, BMP) Watermarking-based DRM: Digimarc 3. Ccore Technology irovides an effective defense in security Ipplications. Basic properties include: - Image Types: Types of images can be watermarked, including Bitmap images versus Vector images - File Format: A Digimarc watermark can reside in any file format supported by the Digimarc-aware host application - Coloring System: A watermark can be placed in RGB, CMYK, LAB or grayscale images, and will survive when an image is converted from one color space to another ligimarc watermarking is based on a patented echnique (Licensing Digimarc Technology '002) called perceptual adaptation. It is mperceptible and robust. 3asic features of watermarking include - Data carrying - Resistant to alteration or duplication - Recognizable under automatic inspection Iigimarc's digital watermarking products and 4. Architecture ,ervices form a complete copyright :ommunication system that image creators and listributors can use to communicate not only :opyright, but complete contact details to the :onsumers of their works. This system provides he tools and capabilities to embed watermarks n images, to detect and read watermarks, to link o complete contact details or web sites for the mage creator or distributor (for inquiring about isage rights, licensing, etc.), and to track use of vatermarked images on the Web.
Digital Watermarking for Digital Rights Management 619
Table 2: Details of the InterTrust DRM solution Inter Trust Rights Systems stems I 1. Drm definition Iigital Rights Management (DRM) is the .mbrella term for new business trust assurance lrocesses designed to unleash the tremendous apabilities of the Internet. DRM technology lrovides tools to enable these new processes InterTrust Technologies - DRM 2002). iny digital content: document, image, audio, 2. Media Type ideo, game, software, other portable media ion 3. Core Technology watermarking-based DRM: InterTrust )RM solution is based on a cryptographic irotocol (also called RightslSystem) that - aims to combine security, scalability, easy implementation, and viable deployment (Abadi, et al. 2002) - provides incentives to participants in a peer-to-peer distribution system to redistribute a file to many others (Golle, et al. 2002) - can considerably simplify the design of a secure system when the keys are reused in certain situations (Haber and Pinkas 200 1). rhere are four components in the RightslSystem amily : - Packager: enables content owners, distributors, and service providers to create digital products from content and package them for distribution. This is accomplished by first encrypting the content, and then creating a RightslPackTM that contains the usage rules for that content.
620 S. H. Kwok
Server: establishes and maintains the1 secure infrastructure for the system; authorizes and delivers rights to users of the system. Clients: enable consumers to transparently access and use protected content on a variety of devices, including PCs, mobile phones, set-top boxes, and music players. The clients are also responsible for enforcing the rules on the protected content. Toolkits: enable independent software vendors (ISVs), application and solution providers, and media player developers to quickly integrate InterTrust’s digital rights management (DRM) technology into their products.
Digital Watermarking for Digital Rights Management
4. Architecture
621
The content owner first selects the content file for distribution. The content can be music, video, text, or any combination of media. The content is then encrypted using the Packager. The Packager is used to specify usage rules that define the digital product to be sold. Usage rules are delivered in a secure file called a RightsIPack. The protected content is sent to the retailer’s or distributor’s content distribution system. At the same time, the RightslPack containing the usage rules is also sent to the Content Rights Server. A consumer selects one of the digital products and pays for it. After the transaction has been approved, the Authorization Generator issues an authorization to the RightslClient software on the consumer’s device. When the autoriztation is received, the RightslClient software automatically retrieves the content from the content distribution system and retrieves the RightslPack containing the usage rules from the Content Rights Server. When both the protected content and its associated RightsJPackhave been retrieved, the user accesses the content according to the rules defined in the RightslPack.
622
S. H. Kwok
Table 3: Details of the WMRM DRM solution. Yindows Media RiPhts Manager 1.. DRM definition DRM is a technology that enables the secure distribution, promotion, and sale of digital media content over the Internet (Definition of DRM 2002). DRM is a set of technologies content owners can use to protect their copyrights and stay in closer contact with their customers. In most instances, DRM is a system that encrypts digital media content and limits access to only those people who have acquired a proper license to play the content (Windows Media DRM 2002). Audio, Video, and portable media ,. Media Type 2. Non watermarking-based DRM: WMRM is . Core Technology 3. also an encryption-based DRM solution. The WMRM features include the following. - Secure Distribution of Digital Media - Flexible Business Models - Highly Scalable Platform - Supports SDMI devices, non-SDMI devices, and portable media - Business rules: expiration date, unlimitec play, transfer to SDMVnon-SDMI, burr to CD, start time, end time, duration counted operations (plays Transfer) - Encryption of content and licenses
Digital Watermarking for Digital Rights Management 623
4. Architecture
2.2
The WMRM processes include packaging, distributing, establishing a license server, license acquisition, and paying for the media file (Architecture of WMRM 2002). WMRM helps protect digital media (such as songs and videos) by packaging digital media files. A packaged media file contains a version of a media file that has been encrypted and locked with a “key”. This packaged file is also bundled with additional information from the content provider. The result is a packaged media file that can only be played by a person who has obtained a license from a license server. When a consumer acquires an encrypted media file from a Web site, he/she must also acquire a license that contains a key to “unlock” the file before the content can be played. Content owners can easily create these licenses and keys to protect their content files with WMRh4 and then distribute the content to consumers.
Summary
The above three DRM solutions differ in many aspects, including the core technologies, architecture and so on. This is primarily due to their DRM definitions and missions. The watermarking-based DRM solution (Digimarc) focuses on the use of watermarks. It focuses on the copyright issues of the distributed images, the location of the embedded watermarks, and asset management. The non watermarking-based DRM solutions (both InterTrust and WMRM) focus on the integration of DRM into business processes. In the next section, we introduce a DRM standard that focuses on both business and watermarking issues. The DRM standard is Secure Digital Music Initiatives (SDMI).
624
3
S. H. Kwok
SDMI
The Secure Digital Music Initiative (SDMI) is a forum that has brought together more than 200 companies and organizations representing information technology firms, consumer electronics manufacturers, security technology developers, the worldwide recording industry, and Internet service providers. The objectives of SDMI are to develop open technology specifications that protect the playing, storing, and distributing of digital music so that a new market for digital music can emerge. The open technology specifications will ultimately (SDMI Portable Device Specification - Part 1 1999): (1) provide consumers with convenient access to music both online and through new emerging digital distribution systems; (2) enable copyright protection for artists’ works; and (3) promote the development of new music-related businesses and technologies. SDMI is focused on two tracks. The first track has already produced a standard, or specification, for portable devices. The longer-term effort is to work toward completion of an overall architecture for delivery of digital music in all forms. Following the proposed SDMI specifications (SDMI Portable Device Specification - Part 1 1999), a DRM system for the online music distribution business can be designed. SDMI DRM is composed of a number of functional components, including a portable device (PD), a licensed compliant module (LCM) and a portable media (PM). These components provide a secure environment for music distribution, rendering, and storing. The functional reference model of SDMI is depicted in Figure 1. Different components are operated at different layers in the reference model.
Digital Watermarking for Digital Rights Management 625
~
Figure 1: Reference Model Functional Layers of SDMI (Version 1.O),
Application Layer The application layer hosts all SDMI-compliant electronic music distribution applications, software players, home library software applications, CD extractors and other applications. Rights management and screening take place in the application layer. LCM
The licensed compliant module (LCM) supports content in various formats to be transferred from SDMI-compliant applications to PDs and PMs. The LCM may serve as a trusted translator in the case where there is a PD format that the application cannot interpret, so that SDMI applications are not required to communicate directly with all PD formats. As depicted in Figure 1, it is expected that an application may communicate with multiple LCMs while a single LCM may also communicate with multiple applications. One important function of an LCM is to provide an abstracted device interface to SDMI applications for PDsPMs.
S. H. Kwok
626
Portable Device Only SDMI-protected content is allowed in communication. The portable device (PD) layer receives SDMI-protected content from the LCM-PD interface. The PD layer constitutes the playback component of the PD reference model, which allows for multiple PD formats as depicted in Figure 2.
1 1 1
PD - Version 1.0
Interface
I I I
Authenticated Input APls
Audio Out
Portable Media
Figure 2: Functional Reference Model of Portable Device (Version 1.O).
3.1
Digital Watermarking and Rights Management in SDMI
A digital audio watermark is basically a series of number for digital music, while digital image and video watermarks can be a pattern or a logo for visual digital content. They all have something in
Digital Watermarking f o r Digital Rights Management
627
common. That is, the watermark in these applications carries meaningful information for later processing. In the context of rights management, this information may serve for owner verification, usage control, access control, and other applications. The SDMI portable device specification (SDMI Portable Device Specification - Part 1 1999) states that digital watermarking is required for both local and distributed SDMI-protected content and electronic music can also be watermarked. Digital watermarking is used to identify the owner of the property. SDMI highlights the importance of digital watermarking and includes it in its specifications. However, SDMI does not establish the design of the digital watermark and how to apply digital watermarking efficiently in rights management. In addition, other potential benefits and advantages of using digital watermarking in SDMI, such as marketing and control functions, are also not provided. In SDMI, rights management includes usage rules that are expressed by content providers to govern the content’s use in the SDMI domain. For example, usage rules include rules governing copying (including number of copies/generations of copies permitted), moves, check-idcheck-out (including number of useable copies), export from the SDMI domain, and combinations thereof. Usage rules are embedded, attached andor associated with the content in a protected manner.
4
The SDMI-based RMS
The SDMI architecture provides the basis for a wide range of realworld rights and obligations, including intellectual property rights, and real-world rules related to those rights and obligations, to be securely enforced. SDMI was initially designed for digital music. However, the architecture is extendable to other electronic media, because the major functional components in SDMI’s architecture
628
S. H. Kwok
are mentioned but not defined. It is feasible to integrate more desirable functions in these components to support other media, while the system structure is maintained as the specification. We present an SDMI-based rights management system (RMS) for various media contents using digital watermarking. The system basically integrates both audio and video services. The SDMIbased RMS is a three-tier clientlserver system as depicted in Figure 3. The diagram demonstrates that the RMS server is accessed by a client through the multimedia player. The RMS server can support multi-client access. There are three major functional components in this system. They are the RMS server, the rights management database (RMDB), and the multimedia player (MP). Moreover, digital watermarking is used to carry messages between these components. Communications between two or more SDMIcompliant components are protected and authenticated by a secure authenticated channel (SAC) in SDMI. SDMI-protected media are used throughout the system.
Digital Watermarking for Digital Rights Management 629
I
Svstem Flow of the SDMI-based Riahts Manaaement Svstem Watermarked Media
Connecto to the designated RMS
server. Which is spscifled in ths extraded URL from the watermark. The Praduol ID, Uosr ID and olher information are sent la the RMS
Rights Management System
9erv-r through the internet using
\
SAC.
n
The RMS serverrearchr for the Pmducl ID and othsr informationfrom the RMDB
MP extracts the nghts informationfrom the watsrmarked media
If ail infrwmation matches
and is correct an approval will be rstumsd to the MP by the RMS sewer In addiUon the User ID and other useful informationare loggsd with the corresponding~ n l r y o the f RMDB
The access nghts or “sag8 nghts are approved And the MP can stan to play the media accoiding la b e
permission granted Multimedia Player IMP) at the Client side Rights Management Database
Figure 3: Major components of the SDMI-based rights management system.
4.1
RMS Server, RMDB, and MP
To manage rights over distributed electronic media, it is necessary for clients to be in communication with a server. With the client/server setup, the RMS server is responsible for “global” rights management, including intellectual property assets. “Global” rights management means that the RMS server is in control of the overall rights management of the electronic media. The RMS server can initiate, maintain, and terminate the usage rights of the electronic media when the MP is accessible to the Internet. When the MP is disconnected from the Internet, the “local” rights management module within the MP will be invoked and will handle all rights issues of the media locally under certain constraints. These constraints include the following.
630 S. H. Kwok 0
The MP must be registered with the RMS server before it is actually used
0
The user must register with the RMS server
0
Authentication is enforced on the MP The local rights management must be initiated by the RMS server when the electronic media is being loaded by the MP for the first time
0
The usage rights for a digital media has a time limitation, so the MP is required to connect to the RMS server for an update when the usage rights have expired
The RMS server incorporates with the server-side database, RMDB, to determine the rights for a particular electronic media. Upon an access request from the MP, the RMS server may know the Internet Protocol address of the MP and therefore the RMS server may interpret the geographical location of the MP and keep it the RMDB for further processing and other services. The rights management database (RMDB) holds all rights-related information as well as RMS server-generated information. In RMDB, a data entry may refer to a customer’s, company’s, or product’s rights information. This can be represented by the product ID, user ID, company ID, access rights, company web address, and so on. This can be a LCM according to the SDMI specifications. The multimedia player (MP) in the rights management system is an SDMI-compliant and video-enabled player. It is a PD and may contain an LCM. Apart from rendering the video and audio media, the major function of the multimedia player is to integrate the rights management module. The rights management module is an
Digital Watermarking f o r Digital Rights Management
631
additional module to communicate with the RMS server and manage rights locally.
5
Digital Watermarking and Rights Information
Watermarking is used for copy control and media identification and tracing. Most proposed watermarking methods use a so-called spread spectrum approach (Cox, et al. 1997, Kirovski and Malvar 2001), which is a pseudo-noise signal with small amplitude added to the host signal and later on detected using correlation methods. A secret key is needed to ensure that the watermark can only be detected and removed only by authorized parties. Digital watermarks carry certain information that is favorable to the content providers and distributors. The information contains both DRM-related and DRM-unrelated information. The DRMunrelated information refers to information about business concerns and business goals. In the context of the electronic media distribution business, business goals may include marketing and promotion. To meet these goals, information about the user (ID, credit, rights information, etc.), the user’s affiliations (interests, nationally, gender, etc.), the product (ID, serial no., model number, type, name, etc.), and the product’s affiliations (fan club, related products, accessories, etc.) are usually needed (Benedens 1999). This information can be easily related to each other using a database management system and indexing information can be used for digital watermarks in rights management. The DRM-related information usually refers to copyright and licensing information, such as the identifier of the copyright holder, the creator of the material, authorized usage of digital media or a link (URL) through which to gather more related information. This information is in response to the DRM requirements. In the
632 S. H. Kwok
electronic media distribution business, authors and publishers who distribute their digital media or contents online expect to be able to specify the permitted use of the distributed contents in a standard format. Moreover, the property owners also demand that the DRM systems can detect and track duplicate copies of their properties. Technically, a digital watermark contains a serial number that uniquely identifies material to registered entities. The rights information together with product information, customer profile, and company information are represented by a key when digital watermarking is in use (Kwok 2002, Kwok and Yang 2002). The key is converted into a digital watermark using a hashing function or a psuedo random generator for data embedding. Subsequently, a host media embeds the digital watermark, and it becomes a watermarked media with rights management enabling features.
A key function of watermarking in a DRM system is to provide evidence for prosecution if someone attempts to break the usage rules of the media. Watermarking can be used for identifying the source of the infringing material. A company that licenses videos online and enables customers to download licensed videos can insert a separate watermark for each download with information identifying the licensee. If an unauthorized use of a video is discovered, then the watermark can be checked to determine from which licensee it came. It may be that the pirated copy was copied from a legitimate copy, but this helps a rights holder to prevent unauthorized dissemination. Digital watermarking technology enables rights holders to enforce their intellectual property rights by seeking out and prosecuting copyright pirates. As with much Internet law, there are jurisdictional problems, with infringement being rife in countries with the least protection, but the provisions presented here will at least ensure that rights holders are able to protect fully their works in a large part of the world market.
Digital Watermarking for Digital Rights Management
633
Conclusions
6
In concluding this chapter, we first present some required features for a desirable DRM system. The following features are referenced to our study of the commercial DRM solutions and the DRM standard - SDMI.
- Enable intellectual property rights, including copy -
-
-
-
protection Support online ownership verification Integrate and persistently protect the artistic integrity of media Support various electronic media and formats Enable pass-along distribution between consumers, know who they are, and get paid by new listeners Create merchandising, promotion, direct response marketing, and affinity groups Receive timely financial payments and usage information for the media Enable the user’s multimedia player for rights management Ensure that all distributed media are SDMI-protected content Track the usage of the media Locate where the distributed media is and who owns the media content Support and respect media label policies - usage rules Use watermarking for embedding terms and usage rules
We presented an SDMI-based DRM system for the music distribution business to demonstrate these features, in particular the role of watermarking in DRM. SDMI was chosen because it is likely to be adopted by the recording industry for music distribution and SDMI-complaint products are already available in the market. The SDMI-based DRM system address both business and DRM concerns. It covers the interests of involved parties - artists,
634 S. H . Kwok
publishers, consumers, etc. - by providing a secure and protected environment in which to conduct business. The SDMI-based solution differs from WMRM and InterTrust solutions because it uses watermarking in its implementation. In WMRM and InterTrust, a cryptographic DRM solution is used instead of watermarking. We do not agree that an encrypted license is equivalent to a digital watermark. Cryptographic-based and watermark-based copyright protection schemes are two commonly used technologies for rights protection. In cryptographic-based rights protection, digital contents are always distributed in their encrypted forms. Given proper permission from the content provider or owner, clients are allowed to access the encrypted contents. However, when a piece of encrypted digital content is decrypted, it becomes ordinary digital content that is no longer protected and carries no rights information. As a result, when the decrypted digital content is distributed to unauthorized consumers illegally, it is almost impossible for a cryptographic-based DRM system to trace the person who has distributed the illegal copy of the digital content or to discover from where it actually came. To address this problem, many systems achieve rights protection by attaching a code or a tag in a form of a digital watermark that uniquely identifies both the creator and the consumer of the digital content. This indicates that the watermarking-based DRM solution is better than the nonwatermarking-based (cryptographic-based) solution. However, one major problem of watermarking-based DRM is that watermarking technology is still at its infant stage and a “truly” robust watermarking technique is not yet available (Craver and Stem 200 1, Min et al. 2001).
Digital Watermarking f o r Digital Rights Management
635
References Abadi, M., Glew, N., Horne, B., and Pinkas, B. (2002), “Certified email with a light on-line trusted third party: design and implementation,” Proceedings of The Eleventh International World Wide Web Conference, pp. 387-395. Architecture of Windows Media Rights Manager, http://www.microsoft. com/windows/windowsmedia/wm7/drm/a rchitecture.asp Benedens, 0. (1999), “Geometry-based watermarking of 3D models,” IEEE Computer Graphics & Applications, vol. 19, pp. 46-55. Cox, I.J., Kilian, J., Leighton, F.T., and Shamoon, T. (1997), “Secure spread spectrum watermarking for multimedia,” IEEE Transactions on Image Processing, vol. 6 , pp. 1673-1687. Craver, S. and Stern, J.P. (2001), “Lessons learned from SDMI,” Proceedings of the IEEE Fourth Workshop on Multimedia Signal Processing, pp. 213-218. Definition of DRM, http://www .microsoft.com/windows/windowsmedia/wm7/drm/d efinition.asp Digimarc Corporation: The Leading Digital Watermarking Developer, http ://www .digimarc.com Digimarc I Digimarc Media Commerce http ://www .digimarc.com/imaging/default .asp
Home
Page,
636
S. H. Kwok
Golle, P., Jarecki, S., and Mironov, I. (2002), “Message-aware of Financial cryptographic primitives,” Proceedings Cryptography, pp. 11-1-1 1-16. Haber, S. and Pinkas, B. (2001), “Securly combing public-key crytosystems,” Proceedings of The 8th ACM conference on Computer and Communications Security, Philadelphia, PA, USA., pp. 215-224. Hartung, F. and Ramme, F. (2000), “Digital rights management and watermarking of multimedia content for m-commerce applications,” IEEE Communications Magazine, vol. 38, pp. 7884. InterTrust Technologies, http://www.intertrust.com InterTrust Technologies - About DRM, http ://www .intertrust .com/maidoverview/drm. html Kirovski, D. and Malvar, H. (200 l), “Robust spread-spectrum audio watermarking,” Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing 2001, pp. 1345-1348. Kwok, S.H. (2002), “Digital Rights Management for Online Music Business,” ACMSIGecom Exchanges, vol. 3, pp. 17-24. Kwok, S.H. and Yang, C.C. (2002), “Watermarking in online media e-business,” Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC), pp. 158-163. Licensing Digimarc Technology, http://www.digimarc .com/licensing/index.htm Min, W., Craver, S., Felten, E.W., and Liu, B. (2001), “Analysis of attacks on SDMI audio watermarks,” Proceedings of the IEEE
Digital Watermarking f o r Digital Rights Management 637
International Conference on Acoustics, Speech, and Signal Processing 2001, pp. 1369-1372. Page, T. (1998), “Rights management: digital watermarking as a form of copyright protection,” Computer Law & Security Report, pp. 390-392. SDMI Portable Device Specification - Part 1, Version 1.0, http://www .sdmi.org/download/port-device-specjartl .pdf Treese, G.W. and Stewart, L.C. (1998), Designing Systems for Internet Commerce, Addison Wesley. Windows Media DRM, http://www.microsoft.corn/windows/windowsmedia/drm.asp
This page intentionally left blank
Chapter 22 Watermark for Industrial Application Zheng Liu and Akira Inoue
With the development of digital watermarking technology and the rapid growth of network distributions of digital contents, more and more digital watermarking products have entered the market for various applications. In t h s chapter, we introduce some representative applications of watermarkmg technology in copyright protection, authentication, and data hiding, respectively. At present, many of the examples presented in this chapter have their commercial products in the market, but some of them are just proposals under consideration. With the maturity in digital watermarking technology and application environments in the near future, it will be possible for all of them to have their products in the market.
1
Introduction
In recent years, digital watermarking technology has become a very active research field. With the development of digital watermarking technology and the rapid growth of network distribution of digital contents, more and more commercial products using watermarking technology have entered the market for various applications such as copyright protection, authentication, and data hding. Compared with the research work in its primary stage, application of watermarking technology will becomes a more important and more challenging topic. This is not only because application is the initial purpose of digital watermarking research, but also it is really an inter639
640
2. Liu and A . Znoue
esting and challenging field where we can use developed watermarking technologies to address various practical issues with different system requirements for different media types such as digital image, video, and audio. Therefore, generally, the digital watermarking technologies used in practical products have the characteristics as follows.
Multi-technology. An effective digital watermarkmg approach for a practical resolution usually requires a comprehensive technique, which could be a combination of multiple watermarking techmques including some cross-disciplinary such as cryptography, pattern recognition, signal processing etc., in order to meet a wide range of requirements for practical applications. Multi-Capability. Besides the general requirements for imperceptibility and robustness, an effective watermarking scheme should take into account of more practical issues at the same time, such as the time consumed for watermark embedding and detection, the capacity of watermark information, the capability of multi-user, the management ability of watermarked contents, a detector with low cost and high reliability, and etc. Multi-Utility. Although the original motivation for digital watermarking is to protect the copyrights of digital contents, the uses of digital watermarking have related to more and more other kinds of “media”, such as printed materials, texture images, designs, copy machmes and scanners, and anythmg for which the copyright protection is required or the role of digital watermarking can be played in. Limited space f o r watermarking techniques. Because practically the space in digital contents where watermarking technologies can be useful is very limited, usually an effective approach for a special task may be a method only using a trick of image processing, not a complex algorithm with more value in theory.
Watermark for Industrial Application 641
The first two characteristics implicate that an effective watermarking approach must be required to have multi-function to meet various requirements for applications. However, it is impossible to have the multi-function by using various watermarkmg techniques together without restriction, because of the limitation on content quality within which the signal of digital contents can be changed as much as needed. Therefore, some of the techniques may be considered worthless practically in spite of their value in theory, if they cannot meet the requirements for applications by themselves or by combined with other techniques. The last characteristic implicates that it is impossible to renovate watermarking technology repetitiously like some other fields where the major t e c h q u e s can be renewed again and again with the development of h g h technologies such as computer technology etc. The reason for it is the same as mentioned above, i.e., there is a limitation on the quality of digital contents. Therefore, the major topics of research for digital watermarking technology will be changed from its primary stage where most work was theoretical and methodological to an application stage where the major topics are how to apply the developed watermarking technologies to address the variety of issues in practical applications. For classification, the applications can be divided into several types according to the properties of digital contents or the objectives of applications. According to content properties, the applications of watermarking technology can be roughly divided into three types, application for still image, application for digital video, and application for digital audio, respectively. Generally, still mage includes full-color image, grayscale image, palette image, text image (binary image), and compression image such as P E G and PEG2000 etc. Digital video includes the uncompressed files such as AVI file and BMP stream file, and compression files such as Quick Time, Windows Media, Real Player, MPEGl, 2, and 4,etc. And digital audio
642 Z. Liu and A . Znoue
includes uncompressed files such as WAVE file, and compression files such as MP3, AAC, Real Audio, Windows Media, ATRAC and ATRAC3, etc. According to the objectives of applications, the applications of watermarking technology can be roughly divided into three types too, the application for copyright protection, the application for authentication, and the application for data hiding, respectively. Copyright protection attempts to protect the rights of content creators, distributors and users, which generally are completed by two stages, one for watermark embedding and the other for content monitoring (detection). In watermark embedding stage, content owner embeds watermark representing copyright information into original contents and distributes watermarked contents to users via the network by distributors, where a unique description of the original contents registered to a neutral authority is necessary. In watermarked content monitoring stage, a detection agent of distributors or a neutral authority supervises copyright violations by extracting the embedded watermark from the contents and reports the results to owners, distributors, or users. Since embedded information is the evidence of copyright, and watermarked contents may face intelligent and intentional attempts to destroy or remove the information embedded, the major challenge to this kind of watermarlung application is its high robustness to a wide range of unintended and intended attacks. Authentication application includes management of watermarked contents for commerce, identification of digital contents, authentication for printed material, copy control for digital contents, and copy control for replica m a c h e s , respectively. Since, in many cases, the results of authentication are related to genuineness of distributed matters, such as stock, bond, bill, coupon, and etc., it is required
Watermark f o r Industrial Application 643
that the watermark detection should have a reliability as high as possible. Note that, in the case of copyright protection, if there was some success probability in watermark detection, such as a success probability over 70 % in watermark detection to a set of distributed contents, it would be considered as sufficient evidence when some of the contents were re-distributed illegally. At the worst if some contents had failed in watermark detection, the loss to the owner was limited to the value of those contents only. However, in the case of authentication, if the same thmg mentioned above happened to a bond owner, i.e., he/she failed in watermark detection of the bond, the bond owner would be condemned as a wrongdoer and had even to face suit for hs/her “guilt”. Data hiding using watermarking technology includes mainly two applications. One is to embed information into digital contents, whch can be an address of the website where users can find more information about contents or make a registration online. The other is to embed information into digital contents, which is used as content indexing and retrieving by which users can classifL and retrieve the contents effectively. In this chapter, we introduce some representative applications of digital watermarking technology according to the objectives of applications, i.e., the application for copyright protection, the application for authentication, and the application for data hiding. Although steganography belongs to the topics of data hiding too, the major work of whch is to transmit some mformation secretly by hiding the information into media such as still image, audio signals etc., we will not give examples of steganography in t h s chapter. The reason for it is that there is a substantial difference between these two data hding, i.e., in steganography, the media is used as an envelop and so usually called as “cover image”, whle in digital wa-
644 2. Liu and A. Znoue
termarking, the media is used as major data and so usually called as “host image”. Meanwhile, visible watermark technology is also an important means for copyright protection, but we will not introduce its applications in this chapter either, because there are differences in both techniques and utilities between these two watermarking technologies. The rest of this chapter is organized as follows. In Section 2, we introduce some representative applications for copyright protection, which include watermark embedding and distribution, network monitoring, and copyright administration. In Section 3, we introduce some representative applications for authentication, which include administration of digital contents for commerce, authentication for ID Card, passport, bond etc., and authentication for copy control of copy machine and scanner etc. In Section 4, we introduce some representative applications for data hiding, which include the applications for information hkmg, and the applications for mformation indexing and retrieval. Lastly, we conclude this chapter in Section 5 .
2
Applications for Copyright Protections
Copyright protection is the most familiar application of watermarking technology that attempts to protect the rights of content creators, distributors and users. As mentioned in Introduction, copyright protection for digital contents is usually completed by two stages, one for watermark embedding and the other for content monitoring. In the stage of watermark embedding, content owner embeds watermark information that represents the copyright of contents into original contents and distributes watermarked contents to users via
Watermark f o r Industrial Application
645
the network by distributors. In the stage of watermarked content monitoring, a detection agent of distributor or a neutral authority supervises copyright violation and reports the results to owners, distributors, or users. Therefore, a general fi-amework for copyright protection of digital contents can be described as follows. 1. Allocating a unique registration number by a registration authority to content owner. 2. Embedding a digital watermark derived fi-om the registered
number into contents secretly to claim ownershp rights. 3. Distributing watermarked contents to users via network with labels to declare that the contents are covered by copyright protection. 4. Monitoring copyright infringement and reporting the results of
supervision to the owners.
In this section, we at first discuss the major issues concerning to digital watermarking for copyright protection. Second, we introduce the fi-amework if content monitoring. Last, as an example of application for copyright administration, we introduce a copyright administration system of digital music content.
2.1
Watermark Embedding for Copyright Protection
As applications for copyright protection, there are currently many kinds of watermarking products depending on the requirements of applications, media types, and watermarking approaches. Each has certain properties or attributes that makes it more suitable for some digital contents or applications than others. But, in terms of processes for watermark embedding, there is not much difference be-
646
Z. Liu and A . Inoue
tween these applications, i.e., all of them are completed by adding watermark information with or without pre-processing into digital contents pixel by pixel in spatial (time) domain or frequency domain. We will not give examples for them in this section. As technical factors when making a scheme for watermark embedding, we should consider three major issues at first, i.e., watermark format, watermark method, and watermark robustness, respectively. For watermark method, there have been many introductions in other section and we will not repeat them here. For watermark format, the watermark used in various applications can be roughly classified into two categories, mark type watermark and literal type watermark. In mark type watermark, copyright information is represented by an owner’s mark such as a company mark, a business logo, and etc. In literal type watermark, copyright information is represented by owner’s information such as a name of company, a serial number of contents, and etc. Watermark robustness is a major criterion for evaluating performance of watermarking techniques. There are many kmds of requirements for watermark robustness which depend on applications and media. Therefore, it is essential to have some standards of watermark robustness for various applications. Therefore, we will introduce some standardization for watermark robustness proposed by some associations or organizations in the following parts.
Watermark for Industrial Application 647
Figure 1. An example of embedding mark type watermark represented by an owner’s mark into Lena image, where (a) is an original image, (b) is an owner’s mark, (c) is a watermarked image, and (d) is a detected owner’s mark.
2.1.1 Mark Type Watermark Figure 1 shows an example of embedding mark type watermark represented by an owner’s mark into Lena image. In Figure 1, (a) is an original image, (b) is an owner’s mark representing a copyright or ownership of image (a), (c) is a watermarked image, and (d) is a detected owner’s mark. The mark (b) could be an image of a signature, a brand mark, a company mark, a business logo, and etc. For watermark embedding, owner’s mark (b) could be transformed into encryption random patterns, or transformed into a bit-stream of pseudo-random sequences by using a secret key, and then, embedded into original image (a). The watermark of mark type is generally used for still image, but it can be also used for video and audio wat ermarking.
A major advantage of using mark type watermark is that, since the existence of owner’s mark is identified by human eyes or by pattern recognition technology, a high reliability of detection can be acquired even if some pixels of detected mark were noised in detection. Figure 2 shows 4 samples of detected owner’s mark, where (a) to (d) is the owner’s mark with a pixel noise probability of O%, lo%,
648
Z. Liu and A . Inoue
20%, and 30%, respectively. As shown in Figure 2 (d), when pixel noise probability increased to 30%, we can still judge the existence of copyright mark with confidence by our eyes.
Figure 2. 4 samples of detected owner’s mark, where (a) to (d) is the detected owner’s mark with pixel noise probability of 0%, lo%, 20%, and 30%, respectively.
However, there are two drawbacks to mark type watermark. First, the judgment of the existence of owner’s mark relymg on human eyes or pattern recognition technology will be very time consuming, especially when delivering the service for contest monitoring to a large number of customers. Second, it is difficult and even impossible to register a large number of owner’s marks in advance, which also greatly limit the function of contest monitoring.
2.1.2 Literal Type Watermark Figure 3 shows an example of embedding literal type watermark represented by owner’s dormation into Lena image. In Figure 3, (a) is an original image, (b) is owner’s information, (c) is a watermarked image, and (d) is detected owner’s mformation. In Figure 3 (c), owner’s information can be owner’s identification information such as a name of company, a serial number of the image, and etc.
Watermark for Industrial Application 649
For watermark embedding, owner’s information can be transformed into a bit-stream of pseudo-random sequences by using a secret key, and then, embedded into original image (a).
Figure 3. An example of embedding literal type watermark represented by owner’s information, where (a) is an original image, (b) is owner’s information, (c) is a watermarked image, and (d) is the detected owner’s information.
By contrast with mark type watermark, there are two advantages of using literal type watermark. First, the owner’s dormation extracted from the watermarked image can be easily identified by using computer. Second, literal type watermark can be defined uniquely to a large number of individual contents. Therefore, the literal type watermark is widely used in various watermarkmg applications.
2.1.3 Standardization for Watermark Robustness As mentioned in Introduction, since embedded watermark for copyright protection is the evidence of copyright, and the watermarked contents may face intelligent and intentional attempts to destroy or remove the information embedded, the major challenge to t h s kind of watermarking applications is high robustness to a wide range of unintended and intended attacks. Meanwhde, watermark robustness must be formulated based on the requirement of content quality, i.e., the watermark scheme should be able to embed the data with the
650
2. Liu and A . Inoue
strength without affecting the perceptual quality of digital content. Without this requirement, anyone can easily embed the watermark with robustness as h g h as he/she wants. Furthermore, blind watermarlung is also a basic requirement for copyright protection, i.e., watermark detection mush be completed without using original signal, because, in general cases such as contents monitoring, it is difficult and even impossible to access to original signal. The blind watermarking makes the watermark of high robustness more difficult. Since there are various requirements for robustness depending on the applications of copyright protection, it is difficult to have a unique standard for evaluating robustness. In other words, for different applications, there should be different standard of robustness. In order to develop digital watermarking standards for its applications, many associations and organizations, such as the Copy Protection Technical Working Group (CPTWG), the Digital AudioVisual Council (DAVIC), the Secure Digital Music Initiative (SDMI), and the Japanese Society for Rights of Authors Composers and Publishers (JASRAC), have formulated various standards for their robustness testing. For robustness testing of digital audio watermarlung, SDMI has planed to implement its digital rights archtecture by two processes, Phase I and Phase I1 respectively, to examine technology that provides some security features for digital music and copyright protection for next-generation portable digital music devices. Completed in June 1999, Phase I screening looked for a watermark in the content, but SDMI compliant portable devices can still accept music in all current formats, i.e., whether protected or unprotected. And the A R I S audio watermarking technology (Verance USA) was chosen for Phase I screening technology. Completed in March 2000, Phase I1 incorporated the watermark detection that will allow new releases
Watermark for Industrial Application 651
to play whde filtering out pirate copies of music. In Phase 11, there were 12 companies that submitted proposals, but only 4 companies remained in the end. The details of robustness requirements for Phase-I1 can be found in the document of the Call for Proposals for Phase I1 Screening Technology. Table 1. The firms certified as capable of achieving the feasible level of technology by STEP 200 1
1. Firms which attained the feasible level of technology IBM of Japan (IBM Japan) Verance (Verance USA)
2. Firms anticipated to attain the feasible level of technology M-Ken (M.Ken Japan) MarkAny (MarkAny Korea)
In 2000 and 200 1, JASRAC has sponsored the t e c h c a l evaluation for audio watermarking called STEP 2000 and STEP 2001 respectively (STEP 2001/2001). The objectives of STEP 2000 and STEP 2001 is t o determine the feasible level of technology that would serve as an international guidehe in the utilization of digital audio watermarks, and to select technologies (companies) that have the capability of achieving this level. Table 1 and Table 2 describe the test results for STEP 2001 and the robustness test items used for STEP 2001, respectively.
652
2. Liu and A . Inoue
Table 2. The robustness test items of STEP 2001
I 13 I MP3 - 64kbtls(mono) 14 15 16 17
I 18 I 19 20 21 22 23 24 25 26 27 28
I 1
AAC - 128kbps AAC-96kbp~ ATRAC(MD) Ver4.5 ATRAC3 - 132kbp~ ATRAC3 - 105kbtls Real Audio - 128 kbps Real Audio - 64 kbns Windows Media - 128kbps Windows Media - 64kbps AM FM
Non Linear Data Compression
PCM AM FM
Linear Data Compression
PCM
For robustness testing of still image watermarking, the program
tools for testing the robustness of watermarking algorism for still
Watermark for Industrial Application 653
image such as Stirmark (Stirmark 1997), Checkmark (checkmark 2001), are generally used. However, so far, there is still not robustness testing for standardization of still image watermarkmg technology sponsored by some associations or organizations. For robustness testing of digital video watermarking, In June 1997, a technical subgroup called Data Hiding Sub-Group (DHSG) was formed under CPTWG to evaluate the t e c h c a l feasibility of a total of eleven watermark technology proposals. After the examination and some combination of the proposals, only 2 proposals out of 11 proposals submitted originally were remained in August 1999. Since the intention of CPTWG is to choose a watermarking scheme from the proposals as one standard technique for DVD copy control, the details will be introduced in section 3.3.1.
2.2
Content Monitoring System
Watermark detection is also an important task for copyright protection by whch we can certifjr and supervise the legality of contents kept by owners or users. With rapid growth of network techmques and improvements on telecommunication lnfrastructure compression technique, the distribution of digital contents such as still picture, music, and video over the Internet has been broadly prevailed, and in everyday, vast amounts of digital contents available to millions of people around the world. Therefore, for applications of copyright protection, it is essential and urgent to have a system by which the legality of contents moving in network can be certified and supervised compulsorily. Many companies have provided the service of content monitoring to users, for example, M.Ken (M.Ken Japan) has delivered a content monitoring system, PatrolRobot, for still image, and IBM (IBM Japan) has delivered a system, IBM EMMS, (Electronic Media Management System) for video monitoring, and etc. The similar system has not only been used in the Internet, but also used in other systems such as the monitoring service for broad-
654 2. Liu and A . Inoue
cast system called ConfirMedia delivered by Vearnce (Vearnce USA). Figure 4 shows a diagram of content monitoring system. Generally, there are two kinds of digital contents monitoring system. The type I is a system by which we can supervise the legality of digital pictures displayed on website. In type I monitoring system, the monitor showed in Figure 4 patrols website to website following a route formulated by supervisor and certifies the legality of pictures displayed on each website automatically and intelligently. If there is a picture f?om which a registered watermark is extracted, the address of that website and the information relevant to the picture will be recorded and sent to the related owner or distributors automatically. With the reports of content monitoring system, the owner or distributors can certify the legality in using the pictures, and then, deal with the concerning matters if the use of pictures is infringement of copyright. There are two major characteristics of type I monitoring system as follows. First, it can be not only used to supervise the copyright or user right of digital pictures displayed on website, but also can be used to supervise authenticity and genuineness of logo mark displayed on the website belonging to some agents or distributed by some authority agents. Second, it can work 24 hours a day and check pictures or logo marks displayed on website without the necessity of permission. The type I1 is a system by whch we can supervise the contents to be delivered to users by downloading, such as o n h e delivering digital video or audio to users. In type I1 system, the monitor showed in Figure 4 is a detection agent who downloads the contents whch likely to be a pirate edition, and checks the legality of distribution. If the digital watermark extracted from the content is not corresponding to the states of the contents, it is identified to be the pirate edi-
Watermark for Industrial Application 655
tion, and then, the detection agent can do the same t h g for infringement of copyright as those done in type I system. The major characteristic of type I1 system is that it can be used to supervise the legality of the contents to be distributed. But the action of type I1 system may be restricted by the case when the permission for entering the website to download the contents was required. Moreover, without the dormation of website ahead where the contents are suspected to be the pirate edition, it is also difficult for monitoring system to patrol the website automatically.
Figure 4.A diagram of content monitoring system
2.3
Copyright Administration
The other important issue for applications of copyright protection is copyright administration for watermarked contents. For applications of copyright protection, there are some issues beyond the scope of watermarking technique that we have to consider them first of all. First, the watermark itself does not originally provide any legal information of ownership. In other words, the watermark used to protect intellectual property must be registered to a trusted agent. Sec-
656 2.Liu and A . Znoue
ond, the problem of ownershp deadlock exists, i.e., with the current watermarking techniques, it is difficult to distinguish who watermarked the content first, since the watermark embedded by some one at first can be replaced or coexisted with the watermark embedded by somebody else later. First of all, it is necessary to have a copyright administration for digital contents before the application of copyright protection is put into practice. For the issue of copyright adrmnistration, some associations and organizations, such as the Japanese Society for Rights of Authors Composers and Publishers (JASRAC), are working hard to develop copyright administration for watermarked contents in order to promote the copyright protection using watermarking technology. Here, we introduce the plan of the Designs for the Administration of Works Using New Technology (DAWN 2001) sponsored by JASRAC as an example of copyright administration for music contents.
JASRAC. Established in 1939, JASRAC is still the only organization for management of copyright in music. The purpose of JASRAC is to protect the rights of music copyright holders, to ensure the smooth diffusion of musical works and to thereby contribute to the administration and growth of music culture. Owners of copyright in music assign the copyright to JASRAC in its capacity as trustee. The rights and obligations between the owners and JASRAC as prescribed in JASRAC's general terms are the conditions for trust. JASRAC receives a royalty from users of the music in accordance with the tariffs for use of musical works, and initiates legal proceedings against unauthorized users. Under the Law of Copyright Management Business enacted in October 2001, the general terms and conditions for trust and the tariffs shall be disclosed to public. DA HW 2001. In order to realize a global system of music usage utilizing electronic authorization and watermarking technologies by the year 2001, JASRAC has sponsored the DAWN 2001 plan which
Watermark for Industrial Application 657
will serve as a blueprint for a new copyright administration system, including copyright protection technologies that will ensure the integrity owners rights in digital o n h e music usages, as well as other types of usages. The plan will focus mainly on the issues as follows. 1. Creating an unified copyright information and licensing system, and a comprehensive set of rules for music usage and copyright for music usage and copyright clearance procedures. 2. Monitoring and preventing illegal copyright usage while providing efficient usage licenses by utilizing new technology. 3. Ensuring expeditious work registration and accurate royalty distribution. 4. Introducing new services that are made possible by the incorporation of new technology - work registration by copyright owners and usage returns by music users via the network is just one possibility.
With the accomplishment of DAWN 2001, users will be able to legally receive music distributed by authorized distributors. If the receiver of an Internet distributed work wishes to exploit the work on the Internet, another authorization would be negotiated in order to prevent illegal usages. At the same time, in order to promote the progress of DAWN 2001 and to find out some of audio watermarking techniques with a feasible level of technology in respect of robustness and audibility, JASRAC has sponsored the STEP 2000 and STEP 2001 in 2000 and 2001 respectively for robustness testing. The details of STEP 2000 and STEP 2001 can be found in section 2.1.3.
658
3
Z. Liu and A . Inoue
Applications for Authentication
Authentication is the second kind of application for digital watermarking. As mentioned in Introduction, the major work of authentication using watermarking technology includes three parts as follows.
1. The management of digital contents for commerce. 2. The authentication for certification matters such as stock, bill and ID Card, and etc. 3. The copy control for digital contents, such as DVD copy control, and the copy control for replica machines, such as the copy control for copy machine and scanning machines.
In this section, we introduce the applications for authentication described above.
3.1
Management of Digital Contents
The fast growth of the Internet and advances in digital content compression technologies have greatly promoted the distribution of digital contents over networks. However, there are some factors as described by the Content ID Forum (cIDf Japan) preventing the distribution of digital contents as follows. (1) There is no method to monitor illegal copying of contents, which creates concerns about distribution over the network. (2) There is no method to conduct efficient searches of a desired content from worldwide contents and a formal procedure for purchasing content. (3) There is no method to obtain copyright information on a content that a user would like to reuse. (4)There is no method to tell whether a digital content is original or not. Therefore, it is necessary and urgent to have a management of digital contents.
Watermark for Industrial Application 659
To address the problems mentioned above, cIDf has put forward a proposal of Content ID in order to provide a strong mechanism for the management of digital contents to promote the digital content commerce over networks. Digital - Content
Overall structure of Content ID
4
e Unique Code
+
-
t-Distributed Content Descriptor (DCD)
DB Registration
-+
P
Figure 5. The overall structure of Content ID
By the definition of cIDf, the Content ID is a set of attributes that realize the content management needed for content distribution. The structure of Content ID is a two-layer watermarking which includes a unique code, called ID management center number, and an attribute related to copyrights. Figure 5 shows the overall structure of Content ID formulated by cIDf As showed in Figure 5 , the Content ID consists of two parts, the ID center management number (Unique Code) and the Distributed Content Descriptor (DCD). The Unique Code consists of 16 bits header code and the number with
660
2. Liu and A . Inoue
an arbitrary bit length assigned by an ID management center to the content. The 16 bits header code includes version number, region code, and ID management center number. The DCD includes only attributes that will never be renewed after publication. With the accomplishment of Content ID by cIDf, the major services that can be provided by Content ID have the hnctions as follows:
Copyright function. Enables anyone to get copyright-related mformation on a particular content and attribute mformation such as date of creation. Content -attribute search function. Provides searching and retrieving of contents under the uniform standards. Barcode function. Enables efficient collection of content distribution history, content sales hstory, and other content-related information across all agents and throughout the commercial world. Fraudulent-use detection function. Enables the fiaudulent use of content on the network to be detected using Content ID as a key. Also enables users to examine usage rules of the content, which will provide mechanisms to ensure observance of those usage rules. Certification function. Enables content to be certified as ffee ffom alteration and tampering after being issued by the official author. Editing-history reference function. Enables viewing of edit history of content stored on an IPR database. Database common-key function. Simplifies searchmg and mutual referencing through common identification codes assigned when constructing digital archives.
Watermark for Industrial Application
661
5. Permission
I . Contcnt crcator
6. Rcqucst for issuc of ID
2. Rcqucst for registration
9. Dclivcry
-
4. Registration of hark attributes
L 3. Issue of partial
7. Issue full conk
8. Ilegistration of distribution nttributcs
IPR-DR
Content ID inanagemenl center
Figure 6. A flow of content processing
Figure 6 shows a flow of content processing for Content ID, which is completed by the steps as follows. 1. Content ID is issued by the Content ID Management Centers, which can be operated by content holders, content creators, or even third parties, under the supervision of the world Registration Authority (RA). 2. A unique Content ID is embedded into a content using a header andor digital watermark. Parts of attribute information can also be embedded into the content as Distribution Content Descriptors (DCD). 3 . The full set of content attribute information is stored in databases (IPR-DB) managed by Content ID Management Centers by using Content ID as a key.
662 2. Liu and A . Inoue
3.2
Authentication for Certification Matters
The second application for authentication is watermarlung for certification matter such as stock, bond, bill, coupon, and etc. One of major characteristics of this kind application is that the content to be detected is a printed material, not a digital one. In other words, the techtuques used for extracting embedded watermark from printed material are more difficult than traditional watermarking techniques used for digital contents. In recent years, with the development of watermarking technology, there are more and more applications for authentication of certification matter using watermarking technology. Classified by the objectives of the applications, there are roughly two kinds of authentication applications for certification matters. The type I is authentication for certification matter such stock, bond, bill, and coupon. The characteristic of t h s kind certification matter is that the object of watermarking is a designed pattern or text document. The type I1 is authentication for certification matter such the photos printed in ID Card, passport, and the image used for brand mark etc. The characteristic of this kind certification matter is that the embedded watermark is extracted from printed image. Therefore, the watermarking methods used for these two kinds of certification matters are different technically.
3.2.1 Watermarking for Certification I Embedding watermark into design. One of watermarking techtuque used for type I certification matter such bond, bill, coupon, and etc. is the method of embedding watermark into design pattern printed on the certifications. Figure 7 shows an example of embedding watermark into the design of a coupon. The watermark information is embedded into the design by making use of the structure characteristics of the waves on the design, such as the interval of the wave lines and the difference in form of the waves. The watermark detec-
Watermark f o r Industrial Application 663
tion can be completed by capturing an image of wave pattern using scanner or digital camera, and then, extracting the watermark from the re-digitalized image.
This coupon is good for
FIVE DOLLARS OFF your next purchase at the
Global E-Shop Expires: December 20. 2000
Figure 7. An example of embedding watermark into the design of a coupon
Embedding watermark into noise pattern. The second watermarking technique used for type I certifications is the method of embedding watermark into a noise pattern printed on the back of certifications. Figure 8 shows an example of embedding watermark into noise pattern printed on the back of coupon showed in Figure 7. As showed in Figure 8, the watermark containing 64 bits information is embedded into a noise pattern with a block size of 128 and 128 pixels by using pseudo-random sequences, and then, the watermarked noise pattern is printed on the back of the coupon block by block repeatedly over all the back of the coupon. The watermark detection can be completed by capturing an image of noise pattern with a block size of short-dot line on the back of the coupon using scanner or digital camera, and then, extracting embedded watermark from the re-digitalized image. One of the characteristics of watermarking using noise patterns is that if the printed pattern is blurred by some dirties such as signature and etc.,
664
2. Liu and A . Inoue
like the block image of long-dot line in Figure 8, the embedded watermark can be still extracted correctly, because of correlation property of pseudo-random sequences, by which the correlation of embedded watermark can be extracted as long as the data over 70% remain correctly.
Figure 8. An example of embedding watermark into noise pattern printed on the back of the coupon shown in Figure 7
Embedding watermark into texture. The other watermarking technique used for type I certifications is the method of embedding watermark into the texts printed on the matters, such as characters, tables, and seal impression etc. However, this method is not as robust as the methods using design patterns and noise patterns described above in type I certifications. Furthermore, it cannot embed as much data as in the printed material compared with the methods using design patterns and noise patterns as well.
3.2.2 Watermarking for Certification I1 Embedding watermark into photo. One of watermarlung techniques used in type I1 certifications, such as photos printed in an ID Card, passport, and etc., is the method of embedding watermark into the photo printed on the subjects using traditional digital watermarking
Watermark for Industrial Application 665
techniques. Figure 9 shows an example of embedding watermark into the photo printed on a company ID Card. The watermark detection can be completed by capturing the photo from ID Card through a special card-reading machme, and then, extracting the watermark fiom the re-digitalized photo. The same t e c h q u e can be used for cases of passport, and etc.
M.Ken Co., Ltd. Multinwlia Reevrh Latuatuy
Zheng Liu, Ph.D. Qxating office D i r d u d Fbfu-ch Di\n’sm Nichiwa Bldg., 31-1 Moloyoyogicho, Shibuya-ku. Tokyo 151-0062, Japan Tel. +El-3-3468-6520 Fax. +El-3-3468-8521 [email protected] httpllwww.mken.co.jp
M.ken Zheng Liu # A990401
Figure 9. An example of embedding watermark into photo printed on the ID Card
The second example of embedding watermark into image for type I1 certifications is its uses in commercial brands and labels called as “watermark brands” or “watermark labels”. Figure 10 shows an example of watermark brand with watermark embedded in the brand image. The watermark detection can be completed by capturing a brand image through a hand-scanner or a special watermark brand reading machine, and then, extracting the watermark from the redigitalized image. The watermark brand has the major characteristics as follows. First, watermark brand contains the authentication
666
2. Liu and A . Inoue
information compared with the general paper brand and label. Second, the cost of watermark brand is very low compared with the general hologram brand.
I Brand Name: WM-Brand NO: B20021205300
Figure 10. An example of watermark brand with a watermark embedded in the brand image
However, one major problem in uses of watermark brand is how to prevent the counterfeit of watermark brand. Because watermark brand technique requires high robustness by which the embedded watermark in brand image can be extracted from re-digitalized image by scanner or digital camera, the watermark with the same robustness can still remain in the counterfeit image copied by copy machine or scanner.
3.3
Authentication for Copy Control
Authentication for copy control is third important application of watermarking technology for authentication. Currently, there are two kinds of applications in copy control for copyright protection. One is the copy control for digital contents such as DVD-WDVD-RW and CD-RED-RW etc. The other is the copy control for replica
Watermark for Industrial Application
667
machmes such as copy machme, scanner and etc. In the following part, we will introduce these two kinds of applications respectively.
3.3.1
Copy Control for DVD
At present, the prevention of illegal copymg for DVD-R disks is made by using content-scrambling system (CSS) techtuque to encrypt and play back movies. But with the appearance of recordable DVDs and other digital recoding equipment such as digital tape recorders and personal computers with large storage capacity, a new copy-protection means is needed to prevent illegal copies of the copyrighted digital content. One of promising technologies to meet above requirements is digital watermarking technology. Because the watermark donnation is mixed into the video signal directly, it is difficult to be removed fkom the signal without damaging the quality of contents. In June 1997, a technical subgroup called DataHiding Sub-Group (DHSG) was formed under the Copy Protection Technical Working Group (CPTWG) to evaluate the technical feasibility of a total of eleven watermark technology proposals. The intention of CPTWG is to choose a watermarking scheme from the proposals as a standard technique for DVD copyright protection. At the beginning, 11 proposals of watermarking technologies were submitted, but 2 of them were not relevant. Thus 9 proposals were examined. In August 1999, after the examination and the some combination of proposals, only 2 proposals were remained. One is the method proposed by Millennlum Group, which consists of three companies, Macrovision (Macrovision USA), Digimarc (Digimarc USA), and Philips ( P u p s Netherlands). The other is the method proposed by Galaxy Group, whch consists of five companies, IBM (IBM Japan), NEC (NEC Japan), Pioneer (Pioneer USA), Hitachi (Hitachi Japan), and Sony (Sony Japan). These two group’s propos-
668
2. Liu and A . Inoue
als were evaluated with respect to various qualities, such as transparency, reliability, and survivabhty. The basic consideration for DVD copy control using watermarking technology is to embed at least 2 bits information called Copy Control Information (CCI) into the video signal. According to requirements formulated by CPTWG, all possible video content should fall into one of four CCI states as described in Table 3. For copy control operation, the embedded CCI is detected fi-om the video signal by a watermark detector attached to recording devices, and a corresponding operation occurs responding to the requirement of CCI. Table 3. According to requirements formulated by CPTWG, all possible video content should fall into one of four categories
I
CCI State Free Copy Copy Never One Copy Allowed Copy No More
I
Definition No copy restrictions whatsoever. No copies allowed One generation of copies may be made The copy state of a recording after being copied a first generation
I
The major advantage of the copy control using watermarking technology is that the embedded CCI will survive even if the video content is transmitted through the analog video channel, recorded to video cassette, or re-digitized. On the contrary, the digital encryption schemes such as the CSS techniques cannot extend its protection over the analogy video channel. There are various strict requirements for this kmd of application. First, watermark detectors must be built into millions of low-cost devices. Second, watermark detectors must work at video rates. Moreover, since the DVD standard employs MPEG coding, the wa-
Watermark for Industrial Application 669
termarking scheme must work well with MPEG. There are 13 essential technical requirements for copy control system described in the Call for Proposal issued by DHSG (DHSG, 1997) as expressed in Table 4. Table 4. The essential technical requirements for copy control system described in the Call for Proposal issued by DHSG
I No. 1 1
2 3
4 5
6
7 8 9
1I
Items Transparency Low cont digital detection Digital detection domain Generational copy control for one copy Low false positive detection Reliable detection Watermark will survive normal video processing in consumer use Licensable under reasonable t e r n Export/Import Technical maturity Data payload Minimum imnact on content DreDaration
I
~~
10 11 12 13
I Data rate
I
Although, the two group’s proposals have been evaluated with respect to various qualities, i.e., transparency, reliability, and survivability, so far, these two proposals still don’t meet the requirements of DHSG.
670
2. Liu and A . Inoue
Scanner with copy control
iter
ComDuter
c a””
Copy machine with copy control
paper
Function 1: copy permission copylscanning permission The document with watermark: The document without watermark: copykcanning denying Function 2: Watermark embedding Embedding the information for document tracing.
0Watermarked document @ Printed watermarked document
0Watermarked document @ Scanned watermarked document @ Copied watermarked document @ Copied watermarked document (digital) Figure 11. An example of document management system with copy control for replica machines
Watermark for Industrial Application
671
3.3.2 Copy Control for Replica Machines Copy control for replica machtne such copy machine and scanner is another interesting application for authentication. With the rapid development of the technology for replica machme, the authentication of permission for document copy (scanning) becomes an urgent task of watermarkmg technology because of the easiness for reproducing digital content. Therefore, in recent years, many manufactures of replica machmes have to consider adding the fknction of copy control using watermarking technology to replica machines. Figure 11 shows an example of document management system with copy control functions for replica machmes. In Figure 11, both copy machme and scanner are installed with the document management functions. Generally, there are two fundamental functions for copy control. One is the control for copy permission by which the copyright is authenticated by watermark detection. If the document was watermarked with correct mformation, the operation for copy or scanning is permitted, or otherwise, the operation is denied. The other is to embed the mformation automatically for document tracing such as the operator’s ID, date, and etc., into the copied or scanned document. With the document management functions using watermarking technology, it is possible to protect the copyright of document and to trace the replica document effectively.
4
Applications for Data Hiding
Data hiding is the thud lund application of digital watermarlung. There are generally two kinds of applications for data hiding as follows. 1. Embedding the data used as linking information into digital contents, by which users can link digital contents or printed materials
672
2. Liu and A . Inoue
directly to a place where users can find more information about the contents or printed materials.
2. Embedding the data used as retrieval information into digital contents, by which users can retrieve or classifjr the contents effectively. In this section, we introduce some representative data hding applications for information linking and information retrieval, respectively. As mentioned in Introduction, the data hding introduced in this chapter is substantially different from those used for steganography, although they all try to hide some information into digital contents. The reason for it is that the work of steganography is to transmit information secretly by hding the information into digital contents, while the work of data hding introduced in this chapter is to hide information used as reference information into digital contents.
4.1
Data Hiding for Information Linking
With the rapid development of the network and computer techniques, the Internet has become a global digital library, through whch we can get almost all the lnformation we need whenever and wherever just by accessing a computer in a small corner. The data hiding for information linking can play an important role in llnlung users to t h s global digital library. There are various applications for information linking using watermarking technology. Figure 12 shows examples of information linking using watermarking technology. The printed material used for data hiding includes those as follows.
Photos. For example, the photos of noted places can be embedded with data connecting to the website where users can get more information about those places.
Watermark for Industrial Application 673
Book. For example, the photos of characteristics and some famous places printed on some pages of book can be embedded with data connecting to the website where readers can get more information about them. Journal. Besides the photos, for example, the advertisements printed on journals can be embedded with data connecting to the website too, where readers can investigate information in detail and make the registration online. Other materials. Other printed materials, such as tickets, gift certificates, and etc., from which users hope to acquire more information related to them.
Figure 12. Examples of information linking using watermarking technology
674 2.Liu and A . Inoue
With the development of computer technology and lvgh technology such as digital camera, and cellular phone etc., there are many means by which we can capture the image from printed materials and connect to the related website effectively by extracting embedded data f’rom captured image.
Digital camera. As showed in Figure 12, the data embedded in the printed matters can be extracted by using a digital camera connecting to a computer. The process of the mformation hnk is completed as follows. (1) Capturing an image of printed material by using digital camera and sending the image to a computer that is connected to the Internet. (2) Extracting the data automatically from the image of digital camera by detection soRware installed in the computer. (3) Connecting to the website automatically by using extracted data and showing the related information on the screen of the computer. One of interesting products is MediaBridge delivered by Digimarc (Digimarc USA), by which users can capture the picture with a symbol ltke “D” printed on the pages of the journals simply by using a digital camera connected to a computer, and then, get into the website related to that picture automatically.
Cellular Phone. The function of cellular phone has improved greatly recently by which users can capture an image with high quality. As showed in Figure 12, the data embedded in printed material can be extracted by using a cellular phone. The process of the mformation link is completed as follows. (1) Capturing an image of printed material by using a cellular phone and sending the image to a management center. (2) Extracting the data automatically from the image of cellular phone by the management center. (3) Connecting the cellular phone to the website of the data automatically by using extracted data and showing the related mformation on the window of the cellular phone.
Watermark for Industrial Application
4.2
675
Data Hiding for Information Retrieving
The development of the computer technology made it easy to construct a Date Base containing an immense of digital contents. However, There are two tedious work for constructing and using Data Base. One work is to classify the original materials for Date Base, and the other work is to retrieve the information for some uses contained in Date Base. For classification, we generally have to classify original materials by human judgment. For information retrieval, we generally have to make use of the means of sensitivity retrieval techniques. Therefore, both of the work are much time consuming and inefficient. With the data hiding using watermarking technology, the efficiency for both classification and retrieval can be highly improved. Detected Watermark: Human Animal House Mountain
Input Key word: Human Animal House Mountain
IPI
Chssificatio n System
Retrieval System
Datc Bas( Figure 13. An example of an image Date Base using watermarking technology for classification and retrieval
Figure 13 shows an example of still image Date Base using watermarking technology for its classification and retrieval. The classification for original images can be easily completed by reading the attribute information embedded in the original images in advance. And in the same way, the retrieval of the images from the Date Base
676
2. Liu and A . Inoue
can be completed by inputting an attribute message used as a key word into retrieval system, and then, the image corresponding to the key word message can be extracted automatically.
Applications of Watermarking Technology
2.1.1 Of MarkType 2.1.2 Watermark of Literal Type 2.1.3 Standardization for Robustness Requirements
I
I
y I
System
2.3 Copyright Administration
.
Applications for Authentication
I
I
I
3.1 Management of Digital Contents 3.2 Authentication
y Applica;ti;;r I I
3.2.1 Watermarkingfor Certification I 3.2.1 Watermarking for Certification I1 3.3.1 Copy Control for DVD 3.3.2 Copy Control for Replica Machines
,
Data
4.1 Data Hiding for Information Linking 4.2 Data Hiding for Information Retrievina
Figure 14. Family tree for applications of digital watermarking technology
Watermark f o r Industrial Application 677
5
Conclusions
We have reviewed the applications of digital watermarlung technology for copyright protection, authentication, and data hiding, respectively. Figure 14 shows a family tree for all of the applications introduced in t h s chapter. Among them, many of the applications have their commercial products in the market at present, but some of them are just proposals under consideration. With the mature in digital watermarking technology and application environments in the near future, it is possible for all of them to have their commercial products in the market. Note that, for introducing the applications of watermarking technology, we roughly divided the applications into three categories as showed in Figure 14. Actually, it is difficult to have a boundary to classifjl them clearly. For example, the application for the management of digital contents described in Section 3.1 includes the work of a h s t r a t i o n of copyright described in Section 2.3 too. Similarly, although the major duty of copy control for digital contents and replica machmes is to authenticate the qualification and right of operators, they also cover the range of copyright protection as well.
In recent years, watermarking technologies have developed very fast, and at the same time, more and more demands for applications of watermarking technology are presented. However, as mentioned previously, there are still some problems as follows that extremely h t the practical uses of watermarlung technology.
Limitation of watermarking technology. Ths issue belongs to the technical problems of digital watermarking. Because the space in all the contents where watermarlung t e c h q u e s can be useful is very h t e d , it is impossible for us to use many kinds of watermarking techniques for some applications ambitiously. Therefore, the choice for some technical decisions, such as the balance of the robustness
678
2. Liu and A . Inoue
and content quality, the balance of the cost and the functions, are very critical.
Administration of watermarking technology. This issue belongs to the non-technical problems of watermarlung technology. As mentioned in Section 2.3, the watermark itself does not originally provide any legal mformation of ownership, i.e., without being registered to a trusted agent, the watermark embedded by individuals or some associations could be invalid in law. Furthermore, it is difficult to distinguish who watermarked the content first by current watermarking techmques. Therefore, all of them require having a united administration for copyright protection. Standardization of watermarking technology. The issue belongs to the technical problems of digital watermarking too. In order to have a united administration for watermarlung technology, there should be a standard in both watermarking technology and watermark format such as the JPEG and PEG2000 standard for digital image compression, the MPEG standard for video compression, and MP3 for digital audio compression. Compared with the problem of limitation for watermarlung technology, the problems of administration and standardization mentioned above are more important issues for watermarking applications. In other words, although the limitation of watermarking technology is a major challenge for watermarking technology, however, without the accomplishment of standardization and a h s t r a t i o n for watermarking technology, it is impossible for us to apply watermarking technology for its applications regularly. Finally, even though digital watermarking technology is still in its immature stage and needs more research not only in t e c h q u e s but also in many other issues such as the standardization of the technique, the administration of copyright, and the law concerned etc., many watermarking technologies have been in their commercial use.
Watermark for Industrial Application 679
The reason for it is that the bitter experience on data pirating has made customers eager to protect their intelligent copyright by choosing watermarking technology. Anyway, no matter when the times of a mature condition for applications of watermarking technology come, the applications of watermarlung technology will be always challenging topics in this age of digitalization and inforrnation.
References Chechmark (200 l), http://watermarking.unige.ch/Checkmark/ cIDf (Japan), http://www.cidf.org/english/index.html CPTWG, http://www.cptwg.org/ DAVIC, http ://www.davic.org/ DAWN 2001,
http ://www.jasrac. or .jp/profile/business/dawn/releasee.htm Digimarc (USA), http://www.digimarc.com Hitach (Japan), http://www.htach.com/ IBM (Japan), http://www .trl.ibm.com/extfnt-e. htm JASRAC (Japan), http://www.jasrac.or.jp/ejhp/index.htm Macrovision (USA), http://www.macrovision.com/ MarkAny (Korea), http ://www .markany.com/eng/default.htm M.Ken (Japan), http ://www .d e n .co .jp/
680
2. Liu and A. Inoue
NEC (Japan), http://www.nec.com/ Philips (Netherlands), http://www.philips.com/ Pioneer (USA), http://www.pioneer.com/
SDMI, http://www.sdmi.org/ Sony (Japan), http://www. sony.com/ 1006,html STEP 2000, http://www.nri.co.jp/english/news/200O/OO 19.htm STEP 200 1,http://www.jasrac.or.jp/ejhp/news/lO Stirmark (1997), http://www.cl.cam.ac.uM-mgk25/stirmark.htm Verance (USA), http://www.verance.com/index.html
APPENDIX
This page intentionally left blank
Appendix A: VQ-Based Scheme I The VQ-based watermarking system mentioned in Chapter 7 is attached in the CD-ROM and explained here. This demo system provides the functions for embedding and extraction. Section 1 illustrates how to use the provided functions and Section 2 explains the source files of it.
How to Use The System
1
In the attached CD-ROM, the executable program, test images, VQ codebooks, some data files, and the source files of the system all can be obtained from the related directories. Table 1 lists the content of the related directories and Table 2 lists the specification of the VQ codebooks. Table 1. Content of the directories (‘X’ is the CD-ROM drive).
Items Executable Program Source Files Test Files
Path X:\A\Exec X:\A\Source X:\A\Data
Table 2. Specification of the attached codebooks.
File r lit: Name IY ant: 128 256 512 I512
L L L P
L4.cb L4.cb L4.cb L4.cb
Size 3lZe
1
128 256 512 512
Used Image for
iviernoa I I Codebook Training 1 Method
I I
Lena Lena Lena Peppers
A-3
1 1
LBG LBG LBG LBG
Threshold nreSnola I 0.0001 0.0001
I
0.0001 0.0001
A-4
Appendix A
Please note, the attached demo program does not require the installation process, and it can be executed under the Windows operating system only. To operate the demo program, please click the executable program, then the main form like Figure 1 will appear. The main functions provided in this demo program are also shown in Figure 1.
Extract
J
I
Figure 1. Main form of the demo system.
1.1
Embedding Procedure
To execute the embedding procedure, please click the “Embed Watermark” option from the “Run” menu or click the related icon on the tool bar on the main form (see Figure 1). After that, the form shown in Figure 2 will appear. Table 3 lists the specification of the input and output files. Please follow it to assign the suitable files for the embedding procedure.
Appendix A
A-5
Figure 2. Form for embedding. Table 3. Specification of the inputloutput files for the embedding procedure.
Items Original Image Watermark Watermarked Image
Image Size 512x512 1 2 8 128 ~ 512x512
Color Gray Binary Gray
File Format BMP BMP BMP
The steps for the embedding procedure are: Step 1: Assign a gray BMP file as the original image, such as the image of Lena in Figure 2. Step 2: Assign a binary BMP file as the watermark. Step 3: Assign a VQ codebook.
A-6
Appendix A
Step 4:If the pixel permutation procedure for the watermark is needed, please check the option. Step 5: Assign the file name for the watermarked image. Step 6: Click the “Embed” button to execute the embedding procedure. After few seconds, the watermarked result will be displayed on the form. We suggest the reader to compare the VQ reconstructed image and the watermarked one.
1.2
Extraction Procedure
To execute the extraction procedure, please click the “Extract Watermark” option fiom the “Run” menu or click the related icon on the tool bar (see Figure 1). After that, the form shown in Figure 3 will appear. Table 4 lists the specification of the input and output files. Please refer to it to assign the suitable files and follow the steps below to execute the extraction procedure. Table 4.Specification of the input/output files for the extracting procedure.
Items Watermarked Image Extracted Watermark
Image Size 512x512 128x128
Color Gray Binarv
File Format BMP BMP
Appendix A
A-7
Figure 3. Form for extracting.
Step 1: Assign an image which contains the watermark signal, such as the one in the right-hand side of Figure 3. Step 2: Assign a VQ codebook, which should be the same as the one in the embedding procedure. Step 3 : Select if the pixel permutation procedure for the watermark is needed. In the embedding procedure, if this option was selected, please check it also, or the extracted result will not be correct. Step 4: Assign the file name for the extracted watermark. Step 5 : Click the “Extract” button to run the whole extraction procedure.
A-8 AppendixA
Few seconds later, the extracted watermark will be displayed on the form. An example of the extracted watermark is shown in the lefttop corner of Figure 3.
1.3 Evaluating Functions In t h s demo program, it also provides some evaluating functions (PSNR, MSE, NC, and BCR) for the images in BMP file format. (The PSNR and MSE functions can be applied upon gray-valued images, and the NC and the BCR functions can be applied upon binary-valued images.) The defhtions of them are listed below. Mean Square Error (MSE):
Peak Signal to Noise Ratios (PSNR): PSNR(X,Y) = 10Xlog Normalized Correlation (NC):
Appendix A
A-9
Bit Correct Rate (BCR):
In the above equations, X and Y are two images with the same size and X = { x ( i , j )10 I i < M,O I j
MXN,
To apply either of the evaluating fbnctions for two images, please click the related icon (see Figure 1) on the main form. After the form like Figure 4 appears, please follow the steps below to execute the evaluating procedure. Please note: the MSE and PSNR fbnctions are designed for grayvalued BMP files only, and the NC and BCR hnctions are designed for binary-valued BMP files only.
Figure 4. Form for evaluating.
A-10
Appendix A
Step 1: Assign the file names of the two images. Step 2: Select the function. Step 3: Click the “Go” button to calculate the selected function for the two images. The calculated result will be displayed on the form and on the main form.
2
Source Codes
The attached demo system in the CD-ROM was designed and compiled with Borland C++ Builder 5.0. Table 5 illustrates each of the member files in the Borland C++ Builder project. To modify or compile the source codes of the system, please copy the files from the CD-ROM to your hard disk, and please remember to disable the read-only attribute of all the files. In this section, only the main subroutines and functions of the source files will be listed and illustrated. Others please refer to the comments in each source file. Table 5 . Content of the source files in this system.
Appendix A
2.1
A-11
Embed.cpp
The subroutines and functions for the embedding procedure are defined in this file. The main functions of them are listed below. void BtnEmbedClick(T0bject *Sender); Description: Execute the embedding procedure. Parameter(s) : TObject *Sender: the object which calls this routine. Note: When the "Embed" button on the form is clicked, this routine will be executed. bool CheckInputInfo(void); Description: Check the input information for the embedding procedure. Return Value: Return false fi the input information is not complete; otherwise, return true. bool Init (void); Description: Read the data from the assigned input$les. Return Value: If the inputJiles cannot be opened correctly, return false; otherwise, return true. void Embed (void); Description: Embed the watermark bits into the obtained VQ indices.
A-12 Appendix A
bool SaveResults (void); Description: Save the watermarked image. Return Value: If the output image cannot be saved, return false; otherwise, return true. void ShowResults (TObject *Sender); Description: Display the output results of the embedding procedure. Parameter (s): TObject *Sender: the object which calls this routine.
2.2
Extract.cpp
For the extracting procedure, the subroutines and hnctions are defined in this file. The main ones of them are listed below.
void BtnExtractClick(T0bject *Sender); Description: Execute the extracting procedure. Parameter (s): TObject *Sender: the object which calls this routine. Note: When the "Extract" button on the form is clicked, this routine will be executed.
bool CheckInputInfo (void); Description : Check the input information for the extraction procedure. Return Value: Return false if the input information is not complete; otherwise, return true. bool Init (void);
Appendix A
A-13
Description: Read the data from the assigned inputfiles. Return Value: If the inputfiles cannot be opened correctly, return false; otherwise, return true.
void Extract (void); Description: Extract the watermark bits from the obtained VQ indices. boo1 SaveResults (void); Description: Save the extracted watermark for the extracting procedure. Return Value: If the extracted image cannot be saved successfully, return false; otherwise, return true.
void ShowResults (TObject *Sender); Description: Display the output results of the extracting procedure. Parameter($: TObject *Sender: the object which calls this routine.
2.3
Evaluate.cpp
T h s file defines the evaluating functions, which can be applied for evaluating the performance of the watermarking system, for BMP images.
void BtnGoClick (TObject *Sender) ; Description : Execute the selected evaluatingfunction. Parameter($: TObject *Sender: the object which calls this routine. Note:
A-14
Appendix A
This routine will be executed while the “Go” button on the form is clicked. boo1 CheckInputInfo (void); Description: Check the input information f o r the extraction procedure. Return Value: Return false fi the input information is not complete; otherwise, return true. void Init (String func); Description : Set the related info on the form for the selected function. Parameter($: String func: the name of the evaluating function. The available function names are “PSNR “MSE’’, “NC”, and “BCR”. ’I,
void ShowResults(doub1e outcome); Description: Display the output result on the form. Pa rameter(s) : double outcome: the calcualted result.
2.4
VQ-class.cpp
In t h s file, the hnctions for the standard VQ procedure are defined. The illustrations of them are listed below. int ReadCodebook(char *filename, unsigned char codebook [I [DIMENSION]) ; Description: Read the VQ codebook into memoly. Parameter($: 1. char *filename: the name of the codebookfile.
Appendix A
A-15
2. unsigned char codebook[l[DIMENSIONJ: the address of the bufler where the read codewords will be placed. Here DIMENSION defines the length of the codeword and the value of it is 16. Return Value: The codeword number of the codebook.
int ReadIndexFile(char *in-file, short int *out-buf) ; Description: Read the VQ indices from the input file and store them in the assigned memory. Parameter (s): 1. char "indile: the name of the inputfile. 2. short int "out-buf the address of the buffer where the read VQ indices will be placed. Return Value: The total number of indices.
boo1 SaveIndexFile(short int *in-buf, char *out-file, int size); Description: Save the VQ indices as a file. Param eter(s) : I . short int *in-buf: the address of the buffer where the VQ indices are stored. 2. char *outJle: the name of the outputfile. 3. int size: the total number of indices. Return Value: Return true if the output file can be generated successfully; otherwise return false.
void TableLookup (int size) ; Description: Execute the VQ table-lookup procedure.
A-16
Appendix A
Parameter(s): int size: the number of the vectors: Note: The table-lookup results will be placed in the default array.
void ReconstructVQImage(short int *index, usigned char codebook [I [DIMENSION], unsigned char out [I [DIMENSION]) ; Description: Execute the VQ table-lookup procedure to reconstruct the image by referring to the assigned VQ indices and codebook. Parameter(s) : 1. short int *index: the address of the buffer where the VQ indices are stored. 2. unsigned char codebook[][DIMENSIONJ: the address of the buffer where the codewords are stored. 3. unsigned char out[l[DIMENSION]: the address of the buffer where encoded data will be placed. boo1 ReconstructVQImage(short int *index, char *cb file, unsigned char out [I [DIMENSION]) ; Description : Execute the VQ table-lookup procedure to reconstruct the image by referring to the assigned VQ indices and the codebookfile. Parameter(s): I . short int *index: the address of the buffer where the VQ indices are stored, 2. char *cbJle: thefile name of the VQ codebook. 3. unsigned char out[J[DIMENSIONJ: the address of the buffer where encoded data will be placed. Return Value: Return false if the codebook file cannot be opened successfully: otherwise return true.
Appendix A
A-17
bool ReconstructVQImage(char *idx-file, char *cb-file, char *out-file); Description: Execute the VQ table-lookup procedure to reconstruct the image by referring to the assigned VQ indexfile and the codebookfile. Parameter (s) : 1. char *idx3le: the file name of the VQ index file. 2. char * c b 3 l e : thefile name of the VQ codebook. 3. char *outgle: thefile name of the output image. Return Value: Return false fi either of the input/outputfiles cannot be opened successfully; otherwise, return true.
void PDSSearch(unsigned char block[] [DIMENSION], short int *index, int block-amount); Description : Apply the PDF search to obtain a nearest codeword from the codebook f o r the input vectors. Parameter(s): 1. unsigned char block[J[DIMENSIONJ: the address of the buffer where the input vectors (or sub-images) are stored. 2. short int *index: the address of the buffer where the obtained indices will be stored. 3. int block-amount: the total number of the input vectors.
bool Search (unsigned char in [I [DIMENSION], char *codebook-file, short int *index, int block-amount); Description : Execute the nearest codeword search f o r each of the input vectors by referring to the assign codebookfile. Parameter@): 1. unsigned char blockfl[DIMENSION]: the address of the buffer where the input vectors (or sub-images) are stored.
A-18
Appendix A
2. char *codebookJle: thefile name of the VQ codebook. 3. short int *index: the address of the buffer where the obtained VQ indices will be stored. 4. int block-amount: the total number of the input vectors. Return Value: Return false if the codebookfile cannot be opened correctly; return true otherwise.
bool VQ(unsigned char in[] [DIMENSION], char *codebook file, short int *index, unsigned char out [DIMENSION], int block-amount) ; Description : For each of the vectors (sub-images) in the assigned input array, this routine will try to obtain a nearest codeword from the assigned codebook file and put it in the assigned output array. Parameter($ : 1. unsigned char infl[DIMENSIONJ: the address of the buffer where the input vectors (or sub-images) are stored. 2. char *codebook$le: thefile name of the VQ codebook. 3. short int *index: the address of the buffer where the obtained VQ indices will be stored. 4. unsigned char out[][DIMENSION]: the address of the buffer where the reconstructed vectors will be placed. 5. int block-amount: the total number of the input vectors. Return Value: Return false if the codebookfile cannot be opened correctly; return true otherwise.
[T
bool VQ(unsigned char in[] [DIMENSION], char *codebook file, short int *index, unsigned char out [I[DIMENSION], int block-amount) ; Description : For each of the vectors (sub-images) in the assigned input array, this routine will try to obtain a nearest codeword from the assigned codebookfile andput it in the assigned output array.
Appendix A
A-19
Parameter(s): I . unsigned char in[l[DIMENSION]: the address of the buffer where the input vectors (or sub-images) are stored. 2. char *codebook$le: the file name of the VQ codebook. 3. short int *index: the address of the buffer where the obtained VQ indices will be stored. 4. unsigned char out[][DIMENSIONJ: the address of the buffer where the reconstructed vectors will be placed. 5. int block-amount: the total number of the input vectors. Return Value: Return false if the codebookfile cannot be opened correctly; return true otherwise.
boo1 VQ class::VQ(unsigned char *in buf, char *codebook file, char *index file, char *out-image, int width, int height); Description : For the image stored in the assigned input array, this routine
will: 1. Divide it into many vectors (or sub-images). 2. Obtain the nearest codewords f o r all of the vectors by referring to the assigned codebookjle. 3. Execute the table-lookup procedure to reconstruct a output image. 4. Save the output image as the assignedfile. Param eter(s) : 1. unsigned char *buf the address of the buffer where the input imuge is stored. 2. char *codebook$le: the file name of the VQ codebook. 3. char *indexJle: the file name of the output indexfile. 4. char "out-image: the file name of the output image. 5. int width, height: the width and height of the output image. Return Value:
A-20 Appendix A
Return false ifeither of the inputloutputfiles cannot be openedhaved successfully, or return true otherwise.
bool VQ(char *in image, char *cb-file, char *index-file, char *out-image); Description : For the assigned image file, this routine will apply the VQprocedure upon it by referring to the assigned codebookfile, and save the obtained VQ indices and the reconstructed image as files with the assignedfile names. Parameter(s): I . char *in-image: thefile name of the input image. 2. char *codebook$le: t h e j l e name of the VQ codebook. 3. char *index$le: the file name of the output index file. 4. char *out-image: the file name of the output image. Return Value: Return false if either of the input/outputfiles cannot be openedhaved successfully, or return true otherwise.
2.5
Too1s.h
In this file, some tool finctions for image processing are defined. The illustrations of the main fbnctions are listed below:
bool ReadBinaryBMPImage(char *filename, unsigned char *buf, int *width, int *height); Description: Read the image data of a binary-valued B M P j l e into memory. Parameter($ : 1. char *filename: the name of the input BMPjle. 2. unsigned char *buf: the address of the buffer where the image data will be placed. 3. int *width: the address of a integer where the width of the image will be stored.
Appendix A
A-21
4. int *height: the address of a integer where the height of the image will be stored. Return Value: Return false if the inputfile cannot be opened correctly; otherwise, return true.
Note: 1. The buffer where the image will be stored should be created before calling this routine. 2. The size of the output buffer should be larger than width *height bytes (one byte f o r one pixel). 3. This routine does not check fi the size of the buffer is large enough to store the read data. 4. This routine does not check if the input BMPfile is a binary one. bool SaveBinaryBMPImage(unsigned char *buf, char *filename, int width, int height); Description: Save the input data as a binary-valued BMPfile. Parameter($ : 1. unsigned char *bu$ the address of the buffer where the image data are stored. 2. char *filename: the name of the output BMPfile. 3. int width, height: the width and height of the image. Return Value: Return false if the output file cannot be generated correctly; 0thenuise, return true. Note: Each binary pixel in the input buffer should be stored in one byte, which means the size of the input buffer is width*height bytes, not width *height/8 bytes. bool ReadGrayBMPImage(char *filename, unsigned char *buf, int *width, int *height);
A-22 Appendix A
Description : Read the image data of a gray-valued BMPfile into memory. Parameter($): 1. char *filename: the name of the input BMPfile. 2. unsigned char *bufi the address of the buffer where the image data will beplaced. 3. int "width: the address of a integer where the width of the image will be stored. 4. int *height: the address of a integer where the height of the image will be stored. Return Value: Return false ifthe inputfile cannot be opened correctly; otherwise, return true. Note: 1, The buffer where the image will be stored should be created before calling this routine, and the size of it should be large enough. 2. This routine does not check if the size of the buffer is large enough for storing the read data. 3. This routine does not check fi the input BMPJile is in grayvalued.
boo1 SaveGrayBMPImage(unsigned char *buf, char *filename, int width, int height); Description: Save the image data as a gray-valued BMPJile. Parameter($: 1. unsigned char *bufi the address of the buffer where the image data are stored. 2. char *filename: the name of the output BMPfile. 3. int width: the width of the image. 4. int height: the height of the image. Return Value:
Appendix A
A-23
Return false fi the outputJile cannot be generated correctly; otherwise, return true.
void BlockDividing(unsigned char *in, unsigned char *out, int img-len, int blk-len); Description: Divide the input image (gray-valued) into many sub-images. Parameter($ : 1. unsigned char "in: the address of the buffer where the input image is stored. 2. unsigned char *out: the address of the buffer where the divided sub-images will be placed. 3. int img-len: the length of the input image. 4. int blk-len: the length of the sub-image. Note: 1. The size of the input image is img-len*img-len pixels. 2. The size of the sub-image is blk-len *blk-len pixels.
void InverseBlockDividing(unsigned char *in, unsigned char *out, int img-len, int blk-len); Description: Piece together the sub-images to generate an output image (gray-valued). Parameter(.) : 1. unsigned char "in: the address of the buffer where the subimages are stored. 2. unsigned char "out: the address of the buffer where the recovered images will be placed. 3. int img-len: the length of the output image. 4. int blk-len: the length of the sub-image. Note: 1. The size of the input image is img-len "img-len pixels. 2. The size of the sub-image is blk-len "blk-len pixels.
A-24
Appendix A
void PixelPermutation(unsigned char *in,unsigned char *out, int *ref, int size);
Description: Execute the pixel permutation procedure for the input image (gray-valued) by referring to the referred indices. Parameter($ : I . unsigned char *in: the address of the buffer where the input data are stored. 2. unsigned char *out: the address of the buffer where the output data will be stored. 3. int *ref: the address of the buffer where the referred indices are stored. 4. int size: the total number ofpixels in the input data. Note: The total items of the referred indices should be the same as the amount of the pixels. boo1 PixelPermutation(unsigned char *&unsigned char *out, char *ref-file, int size);
Description: Execute the pixel permutation procedure for the input image (gray-valued) by referring to the referred indices. Parameter($ : 1. unsigned char *in: the address of the buffer where the input data are stored. 2. unsigned char *out: the address of the buffer where the output data will be stored. 3. char *ref$le: the file name of the referred indices. 4. int size: the total number of pixels in the input data. Return Value: Return false ifthe referred indexfile cannot be opened; otherwise, return true. Note:
Appendix A
A-25
The total items of the referred indices should be the same as the amount of the pixels.
void InversePixelPermutation(unsigned char *intunsigned char *out, int *ref, int size); Description: Execute the inverse pixel permutation procedure f o r the input data (gray-valued) by referring to the referred indices. Param eter(s) : 1. unsigned char *in: the address of the buffer where the input data are stored. 2. unsigned char *out: the address of the buffer where the output data will be stored. 3. int *ref the address of the buffer where the referred indices are stored. 4. int size: the total number of pixels in the input data. Note:
The total items of the referred indices should be the same as the amount of the pixels. boo1 InversePixelPermutation(unsigned char *in, unsigned char *out, char *ref-file, int size) ; Description : Execute the inverse pixel permutation procedure f o r the input data (gray-valued) by referring to the indexfile. Parameter($ : 1. unsigned char *in: the address of the buffer where the input data are stored. 2. unsigned char "out: the address of the buffer where the output data will be stored. 3. char *refJle: the file name of the referred indices. 4. int size: the total number ofpixels in the input data. Return Value:
A-26
Appendzx A
Return false if the referred indexfile cannot be opened; otherwise, return true. Note: The total items of the referred indices should be the same as the amount of the pixels. Long CalcEuclideanDis(unsigned char *blockl,unsigned char Xblock2, i n t size); Description : Calculate Euclidean Distance for 2 data streams (in grayvalued). Param eter(s): 1. unsigned char "bufl: the address of the buffer where the 1st data stream is stored, 2. unsigned char "buf2: the address of the buffer where the 2nd data stream is stored. 3. int size: the total number of elements in either of the input streams. Return Value: The calculated Euclidean Distance. Note: The size of either of the data streams should be the same as the size of another one. double CalcSED(unsigned char *bufl,unsigned char *buf2, int size) ; Description: Calculate square Euclidean Distance for 2 data streams (in gray-valued). Parameter (s): 1. unsigned char *bufl: the address of the buffer where the 1st data stream is stored. 2. unsigned char "buj2: the address of the buffer where the 2nd data stream is stored.
Appendix A
A-27
3. int size: the total number of elements in either of the input streams. Return Value: The calculated Square Euclidean Distance. Note: The size of either of the data streams should be the same as the size of another one.
int CalcHammingDis(unsigned char *blockl, unsigned char *block2, int size); Description : Calculate Hamming Distance f o r 2 binary data streams. Parameter($: 1. unsigned char *bufl: the address of the buffer where the 1st data stream is stored, 2. unsigned char *buf2: the address of the buffer where the 2nd data stream is stored. 3. int size: the total number of elements in either of the input streams. Return Value: Return the calculated Hamming Distance if the function is executed successfully; otherwise, return -1. Note: The size of either of the data streams should be the same as the size of another one. int CalcHammingDis(char *image1, char *image2);
Description: Calculate Hamming Distance f o r 2 binary BMP images. Parameter@): 1. char *imagel: thefile name of the 1st BMP image. 2. char *image2: theJile name of the 2nd BMP image. Return Value:
4-28
Appendix A
Return the calculated Hamming Distance i f the function is executed successfully; otherwise, return -1. Note: The size of either of the data streams should be the same as the size of another one. double CalcMSE(unsigned char *bufl, unsigned char *buf2, int size);
Description: Calculate mean square error (MSE)for 2 data streams (in grayvalued). Parameter@): 1. unsigned char *bufl: the address of the buffer where the 1st data stream is stored. 2. unsigned char *bug: the address of the buffer where the 2nd data stream is stored. 3. int size: the total number of elements in either of the input streams. Return Value: The calculated MSE value. Note: The size of either of the data streams should be the same as the size of another one. double CalcMSE(char *imagel, char *image2);
Description: Calculate mean square error (MSE) for 2 BMP images (in grayvalued). Parameter($ : 1. char *imagel: thefile name of the 1st BMP image. 2. char "image2: the file name of the 2nd BMP image. Return Value: Return the calculated MSE value i f thefunction is executed successfully; otherwise, return -1.0. Note:
Appendix A
A-29
The size of either of the data streams should be the same as the size of another one. double CalcSNR(unsigned char *bufl, unsigned char *buf2, int size);
Description: Calculate signal-to-noise ratio (SNR)for 2 data streams (in gray-valued). Parameter(s): 1. unsigned char *bufl: the address of the buffer where the 1st data stream is stored. 2. unsigned char *buf2: the address of the buffer where the 2nd data stream is stored. 3. int size: the total number of elements in either of the input streams. Return Value: Return the calculated SNR value. Note: The size of either of the data streams should be the same as the size of another one.
double CalcSNR(char *imagel, char *image2); Description: Calculate signal-to-noise ratio (SNR)for 2 BMP images (in gray-valued). Param eter(s): 1. char * h a g e l : the file name of the 1st BMP image. 2, char *image2: the file name of the 2nd BMP image. Return Value: Return the calculated SNR value fi the function is executed successfully; otherwise, return -1.0. Note: The size of either of the data streams should be the same as the size of another one.
A-30
Appendix A
double CalcPSNR(unsigned char *bufl, unsigned char *buf2, int size); Description: Calculate peak-signal-to-noise ratio (PSNR)for 2 data streams (in gray-valued). Param eter(s) : 1. unsigned char "bufl: the address of the buffer where the 1st data stream is stored. 2. unsigned char *buj2: the address of the buffer where the 2nd data stream is stored. 3, int size: the total number of elements in either of the input streams. Return Value: Return the calculated PSNR value. Note: The size of either of the data streams should be the same as the size of another one. double CalcPSNR(char *imagel, char *image2); Description: Calculate peak-signal-to-noise ratio (PSNR)for 2 BMP images (in gray-valued). Parameter($: 1. char *imagel: thefile name of the 1st BMP image. 2. char "image2: thefile name of the 2nd BMP image. Return Value: Return the calculated PSNR value if the function is executed successfully; otherwise return - I . 0. Note: The size of the input images should be the same. double CalcBCR(unsigned char *bufl, unsigned char *buf2, int size); Description : Calculate bit correct ratio (BCR)for 2 binary data streams.
Appendix A
A-31
Parameter($ : 1. unsigned char *bufl: the address of the buffer where the 1st data stream is stored. 2. unsigned char *buf2: the address of the buffer where the 2nd data stream is stored. 3. int size: the total number of elements in either of the input streams. Return Value: Return the calculated BCR value. Note: The size of either of the data streams should be the same as the size of another one. double CalcBCR(char *imagel, char *image2); Description: Calculate bit correct ratio (BCR) f o r 2 binary BMP images. Parameter($: I . char "imagel: the file name of the 1st BMP image. 2. char *image2: thefile name of the 2nd BMP image. Return Value: Return the calculated BCR value if the function is executed successfully; othenvise return -1.0. Note: The size of the input images should be the same. double CalcNC (unsigned char *buf1, unsigned char *buf2, int size); Description: Calculate normalized correlation (nC) f o r 2 binary data streams. Parameter($: 1. unsigned char *bufl: the address of the buffer where the 1st data stream is stored. 2. unsigned char "buf2: the address of the buffer where the 2nd data stream is stored.
A-32
Appendix A
3. int size: the total number of elements in either of the input streams. Return Value: Return the calculated NC value. Note: The size of the data streams should be the same.
double CalcNC(char *imagel, char *image2); Description: Calculate normalized correlation (NC) for 2 binary BMP images. Parameter($: 1. char *imagel: thefile name of the 1st BMP image. 2. char *image2: the file name of the 2nd BMP image. Return Value: Return the calculated NC value if the function is executed successfully; otherwise return -1.0. Note: The size of the input images should be the same. double CalcSimilarity(unsigned char *bufl, unsigned char *buf2, int size); Description: Calculate the similarity for 2 binary data streams. Parameter(s): 1. unsigned char *bufl: the address of the buffer where the 1st data stream is stored. 2. unsigned char *bu.: the address of the buffer where the 2nd data stream is stored. 3. int size: the total number of elements in either of the input streams. Return Value: Return the calculated similarity Note: The size of the data streams should be the same.
Appendix A
A-33
double CalcSimilarity(char *imagel, char "image2) ; Description: Calculate the similarity for 2 binary BMP images. Param eter(s): 1. char "imagel: the file name of the 1st BMP image. 2. char *image2: the file name of the 2nd BMP image. Return Value: Return the calculated similarity value if the function is executed successfully; otherwise return -1.0.
Note: The size of the input images should be the same.
void ShowOpenFileErrorMsg(char *filename); Description: Display a message box with the open-file-error message. Parameter(s): char *filename: The name of the file which has the problem to open it. void XOR(unsigned char *inl, unsigned char *in2, unsigned char *out, int size); Description: Execute XOR for 2 binary data streams. Parameter(s): 1. unsigned char * i d : the address of the buffer where the 1st data stream is stored. 2. unsigned char *in2: the address of the buffer where the 2nd data stream is stored. 3. unsigned char "out: the address of the buffer where the output data will be placed. 4. int size: the total number of elements in either of the input streams. Note: The input streams should have the same size.
This page intentionally left blank
Appendix B: VQ-Based Scheme I1 The VQ-based watermarking system mentioned in Chapter 7 is attached in the CD-ROM and explained here. This demo system provides the functions for embedding and extraction. Section 1 will illustrate how to use the provided functions and Section 2 will explain the source files of it.
1
How to Use The System
In the attached CD-ROM, the executable program, test images, VQ codebooks, some data files, and the source files of the system all can be obtained from the related directories. Table 1 lists the content of the related directories and Table 2 lists the specification of the VQ codebooks. Table 1. Content of the directories (‘X’ is the CD-ROM drive).
Items Executable Program Source Files Test Files
Path X:\B\Exec X:\B\Source X:\B\Data
Table 2. Specification of the attached codebooks. File Name 128 256 512 512
L GL4.cb L G4.cb L G4.cb LP L4.cb
Size 128 256 512 512
Image for Codebook Training Lena Lena Lena Lena + Peppers
A-35
Method
Threshold
GLA GLA GLA LBG
0.0001 0.0001 0.000 1 0.0001
A-36
Appendix B
Please note, the attached demo program does not require the installation process, and it can be executed under the Windows operating system only.
To operate the demo program, please click the executable program, then the main form like Figure 1 will appear. The main hnctions provided in this demo program are also shown in Figure 1.
Figure 1. Main form of the demo system.
1.1
Embedding Procedure
To execute the embedding procedure, please click the “Embed Watermark” option from the “Run” menu or click the related icon on the tool bar (see Figure 1). After that, the form shown in Figure 2 will appear. Table 3 lists the specification of the input and output files. Please follow it to assign the suitable files for the embedding procedure.
Appendix B
A-37
Figure 2. Form for embedding. Table 3. Specification of the inputloutput images for the embedding procedure.
Items Original Image Watermark
Output Image
Image Size 512x512 128x128 512x512
Color Gray Binary
Gray
File Format BMP BMP BMP
The steps for the embedding procedure are: Step 1: Assign a gray BMP file as the original image.
Step 2: Assign a VQ codebook. Step 3: Assign a binary BMP file as the first watermark. Step 4:Assign a binary BMP file as the second watermark.
A-38
Appendix B
Step 5: Assign a binary BMP file as the third watermark. Step 6 : If the pixel permutation procedure for the watermarks is necessary, please check the option. Step 7: Assign the file name for the output image. Step 8: Assign the file name for the first generated key. Step 9: Assign the file name for the second generated key. Step 10:Assign the file name for the third generated key. Step 11:Click the “Embed” button to execute the embedding procedure. After few seconds, the output image will be displayed on the form, and the generated three keys will be saved as the assigned file names.
1.2
Extraction Procedure
To execute the extraction procedure, please click the “Extract Watermark” option from the “Run” menu or click the related icon on the tool bar (see Figure 1). After that, the form shown in Figure 3 will appear. Table 4 lists the specification of the input and output files. Please follow it to assign the input and output files and the steps below to execute the extracting procedure. Table 4.Specification of the inputloutput images for the extracting procedure.
Extracted Watermark 1 Extracted Watermark 2 Extracted Watermark 3
512x512 128x128 128x128 128x128
Bina Binar Bina
BMP BMP
Appendix B A-39
Figure 3. Form for extracting.
Step 1: Assign an image which contains the watermark information, such as the one in Figure 3. Step 2: Assign a VQ codebook, which should be the same as the one used in the embedding procedure. Step 3: Assign the first key for extracting the first watermark. Step 4:Assign the second key for extracting the second watermark. Step 5: Assign the third key for extracting the third watermark. Step 6: Select if the pixel permutation procedure for the watermarks is needed. If this option was selected in the embedding procedure, please check it also, or the extracted result will not be correct.
A-40
Appendix B
Step 7: Assign the file name for the first extracted watermark. Step 8: Assign the file name for the second extracted watermark. Step 9: Assign the file name for the third extracted watermark. Step 10: Click the “Extract” button to run the whole extracting procedure. And few seconds later, the extracted watermarks will be displayed on the form.
1.3
Evaluating Functions
In this demo program, it also provides some evaluating functions (PSNR, MSE, NC, and BCR) for images in BMP file format. (The PSNR and MSE functions can be applied upon gray-valued BMP images, and the NC and the BCR functions can be applied upon binary-valued BMP images.) The illustration of how to use the functions has been mentioned in the earlier appendix; please refer to it if in need.
2
Source Codes
The attached demo system in the CD-ROM was designed and compiled with Borland C++ Builder 5.0. Table 5 illustrates each of the member files in the Borland C++ Builder project. To modifL or compile the source files of the system, please copy the files fiom the CD-ROM to your hard disk, and please remember to disable the read-only attribute of all the files. In this section, only the main subroutines and functions of the source files will be listed and illustrated. Others please refer to the comments in each source file.
Appendix B
Tabel Files ProjectB.* Main.* Embed.* Extract.* Evaluate.* VQ-class. * Tools.* About.*
2.1
A-41
5.
Content Project Information Shell of the system Functions for the embedding procedure Functions for the extraction procedure Functions for evaluation Functions for the VQ procedure Tool functions for image processing Information of this system
Embed.cpp
The subroutines and fbnctions for the embedding procedure are defined in this file. The main functions of them are listed below.
void BtnEmbedClick (TObject *Sender) ; Description : Execute the embedding procedure. Param eter(s) : TObject *Sender: the object which calls this routine. Note: When the "Embed" button on the form is clicked, this routine will be executed. bool CheckInputInfo (void); Description : Check the input information for the embedding procedure. Return Value: Return false fi the input information is not complete; otherwise, return true. bool ReadWatermark(char *filename, unsigned char *out, bool permute);
A-42
Appendix B
Description : Read the data of the watermark image and execute the pixel permutation procedure (PPP) upon the data ifneeded. Param eter(s): 1. char *filename: the file name of the watermark. 2. unsigned char *out: the address of the buffer where the read data will be placed. 3. bool permute: true f o r execute PPP upon the data, false for not. Return Value: If the input image cannot be opened correctly, return false; otherwise, return true.
bool Init (void); Description: Read the data from the assigned inputfiles. Return Value: I f the input files cannot be opened correctly, return false; otherwise, return true. void Embed(void); Description: Embed the watermark bits into the obtained YQ indices. bool SaveKey(unsigned char *in, char *filename); Description: Save the generated key as a file. Param eter(s) : I . unsigned char *in: the address of the buffer where the key data are stored. 2. char *filename: thefile name of the outputfile. Return Value: I f the outputfile cannot be saved correctly, return false; otherwise, return true.
Appendix B
A-43
boo1 SaveResults (void); Description: Save the output image and the generated keys. Return Value: Ifthe outputfiles cannot be saved, return false; otherwise, return true. void ShowResults (TObject *Sender); Description: Display the output results of the embedding procedure. Param eter(s): TObject *Sender: the object which calls this routine. float CalcIndexMean(short int *index, int block-no, int hO, int hl, int w0, int wl) ; Description: Calculate the mean value for a VQ index by referring to its surrounding indices. Parameter (s): 1. short int "index: the address of the buffer where the VQ indices are stored. 2. int block-no: the order of the vector (block) which we want to weight. 3. int hO: hO will be 0 fi the block to be weight is in the left edge of the input image; otherwise, hO is -1. 4. int h l : h l will be 0 if the block to be weight is in the right edge of the input image; otherwise, h l is -1. 5. int w0: w0 will be 0 if the block to be weight is in the top edge of the input image; otherwise, w0 is -1. 6. int wl: wl will be 0 fi the block to be weight is in the bottom edge of the input image; otherwise, wl is -1. Return Value: Return the calculated mean value of the index.
A-44
Appendix B
float CalcIndexVar(short int *index, int block no, float mean, int hO, int hl, int w0, int wi) ; Description: Calculate the variance value for a VQ index by referring to its surrounding indices. Parameter(s): 1. short int *index: the address of the buffer where the VQ indices are stored. 2. int block-no: the order of the vector (block) which we want to weight. 3. jloat mean: the mean value of the index. 4. int hO: hO will be 0 ifthe block to be weight is in the left edge of the input image; otherwise, hO is -1. 5. int h l : h l will be 0 ifthe block to be weight is in the right edge of the input image; otherwise, h l is -1. 6. int w0: w0 will be 0 fi the block to be weight is in the top edge of the input image; otherwise, w0 is -1. 7. int wl: wl will be 0 if the block to be weight is in the bottom edge of the input image; otherwise, wl is -1. Return Value: Return the calculated variance of the index. void Polarity(short int *index, unsigned char *outl, unsigned char *out2, unsigned char *out31 ; Description: Generate the polarity data streams by referring to the calculated mean values and variance values of the VQ indices. Parameter(s): 1. short int *index: the address of the buffer where the VQ indices are stored. 2. unsigned char *outl: the address of the buffer where the 1st polarity data stream will be placed.
Appendix B A-45
3. unsigned char *out2: the address of the buffer where the 2nd polarity data stream will be placed. 4. unsigned char "out.3: the address of the buffer where the 3rd polarity data stream will be placed.
2.2
Extract.cpp
For the extracting procedure, the subroutines and finctions are defined in t h s file. The main ones of them are listed below.
void BtnExtractClick(T0bject *Sender); Description: Execute the extracting procedure. Parameter($: TObject "Sender: the object which calls this routine. Note: When the "Extract" button on the form is clicked, this routine will be executed. bool CheckInputInfo (void); Description: Check the input information f o r the extracting procedure. Return Value: Return false if the input information is not complete; otherwise, return true. bool ReadKey(char *filename, unsigned char *out); Description : Read the data of the key from the assignedfile. Param eter(s) : 1. char *filename: thefile name of the keyfile. 2. unsigned char *in: the address of the buffer where the key data will beplaced. Return Value:
A-46
Appendix B
I f the inputfile cannot be opened correctly, return false; otherwise. return true.
bool Init (void); Description: Read the data from the assigned inputfiles. Return Value: If the inputfiles cannot be opened correctly, return false; otherwise, return true.
void Extract (void); Description: Recover the watermark bits from the obtainedpolarity data streams. bool SaveWatermark(unsigned char *in, char *filename, bool permute);
Description: Execute the inverse pixel permutation procedure (IPPP) upon the data of watermark (yneeded) and save the data as aJile. Parameter(s): 1. unsigned char *in: the address of the buffer where the data of watermark are stored. 2. har *filename: the file name of output image. 3. boolpermute: true for execute IPPP upon the data, false for not. Return Value: I f the watermark cannot be saved successfully, return false; otherwise, return true.
bool SaveResults (void); Description : Save the extracted watermarks. Return Value:
Appendix B
A47
If one of the extracted watermarks cannot be saved successfully, returnfalse; otherwise, return true.
void ShowResults (TObject *Sender); Description: Display the extracted watermarks and the BCR values. Parameter(s) : TObject *Sender: the object which calls this routine.
2.3
Evaluate.cpp
This file defines the evaluating functions, which can be applied for evaluating the performance of the watermarking system. The illustrations of the source codes in thts file have been listed in the earlier appendix; please refer to the related section.
2.4
VQ-class.cpp
The hnctions for the VQ procedures are defined in this file. The illustration of them has appeared in the earlier appendix; please refer to the related section to obtain the information.
2.5
Too1s.h
In this file, some tool fimctions for image processing are defined. The illustrations of these functions have been illustrated in the earlier appendix, and please read the related appendix to obtain the details of them.
This page intentionally left blank
Appendix C: Spatial-Based Scheme The spatial domain based watermarking system mentioned in Chapter 13 is attached in the CD-ROM and explained here. T h s demo system provides the functions for embedding and extraction. Section 1 will illustrate how to use the provided functions and Section 2 will explain the source files of it.
1
How to Use The System
To operate the demo program, please obtain the executable program and some test images from the attached CD-ROM firstly. Table 1 lists the holders and the content of them. Table CD1. Items Executable Program Source Files Test Files
Path X:\CExec X :\C\Source X:\C\Data
After executing the executable program, the main form lke Figure 1 will appear. The main functions provided in this demo program are also shown in Figure 1.
A-49
A-50 Appendix C
Figure 1. Main form of the demo system.
1.1
Embedding Procedure
To execute the embedding procedure, please click the “Embed Watermark” option from the “Run” menu or click the related icon on the tool bar (see Figure 1). ARer that, the form shown in Figure 2 will appear. Table 2 lists the specification of the input and output image files. Please follow it to assign the suitable files for the embedding procedure. Table 2. Specification of the inputloutput images for the embedding procedure.
I Items Original Image Watermark Watermarked Image
I
Imaee Size 512x512 128x128 512x512
I
Color Gray Binary Gray
I File Format I BMP BMP BMP
Appendix C A-51
Figure 2. Form for embedding.
The steps for the embedding procedure are: Step 1: Assign a gray BMP file as the host image, such as the image of Lena in Figure 2. Step 2: Assign a binary BMP file as the watermark, such as the logo in the left-top corner of Figure 2. Step 3: Assign a key file, which will be used for selecting the pixels where the watermark bits will be embedded. If the users do not have such file, please leave this field alone. The default file will be used then. Step 4:Set the value of the delta parameter. The larger the value, the robustness will be better, but the quality will become poor.
A-52
Appendix C
Step 5: If the pixel permutation procedure for the watermark is needed, please check the option. Step 6: Assign the file name for the output watermarked image. Step 7: Click the “Embed” button to execute the embedding procedure. After few seconds, the watermarked result will be displayed on the form.
1.2
Extracting Procedure
To execute the mentioned extraction procedure of Section 2 in Chapter 13, please click the “Extract Watermark” option from the “Run” menu or click the related icon on the tool bar (see Figure 1). After that, the form shown in Figure 3 will appear. Please follow Table 3 to assign the suitable input and output files and the steps below to operate the extracting procedure. Table 3. Specification of the inputJoutput files for the extracting procedure. ~
Items Watermarked Image Extracted Watermark
Image Size 512x512 128x128
Color Gray
Binarv
~~
File Format BMP BMP
Appendix C A-5.3
Figure 3. Form for extracting.
Step 1:Assign a BMP file which contains the watermark infonnation as the input image, such as the one in the right-hand side of Figure 3. Step 2: Assign a key file, which should be the same as the one used in the embedding procedure. T h s key will be used to select the pixels and the watermark bits will be extracted from these pixels. If the users do not have a suitable key file, please leave t h s field alone. The system will use the default file then. Step 3: Select if the pixel permutation procedure for the watermark is needed. In the embedding procedure, if this option was selected, please check it also, or the extracted result will be wrong.
A-54
Appendix C
Step 4:Assign the file name for the extracted watermark. Step 5 : Click the “Extract” button to run the extraction procedure. After few seconds, the extracted result will be displayed on the form. (An example of the extracted watermark is shown in the left-top corner of Figure 3.)
1.3
Evaluating Functions
In this demo program, it also provides some evaluating hnctions (PSNR, MSE, NC, and BCR) for images in BMP file format. (The PSNR and MSE functions can be applied upon gray-valued BMP images, and the NC and the BCR functions can be applied upon binary-valued BMP images.) The illustration of how to use the functions has been mentioned in the earlier appendix; please refer to it if in need.
2
Source Codes
The attached demo system in the CD-ROM was designed and compiled with Borland C++ Builder 5.0. Table 4 lists the content of each member file in the Borland C++ Builder project. To modify or compile the source files of the system, please copy the files from the CD-ROM to your hard disk, and please remember to disable the read-only attribute of all the files. In this section, only the main subroutines of the source files will be listed and illustrated. Others such as the evaluation functions and tool functions etc have been explained in the earlier appendix, thus they will be omitted here.
Appendix C A-55
Table 4.Content of the source files in this system.
2.1
Embed.cpp
The subroutines and hnctions for the embedding procedure are defined in this file. The main functions of them are listed below.
void BtnEmbedClick (TObject *Sender); Description : Execute the embedding procedure. Param eter(s) : TObject *Sender: the object which calls this routine. Note: When the "Embed" button on the form is clicked, this routine will be executed. bool CheckInputInfo(void); Description: Check the input information f o r the embedding procedure. Return Value: Return false if the input information is not complete; otherwise, return true.
bool Init (void); Description : Read the data from the assigned inputfiles.
A-56
Appendix C
Return Value: r f the input files cannot be opened correctly, return false; otherwise, return true.
void Embed (void); Description: Embed the assigned watermark into the assigned cover image. bool SaveResults (void); Description: Save the watermarked image for the extracting procedure. Return Value: rfthe output image cannot be saved, return false; otherwise, return true. void ShowResults(T0bject *Sender); Description: Display the output results of the embedding procedure. Parameter (s): TObject *Sender: the object which calls this routine. bool ReadPixelEmbeddingKey(char "filename, unsigned char *out) i Description : Read the pixel embedding key, which will be referred to while embedding. Parameter (s): 1. char *filename: the name of the key file. 2. unsigned char *out: the address of the buffer where the read key data will be placed. Return Value: Return false if the key file cannot be opened correctly; otherwise, return true. int CalcMean(unsigned char *in, int n) ;
Appendix C A-57
Description: Calculate the mean value of the surrounding pixels of the n-th pixel in one block. Parameter(s): 1. unsigned char *in: the address of the buffer where the subimage (or block) is placed. 2. int n: the order of the pixel in the block. (i.e. ifwant to calculate the mean value of the surrounding pixels of the 13th pixel, then n=13). Return Value: Return the calculated mean value.
2.2
Extract.cpp
For the extracting procedure, the subroutines and knctions are defined in this file. The main ones of them are listed below.
void BtnExtractClick(T0bject *Sender); Description: Execute the extracting procedure. Parameter(s): TObject *Sender: the object which calls this routine. Note: When the "Extract" button on the form is clicked, this routine will be executed. bool CheckInputInfo (void); Description: Check the input information f o r the extraction procedure. Return Value: Return false if the input information is not complete; otherwise, return true. bool Init (void); Description:
A-58
Appendix C
Read the data from the assigned inputfiles. Return Value: I f the input files cannot be opened correctly, return false; otherwise, return true. void Extract (void); Description : Extract a watermarkfrom the assigned inputfile. boo1 SaveResults (void); Description: Save the output results of the extracting procedure. Return Value: I f the extracted image cannot be saved, return false; otherwise, return true. void ShowResults (TObject *Sender); Description : Display the output results of the extracting procedure. Parameter(s) : TObject "Sender: the object which calls this routine.
2.3
Evaluate.cpp
This file defines the evaluating functions, which can be applied for evaluating the performance of the watermarking system. The illustrations of the source codes in t h s file have been listed in the earlier appendix; please refer to the related section.
2.4
Too1s.h
In t h s file, some tool fbnctions for image processing are defined. The illustrations of these functions have been mentioned in the earlier appendix; please read the related appendix to obtain the details of them.
Appendix D: GA Training Program for Spatial-Based Scheme In this appendix, the GA training program, which was illustrated in Section 3 of Chapter 13, for the spatial-domain-based watermarlung system is presented. The trained results of this demo program can be used in the spatial-based watermarking scheme (see Appendix C) to improve the PSNR values of the watermarked images and the BCR values of the extracted watermarks while the considered attack happens. Section 1 will introduce how to use the provided hnction of the demo system, and Section 2 will explain the source files of t h s system.
1
How to Use The System
The executable program, test images, and the source files for the demo system can be obtained fi-om the attached CD-ROM. Table 1 lists the holders and the content of them. Table CD1. Items Path Executable Program X:\D\Exec Source Files X : D \ S ource Test Files X:\D\Data After executing the demo program, the main form (see Figure 1) of the system will appear. The main hnctions of this system are illustrated in the following sub-section.
A-59
A-60
Appendix
D
Figure 1. Main form of the demo system.
Appendix
1.1
D A-61
GA Training Procedure
To execute the GA training program, please refer to the specification of the input files listed in Table 2 and the steps followed below. Table 2. Specification of the inputloutput images for the training procedure. ~~~
~~
Items Original Image Watermark
~
Image Size 512x512 128x 128
Color Gray Binarv
File Format BMP BMP
Step 1: In the main form of the demo program (see Figure l), please assign a gray BMP file as the original image. Step 2: Assign a binary BMP file as the watermark. Step 3: Set the value of the delta parameter. The larger the value, the robustness will be better, but the quality will become poorer. Step 4:Set the population size for the GA training procedure. The maximum value of it is 10 in thls system. Step 5: Set the value of lambda for the GA fitness function to adjust the weighting of PSNR or BCR. Step 6: Assign the times of the GA training iteration. Step 7: Assign the mutation rate. The value of it should be between 0 and 100 (in %). Note, in the demo program, there is no crossover procedure, so only the mutation rate has to be set. Step 8: If the user wants to assign the selected pixels for the first GA iteration, please check this option and select the pixels.
A-62
Appendix
D
Step 9: Here please choice if the GA training program trains with the DCT attack. If no, the training procedure will only train for having better PSNR. Step 10: Assign the file name for the trained results. Step 11: Click the “Train” button to execute the GA training procedure. Please note, the GA training procedure takes some time to have the results, and the time of training depends on the GA population size, GA iteration times, and the class of the used computer etc. The program will have no respond during the training period.
2
Source Codes
The attached system in the CD-ROM was designed and compiled with Borland C++ Builder 5.0. Table 3 lists the member files in this project. To modify or compile the source files of the system, please copy the files from the CD-ROM to your hard disk, and please remember to disable the read-only attribute of all the files. Table 3. Content of the source files in this system. Content Files Project Information ProjectD.* Train. * Shell of the system Setun.* 1 Shell for initializing the selected Dixels. I Functions for the GA training procedure GA class.* DCT class.* I Functions about DCT. I Information of this system About. * Tools.* I TOOI functions for image Drocessing
I
In this section, only the main subroutines of the source files will be listed and illustrated. Others such as the evaluation functions and
Appendix D
A-63
tool functions etc have been explained in the earlier appendix, thus they will be omitted here.
2.1
Train.cpp
This file defines the user interface of the demo program and some shell functions for the training procedure. The main functions of them are listed below.
void BitBtnTrainClick(T0bject *Sender); Description: Execute the GA training procedure. Parameter(s) : TObject *Sender: The object which calls this routine. Note: When the "Train" button on theform is clicked, this routine will be executed. bool CheckInputInfo(void); Description: Check the input information for the training procedure. Return Value: Return false fi the input information is not complete; otherwise, return true. bool Init (void); Description: Read the input info and the datafiom the assigned input files. Return Value: If the input files cannot be opened correctly, return false; otherwise, return true. b o o l SaveResults (void); Description:
A-64 Appendix D
Save the GA trained results. Return Value: rfthe trained results cannot be saved, return false; otherwise, return true.
2.2
GA-class.cpp
In this source file, the subroutines and fimctions for the GA training procedure are defined. The main ones of them are listed below.
void SetGAParameters(int ind amount, float lambda, int iteration, float-m-rate) ; Description: Set the GA parameters. Param eter(s) : 1. int ind-amount: the GA population size. 2. float lambda: a parameter for the GAJitnessfunction. 3. int iteration: the GA iteration. 4. float m-rate: the GA mutation rate. void SetAttackingParameters(int attack-amount) ; Description: Set the info about the assigned attack. Parameter(s) : int attack-amount: the number of attach. void InitSelectedPixels(int a, int b, int c, int d) ; Description: Set the usedpixels for the first iteration. Param eter(s) : int a, b, c, d: The indices of the 4 usedpixels. void InitO (boo1 need-init);
Appendix D
A-65
Description : Set the init values of some vars. Parameter(s): boo1 need-init: true for setting the init values of the selected pixels; false for not. Note: This routine will save the selected pixels as a text file for checking.
void Initl (void); Description: Copy the data of the original image. Note: This routine will copy the cover image into bufB[l. void Mutate (int ind-no) ; Description : Mutate the genes of the assigned individual. Param eter(s) : int ind-no: the number of the assigned individual. void Embed (int ind-no) ; Description : Embed the watermark bits into the original image. Parameter (s): int ind-no: the number of the assigned individual. Note: 1. Before executing this routine, the original image has to be stored in bujB[l. 2. The info of the selectedpixels is stored in the assigned individual. void Attack (void); Description: Apply the DCT attack upon the watermarked image.
A-66
Appendax
D
Note: 1. Before executing this routine, the watermarked image has to be placed in bujB[]. 2. After executing this routine, the attacked image will be stored in bujB2[]. void Extract (int ind-no) ; Description : Extract the embedded watermark bits from the selectedpixels of the assigned individual. Parameter(s) : int ind-no: the number of the assigned individual. Note: I . This routine uses the data stored in bujB2[1 and 2. Store the extracted watermark bits in buJD[1. float CalcBCR(unsigned char *bufl, unsigned char *buf2, int size); Description : Calculate the Bit Correct Rate. Parameter($: 1. unsigned char *bufl: the address of the buffer where the first data stream is stored. 2. unsigned char "buf2: the address of the buffer where the second data stream is stored. Return Value: The calculated BRC value, Note: In this function, O < = BCR <=I. void Evaluate (int ind-no) ; Description: Calculate the fitness score for the assigned individual. Parameter($: int ind-no: the number of the assigned individual.
Appendix D
A-67
Note: If it trains without attack, the PSNR score is the fitness score; or, fitness score = PSNR + lambda * BCR.
void SortScore(int block-no); Description : For the assigned block of each individual, thisfunction will sort the fitness scores of them using bubble sorting algorithm. Parameter(s) : int block-no: the number of the assigned block. Note: 1. After sorting, thefirst score offitness-score[] will be the largest one and the last score of it will be the smallest one. 2. After sorting, thefirst value of order[] is the index of the individual whosefitness score is the best one.
void Backup (int block-no)
;
Description : For the block with the best fitness score, keep the info of it. Param eter(s): int block-no: the number of the assigned block. Note: This routine will keep the bestfitness score, PSNR score, BCR score, and the selected pixels.
void Select (void); Description : Select the best individual with the best fitness score.
void Regenerate (void); Description: Replace the content of each individual with the content of the best individua I.
void Check(F1LE *fp, int iter);
A-68
Appendix
D
Description: Save the important info for current iteration. Param eter(s) : 1. FILE *fp: a file pointer where the info will be stored. 2. int iter: current GA iteration. Note: The trained results will be saved every 10 iterations.
boo1 SaveResults(char *dat-file, char *chk-file); Description : Save the trained results. Parameter(s): 1. char *datJle: thefile name for the trained results. 2. char *chkJle: the file name for the checkfile. Return Value: Return truefor saving the results successfully, false for not. Note: Except saving the trained results as a data file, this routine will also save the trained results as a textfile for checking. void TrainWithoutAttack(F1LE * f p ) ; Description : Execute the GA training without attack. Param eter(s) : FILE *fp: A file pointer. The trained info of each iteration will be saved to thefile where the pointer links. void TrainWithAttack(F1LE * f p ) ; Description : Execute the GA training with attack. Parameter($: FILE *fp: A file pointer. The trained info of each iteration will be saved to thefile where the pointer links.
Appendix D A-69
void Train(boo1 need-init) ; Description: The main function for executing the GA training procedure. Parameter(s): boo1 need-init: Set true i f needs to initialize the selected pixels for the 1-st GA iteration; set false ifnot. int CalcMean(unsigned char *in, int n) ; Description: Calculate the mean value of the surrounding pixels of the n-th pixel in one block. Parameter(s): I. unsigned char *in: the address of the buffer where the subimage (or block) is placed. 2. int n: the order of the pixel in the block. 6.e. i f want to calculate the mean value of the surrounding pixels of the 13th pixel, then n=13). Return Value: Return the calculated mean value.
2.3
DCT-class.cpp
The functions for DCT procedure are defined in thls file and the information of them are: void InitQTable (int q-factor) ; Description: Set the initial values for the Q-table. Parameter(s): int qJactor: the value of the quality factor q. void InitScanPath (void); Description: Set the scanningpath for the zig-zag scan.
A-70
Appendix
D
void InitCos (void); Description: Initialize the cosine table. void Init (int q) ; Description: Initialize the variables for the related DCTprocedures. Parameter($ : int q: the DCT quality factor. void DCT(unsigned char *block, int *dct-block) ; Description: Apply the DCTprocedure for one 8*8 sub-image without executing the zig-zag scan. Parameter(s): I . unsigned char *block: the address of the buffer where the sub-image is stored. 2. int "dct-block: the address of the buffer where the transformed DCT coefficients will be stored. void IDCT(int *dct-block, unsigned char *block); Description: Apply the inverse DCTprocedure for one 8 "8 block without executing the zig-zag scan. Parameter($: 1. int "dct-block: the address of the buffer where the input DCT coefficients are stored. 2. unsigned char "block: the address of the buffer where the recovered sub-image data will be stored. void DCT(unsigned char *in-buf, int *out-buf); Description:
Appendix D
A-71
Apply the DCTprocedure for one 8*8 sub-image. Parameter($ : 1. unsigned char *in-bufi the address of the buffer where the sub-image is stored, 2. int *out-bu! the address of the buffer where the transformed DCT coefficients will be stored.
void IDCT(int *in-buf, unsigned char *out-buf) ; Description: Apply the inverse DCTprocedure f o r one 8*8 block. Parameter($ : 1. int *in-buf the address of the buffer where the input DCT coefficients are stored. 2. unsigned char *out-but the address of the buffer where the recovered image data will be stored.
void BatchDCT(unsigned char i n [ M ] [Sl out [MI [S] , int n) ;
I
int
Description: Apply the DCTprocedure upon the n 8*8 sub-images. Parameter(s): 1. unsigned char in[M][S]: the address of the buffer where the data of the sub-images are stored. 2. int out[MJ[S]: the address of the buffer where the transformed DCT coefficients will be placed. 3. int n: the total amount of the 8*8 sub-images.
void BatchIDCT (int in [MI [Sl out [MI [S] int n) ;
I
unsigned char
I
Description : Apply the inverse DCTprocedure upon the input DCT coefficients. Parameter(s) :
A-72 Appendix D
I . int in[M][S]: the address of the buffer where the input data are stored, 2. unsigned char out[M][S]: the address of the buffer where the recovered data will be placed. 3. int n: the total amount of the 8"8 sub-images. void DCTAndIDCT(unsigned char *in buf, unsigned char *out-buf, int q-factor, int width, int height); Description : Apply the DCTprocedure upon the input data then the IDCT procedure upon the transformed data. Parameter($: I . unsigned char "in-bufi the address of the buffer where the input data are stored. 2. unsigned char "out-buf the address of the buffer where the recovered data will be placed. 3. int qJactor: the quality factor f o r the DCT Q-table. 4. int width, height: the width and height of the image. boo1 DCTAndIDCT(unsigned char *in buf, char *out file, int q-factor, int width, int height) ; Description: Apply the DCTprocedure upon the input data then the IDCT procedure upon the transformed data. The recovered image data will be saved as the assigned output file. Param eter(s): 1. unsigned char "in-bufi the address of the buffer where the input data are stored. 2. char *out$le: the file name of the output BMP image. 3. int qyactor: the quality factor f o r the DCT @table. 4. int width, height: the width and height of the image. Return Value:
Appendix
D A-73
Return false if the output file cannot be generated successfully; otherwise return true. boo1 DCTAndIDCT(char *in-file, char *out-file, int q-factor); Description: Apply the DCTprocedure upon the input image (in file format) then the IDCTprocedure upon the transformed data. The recovered image data will be saved as the assigned output file. Param eter(s): 1. char *inf i l e : the file name of the input BMP image. 2. char *outJle: the file name of the output BMP image. 3. int qjactor: the quality factor for the DCT Q-table. Return Value: Return false if the I/O file cannot be openedhaved successfully; otherwise return true. Note: 1. The default size of the input/output image is 512*512 pixels. 2. The input/outputfile is in BMPfile format and has to be a gray-valued image.
2.4
Too1s.h
In this file, some tool hnctions for image processing are defined. The illustrations of these hnctions have been illustrated in the earlier appendix; please read the related appendix to obtain the details of them.
This page intentionally left blank
Appendix E: Visual Cryptography A two-out-of-two visual secret sharing system (Chapter 16) for binary images is attached in the CD-ROM. This appendix will introduce the demo system from how to operate the splitting procedure and the stacking procedure in Section 1, and illustrates the source files of it in Section 2.
1
How to Use The System
The executable program, test images, and the source files can be obtained fiom the attached CD-ROM. Table 1 lists the holders and the content of them. Table 1. Content of the holders (‘X’ is the CD-ROM drive).
Items Executable Program Source Files Test Files
Path X:WExec X:E\Source X:E\Data
To operate the executable program, we suggest the readers to copy the necessary files into your hard disk and execute it then.
1.1
Splitting Procedure
To execute the splitting procedure, please run the program and refer to Table 2 to assign the suitable input file. The steps below then can be applied.
A-75
A-76
Appendix E
Tablehe Splitt 2. Items Input Image (W) lstShare (Wsl) 2”dShare (Ws2)
Image Size 128x128 256x256 256x256
Color Binary Binary Binary
File Format BMP BMP BMP
Figure 1. Form for the splitting procedure.
Step 1: Click the “Split” tab on the main form. The form for splitting (see Figure 1) will appear. Step 2: Assign a binary BMP file as the input image. Step 3: Assign the file names for the first share and the second share respectively. Step 4: Click the “Split” button to execute the splitting procedure. Step 5 : The generated two shares will be displayed on the form.
Appendix E A-77
1.2
Stacking Procedure
To obtain the secret information fiom the generated shares, the stacking procedure can be started from selecting the "Stack" tab on the form. Table 3 lists the specification of the input/output files and Figure 2 displays the interface of this procedure. Table 3. Specification of the inputloutput files for the stacking procedure.
I Items
I
lstShare (Wsl) 2ndShare (Ws2) Recovered Image (Wr)
Image Size 256x256 256x256 256x256
I
Color Binary Binary Binarv
I File Format I BMP BMP BMP
Figure 2. Form for the stacking procedure.
Step 1: Click the "Stack" tab on the form. The form for the stacking procedure is shown in Figure 2.
A-78 Appendix E
Step 2: Assign the input images. (The specification of the input files please refer to Table 3.) Step 3: Assign the file name for the stacked result. Step 4:Click the "Stack" button to execute the stacking procedure. Step 5 : The stacked image will be displayed on the form.
2
Source Codes
The attached demo system in the CD-ROM was designed and compiled with Borland C++ Builder 5.0. To modifl or compile the source files of the system, please copy the files from the CD-ROM to your hard disk, and remember to disable the read-only attribute of all files. Table 4 lists the member files in the Borland C++ Builder project. Table 4. Content of the source files in this system.
Content Shell (interface) of the system Tools.*
In t h s section, only the main subroutines of the source files will be listed. Others such as the tool functions have been explained in the earlier appendices. Please refer to the related appendices for the details of them. Further details and comments about the functions can be obtained in each of source files.
Appendix E A-79
2.1
Main.cpp
The user interface of t b s system and the functions for the splitting procedure and the stacking procedure are all defined in this file. The main subroutines of t h s file are listed below: void BtnSplitClick (TObject *Sender) ; Description: Execute the splitting procedure. Parameter(s) : TObject *Sender: The object which calls this routine. Note: When the "Split" button on the form is clicked, this routine will be executed. bool Split -CheckInputInfo (void); Description: Check the input information f o r the splitting procedure. Return Value: Return false fi the input information is not complete; otherwise, return true. bool Split -Init (void); Description: Read the data from the assigned inputfile. Return Value: r f the input file cannot be opened correctly, return false; otherwise, return true. void Split (void); Description: Split the input image into two shares. Note:
A-80 Appendix E
The input data have to be stored in the default buffer beforehand, and the generated shares will be stored in another default bu8ers. bool Split-SaveResults (void); Description: Save the generated image. Return Value: If the output images cannot be saved, return false; otherwise, return true. void Split-ShowResults (void); Description: Display the output results of the splitting procedure. void BtnStackClick (TObject *Sender) ; Description : Execute the stacking procedure. Param eter(s): TObject "Sender: The object which calls this routine. Note: When the "Stack" button on the form is clicked, this routine will be executed. bool Stack-CheckInputInfo(void); Description: Check the input information for the stacking procedure. Return Value: Return false if the input information is not complete; othenuise, return true. bool Stack-Init (void); Description: Read the data from the inputfiles and check the size of them. Return Value:
Appendix E A-81
r f the input$les cannot be opened correctly, return false; otherwise, return true.
void Stack(void); Description: Stack the shares to generate the stacked image. Note: The data of the shares have to be stored in the default buffers beforehand. The generated image will be stored in another default buffer also. boo1 Stack-SaveResults (void); Description: Save the stacked image. Return Value: rfthe output image cannot be saved, return false; otherwise, return true.
void Stack-ShowResults (void); Description: Display the output result of the stacking procedure.
2.2
Too1s.h
In t h s file, some tool hnctions for image processing are defined. The illustrations of these fbnctions have been mentioned in the earlier appendix; please read the related appendix to obtain the details of them.
This page intentionally left blank
Appendix F: Modified Visual Cryptography In this appendix, a modified visual cryptography system will be mentioned. This demo system provides the functions for splitting a binary image into two shares without expanding the size of the output images. Illustration for operating the system will be given in Section 1 and the explanation of the source files will be given in Section 2.
1
How to Use The System
To operate the system mentioned in Section 3 of Chapter 17, the executable program, test images, and the source files can be obtained from the attached CD-ROM. Table 1 lists the related directories and the content of them. Table1.
I
Items Executable Program Source Files Test Files
I
Path X:WExec X:W\Source X:\FV)ata
I
Note: We suggest the readers who want to try the mentioned program please make a copy of it into your hard disk.
1.1
Splitting Procedure
To execute the splitting procedure, please run the demo program and refer to Table 2 to assign the suitable file as the input image. The steps below then can be applied. A-83
A-84 Appendix F
Tablesplitting 2. Items Input Image (W) lstShare (Wsl) 2ndShare (Ws2)
Image Size 128x128 128x128 128x128
Color Binary Binary Binary
File Format BMP BMP BMP
Figure 1. Form for the splitting procedure.
Step 1: Click the “Split” tab on the main form. The form for splitting (see Figure 1) will appear.
Appendix F
A-85
Step 2: Assign a binary BMP file as the input image. (Some test binary images can be obtained from the “Data” holder.) Step 3: Assign the file names (or use the default file names) for the output images. Step 4:Click the “Split” button to execute the splitting procedure. Step 5: After few seconds, the generated shares will be displayed on the form.
1.2
Stacking Procedure
To recover the original image from the two shares, please follow the steps below and refer to the specification of the input/output files in Table 3. Table stack 3. Items 1’‘ Share (Wsl) 2ndShare (Ws2) Recovered Imane fWr)
I
1
Image Size 128x128 128x128 128x128
I I
Color Binarv Binary Binarv
I I
File Format BMP BMP BMP
Step 1: Click the “Stack” tab on the form. Step 2: In the appeared form (see Figure 2), please assign the two shares as the input images. Step 3: Assign the file name for the output image. Step 4:Assign the operator for the stacking procedure. (The default one is XOR.)
A-86
Appendix F
Step 5: Click the "Stack" button to execute the recovery procedure. The recovered result will be displayed on the form.
Figure 2. Form for the stacking procedure.
Note: In the splitting procedure, the system applies the XOR operator to split the input image into two shares. In the above recovery procedure, besides the default XOR operator, readers can try to apply other operators (such as AND or OR) to see what will happen then.
Appendix F
2
A-87
Source Codes
The attached demo system in the CD-ROM was designed and compiled with Borland C++ Builder 5.0. To modifjr or compile the source files of the system, please copy the files fiom the CD-ROM to your hard disk, and remember to disable the read-only attribute of all files. Table 4 lists the member files in the Borland C++ Builder project. Table 4. Content of the source files in this project.
Content Shell (interface) of the system
Tools.*
In t b s section, only the main subroutines of the source files will be listed. Others such as the tool hnctions have been explained in the earlier appendices, thus please refer to the related appendices for the details of them.
2.1
Main.cpp
In t h s file, the user interface of this system and the functions for the splitting procedure and the stacking procedure are defined. The main subroutines of this file are:
void BtnSplitClick (TObject *Sender) ; Description: Execute the splitting procedure. Parameter(s): TObject *Sender: The object which calls this routine. Note:
A-88
Appendix F
When the "Split" button on theform is clicked, this routine will be executed. bool CheckInputInfol(void); Description: Check the input informationfor the splitting procedure. Return Value: Return false fi the input information is not complete; otherwise, return true. void BtnStackClick(T0bject *Sender); Description: Execute the stacking procedure. Parameter(s): TObject *Sender: The object which calls this routine. Note: When the "Stack1'button on theform is clicked, this routine will be executed. bool CheckInputInfo2(void); Description: Check the input informationfor the stacking procedure. Return Value: Return false fi the input information is not complete; otherwise, return true.
2.2
Too1s.h
In this file, some tool functions for image processing are defined. The illustrations of these functions have been mentioned in the earlier appendix; please read the related appendix to obtain the details of them.
Appendix G: VC-Based Scheme The VC-based watermarking demo program mentioned in Section 2 of Chapter 17 will be introduced here. Section 1 illustrates how to use the provided functions and Section 2 explains the source files of it.
1
How to Use The System
To operate the demo system which mentioned in Section 2 of Chapter 17, please obtain the executable program and some test images from the attached CD-ROM firstly. Table 1 lists the related holders and the content of them. Table1. Items Executable Program Source Files Test Files
Path X:\G\Exec X:\G\Source X:\G\Data
After executing the demo program, the main form (see Figure 1) of the system will appear. The provided functions of this system are also shown in Figure 1. The illustration of them will be given in the following sections.
A-89
A-90
Appendix G
Figure 1. Main form of the demo system.
1.1
Splitting Procedure
To operate the splitting procedure, please refer to Table 2 to assign the suitable files as the input images and follow the steps below. Table 2. Specification of the inpudoutput files for the splitting procedure.
Items
Size 512x512
Public Watermark Secret Watermark
512x512 512x512
Color Bina Bina Bina
File Format
BMP
Appendix G A-91
Figure 2. The form for splitting.
To split an input watermark into a public one and a secret one, please click the related icon (see Figure 1) or the “Split Watermark” option from the “Run” option in the main menu. Then, the steps below are applied: Step 1: In the appeared form (see Figure 2), assign a gray BMP file as the original image. Step 2: Assign a binary BMP file as the input watermark. Step 3: Set the value of threshold 1 (i.e. 1 =lo). Step 4: Assign the file name for the generated public watermark. Step 5: Assign the file name for the generated secret watermark.
A-92 Appendix G
Step 6: Click the “Split” button on the form. After few seconds, the generated public watermark and the secret one will be displayed on the right-hand side of the form. (An example of the generated secret watermark is shown in Figure 2.) The generated public watermark can be used as the input one for the mentioned embedding scheme, and the secret watermark can be used in the verification procedure.
1.2
Stacking Procedure
The same as the splitting procedure, the stacking procedure in this demo program is designed for BMP images only. Please refer to Table 3 to assign the suitable input files. Also, if the reader cannot obtain your own test images, some example files can be found from the attached CD-ROM. Table 3. Specification of the inputloutput files for the stacking procedure.
Items Public Watermark Secret Watermark
Stacked Image Reduced Watermark
Image Size
Color
File Format
512x512 512x512 512x512 256x256
Binary
BMP
Binarv Binary Grav
BMP BMP BMP
To execute the stacking procedure, please click the related icon (see Figure 1) or the “Stack Watermarks” option from the “Run” option in the main menu. After the form like Figure 3 appears, the steps below can be applied:
Appendix G A-93
Figure 3. The form for stacking.
Step 1: In the appeared form, please assign the file name of the public watermark. The image of it will be displayed on the form. Step 2: Assign the file name of the secret watermark. Step 3: Assign the file name for the stacked result. Step 4: Assign the file name for the reduced result. Step 5: Click the "Stack" button to execute the stacking procedure. After few seconds, the output results will be shown on the form. (Figure 3 shows an example of the stacked result by stacking a public watermark and a secret watermark.)
A-94 Appendix G
Please note, due to the used algorithm in the reducing procedure, the saved reduced image is in gray-valued format.
1.3
Embedding Procedure
To execute the embedding procedure, please click the related icon (see Figure 1) or the “Embed Watermark” option from the “Run” option in the main menu. Then, the form like Figure 4 for embedding will appear. Please refer to Table 4 to assign the suitable files and follow the steps below to operate the embedding procedure. Table 4. Specification of the inpudoutput files for the embedding procedure.
Items Original Image Public Watermark Watermarked Image
Image Size 5 12x5 12 512x512 5 12x 5 12
Color Gray Binary Gray
File Format BMP BMP BMP
Appendix G A-95
Figure 4. The form for embedding.
Step 1: In the appeared form, assign a gray BMP file as the original image. The image will be displayed on the form. Step 2: Assign the file name of the public watermark. The image will be displayed on the form. Step 3: Set the value of threshold 1.The value of it should be the same as the one in the splitting procedure. Step 4:Set the value of threshold 6. (i.e. 6 =lo). Step 5: Assign the file name for the watermarked image. Step 6: Click the “Embed” button to execute the embedding procedure. After few seconds, the watermarked image will be displayed on the form, such as the one in Figure 4.
A-96 Appendix G
1.4
Extraction
To execute the extracting procedure, please click the related icon (see Figure 1) or the “Extract Watermark” option from the “Run” option in the main menu. When the form like Figure 5 appears, please refer to Table 5 to assign the suitable files and follow the steps below to operate the extracting procedure. Table 5 . Specification of the inputloutput files for the extracting procedure.
Appendix G A-97
Figure 5 . The form for cxtracting.
Step 1: Assign the file name of the watermarked image. The image will be displayed on the form. Step 2: Assign the name of the secret watermark, which will be displayed on the form then. Step 3: Set the value of threshold A, which should be the same as the value of that in the embedding procedure. Step 4: Assign the file name for the rebuilt public watermark. Step 5: The program will stack the secret watermark and the rebuilt one to generate a stacked image. Please assign the file name for it.
A-98
Appendix G
Step 6: This program will reduce the stacked image to its original image size, and save it as an output image. Please assign a file name for it. Step 7: Click the “Extract” button to execute the whole extracting procedure. After few seconds, the generated images will be saved as the assigned filenames and be displayed on the form. (Figure 5 displays an example of the reduced result.) Please note, due to the used algorithm in the reducing procedure, the saved reduced image is in gray-valued format.
1.5
Performance Evaluation
In this demo program, it also provides some evaluating functions (PSNR, MSE, NC, and BCR) for images in BMP file format. (The PSNR and MSE functions can be applied upon gray-valued BMP images, and the NC and the BCR functions can be applied upon binary-valued BMP images.) The illustration of how to use the functions has been mentioned in the earlier appendix; please refer to it if in need.
2
Source Codes
The attached system in the CD-ROM was designed and compiled with Borland C++ Builder 5.0. Table 6 lists the content of each member file in the Borland C++ Builder project. To modify or compile the source files of the system, please copy the files from the CD-ROM to your hard disk, and please remember to disable the read-only attribute of all the files. In this section, only the main subroutines of the source files will be listed and illustrated. Others such as the evaluating functions and
Appendix G A-99
tool functions etc have been explained in the earlier appendix, thus they will be omitted here. Table 6. Content of the source files in this system.
2.1
Split.cpp
In this file, the functions and subroutines for the splitting procedure are defined. The most important ones of them are listed below.
void BtnSplitClick(T0bject *Sender); Description: Execute the splitting procedure. Parameter@): TObject *Sender: The object which calls this routine. Note: When the "Split" button on theform is clicked, this routine will be executed, boo1 CheckInputInfo(void); Description: Check the input informationfor the splitting procedure. Return Value:
A-100
Appendix G
Return false return true.
if the input information is not complete; otherwise,
boo1 Init (void); Description: Read the datafiom the assigned inputjile. Return Value: If the inputjile cannot be opened correctly, return false; otherwise, return true.
void Classify(unsigned char H[] [4], int threshold, char *BT); Description: Classifi all the input blocks to obtain the block types. Parameter@): 1. unsigned char H[][4]: the buffer where the input blocks are placed. 2. int threshold: the threshold ;1for the classijication procedure. 3. char *BT: the address of the buffer where the data of block types will be placed. void GeneratePublicWM(unsigned char *ref, unsigned char wm[] [4]); Description: Generate a public watermark. Parameter(s): 1. unsigned char *rej the address of the buffer where the data of block types are placed. 2. unsigned char wm[][4]: the address of the bufer where the generated public watermark will be placed. void GenerateSecretWM(unsigned char *W, unsigned char *ref, unsigned char wm[] [4]) ; Description: Generate a secret watermark.
Appendix
G A-101
Parameter($ : 1. unsigned char *W: the address of the buffer where the original watermark is placed. 2. unsigned char *ref the address of the buffer where the data of block types are placed. 3. unsigned char wm[][4]: the address of the buffer where the generated secret watermark will be placed.
void Split (void); Description: Split the input image into a public one and a secret one. boo1 SaveResults (void); Description: Save the output results of the splitting procedure. Return Value: Ifthe output results cannot be saved, return false; otherwise, return true.
void ShowResults(T0bject *Sender); Description: Display the output results of the splitting procedure. Parameter(s) : TObject *Sender: the object which calls this routine.
2.2
Stack.cpp
For the stacking procedure, the functions and subroutines are defined in this file. The main ones of them are listed below. void BtnStackClick(T0bject *Sender); Description: Execute the stacking procedure. Parameter(s): TObject *Sender: the object which calls this routine.
A-102
Appendix G
Note: When the "Stack" button on the form is clicked, this routine will be executed. b o o l CheckInputInfo(void);
Description: Check the input information for the stacking procedure. Return Value: Return false if the input information is not complete; otherwise, return true. b o o l Init (void);
Description: Read the data from the assigned inputjles. Return Value: If the inputfiles cannot be opened correctly, return false; otherwise, return true. void Stack (void); Description: Stack the shares to generate the stacked image. Note: This routine will refer to Wp[] and Ws[], andput the generated result in W[]. void Reduce (void); Description: Reduce the size of the stacked image. Note: This routine refers to W[] andput the reduced result in Wr[]
bool SaveResults (void); Description: Save the output images of the stackingprocedure. Return Value:
Appendix G A-103
Ij'the output images cannot be saved, returnfalse; otherwise, return true.
void ShowResults(T0bject *Sender); Description: Display the output results of the stacking procedure. Parameter(s) : TObject *Sender: the object which calls this routine.
2.3
Embed.cpp
The subroutines and functions for the embedding procedure are defined in this file. The main functions of it are listed below.
void BtnEmbedClick(T0bject *Sender); Description: Execute the embedding procedure. Parameter(s) : TObject *Sender: the object which calls this routine. Note: When the "Embed" button on theform is clicked, this routine will be executed. b o o l CheckInputInfo(void); Description: Check the input informationfor the embedding procedure. Return Value: Return false if the input information is not complete; otherwise, return true.
b o o l Init (void); Description: Read the dataporn the assigned inputfiles. Return Value:
A-104
Appendix G
I f the inputfiles cannot be opened correctly, returnfalse; otherwise, return true.
void Embed (int delta); Description: Embed the assigned watermark into the assigned cover image. Parameter(s): int delta: a threshold delta. boo1 SaveResults (void); Description: Save the watermarked image. Return Value: Ifthe output image cannot be saved, returnfalse; otherwise, return true. void ShowResults(T0bject *Sender); Description: Display the output results of the embedding procedure. Parameter@): TObject *Sender: the object which calls this routine.
2.4
Extract.cpp
For the extracting procedure, the subroutines and functions are defined in this file. The main ones of it are listed below.
void BtnExtractClick(T0bject *Sender); Description: Execute the extracting procedure. Parameter(s): TObject *Sender: the object which calls this routine. Note: When the "Extract" button on theform is clicked, this routine will be executed.
Appendix G A-105
bool CheckInputInfo(void); Description: Check the input informationfor the extracting procedure. Return Value: Return false fi the input information is not complete; otherwise, return true. bool Init (void); Description: Read the datafiom the assigned inputfiles. Return Value: I f the input files cannot be opened correctly, return false; otherwise, return true.
bool SaveResults (void); Description: Save the output results of the extracting procedure. Return Value: Ifthe output images cannot be saved, return false; otherwise, return true. void ShowResults(T0bject *Sender); Description: Display the output results of the extracting procedure. Parameter(s) : TObject *Sender: the object which calls this routine.
2.5
Evaluate.cpp
This file defines the evaluating functions, which can be applied for evaluating the performance of the watermarking system. The illustrations of the source codes in this file have been listed in the earlier appendix; please refer to the related section.
A-106
Appendix G
2.6
Too1s.h
In this file, some tool functions for image processing are defined. The illustrations of these functions have been mentioned in the earlier appendix; please read the related appendix to obtain the details of them.
Appendix H: Gain/Shape VQ-Based Watermarking System A demo program about the mentioned system in Section 3 of Chapter 17 will be introduced in this appendix. Section 1 introduces how to operate the demo program, and Section 2 explains the source files of it.
1
How to Use The System
To operate the demo system, please obtain the executable program, the test images, and the codebooks from the attached CD-ROM. Table 1 lists the related directories and Table 2 lists the specification of the attached data files in the “Data” directory. Table 1, Related directories of this system (‘X’ is the CD-ROM drive). Path Items X:\H\Exec Executable Program X:\H\Source Source Files X:\H\Data Data Files
Table 2. Illustration of the attached data files. File Name Illustration A 512x512 gray image, which is used as the host lena-5 12.bmp image. A 128x128 binary image, which is used as a waterw-rose- 12 8.bmp mark. Gain.cb A gain codebook with 16 codewords. Shape.cb A shape codebook with 16 codewords.
A-107
A-108
Appendix H
After executing the demo program, the main form (see Figure 1) will appear. In this form, the provided functions can be executed by clicking the related icons. The main functions of it will be illustrated in the following sections. . . .. . .
.
.
-/
. .
Embed
. .. .
.
J
Extract
Figure 1. Main form of the system.
1.1
Splitting and Stacking Procedures
In this demo program, it provides no function for the splittinghtacking procedure mentioned in Section 3 of Chapter 17. Thus, to split an input watermark into two shares or to recover the original watermark from the shares, the system mentioned in Appendix F can be applied. Please refer to the illustration of the related appendix to split or to recover the input watermark.
1.2
Embedding Procedure
After splitting the original watermark into two unrecognizable watermarks by applying the demo program mentioned in Appendix F, the two watermarks are used as the input ones of the embedding
Appendix H A-109
procedure. Table 3 specifies the file format of the inputloutput files for this demo program. Table 3. Specification of the inputloutput images for the embedding procedure.
Items 1’‘ Watermark
5 12x5 12 128x128 128x128 512x512
Bina Bina BMP
Figure 2. Form for embedding.
To execute the embedding procedure, please click the “Embed Watermark” option from the “Run” menu or click the related icon on the tool bar (see Figure 1). The form for the embedding procedure like Figure 2 will appear. After that, please follow the steps below and refer to Figure 2 to operate the demo program.
A-110
Appendix H
Step 1: Assign a gray BMP file as the host image. Step 2: Assign two binary BMP files as the input watermarks. Step3:Assign the gain codebook and shape codebook. (If the reader does not have the codebooks, a shape codebook and a gain one can be obtained from the “Data” holder in the CDROM. Please refer to Table 2.) Step 4:Assign the file name for the output image. Step 5 : Please assign the file names for the generated keys. Step 6: Click the “Embed” button to execute the whole embedding procedure. Step 7: After few seconds, an OK message box will appear and the output image will be displayed on the form too.
1.3
Extracting Procedure
The extracting procedure can be started from clicking the related icon (see Figure 1) on the tool bar or select the “Extract Watermark” option from the “Run” option in the main menu. After that, a form like Figure 3 will appear. Table 4 lists the specification of the input/output files. Please refer to it to assign the input files for the demo program, and please refer to Figure 3 and the steps below to operate the extracting procedure.
Appendix H
A- 111
Figure 3. Form for extraction. Table 4. Specification of the input/output images for the extracting procedure. Items Image Size Color File Format Watermarked Image Gray BMP 512x512 lstExtracted Watermark 128x128 Binary BMP 2"d Extracted Watermark 128x128 Binary BMP
Step 1: Assign an image which contains the watermark signal as the input image, such as the one in Figure 3. Step 2: Assign the shape codebook and the gain codebook which were used in the embedding procedure. Step 3: Assign the key files which were generated in the embedding procedure.
A-112 Appendix H
Step 4:Assign the file names for the extracted watermarks. Step 5: Click the “Extract” button to execute the extracting procedure. Step 6 :After few seconds, an OK message box will appear. The extracted watermarks will be saved as the assigned file names and will be displayed on the form. To recover the original watermark from the two extracted ones, please execute the system mentioned in Appendix F.
1.4
Performance Evaluation
In this demo program, it also provides some evaluating functions (PSNR, MSE, NC, and BCR) for images in BMP file format. (The PSNR and MSE functions can be applied upon gray-valued BMP images, and the NC and the BCR functions can be applied upon binary-valued BMP images.) The illustration of how to use the functions has been mentioned in the earlier appendix; please refer to it if in need.
2
Source Codes
The attached demo system in the CD-ROM was designed and compiled with Borland C++ Builder 5.0. Table 5 lists the member files of this Borland C++ Builder project. To modify or compile the source files of the system, please copy the related files from the CD-ROM to your hard disk, and remember to disable the read-only attribute of all files. In this section, only the main subroutines of the source files will be listed and illustrated. Others such as the evaluating functions and
Appendix H
A-113
tool functions have been explained in the earlier appendix, thus they will be omitted here. Table 5 . Content of the source files in this system.
2.1
Embed.cpp
The subroutines and functions for the embedding procedure are defined in this file. Here the main functions in this file are listed below.
void BtnEmbedClick(T0bject *Sender); Description: Execute the embeddingprocedure. Parameter(s): TObject *Sender: the object which calls this routine. Note:
When the "Embed" button on theform is clicked, this routine will be executed. boo1 CheckInputInfo(void); Description: Check and read the input informationfor the embeddingprocedure. Return Value:
A-114
Appendix H
Return false return true.
if the input information is not complete; otherwise,
boo1 Init (void); Description: Read the datafiom the assigned inputfiles. Return Value: Ifthe inputfiles cannot be opened correctly, return false; otherwise, return true. float Weighting(short int "index, int block no, int hO, int hl, int w0, int wl); Descrip&n: Weight a VQ index by referring to its surrounding indices. Parameter(s): 1. short int *index: the address of the buffer where the VQ indices are stored. 2. int block-no: the order of the vector (block) which we want to weight. 3. int hO: hO will be 0 ifthe block to be weight is in the left edge of the input image; otherwise, hO is -1. 4. int h l : h l will be 0 ifthe block to be weight is in the right edge of the input image; otherwise, h l is -1. 5. int w0: w0 will be 0 ifthe block to be weight is in the top edge of the input image; otherwise, w0 is -1. 6. int w l : w l will be 0 ifthe block to be weight is in the bottom edge of the input image; otherwise, w l is -1. Return Value: Return the weighting value of the index. void Polarity(short int *index, unsigned char *out) ; Description: Generate a polarity data stream by referring to the input VQ indices.
Appendix H
A-115
Parameter(s) : 1. short int *index: the address of the buffer where the VQ indices are stored. 2. unsigned char *out: the address of the buffer where the generated polarity data will be placed. boo1 SaveResults(unsigned char *in, char *filename); Description: Save the generated key. Parameter(s) : I . unsigned char *in: the address of the buffer where the generated key is stored. 2. char *filename: the name of the output key file. Return Value: Return false fi the output file cannot be generated; otherwise, return true. void ShowResults(T0bject *Sender); Description: Display the output results of this procedure. Parameter(s): TObject *Sender: the object which calls this routine.
2.2
Extract.cpp
For the extracting procedure, the subroutines and functions are defined in this file, and the main functions of it are listed below.
void BtnExtractClick(T0bject *Sender); Description: Execute the extracting procedure. Parameter(s) : TObject *Sender: the object which calls this routine. Note:
A-116
Appendix H
When the "Extract" button on theform is clicked, this routine will be executed.
bool CheckInputInfo(void); Description: Check and read the input information for the extracting procedure. Return Value: Return false if the input information is not complete; otherwise, return true. bool Init (void); Description: Read the data9om the assigned inputfile. Return Value: If the inputfile cannot be opened correctly, return false; otherwise, return true. b o o l ReadEmbeddingKey(char *filename, un-
signed char *out); Description: Read the key file for extracting. Pararneter(s): 1. char *filename: the name of the input key file. 2. unsigned char *out: the address of the buffer where the data of the key will be stored. Return Value: Return false if the key file cannot be opened correctly; otherwise, return true. bool SaveResults (void); Description: Save the output results of the extracting procedure. Return Value:
Appendix H
A-117
rfthe output image cannot be saved, return false; otherwise, return true. void ShowResults(T0bject *Sender); Description: Display the output results of the extracting procedure. Parameter($ : TObject *Sender: the object which calls this routine.
2.3
GSVQ-class.cpp
In this file, the functions for the gadshape vector quantisation are defined. The description of each function is listed below. The information of further details can be obtained in the source file.
int ReadShapeCodebook(char *filename, double codebook[][DIMENSION]); Description: Read the shape codebook into memory. Parameter($ : 1. char *filename: the name of the shape codebook file. 2. double codebook[][DIMENSION]: the address of the buffer where the read codewords will be placed. Here DIMENSION defines the length of the codeword. Return Value: Return the codeword number of the shape codebook. int ReadGainCodebook(char *filename, double *codebook); Description: Read the gain codebook into memory. Parameter(s): I . char *fiename: the name of the gain codebookfifile. 2. double *codebook: the address of the buffer where the read codewords will be placed.
A-118
Appendix H
Return Value: Return the codeword number of the gain codebook. boo1 ReadIndexFile(char *infile, short int
*index); Description: Read the VQ indicesfrom the assigned inputfile. Parameter(s): 1. char *infile: the name of the inputfile. 2. short int *index: the address of the buffer where the read VQ indices will be placed. Return Value: Return true if the data of the inputfile can be read successfully; otherwise returnfalse. boo1 SaveIndexFile(short int *index, char
*outfile); Description: Save the VQ indices as afile. Parameter($: 1. short int *index: the address of the buffer where the VQ indices are stored. 2. char *outJle: the name of the outputfile. Return Value: Return true if the outputfile can be generated successfully; otherwise returnfalse. void Search(unsigned char in[] [DIMENSION], short int *indexl, short int *index2, int block-amount); Description: Execute the nearest search for the input vectors. Parameter(s):
Appendix H A-119
1. unsigned char in[][DIMENSION]: the address of the buffer where the input vectors (or sub-images) are stored. Here DIMENSION denotes the length of each vector. 2. int block-amount: the number of the input vectors (blocks). 3. short int "indexl: the address of the buffer where the obtained shape indices will be stored. 4. short int "index2: the address of the buffer where the obtained gain indices will be stored.
void TableLookup(short int *indexl, short int *index2, unsigned char out[] [DIMENSION], int block-amount); Description: Execute the G/S VQ table-lookup procedure. Parameter(s): 1. short int *indexl: the address of the buffer where the shape indices are stored. 2. short int "index2: the address of the bufer where the gain indices are stored. 3. int block-amount: the number of the vectors (blocks). 4. unsigned char out[][DIMENSION]: the address of the buffer where the obtained codewords will be stored. Here DIMENSION denotes the length of each vector. boo1 Encode(unsigned char in[] [DIMENSION],
char * s cb file, char *g-cb- file, short int *indexlT sEort int *index2, int block-amount); Description: Execute the G/S VQ encoding procedure. Parameter(s): 1. unsigned char in[][DIMENSION]: the address of the buffer where the input vectors (or sub-images) are stored. Here DIMENSION denotes the length of each vector. 2. char *s-cb_file: the name of the shape codebookfile. 3. char *g-cbfile: the name of the gain codebookfile.
A-120
Appendix H
4. int block-amount: the number of the vectors (blocks). 5. short int "indexl: the address of the buffer where the obtained shape indices will be stored. 6. short int "index2: the address of the buffer where the obtained gain indices will be stored. Return Value: Return false if the codebookfiles cannot be opened successfully; otherwise, return true.
boo1 Decode(char * s _ cb- file, char * g_ cb_ file, short int *indexl, short int *index2, unsigned char out[][DIMENSION], int block-amount) ; Description: Execute the G/S VQ decoding procedure. Parameter(s): 1. char "s-cbjle: the name of the shape codebook$le. 2. char *g-cbJile: the name of the gain codebookfile. 3. short int "indexl: the address of the buffer where the obtained shape indices are stored. 4. short int "index2: the address of the bufer where the obtained gain indices are stored. 5. int block-amount: the number of the vectors (blocks). 6. unsigned char out[][DIMENSION]: the address of the buffer where the obtained vectors (or sub-images) will be stored. Here DIMENSION denotes the length of each vector. Return Value: Return false if the codebookfiles cannot be opened successfully; otherwise, return true.
Appendix H A-121
2.4
Evaluate.cpp
This file defines the evaluating functions, which can be applied for evaluating the performance of the watermarking system. The illustrations of the source codes in this file have been listed in the earlier appendix; please refer to the related section.
2.5
Too1s.h
In this file, some tool functions for image processing are defined. The illustrations of these functions have been mentioned in the earlier appendix; please read the related appendix to obtain the details of them.
This page intentionally left blank
Authors ’
Contact
Information
B- 1
This page intentionally left blank
Authors’ Contact Information B-3
Authors’ contact information
(Listing in alphabetical order)
Mauro Barni is with University of Siena, Italy barni@.?dii.unisi.it Christoph Busch is with Fraunhofer Institute for Computer Graphics (FhG-IGD), Germany [email protected] Roberto CaZdeZZi is with University of Florence, Italy [email protected] Chin-Chen Chang is with National Chung Cheng University, Taiwan, ROC [email protected] Brian Chen was with Massachusetts Institute of Technology, USA [email protected] Yong Hee C h i is with Yountel Inc., Korea see210-vountel.com
B-4
Authors’ Contact Information
Shu-Chuan Chu is with Flinders University, Australia [email protected] Hsueh-Ming Hung is with National Chiao Tung University, Taiwan, ROC [email protected] Dimitrios Hatzinakos is with University of Toronto, Canada [email protected] Jinwoo Hong is with Broadcasting Content Protection Research Team, ETRI, Korea jwhonq(8etri. re.kr Chuo-Hsing Hsu is with Chien Kuo Institute of Technology, Taiwan, ROC hsu@ckit .edu.tw Hsiung-Cheh Huang is with National Chiao Tung University, Taiwan, ROC [email protected] Jiwu Huung is with Zhongshan University, China
isshiw63zsu.edu .cn
Authors’ Contact Information B-5
Kuo-Feng Hwang is with National Taichung Institute of Technology, Taiwan, ROC kfhwanq@ieee .orq
Akira Inoue is with M. Ken Co., LTD., Japan [email protected] Lakhmi C. Jain is with University of South Australia, Australia [email protected] Hyoung Joong Kim is with Kangwon National University, Korea khi@,kanqwon.ac.kr Deepa Kundur is with Texas A&M University, USA deepaoee.tamu.edu Sai Ho Kwok is with The Hong Kong University of Science and Technology, Hong Kong, China [email protected] Heung-Kyu Lee is with Korea Advanced Institute of Science and Technology, Korea hkleemcasaturn. kaist.ac.kr Wei-Po Lee is with National University of Kaohsiung, Taiwan, ROC [email protected]
B-6
Authors’ Contact Information
Iuon-Chang Lin is with National Chung Cheng University, Taiwan, ROC [email protected] Zheng Liu is with M. Ken Co., LTD., Japan [email protected] Zhe-Ming Lu is with Harbin Institute of Technology, China [email protected] Masaharu Mizumoto is with Osaka Electro-Communication University, Japan [email protected] Nikolaos Nikolaidis is with University of Thessaloniki, Greece nikolaidOzeus .csd .auth.qr Xiamu Niu is with Harbin Institute of Technology, China xiamu. [email protected] u.cn Jeng-Shyang Pan is with National Kaohsiung University of Applied Sciences, Taiwan, ROC [email protected] Ioannis Pitas is with University of Thessaloniki, Greece pitas@,zeus.csd.auth.crr
Authors’ Contact Information
B-7
Alessandro Piva is with University of Florence, Italy
[email protected] John E Roddick is with Flinders University, Australia roddick(@infoenq.flinders.edu.au Peng Shi is with University of Glamorgan, U.K. He was with Defence Science & Technology Organisation, Australia Penq [email protected] .defence.9ov.au Yan Shi is with Kyushu Tokai University, Japan [email protected] Yun Q.Shi is with New Jersey Institute of Technology, USA [email protected] .edu Chin-Shiuh Shieh is with National Kaohsiung University of Applied Sciences, Taiwan, ROC [email protected] Jongwon Seok is with Broadcasting Content Protection Research Team, ETRI, Korea jwseo ka-etri.re.kr Karen Su was with University of Toronto, Canada [email protected]
B-8 Authors’ Contact Information
Feng-Hsing Wang is with University of South Australia, Australia [email protected]
INDEX A
A/D conversion 11, 230,277,531,567
access level verification 565
added white noise 230
accumulator 586
aspect-ratio conversion
activation function 399-400,
230,242
416
averaging 219, 278,
Adaptive Transform Acoustic
566,576
Coding (ATRAC) 12,38,
bending 230,242,246
642,652
blurring 418-419,422,
annotation 532, 561, 563
424,434,447-448,
ant colony system 97, 108,
453
110,116
collusion 268, 275, 279,
ant system 97, 105, 107-108,
28 1,290-29 1, 293,
110, 115-117
300,303
initialization 105
color space conversion
pheromone 105
230,618
updating phase 106
compression 11,23-24,
walking phase 105
142, 154-155, 157,
ARIS audio watermarlung
174, 177,230,232-
technology 650
235,239-240,283,
artificial intelligence (AI) 67
285-286,292-293,
attack 3, 11, 321, 351,390 I- 1
1-2
Index
300, 304, 312,320,
352, 360,43 1,434,
336,339-340, 360,
522,569,571,573
382,388,418-419,
flipping 339
422,43 1,434,447,
format conversion 277,
449,453, 516, 528, 53 1, 554, 567, 570-
516,567,580,587 frame dropping 142,
571, 573, 595,652-
219,257-258,268,
653,678
284,300,573,576
cropping 18,23-24, 155, 157, 175, 177, 206,230,293, 300,
frame erasure 532, 570 frame-rate conversion 230
304, 320, 325, 339,
geometrical distortion
385,532-533,569-
219,241,243-246
571,573 D/A conversion 11, 277,531,567 destruction 23
jitter 204, 533 line-scan conversion 230 lossy compression 2 19,
downmixing 11
221,230,283,292,
dynamic range
418-419,422,431,
compression 11 filtering 14, 23-25, 153, 190,281,336,339,
434,447,449,453, 595
Indez 1-3
motion-compensating
291,293-294,298-
142, 158,291,524-
300,304,339,418-
527,576
419,422,453, 521,
noise addition 339, 516, 528, 531, 570-571, 573
529, 571, 573, 577, 580,600-601 sharpening 339,418-
noise reduction 230
419,422,447-448,
pitch shifting 11
451,453,571
random stretching 11 rate conversion 230, 567
shearing 230,242,325, 339 shifting 23,25
re-ordering 268
slow motion 230
rotation 23,25,230,
swapping 219,275,
242-243,284,304, 324-325, 339, 571, 573,578 sampling rate conversion 230 scaling 18, 74, 152, 155, 157, 189,203-
284-287, 566, 576 synchronization 23, 201,204,206,208, 275 transcoding 142, 144, 269,567 translation 230,242-
204,207,210,221,
243,434, 530-53 1,
230,233,242-244,
583
253,276,279-280,
1-4
Index
authentication 18, 20, 169, 185,
beat-scaling transform 2 10
212,268,422,474-476,
benchmark 3-4,25-26, 3 1, 239,
532, 563-564,615,630,
241,315-322, 324, 326,
639,642-644,658,662,
330, 334-335, 338-341,
665-666,67 1, 676-677
343-345
for certification 662
algorithm complexity
for copy control 666 management of digital
32 1 message decoding
contents 658
performance 33 1
autocepstrum 202
payload 332
AVI file 641
visual quality 322,334 watermark detection performance 325
B
bit correct ratio (BCR) 140, B-frame 291,296,578-579
381,417-418,421,447,
back propagation algorithm
452
40 1 back propagation network (BPN) 398,415 bandwidth 22, 165, 187,247,
Bit Domain Labeling 573 bit error rate (BER) 136, 33 1,
339, 341-342 bit-rate 141-142, 158, 223-224,
254,269, 275,292, 519-
286,526-527, 567,573,
520,534,608-610
579-580, 587
Barkcode 208
block truncation coding (BTC)
Index 1-5
174 BMP file 391,618,641 broadcast monitoring 10, 306-
307,561,564,568 Business-to-Business (B2B)
616 Business-to-Consumer (B2C)
616
534-535, 548, 564, 594-595, 597-601,604-610,628,668 bursty 535 cable 269 error-prone 564 fading 535 mobile 6 noisy 548, 594 secure 460,474 transmission 6
C
wide-band 275
C language 97, 119,123, 129
wireless 358, 564
capacity 12-14,20, 142, 187,
channel coding (see also
196,210,229,270,273,
“error correction codes
288, 369, 565, 588, 601,
(ECC)”) 136, 272, 534
606-610,640,656,667
BCH codes 136,534-
central authority (CA) 460
535,546,548,552
cepstrum 202 CERTIMARK 26,3 16,343,
345 channel 6, 11, 157-158,266,
Hamming distance 534 Checkmark 26,340,344,653 Chinook Communications 269 Chirp-Z transform 169
269,272,275, 307, 358,
chromosome coding 104
434,460,474, 5 19-520,
chromosome crossover 101
1-6
Index
chromosome mutation 102
304, 312, 320, 324, 333,
cluster 41,43,48-49, 51,61,
336,339-340,360, 382,
151, 167, 170, 173, 179-
388, 395-398,403,418-419,
180, 532-533
422,43 1,434,447,449,
code division multiple access
453, 516, 524, 528, 531,
(CDMA) 157,247-249,
554, 562, 566-567, 570-571,
254,275,281-284
573-574, 576, 588, 595,
Direct Sequence
641-642,652-653,658,678
CDMA 247-249
compression rate 165, 167,
Coefficient Domain Labeling
232,320,449,567
573-574
conditional access 614
co-evolutionary system 90,
ConfirMedia 654
104
Content ID 629,658-661
Color Index Table (CIT) 473
Distributed Content
communication theory 265,
Descriptor (DCD)
272
659-661
compact disc (CD-ROM) 614
ID center management
compress ratio (CR) 554
number (Unique
compression 1 1, 23-24, 142,
code) 659
147, 154-157, 165-168, 170, 174, 177, 193,219-224,
660
230,232-235,239-240, 267, 283. 285-286., 292-293. 300. ~
-
I
Content ID Forum (cIDf) 658-
~
~
- 7
content monitoring 642, 644-
645,653-655
Index 1-7
content-scrambling system (CSS) 667-668
325, 351-352,429-430,453, 474-476, 515-516, 563,614,
Contribution watermark 228
624, 634,639-640,642-645,
convolution 244, 254, 257-258,
649-650,653,655-657,
279,283 copy control 12,268,306-307,
666-667,677-678 literal type watermark
325, 532, 564,614,631,
646,648-649
642,644,653,658,666-
mark type watermark
671,677 Copy Control Information (CCI) 12,668 Copy Protection Technical
646-649 copyright verification 187 correlation 138, 140, 149, 177,
185, 188, 190-192, 196,
Working Group (CPTWG)
201-204,208-210,242,
650,653,667-668
254-258, 276-278, 283, 324,
Data Hiding Sub-Group
330, 358, 371-372, 522-523,
(DHSG) 653,667,
529-531, 548, 551, 568-569,
669
575-577,583,631,664
Copyright administration 644-
645, 655-657 Copyright Management Information (CMI) 13 copyright protection 6,22,3 1,
169, 185,231,315, 320,
auto-correlation 185,
191,202,208,210, 242,244,246,254, 256-258 cross-correlation 138,
177, 191,258,268,
1-8
Index
276-278,358, 371-
for information
372 cost function (see also “fitness function”) 69, 351, 358,
360-361,363,368,374 cryptanalysis 187 cryptography 18,22-23, 143,
192,229,266, 302, 358, 422,459-46 1,468,473-474, 476,613,619,634 cryptosystem 442,450,460
retrieving 672, 675 desynchronization 206,208,
271,275 Digimarc 270, 613-618,623,
667,674 Digital Audio Visual Council (DAVIC) 650 digital contents management
658 Digital Property Rights Infrastructure (DPRI) 6 15 digital rights management
D
(DRM) 308,320,613-620,
data embedding 5, 526-527,
548,593,632 data hiding 5,287, 309, 5 155 17, 532-533, 545, 548-55 1,
622-624,63 1-634 digital signature 174-175,268,
422 digital versatile disk (DVD)
607,610,639-644,653,
23 1, 5 15-516, 527-528, 532,
671-672,675,677
553, 561, 565, 578, 614,
for information linking
644,672,674
653,658,666-668 discrete cosine transform (DCT)
21,25, 147, 150-151, 153-
Index
156, 158, 169, 179, 189,
1-9
distortion measure 594,606
222-224,230,232-236,238,
square error 594,595,
252,284-286,290-294,
606-607
296-298, 302-303, 352-359,
weighted square error
367, 369,373,402-404,
594,606
422,43 1, 524, 526-528,
distortion-to-noise ratio (DNR)
60 1, 605-608
545-546, 548, 566, 573-576, 583-587,590,594
dither modulation 601-603
discrete Fourier transform
double-sideband suppressed
(DFT) 21, 147, 149-150,
carrier (DSB-SC) 519
154, 169, 189,242,299-
drift compensation 29 1,527
301, 566, 570-571,591,
dynamic link libraries (DLL)
339-340, 343
594 magnitude 150, 153,
570-571 phase 150 discrete wavelet transform
(DWT) 21, 147, 151, 154,
E echo-hiding 186,208 Electronic Media Management
169, 189,284, 395-398,
System (EMMS) 653
409-414,416,422-423,
encryption 18,229,267,269,
552-553,581-583 distortion compensation 600-
60 1,605
308, 515, 528,614,622, 647,668 end of block (EOB) 575
1-10 Zndex
entropy coding 526,584 equal error rate (EER) 328,
evolutionary algorithm (EA)
67-71, 77, 80, 85-89
34 1-342
initialization 68
error concealment 563-564
re-creation 68-69
error correction codes (ECC)
reproduction 68
(see also “channel coding”)
304, 515, 533-534, 539, 610
selection 68-69 evolutionary programming (EP)
68-69, 77, 79-80,90
BCH codes 136,534535,546,548,552 convolutional codes
evolution strategy (ES) 68, 70,
77-79 multi-member ES
534 Hamming codes 247
model 78 exclusive-or (XOR) 18, 141,
error detection 20,563-564
176-177,257, 357,
Euclidean distance 166, 171
41 8,447,492, 506,
European Broadcasting Union
508
(EBU) 219-220,225,227229,231,233,239-241, 247,259 evolution 67-68, 70-71, 73, 77,
79, 85,98-99, 114,275,436
execution time 67,321,333,
335,344 expression tree 100 extended markup language
(XML) 343,344
I d e s 1-11
fitness function (see also “cost
F
function”) 69, 73, 101, 387
false alarm (see also “false positive”) 3 19, 325-326,
fitness-proportional selection 73-74
328,331,339,341-342, 344-345,528,571
Fourier-Mellin transformation 242
probability 326,33 1, 339,344-345
frequency domain 149,158, 201-204,220,284,295,
false negative (see also “false
302,391,396,402,409,
rejection”) 191, 326, 345
431,433-435, 561, 568,
false positive (see also “false
570,609,646
alarm”) 191, 229, 325, 345,
5 16,669
frequency-scale modification 206-207
false rejection (see also “false negative”) 326, 328, 331,
fuzzy c-means (FCM) 43,4852,55, 58
342 probability 326,330
fuzzy rule 41-44,47-48,51, 54-56,59,61
fast Fourier transform (FFT) 154,278,544,569 fingerprinting 227,247,268
G
finite impulse response (FIR)
Galaxy proposal 527,667
577
Galois field (GF) 469
1-12 Zndez
Gaussian 45, 78-79, 154-156,
initialization 68, 70, 72,
192,277, 330, 343, 529,
84,100, 105, 111
531,595,599,601,606-
multiple-objective 104
607
mutation 69, 73,7740,
Gaussian Q-function 599
85,99-100, 102,
genetic algorithm (GA) 68-69,
359,385,388
71, 73, 75, 77-78, 80-81, 84,
mutation rate 85, 102
90,97- 100, 102-104, 115,
selection 68-70, 72-76,
140, 179, 351-353, 377-378,
80, 85-86, 88, 90,
383, 391-392
98-99, 101, 103-104,
chromosome 68,72,76, 99-102, 104,386 crossover 69, 73, 75-76, 79, 83-85,99, 101-102,359, 385,388 crossover rate 101 evaluation 101 gene 71-72,76-78, 81, 101-102 individual 68,71-75, 77-79,81,84,88, 384-388,390
359 selection factor (SF) 101,104 genetic programming (GP) 68, 77,80-84,90, 100 closure 82-83 dynamic-length tree structure 81 sufficiency 82-83 group of frame (GOF) 285, 29 1,298-299 group of picture (GOP) 243, 285,572,576,578-579
Index
1-13
H
I
H.261 151,524
identification 227,247,268-
H.263 151,524,567
269, 432, 474-476, 565,
H . 2 6 ~576
614, 616,631, 642,648,
Hamming distance 475,534,
660
579 Hamming weight 47 1 hash function 18, 174, 302,
402,632
ImageBridge 6 16 indexing 185,541-542, 561,
63 1,643-644 information hiding 5, 272, 536
Huffman coding 224,526
information theory 273
human auditory system (HAS)
integrity verification 563
186,193 human sensory system 165
intellectual copyright protection 474-476
human visual system (HVS)
intellectual property right 3,
137, 143,284,288,292-
395,430,614, 527,632-
295,298-300,397,43 1-432,
633
443,445,452,459, 566, 571,577,588 hypervideo 270,308 hypothesis testing 190, 196,
3 18,325-326
interleaving 5 15, 5 17, 532-533,
535,537-542,544-552,554 Internet 3-4, 6, 18, 22,26-27,
32, 168, 351, 358, 382, 429, 613-614,616,619,622,
1-14 Indez
624,629-630,632,653,
Designs for the
657-658,672,674
Administration of Works using New
International Federation of the Phonographic Industry
technology (DAWN
(IFPI) 11
2001) 656-657
International
STEP2000 11,186
STEP2001 11, 186
Telecommunications Union (ITU) 232,241, 323-324, 576
Joint Photographic Experts Group (PEG) 23-24, 151,
ITU-R 500 323-324
153-155, 157, 174, 177,
ITU-R 601 232,241
230,233, 336, 339, 360,
InterTrust 6 13, 6 15-616, 6 19, 623,634
InterTrust RightsISystem 613, 616,619
intra frame (I-frame) 222, 574,
367-369,377, 382,388, 391, 398,403,418,419, 422,434,449, 510, 571, 573,641,678
spectral selection mode 368-369
578,584
PEG2000 340,395,397,641, 678 J
Just Another Watermarking
Japan Society for Rights of
Algorithm (JAWS) 10,
Authors, Composers and
141,221,259,271,
Publishers (JASRAC) 11,
277-279,28 1, 283,
186,650-651,656-657
307,568,570
Zndez 1-15
public 18-19,22,415,
just noticeable difference (JND)
442,450
148, 156,295-298
secret 19,22-23, 137, 141, 169, 172, 175,
K
177, 179, 189-190,
key 6-8, 18-19,22-23, 137,
229,235,245,265,
141, 169, 172, 174-175,
288,316,320, 356,
177, 179, 187, 189-190,
358,383,391-392,
229,233,235,245,265,
404,411,414,430,
286,288,290, 300, 302,
434,442,444,45 1,
316-317, 319-321, 326, 329,
460,481, 584, 586,
334-335, 338-341, 343-345,
63 1, 647,649
353,356,358-359,369,
session 460
383-384, 388, 391-392,404,
single 3 19, 326
41 1,414-415,423,430, 433-434,442,444,450, 460,481, 571, 584, 586, 619,623,631-632,647, 649,660-661
multiple 326,334-335, 344-345
private 18, 21, 280-281, 414,423,571
L least significant bit (LSB) 14, 16,20, 140,282,286-287, 431,433,452,573-574
Library of Support Vector Machines (LIBSVM) 445446
1-16 Indez
linear feedback shift register 24, 291 local selection 73, 75 lossy compression 11, 165,
media identification 6 14, 63 1 media tracing 614, 63 1, 67 1 MediaBridge 270, 674
219,221,230,235,283,
meta-heuristic 97-98, 115
292,418-41 9,422,43 1,
Microsoft Windows Media
434,447,449,453,595
Rights Manager (WMRM) 613, 615, 617, 622, 623, 634-635
M
Millennium system 5 16, 527-
macro-block (MB) 524,574, 584 maximum length sequence (Msequence) 191- 192,2 10, 282-283 maximum ratio combining (MRC) 280-28 1
532 most significant bit (MSB) 18, 443,445,452 Motion JPEG (MJPEG) 230, 531,570 Motion Picture Experts Group (MPEG) 12,141-142, 151,
MD5 hash function 18
158,219-224,230,232,
mean opinion score (MOS)
234-235, 239-240, 284-287,
324
290-291,296,300, 304,
mean-square-error (MSE) 55,
308, 395, 398, 524, 526,
138, 168,381,446-447
531, 553-554, 561-562,
Index 1-17
567-568, 570,573-574, 576,
N
578, 580-581, 583, 587-588,
natural selection 67, 98, 101
64 1,668-669,678
neural network 4 1,47,72,
MPEG-1 12,151,222,230,
395-396, 398,402,404-409,
524,578,641
412,414-415,419,422-423,
Layer3 (MP3) 12,206, 642,652,678
435,451 neuro-fuzzy learning 41-45,
MPEG-2 12, 142, 151,222,
47, 51, 54, 56, 59,61
230-232, 234-235, 239-240, 284-287,290-291, 304, 524,
Gaussian type 45 Neymann-Pearson criterion
531, 553-554, 570, 573, 578,580,587,641
569,583 noise visibility function (NVF)
AAC 12,642,652 MPEG-4 220,395,398,524,
279-280,324 normalized cross-correlation
561-562,580-581,583,
(NC) 177,358,360-363,
587-588,641
365,369-374,510
multimedia player (MP) 628630,633 multiresolution approximation (MRA) 396-397 multiresolution representation (MRR) 396
O
OCTALIS project 227 Optimark 26,340-341,343, 345
1-18
Indez
optimization 68, 7 1,97- 100,
Modified Patchwork
104-105, 110-115, 275,290,
(MPA) Scheme 199,
292,302,353,436-437
200,212 pattern recognition 396, 398,
435,569,640,647-648 P
payload 14, 221, 229, 233-234,
P-frame 291,296,578-579
265-269,277,332-333,
parallel evolutionary algorithm
341-342, 516, 528-530, 553,
86, 88, 90 coarse-grain model 86-
89 fine-grain model 86,
88-89 parallel genetic algorithm 75,
104 particle swarm optimization
97, 110-115 initialization 11 1
568-569,574,578,669 peak signal-to-noise ratio
(PSNR) 137-138, 140, 168, 172-173, 179-180,324, 339-341,343,345, 358, 360-361, 363, 365-366, 369, 381, 384, 387-388,417-418, 446-448,494,509,554 perceptual coding 265, 272,
308
memory updating 112
Picture Types (PTY) 578-579
position updating 112
population-based optimization
velocity updating 111 patchwork 185-186, 188, 196-
200,212
68,86,111 post-process 417,446-447 premature convergence 85
Index
pre-process 43,48,5 1, 56,6 1,
1-19
pseudo-random sequence 188-
186, 190,258, 385,417,
189, 191-192, 196,209-210,
446-447,646
577,647,649,663-664
Production watermark 228
psycho-acoustic model 188,
progressive transmission 35 1,
192-196
360,367-369,373-374
frequency masking 195,
pseudo noise (PN) sequence
302,603
203-204, 249, 254-255, 258,
temporal masking 193,
275-276,292,5 17-521,63 1
195,267,296,299,
pseudo-random 135, 137-138, 185, 188-192, 195-197,
304 pyramid structure 396
209-210,221,233,353, 410,414,517,577, 582, 584,602-603,647,649, 663-664 pseudo random number
Q
quadtree 410-41 1,414 quantization 2 1,24, 165-170,
generator (PRNG) 233,
174,222,233,235,286,
353,378,406,410-41 1,
524, 526, 531, 570, 593,
414
596-598,600,602-603,605
seed 275,378,411, 414
quantization index modulation (QIM) 21,593,595-597, 600-601, 606-607 Quick Time 641
1-20 Index
resynchronization 5 17 rights management system
R
(RMS) 320, 627-631,653, random access 23 1 rank-based selection 73-75
67 1 roulette wheel selection 73,
Real Audio 12, 642, 652
101
Real Player 641
RSA system 18
real-time 195,219-220,222,
run-length coding 524, 526,
231-232,238,267,270,
573
273,275,277, 285,300, 302,306-308, 530-53 1, 566-567 receiver operating
S Safranek-Johnson model 156
characteristic (ROC) 136,
salient features 204-206
328-330, 333, 335-336, 338,
salient point 208-209
34 1
SDMI DRM 613,615,624,
Recording Industry Association of America (RIAA) 186 redundant-chip coding 209210 replica modulation 185, 188, 201-204,206,208,212
633 licensed compliant module (LCM) 624-626,630 portable device (PD) 271,624-627,630, 650
Index 1-21
portable media (PM)
signal estimation theory 272
619,622,624
signal processing 169, 286,
secure authenticated channel (SAC) 628 SDMI-based RMS 627-631 multimedia player (MP)
628-630,633 rights management database (RMDB)
628-630 rights management system server (RMS server) 628-630 secret sharing 459-462,468-
469,471,476
292,299, 304, 320, 396, 430,532,595,640 signal-to-noise ratio (SNR) 12,
22,281, 324, 599, 601, 604-605,608,610 simplified fuzzy reasoning method 45 sliding window 207 software development kit (SDK) 617 source code 339,344,597 spatial domain 18, 20-2 1, 24,
135-136, 138, 140-143,
Secure Digital Music Initiative
149-151, 169,242,245,
(SDMI) 68, 187,613,615,
293, 352, 354, 377, 395,
622-628,630,633-634,650
402,404,430,432-433,
hackSDMI 187 self-marking 185, 188,204,
21 1-212
435,481-482, 516, 528-529, 566,568,571,646 Spatially Localized Image
SHA-1 175
Dependent (SLIDE)
sigmoid function 400,416
watermarking 279
1-22 Index
spread spectrum (SS) 14,21,
subband 151, 195,234-235,
24, 137, 140, 154-157, 185-
238, 396-398, 409-410,414,
186, 188, 192,203,211-
552-553,582,609
212,247,272,274-277, 282,285,290-292,294, 296,299, 5 15-517, 52 1, 524, 551,601,605-607, 63 1
successive packing (SP) 533,
540-548,550 sum of squared error (SSE)
402,4 16 support vector machine (SVM)
amplitude-modulation
429,432-433,435-436,
spread spectrum
438-439,441,443-446,
(AM-SS) 601-605,
451-453
607 direct-sequence spread spectrum 5 18 spread-transform dither modulation (STDM) 60 1-
605 steganography 5,265,272,
643,672 Stirmark 25-26,339-340,344,
653
symbol error rate (SER) 547-
550 symmetrical phase only filtering (SPOMF) 569 synchronization 23,25, 185-
186, 201-202, 204, 206-2 10, 212,219,230-231,239, 241,256,271,276,283, 290,293,300,515, 517, 522-523, 530-53 1, 533, 55 1, 553, 571-572, 576, 583-584, 610
Index
frame 515,517,523, 551,553 frequency 5 15,517, 523,551,553
tournament selection 73,75, 104
Tracing Authors’ right by Labeling Image Service
phase 551
and Monitoring Access
symbol 551
Network (TALISMAN)
synchronization code 206,208
1-23
219-220, 231,233,239, 259 transform domain 14-15,21,
T
147-148, 150, 152-154, 156,
tabu search 173
158-159, 169, 189, 198,
tamper-assessment 268
274,284-285,292-293,
temporal domain 149, 151,
304, 351-353, 367, 528,
566 temporal wavelet transform (TWT) 302-303
time domain 157, 196, 198, 201,203-204,206,208, 282,609,646 time-base modulation 188 time-scale modification 185186, 188,204-206,2 12
565 traveling salesperson problem (TSP) 100,105 trusted third party (TTP) 414415,422
two alternative, forced choice (2AFC) test 323 two-set 185, 188, 196,211212
1-24 Index
U
compression rate 165,
167
unequal error protection (UEP)
gain-shape 482,501
18
index based VQ watermarking 170-
V
174 index constrained VQ
variable length code (VLC)
174, 182
222,285-287,290,
joint codebook design
573-574 Vatican Library project 15
and partitioning
vector quantization (VQ) 2 1,
180 LBG algorithm 167-
23,25, 165-167, 169-170,
168,173
174, 176-177,352,
security key based VQ
391,482, 500-501,
watermarking 174-
503,511 codebook expansion
173-174 codebook partitioning
179-180 codebook size 167,
173-174, 176 codeword search 166 compression quality
167
177 video data hiding 5 15-517,
532-533,550-55 1 video object (VO) 561,579-
583,587 video object plane (VOP) 584-
587 video-on-demand (VOD) 268,
429,515,566
Index 1-25
video watermarlung (VWM) group 527 visual authentication 474,476
attack 3, 11, 321, 351,
390 benchmark 3-4,25-26,
visual cryptography (VC) 459-
31,239,241,315-
461,468,473-474,476,
322, 324, 326, 330,
48 1-482,500,511
334-335, 338-341,
visual secret sharing (VSS)
46 1-463, 467-469, 47 1-476, 48 1 VLSI 10
343-345 blind 187, 196-197,
202,242,268,293, 298,300, 302, 316, 577,650 capacity (see also
W
“payload”) 12-14,
WaterCast 270
20, 142, 187, 196,
watermark 3-32, 135-143,
210,229,270,273,
147-159, 165-180, 185-212,
288, 369, 565, 588,
219-259, 265-309, 315-345,
601,606-610, 640,
35 1-374, 377-392, 395-424,
656,667
429-453, 459-476, 48 1-511,
channel 6,272
515-555, 561-589, 593-610,
code 564,566-567, 573,
613-634, 639-679 additive 135- 136
575 collaborative 9
1-26 Zndex
complexity 10, 13,20,
233,266,272-273,
141, 196,202,208,
275,304, 351-352,
238,258,267,270-
395-396,423,430,
271,274,281,283-
475, 515, 532, 563,
285,287-288,292,
565, 580,613-615,
298-300,302,304-
6 18,626-628,631-
307, 315,321-322,
632,634,639-641,
333-334,342,344,
643-645,650, 654,
520, 530-531, 567,
658,661,664,667,
569-570,573,579
67 1, 677-678
decoding 7,607 detection 6, 187,206,
embedding 6,20,22, 137, 159, 196,221-
239,253,265,268,
222,231-233,235,
272,277,284, 3 17-
246,253,265,270-
318,320-321, 325,
272,291,306,321-
327-328, 331, 341,
322,334,343,352-
344, 516, 522, 529-
353,356,402,405-
531, 567-568,573,
406,433,441-442,
650,653,662-663,
444,452,48 1, 493,
665,671
531, 564, 566, 570,
digital 3-6, 9-11, 13, 15,
575, 582-584, 640,
20,22,25,31, 135,
642,644-647,649,
165, 169, 180,231,
656,665,678
Index 1-27
encoding 8,278
299,301-302,304,
extraction 6,21,24-25,
351,358, 516, 530,
138, 172-173,353, 396,402,405,423-
533,564,618,640 inaudibility 10-11,
424,441,444-445,
188-189, 192-196,
452-453,483,494,
198,208,211,615
580,654 fidelity 322-323,534, 601,606-607
insertion 6, 8,274,277, 532,570,613 invisibility 11, 14-15,
fragile 5,9, 16, 18,20,
220,228,249-250,
136, 174-175,212,
254,265,273,294,
266,272,3 16,564
324, 396,404,423,
hard decision detector
430,432, 553, 573,
318, 327,331-333, 335,337-338,341 hardware implementation 10,
584,586,615 management 229,640, 642 message decoding 317-
265,271,304,322
3 18,320-321, 326-
imperceptible 3, 14-16,
327,331,341,344
20, 137-138, 140,
multiple 178-179,293,
186,202,270,272,
320,335-336,640
275-276,280,283,
multiple bit system 3 17
285-291,294,297,
1-28 Index
multiplicative 147-148, 159, 570-571 non-blind 187-188,270, 316 non-oblivious 2 1,22 object-based 562, 579581,588
319,321,328,332, 336-337, 343, 381, 391,446,494, 509, 531,554,595,646 practical product 10, 640-641, 677 practicality 143,270
oblivious 22, 34
private 21
payload (see also
public 22,233-234,
“capacity”) 14, 221,
482-483,485,488,
229,233-234,265-
493
269,277, 332-333,
raw video sequence
341-342, 516, 528-
141- 142,220-22 1,
530, 553, 568-569,
266, 562, 568, 570,
574,578,669
572,581
perceptible 9, 15-16, 169
reliability 14, 255, 270, 284,293,298,304,
perceptual quality 269,
315, 318, 536, 551,
3 15, 3 19, 322,324-
568-569, 595,640,
325, 334-335, 343,
643, 647, 668-669
650 performance 14, 198, 276,281,315,318-
requirement 11, 13,20, 220,249-250,338,
Index 1-29
433,475, 584, 645,
230,234-235,239,
649,668-669
247,267,270,272,
robust 3-5,9, 11, 16,
274-276,278-28 1,
18,20-21,25, 135,
283-284,287-290,
141, 147, 153, 169-
292,298-301,304,
170, 174, 177, 185,
319, 336, 351, 353,
192,205-206,208,
356,358,361,373-
212,219,239,246,
374, 377, 379-382,
266,273,275,277,
385, 387,390-391,
285, 300, 304,316,
395-396,398,402,
319,345,351-353,
404,406,408,413,
373,395-396,398,
418,422,424,429-
409,419,422-423,
435,447,452-453,
430-43 1,434,453,
516-517, 528,531-
530-531, 551, 563,
533, 545, 547-550,
575, 588, 596, 618,
552-554, 563,566,
634,664
571-573, 578, 580,
robustness 1 I, 13-14,
584, 587-588, 593,
20-21,24,26, 135,
595, 597-598,600,
140, 142, 148, 169,
602,606,640,642,
178-179, 186, 189,
646, 649-653,657,
192-193, 195-196,
666,677
202,210,220,229-
1-30
Index
secret 229, 233-234,
463,469,47 1,473-
480,483,485,488, 493 security 6, 13, 140, 169,
474,620,668-669 universal 10 visible 15,430,644
174,229,255,268-
with original 13
269,302,308-309,
without original 13
315-316,333,402,
zero bit system 3 17,
408,429-435,441,
333,341-342
450,453,462,474-
Watermark Minimum Segment
475,481, 532, 552,
(WMS) 229,243-247,249-
573,613-615,618-
250,252-258
622,624,650 semi-fragile 5,9, 16,
20,174,266,3 16 shaping 188, 192-193,
195,209 soft decision detector 3 18,327-328,332333,335,337-338, 340-343
Watson metric 340 WAVEfile 642 Weber’s law 148 weighted PSNR 324, 340 Windows Media 12,613-615,
617,622,641-642, 652
WMRM (see also “Microsoft Windows Media Rights
tamper resistance 12
Manager”) 6 13, 6 15,
transparency 12-14,
6 17, 622, 623,
236,239,459,461-
634-635
Index
X XOR (see also “exclusive-or”) 18, 141, 176-177,257,357, 418,447,492,506,508
XML (also see “extended markup language”) 343, 344
Z
zero-crossing 207,208 zigzag scan 151,354,524,526
1-31