Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
5646
Alain Trémeau Raimondo Schettini Shoji Tominaga (Eds.)
Computational Color Imaging Second International Workshop, CCIW 2009 Saint-Etienne, France, March 26-27, 2009 Revised Selected Papers
Including 114 colored figures
13
Volume Editors Alain Trémeau Université Jean Monnet Laboratoire Hubert Curien UMR CNRS 5516 18 rue Benoit Lauras, 42000 Saint-Etienne, France E-mail:
[email protected] Raimondo Schettini Università degli Studi di Milano-Bicocca Piazza dell’Ateneo Nuovo 1, 20126 Milano, Italy E-mail:
[email protected] Shoji Tominaga Chiba University 1-33, Yayoi-cho, Inage-ku, Chiba-shi, Chiba, 263-8522, Japan E-mail:
[email protected]
Library of Congress Control Number: 2009930845 CR Subject Classification (1998): I.4, I.3, I.5, I.2.10, F.2.2 LNCS Sublibrary: SL 6 – Image Processing, Computer Vision, Pattern Recognition, and Graphics ISSN ISBN-10 ISBN-13
0302-9743 3-642-03264-8 Springer Berlin Heidelberg New York 978-3-642-03264-6 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12701643 06/3180 543210
Preface
We would like to welcome you to the proceedings of CCIW 2009, the Computational Color Imaging Workshop, held in Saint-Etienne, France, March 26–27, 2009. This, the second CCIW, was organized by the University Jean Monnet and the Laboratoire Hubert Curien UMR 5516 (Saint-Etienne, France) with the endorsement of the International Association for Pattern Recognition (IAPR), the French Association for Pattern Recognition and Interpretation (AFRIF) affiliated with IAPR, and the "Groupe Français de l'Imagerie Numérique Couleur" (GFINC). The first CCIW was organized in 2007 in Modena, Italy, with the endorsement of IAPR. This workshop was held along with the International Conference on Image Analysis and Processing (ICIAP), the main conference on image processing and pattern recognition organized every two years by the Group of Italian Researchers on Pattern Recognition (GIRPR) affiliated with the International Association for Pattern Recognition (IAPR). Our first goal, since we began the planning of the workshop, was to bring together engineers and scientists from various imaging companies and from technical communities all over the world to discuss diverse aspects of their latest work, ranging from theoretical developments to practical applications in the field of color imaging, color image processing and analysis. The workshop was therefore intended for researchers and practitioners in the digital imaging, multimedia, visual communications, computer vision, and consumer electronic industry, who are interested in the fundamentals of color image processing and its emerging applications. We received many excellent submissions. Each paper was reviewed by three reviewers, and then the general chairs carefully selected only 23 papers in order to achieve a high scientific level at the workshop. The final decisions were based on the criticisms and recommendations of the reviewers and the relevance of papers to the goal of the workshop. Only 58% of the papers submitted were accepted for inclusion in the program. In order to have an overview of current research directions in computational color imaging six different sessions were organized: • • • • • •
Computational color vision models Color constancy Color image/video indexing and retrieval Color image filtering and enhancement Color reproduction (printing, scanning, and displays) Multi-spectral, high-resolution and high dynamic range imaging
In addition to the contributed papers, four distinguished researchers were invited to this second CCIW to deliver keynote speeches on current research directions in hot topics on computational color imaging:
VI
Preface
• Hidehiko Komatsu, on Information Processing in Higher Brain Areas • Qasim Zaidi, on General and Specific Color Strategies for Object Identification • Theo Gevers, on Color Descriptors for Object Recognition • Gunther Heidemann, on Visual Attention Models and Color Image Retrieval There are many organizations and people to thank for their various contributions to the planning of this meeting. We are pleased to acknowledge the generous support of Chiba University, the Dipartimento di Informatica Sistemistica e Comunicazione di Università degli Studi di Milano-Bicocca, the Région Rhones-Alpes and Saint-Etienne Métropole. Special thanks also go to all our colleagues on the Conference Committee for their dedication and work, without which this workshop would not have been possible. Finally, we envision the continuation of this unique event, and we are already making plans for organizing the next CCIW workshop in Milan in 2011.
April 2009
Alain Trémeau Raimondo Schettini Shoji Tominaga
Organization
Organizing Committee General Chairs
Alain Trémeau (Université Jean Monnet, Saint-Etienne, France) Raimondo Schettini (Università di Milano-Bicocca, Milan, Italy) Shoji Tominaga (Chiba University, Chiba, Japan)
Program Committee Jesus Angulo James K. Archibald Sebastiano Battiato Marco Bressan Majeb Chambah Cheng-Chin Chiang Bibhas Chandra Dhara Francesca Gasparini Takahiko Horiuchi Hubert Konik Patrick Lambert J. Lee Jianliang Li Peihua Li Chiunhsiun Lin Ludovic Macaire Lindsay MacDonald Massimo Mancuso Jussi Parkkinen Steve Sangwine Gerald Schaefer Ishwar K. Sethi Xiangyang Xue Rong Zhao Silvia Zuffi
Ecole des Mines de Paris, France Brigham Young University, USA Università di Catania, Italy Xerox, France Université de Reims, France National Dong Hwa University, Taiwan Jadavpur University, India Università di Milano-Bicocca, Italy Chiba University, Japan Université de Saint-Etienne, France Université de Savoie, France Brigham Young University, USA Nanjing University, P.R. China Heilongjiang University, China National Taipei University, Taiwan Université de Lille, France London College of Communication, UK STMicroelectronics, France University of Joensuu, Finland University of Essexs, UK Aston University, UK Oakland University, Rochester, USA Fudan University, China Stony Brook University, USA CNR, Italy
Local Committee Eric Dinet Damien Muselet
Laboratoire Hubert Curien, Saint-Etienne, France Laboratoire Hubert Curien, Saint-Etienne, France
VIII
Organization
Frédérique Robert Dro Désiré Sibidé Xiaohu Song
IM2NP UMR CNRS 6242, Toulon, France Laboratoire Hubert Curien, Saint-Etienne, France Laboratoire Hubert Curien, Saint-Etienne, France
Sponsoring Institutions Laboratoire Hubert Curien, Saint-Etienne, France Université Jean Monnet, Saint-Etienne, France Région Rhône-Alpes, France Saint-Etienne Métropole, France Università di Milano-Bicocca, Milan, Italy Chiba University, Japan
Table of Contents
Invited Talk Color Information Processing in Higher Brain Areas . . . . . . . . . . . . . . . . . . Hidehiko Komatsu and Naokazu Goda
1
Computational Color Vision Models Spatio-temporal Tone Mapping Operator Based on a Retina Model . . . . . Alexandre Benoit, David Alleysson, Jeanny Herault, and Patrick Le Callet Colour Representation in Lateral Geniculate Nucleus and Natural Colour Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naokazu Goda, Kowa Koida, and Hidehiko Komatsu
12
23
Color Constancy Color Constancy Algorithm Selection Using CART . . . . . . . . . . . . . . . . . . . Simone Bianco, Gianluigi Ciocca, and Claudio Cusano
31
Illuminant Change Estimation via Minimization of Color Histogram Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michela Lecca and Stefano Messelodi
41
Illumination Chromaticity Estimation Based on Dichromatic Reflection Model and Imperfect Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Johji Tajima
51
Color Image/Video Indexing and Retrieval An Improved Image Re-indexing Technique by Self Organizing Motor Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sebastiano Battiato, Francesco Rundo, and Filippo Stanco
62
KANSEI Based Clothing Fabric Image Retrieval . . . . . . . . . . . . . . . . . . . . . Yen-Wei Chen, Shota Sobue, and Xinyin Huang
71
A New Spatial Hue Angle Metric for Perceptual Image Difference . . . . . . Marius Pedersen and Jon Yngve Hardeberg
81
Structure Tensor of Colour Quaternion Image Representations for Invariant Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jes´ us Angulo
91
X
Table of Contents
Color Image Filtering and Enhancement Non-linear Filter Response Distributions of Natural Colour Images . . . . . Alexander Balinsky and Nassir Mohammad
101
Perceptual Color Correction: A Variational Perspective . . . . . . . . . . . . . . . Edoardo Provenzi
109
A Computationally Efficient Technique for Image Colorization . . . . . . . . . Adrian Pipirigeanu, Vladimir Bochko, and Jussi Parkkinen
120
Texture Sensitive Denoising for Single Sensor Color Imaging Devices . . . Angelo Bosco, Sebastiano Battiato, Arcangelo Bruna, and Rosetta Rizzo
130
Color Reproduction (Printing, Scanning, Displays) Color Reproduction Using Riemann Normal Coordinates . . . . . . . . . . . . . . Satoshi Ohshima, Rika Mochizuki, Jinhui Chao, and Reiner Lenz Classification of Paper Images to Predict Substrate Parameters Prior to Print . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matthias Scheller Lichtenauer, Safer Mourad, Peter Zolliker, and Klaus Simon
140
150
A Colorimetric Study of Spatial Uniformity in Projection Displays . . . . . Jean-Baptiste Thomas and Arne Magnus Bakke
160
Color Stereo Matching Cost Applied to CFA Images . . . . . . . . . . . . . . . . . . Hachem Halawana, Ludovic Macaire, and Fran¸cois Cabestaing
170
JBIG for Printer Pipelines: A Compression Test . . . . . . . . . . . . . . . . . . . . . Daniele Rav`ı, Tony Meccio, Giuseppe Messina, and Mirko Guarnera
180
Synthesis of Facial Images with Foundation Make-Up . . . . . . . . . . . . . . . . . Motonori Doi, Rie Ohtsuki, Rie Hikima, Osamu Tanno, and Shoji Tominaga
188
Multi-spectral, High-Resolution and High Dynamic Range Imaging Polynomial Regression Spectra Reconstruction of Arctic Charr’s RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. Birgitta Martinkauppi, Yevgeniya Shatilova, Jukka Kek¨ al¨ ainen, and Jussi Parkkinen An Adaptive Tone Mapping Algorithm for High Dynamic Range Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jian Zhang and Sei-ichro Kamata
198
207
Table of Contents
XI
Material Classification for Printed Circuit Boards by Spectral Imaging System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abdelhameed Ibrahim, Shoji Tominaga, and Takahiko Horiuchi
216
Supervised Local Subspace Learning for Region Segmentation and Categorization in High-Resolution Satellite Images . . . . . . . . . . . . . . . . . . . Yen-wei Chen and Xian-hua Han
226
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
235
Color Information Processing in Higher Brain Areas∗ Hidehiko Komatsu1,2 and Naokazu Goda1,2 1
2
National Institute for Physiological Sciences, Okazaki, Japan The Graduate University for Advanced Studies (SOKENDAI), Okazaki, Japan
[email protected]
Abstract. Significant color signal transformation occurs in the primary visual cortex and neurons tuned to various direction in the color space are generated. The resulting multi-axes color representation appears to be the basic principle of color representation throughout the visual cortex. Color signal is conveyed through the ventral stream of cortical visual pathway and finally reaches to the inferior temporal (IT) cortex. Lesion studies have shown that IT cortex plays critical role in color vision. Color discrimination is accomplished by using the activities of a large number of color selective IT neurons with various properties. Both discrimination and categorization are important aspects of our color vision, and we can switch between these two modes depending on the task demand. IT cortex receives top-down signal coding the task and this signal adaptively modulates the color selective responses in IT cortex such that neural signals useful for the ongoing task is efficiently selected.
1 Neural Pathway for Color Vision Visual systems in the human and monkey brains have functional differentiation and consists of multiple parallel pathways [1]. Color information is carried by specific types of retinal cells and transmitted along specific fibers in the optic nerve [2][3]. Visual signals leaving the eye are relayed at the lateral geniculate nucleus (LGN) and then reach to the primary visual cortex (or V1) situated at the most posterior part of the cerebral cortex. LGN has multi-layered organization, and color information is coded only at specific layers. Cerebral cortex contains a number of visual areas, and these areas consist of two major streams of visual signals. Of these, color information is carried by the ventral visual stream that is thought to be involved in visual recognition of objects. Ventral visual stream starts from sub-regions in V1, include sub-regions of area V2, area V4 and finally reaches to the inferior temporal cortex (or IT cortex) (Fig. 1). In humans, damage in the ventral cortical area around fusiform gyrus results in the loss of color sensation (achromatopsia), so this area should play a critical role in color vision. In the macaque monkey, a very good animal model of human color vision, IT cortex plays a very important role in color vision because selective damage in the IT ∗
This work is supported by a Japanese Grant-in-Aid for Scientific Research (B) and a grant for Scientific Research on Priority Areas from MEXT of Japan.
A. Trémeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 1–11, 2009. © Springer-Verlag Berlin Heidelberg 2009
2
H. Komatsu and N. Goda
Fig. 1. Visual pathway in the monkey brain related to color vision. V1: primary visual cortex, V2: area V2, V4: area V4, IT: inferior temporal cortex. TE and TEO correspond to the anterior and posterior part of IT.
cortex results in severe deficit in color discrimination [4-6]. In this paper, we will describe our researches on how the color information is represented and transformed at different stages of the visual pathway, and how the neuron activities in the IT cortex are related to the behavior using color signals.
2 Representation of Color Information Color vision originates from the comparison of signals of photoreceptors with different spectral sensitivity functions. Humans and macaque monkeys have three types of cone photoreceptors that are maximally sensitive to long (L), middle (M) and short (S) wavelengths, and they are called L cone, M cone and S cone, respectively. Comparison of signals from different types of cones occurs in the retinal circuit, and the resulting difference signals are sent to LGN through the optic nerve. At this stage, it has been known that color information is carried by two types of color selective neurons, namely, red-green (R/G) color opponent neuron, and blue-yellow (B/Y) color opponent neuron. The former type of cells code the difference between L-cone and M-cone signals (either L-M or M-L). On the other hand, the latter type of cells code the difference between S-cone signal and the sum of the signals from the remaining two types of cones (S-(L+M)). Different laboratories have used different color stimuli to characterize the color selectivity of neurons. In our laboratory, we have used color stimuli based on the CIExy chromaticity diagram [7][8]. To study the color selectivity of a neuron, we used a set of color stimuli that were systematically distributed on the chromaticity diagram and mapped the responses on the diagram (Fig.2). Each color stimulus had the same luminance, shape and area. Color stimuli were presented on the computer display one by one at the same position in the receptive field of the recorded neuron. We employed CIE-xy chromaticity diagram because of the general familiarity of this
Color Information Processing in Higher Brain Areas
3
Fig. 2. Color stimuli used in our laboratory that were systematically distributed in the chromaticity diagram. A: Colors plotted on the CIE-xy chromaticity diagram. B: Colors replotted on the MacLeod-Boynton (MB) chromaticity diagram. In both A and B, + indicates the chromaticity coordinates of color stimuli distributed regularly on the CIE-xy chromaticity diagram, those of color stimuli distributed regularly on the MB chromaticity diagram, and Δ the equal-energy white point. Cardinal axes in the MB diagram [L-M and S-(L+M)] are also shown. From [8] with modification.
○
diagram, and because we can easily describe the color selectivity in terms of the combination of cone signals because XYZ space on which CIE-xy diagram is based and LMS space representing cone signals are connected by linear transformation. By using this method, comparison of the color selectivity of neurons in LGN and V1 was conducted [8]. Figure 3 left shows typical examples of color selectivity of LGN neurons. Response magnitude to each color stimulus is expressed as the diameter of the circle and plotted at the position in the chromaticity diagram that corresponds to the chromaticity coordinates of the color. Open circle represents excitatory response and filled circle represents inhibitory response. Cell 1 showed strong response to red colors and exhibited no response to cyan to green colors. This is an example of R/G color opponent neuron. Cell 2 showed strong excitatory responses to blue colors and strong inhibitory responses to colors around yellow. This is an example of B/Y color opponent neurons. In these diagrams, contour lines of the equal-magnitude responses are also plotted. Like these example neurons, LGN neurons generally had straight response contours. To examine how the cone signals are combined to generate these neural responses, response contours were re-plotted on the MacLeod-Boynton (MB) chromaticity diagram [9] by using the cone spectral sensitivities as a transformation matrix [10], and the direction in which the response magnitude most steeply changes (tuning direction) was determined. Lower half of Figure 3 left shows the tuning directions of 38 LGN neurons recorded. They were concentrated only at very limited directions in color space: two large peaks were observed at 0 deg and 180 deg. These correspond to the difference signal between L and M cones: 0 deg corresponds to L-M signal, and 180 deg to M-L signal. Altogether, these peaks correspond to the R/G color opponent
4
H. Komatsu and N. Goda
Fig. 3. Left: Color selectivity of two example LGN neurons (top) and distribution of the tuning directions of LGN neurons (bottom). Right: Color selectivity of two example V1 neurons (top) and distribution of the tuning directions of V1 neurons (bottom). See text for the detail. From [8] with modification.
neurons. There was a smaller peak at around 90 deg. This corresponds to the difference signal between S cone signal and the sum of L and M cone signals, namely S-(L+M) signal, and represent the B/Y color opponent neurons. We can think that, at this stage, color is decomposed along two axes that consists of MB chromaticity diagram, namely L-M and S-(L+M) axes. Different colors correspond to different weighs on each of these axes. Color is represented in V1 in a way quite different from that in LGN. Right half of Figure 3 shows the color selectivity of two example neurons in V1. Compared with LGN neurons, there were two major differences. First, tuning directions of V1 neurons widely vary and are not restricted in certain directions as observed in LGN. The response contours of cell 3, for example, have orientation that is never observed in LGN. Figure 3 right bottom shows tuning directions of 73 V1 neurons. They are widely distributed across many directions in the color space. This indicates that different hues are represented by different neurons in V1. These results indicate that there is dramatic change in the way hue is represented between LGN and V1. Secondly, many color selective V1 neurons had clearly curved response contours (e.g. cell 4) that were unusual in LGN where neurons in principle had straight response contours. Filled parts of bar graphs in Figure 3 indicate neurons in which a model yielding curved response contours make the data fitting significantly better than any model having only straight response contours. The curved response contours enable to restrict the responses in any region in the chromaticity diagram, and can generate sharp tuning to any hue. Therefore, the neural process involved in forming the curved response contour must be closely related to the process of generating selectivity to various hues in the cerebral cortex. We can also think about the difference in color representation between LGN and V1 in the following way. The
Color Information Processing in Higher Brain Areas
5
neural pathway connecting the eye and V1 through LGN consists of only a limited number of nerve fibers compared with the number of photoreceptors. In order to transmit visual information efficiently under such constraint, color information is encoded in a compressed form to reduce redundancy. In contrast, in the cerebral cortex, the constraint of capacity is less severe because of the large volume of the cortex, and different computational principle may dominate. It appears that visual cortex took a strategy to explicitly represent different hues with different neurons. Presumably, there is some biological advantage of representing hue independently by different set of neurons.
3 Transformation of Color Signals in Early Visual Areas Difference in color selectivities of neurons between LGN and V1 indicates that significant transformation of color signal takes place in V1. Although actual neural processing for the transformation is not known, two-stages model shown in Figure 4A can explain the properties of V1 neurons quite well. At the first stage of the model, signals from R/G color opponent neuron and B/Y color opponent neuron are linearly summed with various combination of weights and then the resulting signal is rectified. As the result of this first stage, neurons with straight response contours tuned to various directions in the color space are formed. At the second stage, the signals from multiple cells at the first stage are linearly summed and then rectified. As the results of linear summation and rectification repeating twice, color selective neuron with curved response contours sharply tuned to specific hue are formed. Figure 4B
Fig. 4. A: Two-stage model to explain the transformation of color selectivity in V1. B: Responses of an example V1 neuron tuned to yellow (left), the outputs of the best model with a single stage (middle), and the outputs of the best model with double stages (right).
6
H. Komatsu and N. Goda
illustrates the responses of an example V1 neuron tuned to yellow (real response), the outputs of the best model with a single stage, and the outputs of the best model with double stages.
4 Relationship between Neural Responses and Behavior in the Inferior Temporal Cortex Color selectivity tuned to specific hue is commonly observed in each stage of the cortical visual pathway [11-15], and we believe this is the fundamental principle for color representation in the cerebral cortex. Inferior temporal cortex (IT cortex), the highest stage of the ventral stream, is thought to play a critical role in color vision because its lesion cause severe deficits in color discrimination. Color selective neurons tuned to specific hues are also found in IT cortex [7] (Fig. 5). To study how color selective IT neurons contribute to color discrimination, the quantitative relationship between color judgment in monkeys and the responses of color selective IT neurons were examined [16]. Neuronal activities and behavior recorded simultaneously while the monkeys performed a color judgment task were compared. Color discrimination threshold was computed based on the responses of each color selective IT neuron. To do this, we first computed the probability distribution function of the response magnitudes for each color, and receiver-operating-characteristic (ROC) analysis was conducted to compute the probability that an ideal observer can discriminate two different colors separated at a certain distance in the color space. Then, color discrimination threshold of individual neuron was computed based on the relationship between the color difference and the probability of the correct response. When the color discrimination thresholds based on the neuron activities and those
Fig. 5. Color selectivity of six examples neurons recorded from the IT cortex tuned to different hues. Color selectivity of each neuron is represented as the response contours within the region in the CIE-xy chromatcity diagram examined (broken line). In each panel, the thicker (thinner) response contour indicates the locations where the responses are 75% (50%) of the maximum response. From [7] with modification.
Color Information Processing in Higher Brain Areas
7
based on the monkey's behavior were compared, neural color discrimination threshold was on average 1.5 times larger than that of the monkey, indicating that neural sensitivity tended to be somewhat lower than the behavioral sensitivity. On the other hand, it was found that there was a strong positive correlation between neuron activity and monkey's behavior with regard to the way discrimination threshold depends on color. CIE-xy chromaticity diagram is not a uniform color space; in other words, even if two pairs of color had the same distance on the CIE-xy chromaticity diagram, their differences may not be the same perceptually [17]. As the result, discrimination threshold obtained by the monkey's behavior changes depending on the position in the chromaticity diagram. It was examined how the neural color discrimination threshold depends on the position in the chromaticity diagram and how it is related to the behavioral threshold. To do this, the chromaticity diagram was divided into 10 areas and the average discrimination thresholds in each area was computed for both neuron and behavior. Relationship between the mean neural and behavioral thresholds across different areas of the chromaticity diagram for one monkey is shown in Figure 6. There was a strong positive correlation between these two values and this clearly indicates that activities of color selective IT neurons are closely correlated with the color discrimination behavior of the monkeys. To study how individual IT neurons contribute to the monkey's color judgment, the correlation between the trial-to-trial fluctuation of the neural responses and the color judgment of the monkey was examined by the ideal-observer analysis [16]. It was found that contribution of individual neuron is relatively small and that there is no systematic relationship between the sensitivity to color difference or the sharpness of the color selectivity of the neurons and the degree to which each neuron contribute to the color judgment. These results suggest that signals from a large population of color selective neurons with various properties, rather than a small subset of neurons with especially high sensitivity, contribute to color perception and color discrimination behavior.
Fig. 6. Relationship between the mean neural and behavioral thresholds across different areas of the chromaticity diagram for one monkey. Each symbol corresponds to different region in the CIE-xy chromaticity diagram. See text for more detail. From [16] with modification.
8
H. Komatsu and N. Goda
5 Cognitive Control of Color-Related Behavior In our daily life, we often treat similar though perceptually distinguishable colors in the same way as a group and give the same name (e.g. green, red). Such categorical perception is an important cognitive function and it enables to efficiently manage the infinitely variable objects and events in the environment by our cognitive system with finite resource for information processing. Such categorical perception is an important aspect of our color perception. On the other hand, we make fine discrimination of similar colors in certain situations such as when we scrutinize the food or clothes at the shop. We can switch between these two functions, namely categorization and fine discrimination, depending on the situation or the task demands. It is believed that the prefrontal cortex plays central role in such cognitive control of visual behaviors [18]. However, it is not well understood how the neural responses in the visual cortical areas are affected by the top-down signal from the prefrontal cortex coding the current situation and how they are modulated depending on the task demands. In order to understand how the cognitive control of color-related behavior using color stimuli involves neurons in the visual cortex, the activities of the color selective IT neurons of monkeys trained to perform a color categorization task and a fine discrimination task were analyzed [19]. While a single IT neuron was recorded, above two tasks were switched and it was examined how the neural responses change. In each task, 11 color stimuli that were separated in a constant interval between red and green on the CIE-xy chromaticity diagram were used. In the categorization task, one of the 11 colors was presented as the sample, and the monkey judged whether the sample was red or green. In the discrimination task, one of the 11 colors was presented as the sample and that was followed by two similar colors (choice colors), and the monkey had to select one of these choice colors that was the same as the sample. In this latter task, the monkey had to discriminate colors even though both were within the same color category (red or green). A majority of neurons (64%) exhibited significant change in their responses to the sample color depending on the ongoing task. Responses of four example neurons to the 11 sample colors are shown in Figure 7. A large majority of these neurons (77%) showed stronger responses in the categorization task, and the responses during passive viewing were similar to those during the categorization task. These results suggest that the default of the cognitive control is categorical judgment, and that IT neurons are in general more active in this condition. It was also shown that, as the results of response change, neural signals differentiating red category vs green category is enhanced during the categorization task and suppressed during the discrimination task. Thus, the top-down signal adaptively modulates the gain of the neural responses in IT cortex such that neural signals useful for the ongoing task is efficiently selected. Interestingly, the color selectivity of the neuron itself does not change despite the change in the response amplitude, so these neurons can transmit precise color information regardless of the task. This is in marked contrast from the prefrontal cortex where neurons exhibits selectivity corresponding to the categorical judgment [20]. These results suggest that cognitive control of visual behavior by the top-down signal from the prefrontal cortex coding the current situation or ongoing task demand involves the adaptive modulation of neuron activities in the visual cortex that carry precise sensory information.
Color Information Processing in Higher Brain Areas
9
Fig. 7. Responses of four example IT neurons that were modulated by the ongoing task. Responses to the 11 sample colors recorded during the categorization task (solid line) and those during the discrimination task (broken line) are indicated. Cells in a and b exhibited stronger responses during the categorization task, and those in c and d during the discrimination task. See text for more detail. From [19] with modification.
Figure 8 schematically illustrates how the color signals originating from the three types of cones are transformed in the visual system, and how the cognitive control of behavior using color stimuli are executed by the top-down signal from the prefrontal
Fig. 8. Schematic illustration of the color signal transformation and cognitive control of color signals in the brain. IT: inferior temporal cortex, PFC: prefrontal cortex, PP: posterior parietal cortex. See text for more detail.
10
H. Komatsu and N. Goda
cortex to the IT cortex. This schema is a hypothetical one based on incomplete experimental data and it lacks several important aspects of color processing such as the spatial processing of color signals. Nevertheless, color information can be basically described in the three dimensional space because it stems from the activities of three types of cones, and because of this, understanding of the neural processing of color vision is probably the most advanced field across the entire visual neuroscience. In this schema, we aimed to provide what we think is the essence of the neural processing of color itself or hue, and hope this will be a useful schema to guide further development of the research in color vision.
References 1. Maunsell, J.H., Newsome, W.T.: Visual processing in monkey extrastriate cortex. Annu. Rev. Neurosci. 10, 363–401 (1987) 2. Komatsu, H.: Mechanisms of central color vision. Curr. Opin. Neurobiol. 8, 503–508 (1998) 3. Solomon, S.G., Lennie, P.: The machinery of colour vision. Nat. Rev. Neurosci. 8, 276– 286 (2007) 4. Heywood, C.A., Gaffan, D., Cowey, A.: Cerebral achromatopsia in monkeys. Eur. J. Neurosci. 7, 1064–1073 (1995) 5. Buckley, M.J., Gaffan, D., Murray, E.A.: Functional double dissociation between two inferior temporal cortical areas: perirhinal cortex versus middle temporal gyrus. J. Neurophysiol. 77, 587–598 (1997) 6. Huxlin, K.R., Saunders, R.C., Marchionini, D., Pham, H.A., Merigan, W.H.: Perceptual deficits after lesions of inferotemporal cortex in macaques. Cereb. Cortex 10, 671–683 (2000) 7. Komatsu, H., Ideura, Y., Kaji, S., Yamane, S.: Color selectivity of neurons in the inferior temporal cortex of the awake macaque monkey. J. Neurosci. 12, 408–424 (1992) 8. Hanazawa, A., Komatsu, H., Murakami, I.: Neural selectivity for hue and saturation of colour in the primary visual cortex of the monkey. Eur. J. Neurosci. 12, 1753–1763 (2000) 9. MacLeod, D.I., Boynton, R.M.: Chromaticity diagram showing cone excitation by stimuli of equal luminance. J. Opt. Soc. Am. 69, 1183–1186 (1979) 10. Smith, V.C., Pokorny, J.: Spectral sensitivity of the foveal cone photopigments between 400 and 500 nm. Vision Res. 15, 161–171 (1975) 11. Lennie, P., Krauskopf, J., Sclar, G.: Chromatic mechanisms in striate cortex of macaque. J. Neurosci. 10, 649–669 (1990) 12. Wachtler, T., Sejnowski, T.J., Albright, T.D.: Representation of color stimuli in awake macaque primary visual cortex. Neuron 37, 681–691 (2003) 13. Kiper, D.C., Fenstemaker, S.B., Gegenfurtner, K.R.: Chromatic properties of neurons in macaque area V2. Vis. Neurosci. 14, 1061–1072 (1997) 14. Zeki, S.: The representation of colours in the cerebral cortex. Nature 284, 412–418 (1980) 15. Conway, B.R., Moeller, S., Tsao, D.Y.: Specialized color modules in macaque extrastriate cortex. Neuron 56, 560–573 (2007) 16. Matsumora, T., Koida, K., Komatsu, H.: Relationship between color discrimination and neural responses in the inferior temporal cortex of the monkey. J. Neurophysiol. 100, 3361–3374 (2008)
Color Information Processing in Higher Brain Areas
11
17. MacAdam, D.L.: Visual sensitivities to color differences in daylight. J. Opt. Soc. Am. 32, 247–274 (1942) 18. Miller, E.K., Cohen, J.D.: An integrative theory of prefrontal cortex function. Annu. Rev. Neurosci. 24, 167–202 (2001) 19. Koida, K., Komatsu, H.: Effects of task demands on the responses of color-selective neurons in the inferior temporal cortex. Nat. Neurosci. 10, 108–116 (2007) 20. Freedman, D.J., Riesenhuber, M., Poggio, T., Miller, E.K.: A comparison of primate prefrontal and inferior temporal cortices during visual categorization. J. Neurosci. 23, 5235–5246 (2003)
Spatio-temporal Tone Mapping Operator Based on a Retina Model Alexandre Benoit1, David Alleysson2, Jeanny Herault3, and Patrick Le Callet4 1
LISTIC 74940 Annecy le Vieux, 2 LPNC 38040 Grenoble, 3 Gipsa Lab 38402 Grenoble, 4 IRCCyN 44321 Nantes, France
[email protected]
Abstract. From moonlight to bright sun shine, real world visual scenes contain a very wide range of luminance; they are said to be High Dynamic Range (HDR). Our visual system is well adapted to explore and analyze such a variable visual content. It is now possible to acquire such HDR contents with digital cameras; however it is not possible to render them all on standard displays, which have only Low Dynamic Range (LDR) capabilities. This rendering usually generates bad exposure or loss of information. It is necessary to develop locally adaptive Tone Mapping Operators (TMO) to compress a HDR content to a LDR one and keep as much information as possible. The human retina is known to perform such a task to overcome the limited range of values which can be coded by neurons. The purpose of this paper is to present a TMO inspired from the retina properties. The presented biological model allows reliable dynamic range compression with natural color constancy properties. Moreover, its non-separable spatio-temporal filter enhances HDR video content processing with an added temporal constancy. Keywords: High Dynamic Range compression, tone mapping, retina model, color constancy, video sequence tone mapping.
1 Introduction In this paper, we propose a method to compress High Dynamic Range images in order to make visual data perceptible on display media with lower dynamic range capabilities. HDR images are our real life visual world; our eyes perceive everyday a wide variety of visual scenes with really different luminance values. Our visual system is able to cope with such a wide variety of input signals and extract salient information. However, we can notice that, as discussed in [1], neurons cannot code a wide range of input values. Thus, at the retina level a compression process occurs in order to preserve all relevant information in the coding process, including color. A similar idea has been recently introduced with digital High Dynamic Range imaging. It is now possible to create HDR images even with standard digital cameras [2] or light simulation [3], nevertheless the development of HDR displays, which would be able to render all the acquired data is still under development [4]. Current displays are Low Dynamic Range and direct HDR image visualization would hide a A. Trémeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 12–22, 2009. © Springer-Verlag Berlin Heidelberg 2009
Spatio-temporal Tone Mapping Operator Based on a Retina Model
13
large part of the information. An alternative is the development of Tone Mapping Operators [5, 6] which allow HDR images to be rendered on standard LDR displays and preserve most of the information to be seen. Nevertheless, as discussed in [7] a HDR image cannot be perceived similarly to its LDR version. Human factors related to this problem are not already known but some artifacts created by the tone mapping conversion can already be measured [8]. They are related to halo effects and color distortions and lead to naturalness corruption. Nevertheless, studies on the quality perception [5, 6, 9] are the only current assessment solutions. The current challenge is to design a TMO able to limit artifacts and preserve the general ambiance of the original HDR visual scene. Several operators have already been proposed and compared [6, 9] and lead to a wide variety of approaches. From computer vision methods to the one inspired by the visual system, each TMO presents a different approach and requires specific parameters set for each processed image. In addition, a new challenge is related to video sequence processing and has not yet been explored. The aim is to generate successive tone mapped images which allow a natural perception sensation without temporal instabilities created by frame-by-frame image optimization. In this paper, we propose a new TMO based on a retina model. The approach models retina local adaptation properties, as described in Meylan et al. work [10] and is completed by specific spatio-temporal filters of the retina. The added contribution involves retina processes which enable spatio-temporal noise removal, temporal stability introduction and spectral whitening. The paper is presented as follows: section 2 describes the proposed retina model and its properties for HDR compression. Section 3 illustrates the effect of such a filter in the case of static and dynamic content processing.
2 Retina Biological Model The human retina architecture is based on cellular layers which process the visual information from the photoreceptors visual data entry point to the ganglion cells output. The input signals are locally processed step by step so that details, motion and color information are enhanced and conditioned for high level analysis at the visual cortex level. Here, we focus on the known parts of the human retina, which are suitable for a Tone Mapping Operator design. The aim is to show that tone mapping is already performed in low level vision so that higher level visual tasks are facilitated. Furthermore, modeling these early vision properties leads to a fast and efficient TMO. We choose to work with the model described in [1], which takes into account the different low level processes occurring in the retina. We particularly focus on the foveal vision area and its output called Parvocellular channel, which brings the details and color information to the central nervous system. The aspects taken into account are detailed in the following: • Local luminance and local contrast adaptation at the photoreceptor and ganglion cells levels. This biological property is directly linked to our dynamic range compression topic.
14
A. Benoit et al.
• The spatio-temporal filtering occurring at the Outer Plexiform Layer level (OPL). This filtering allows input image frequency spectrum to be whiten and enhances image details. Moreover, its temporal properties allow noise reduction and temporal stability. • Color sampling: the input image is spatially sampled by sensors with different color sensitivities. Our TMO allows gray scale and color images to be processed in the same way and introduces color constancy properties. The architecture of the proposed model follows the biological model architecture and is depicted on figure 1.
Fig. 1. Simplified model of the proposed retina model. Color processing is an additional processing, which requires preliminary input image multiplexing and output picture demultiplexing.
The input image can be either a raw gray image or a multiplexed color one. Then, color processing appears as an additional processing, which consists of preliminarily multiplexed color information followed by a filtering stage, then a demultiplexing stage. The key-point of the model is actually the two local adaptation steps, corresponding respectively to photoreceptors and ganglion cells, and the OPL filter placed between them. As discussed in [1], the photoreceptor's local adaptation is modulated by the OPL filter. 2.1 Local Adaptation Photoreceptors are able to adjust their sensitivity with respect to the luminance of their spatio-temporal neighborhood. This is modeled by the Michaelis-Menten [1] relation, which is normalized for a luminance range of [0, Vmax] (Eq. 1). Vmax represents the maximum pixel value in the image-255 in the case of standard 8 bits images. It may vary greatly in case of High Dynamic Range images. (1) (2) In this relation, the response C(p) of photoreceptor p depends on the current excitation R(p) and on a compression parameter R0 (p) which is linearly linked to the local
Spatio-temporal Tone Mapping Operator Based on a Retina Model
15
Fig. 2. Photoreceptors local luminance adaptation. Left: output response with regard to the local R0 (p) value. Right: effect on a HDR image (from: www.openexr.org).
luminance L(p) (cf. Eq.2) of the neighborhood of the photoreceptor p. This local luminance L(p) is computed by applying a spatial low pass filter to the input image. This low pass filtering is actually achieved by the horizontal cells network as presented later on. Moreover, in order to increase flexibility and make the system more accurate, we add in R0 (p) the contribution of a static compression parameter V0 of value range [0;1]. Compression effect is reinforced when V0 tends to 1 or is attenuated when reaching 0. Photoreceptors V0 value is set to 0.7 as a generic experimental value. Figure 2 shows the evolution of sensitivity with respect to R0 (p) and illustrates the effect of such a compression on a back-lit picture. Sensitivity is reinforced for low values of R0 (p) and is kept linear for high values. As a result, this model enhances contrast visibility in dark areas while maintaining it in bright areas. 2.2 OPL: Spatio-temporal Filtering and Contour Enhancement The cellular interactions of the OPL layer can be modeled with a non-separable spatiotemporal filter [1] whose transfer function for 1D signal is defined in eq. (3) where fs and ft denote respectively spatial and temporal frequency. Its transfer function is drawn on figure 3.a. This filter can be considered as a difference between two low-pass spatiotemporal filters which model the photoreceptor network ph and the horizontal cell network h of the retina. As discussed in [10, 1], the output of the horizontal cells network (Fh) is very low spatial frequency limited and can be interpreted as the local luminance L(p) required by the photoreceptors local adaptation step (eq. 2). Moreover, Fh filter’s temporal low pass effect allows local luminance computation to be temporally smoothed. Finally, as a general rule, global FOPL filter has a spatio-temporal high-pass effect in low frequencies which results in a spectral whitening of the input. Its highfrequency low-pass effect enables the removal of the structural noise.
FOPL ( fs , ft ) = Fph ( fs , ft ) ⋅ (1 − Fh ( fs , ft )) with Fi ( fs , ft ) = with subscript i=ph or i=h.
1 1+ βi + 2α i (1 − cos(2π fs ))+ j2πτ i ft
(3)
16
A. Benoit et al.
Fig. 3. OPL filter transfer function and illustration of its effect on an image and its spectrum (input image is the output of the photoreceptors previous stage)
βph is the gain of filter Fph. Setting βph to 0 cancels luminance information and a higher value allows luminance to be partially processed; βph is typically set to 0.7 which allows a good general effect. Parameters αi and τi stand respectively for the space and time constants of the filters. Temporal noise is minimized when τph is increased. Besides, the higher τh the slower temporal adaptation. αph spatial filtering constant should be remained low (close to 0) in order to preserve high spatial frequencies. αh allows local luminance to be extracted; its generic value is set to a space constant of 7 pixels. One of the most relevant effects is spectral whitening, which compensates for the 1/f spectrum of natural images as shown in figure 3.b-c. It happens on a limited portion of the spatial frequency spectrum. 2.3 Retina Ganglion Cells Final Dynamic Compression Step The ganglion cells of the Parvocellular channel receive the information coming from the OPL. They act as local contrast enhancers and are modeled by the MichaelisMenten law similar to the photoreceptors as discussed in [11] but with different parameters. Indeed, the local luminance value is related to a smaller area around each cell since receptor fields are smaller. Also, as a general rule, the compression effect is more powerful at the ganglion cells level. The parameter V0 is then higher than the one of photoreceptors and is typically set to 0.9. Figure 4 shows results of this local adaptation on the OPL's output stage. This last filtering allows the final luminance compression of originally very dark areas.
Spatio-temporal Tone Mapping Operator Based on a Retina Model
17
Fig. 4. Effect of the ganglion cells local adaptation (input image is the output of the OPL previous stage)
2.4 Color Processing To deal with color, it is possible to take advantage of the photoreceptors color sampling properties. As described in [10], since in the fovea photoreceptors (mainly cones) sample the visual scene with three different sensitivities (Long, Medium and Short wavelengths: with L, M and S cones), the spectrum of the color sampled image presents special properties. Luminance spectrum is centered on the low frequencies while color information spectrum is located in higher frequencies. Modeling such properties consists in multiplexing the color image before processing it with the previously presented retina model and demultiplexing it afterwards, as presented in figure 1. As a consequence, designing a TMO with such an approach while preserving high spatial frequencies allows luminance map to be toned and color information to be kept. Moreover, from a computational cost point of view, processing color only requires an additional color multiplexing/demultiplexing stage to the gray level processing core. Towards this end, we propose to use the color demultiplexing algorithm presented in [12] which supports different color sampling methods (Bayer, Diagonal,
Fig. 5. Color multiplexing and demultiplexing on the TMO “boundaries”
18
A. Benoit et al.
Random). Following the TMO scheme depicted on figure 1, figure 5 shows the color management steps of the algorithm. Using Bayer color sampling, color spectrum is translated in high spatial frequencies. Thus, since OPL filter with a low value of αph keeps those frequencies, color information remains unchanged. To sum up, the human retina naturally achieves color tone mapping and its simplified model allows digital HDR images to be processed in the same way. Notice that this current application is suitable to deal with high dynamic range in terms of luminance; issues related to gamut limitation are not considered.
3 Design of a Retina Based TMO As shown in the previous section, the retina model includes many properties required for low level image tone mapping. Indeed, its preliminary goal is to condition visual signals in order to make them compliant with the subsequent image analysis “operators”. In addition to its image processing properties, this model has the ability to present a low computational cost: the gray level processing part requires only 16 operations per pixel. Color management depends on the chosen demultiplexing method -here it requires less than 200 linear operation per pixels. Consequently, we follow the biological model and use all the tools described on figure 1. Actually, the proposed TMO is rather similar to the one proposed by Meylan & al. [10] but it adds the contribution of the OPL filter which achieves a more complete retina model, taking into account temporal aspects. The main advantage is that the OPL intermediate filter allows spectral whitening and temporal information management. For simplification, we first describe its properties for static visual scenes and then deal with video sequences processing. 3.1 Static Content Even if human factors related to HDR imaging are not well known, some critical points have been reported in literature. A brief overview of the main issues and the response introduced by our method are presented in the following. Refer to figure 6 for image samples. • The first point is related to artifacts generation in the tone mapping process: since luminance is compressed, algorithms have a tendency to generate halos and contours around the boundaries of high luminance gradient of visual scenes [5, 13]; e.g, sun and lights boundaries. Our algorithm limits such artifacts since low spatial frequency energy is lowered and logarithmic compression of photoreceptors is very local (typically 7-pixel wide). • Color constancy is the second challenge. It is often impaired when color components are manipulated independently [2]. Due to multiplexing stage, our method naturally remains color information in high spatial frequencies and do not change them. Furthermore, objects color information is preserved, independently of the illuminant color [1]. • Natural rendering is the general issue for image tone mapping. Even though details extraction in bright and dark areas of a visual scene is the final goal, it can change the image initial balance and reduce its dynamic [2, 14, 6]. Global ambiance luminance should be preserved in order to maintain the initial « goal of
Spatio-temporal Tone Mapping Operator Based on a Retina Model
19
the image » and the areas at which the photographer wants the observers to look at. Concerning that specific issue, our algorithm allows luminance ambiance to be less compressed by lowering the value of photoreceptors V0 parameter. It is a matter of trade-off: either luminance ambiance survival or all details extraction. In addition, it is generally difficult to identify a unique TMO parameters setting which would allow many different images to be tone mapped and look “natural”. Each TMO algorithm generally has its own parameters which should be optimized for each image, resulting in a supervised tone mapping process. However, some algorithms already present good parameters stability and generate natural-looking images with a large set of pictures, e.g. the Meylan et al. [10], Mantiuk’s [15] algorithm and our contribution. Figure 6 gives examples of tone mapping results with the proposed algorithm and other results coming from [10], another human visual system based TMO [15], which includes visual cortex models but less retina properties, and Reinhard’s TMO [2], which is more focused on local adaptation retina properties. Note that generic default values were used as unique parameter set for each method. On these examples, color constancy is achieved for both [10] and our method since they manage color in a similar way. Meylan’s method compresses luminance to a larger extent, which allows details to be more visible. In parallel, the proposed method keeps better global luminance ambiance and provide minimal halo effects. As a compromise, dark details may remain hidden because the variance of R0 (p) is less due to an additional temporal
Fig. 6. Tone mapping results comparison between our approach and other existing methods (refer to [16] for higher resolution results). Our algorithm does not generate halos (refer to the two last examples), nevertheless, dark areas’ details can be hidden (refer to the first two examples). The other compared algorithms can better extract details in dark areas but halos or color distortion can appear. Also, locked parameters and accurate tone mapping are not supported by all the algorithms presented in this sample.
20
A. Benoit et al.
Fig. 7. HDR video sequence tone mapping results samples (refer to [16] for higher resolution results). In this test sequence, the sun generates optic halos on the camera, the most important changes are related to sudden sun disappearing and appearing depending on camera position and objects on the visual scene. Our method allows details to be constantly extracted whatever luminance change there are. Moreover, colors and image dynamic remain stable.
filtering. While comparing all methods, we can see how different results can be from one algorithm to the other. Actually, apart from constancy problems, naturalness and artifacts limitation versus details extraction trade off is the most difficult to reach. Also, our method, [10] and [15] appear in these samples as optimal TMOs with a constant parameters set. However, only future visual quality tests will allow rigorous comparisons to be made. 3.2 Video Content As mentioned in [17], TMO encounter critical problems when visual content changes with time. Indeed, from frame-to-frame critical light changes can appear and since current TMO are independently optimized for each frame, the reconstructed video sequence can be degraded by successive uncorrelated tone mapping operations. This problem is even more difficult to overcome when dealing with global image TMO, which generate a tone mapped image depending on the global image luminance. This problem is more limited in the case of local operators such as [10, 13] and our method. However, video content tone mapping is a very recent challenge and it is not already possible to compare results due to the lack of real HDR video contents. Here, we propose in figure 7 some samples of a HDR video sequence shoot to illustrate some properties of our method. Sudden light changes do not impact global luminance of the tone mapped images. Our TMO presents temporal constants, which acts on local luminance processing adaptation. Particularly, the spatio-temporal horizontal cells low pass filter of the OPL filter computes the local luminance with a memory effect. This results in smooth transitions when luminance changes suddenly. Also, the photoreceptors stage allows spatio-temporal noise removal at the beginning of the tone mapping process.
4 Conclusion This paper proposes a Tone Mapping Operator (TMO) based on a human retina model. It imitates some parts of the foveal retina functionalities including its luminance compression properties and adds the contribution of temporal information processing. This “biologic” algorithm is able to process a wide variety of images with natural rendering and has a good potential for video sequences processing. Moreover,
Spatio-temporal Tone Mapping Operator Based on a Retina Model
21
the method presents good color constancy properties and its color management flexibility enables easier practical implementation. Future work could aim at identifying human factors, which would allow to identify the main clues of the most appropriate TMO and adjust its parameters in that sense, as well as visual perception tests in order to further assess the subjective quality of the Tone Mapping operators.
Acknowledgement This work was supported by FuturIm@ge project within the “Media and Network” French Cluster. Also, we thank Grzegorz Krawczyk from MPI Informatik who gave us access to these rare HDR video sequences.
References [1] Hérault, J., Durette, B.: Modeling Visual Perception for Image Processing. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M., et al. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 662–675. Springer, Heidelberg (2007) [2] Reinhard, E., Ward, G., Debevec, P., Pattanaik, S.: High Dynamic Range Imaging: Acquisition, Display, and Image Based Lighting. Morgan Kaufmann, San Francisco (2005) [3] Ward, G.: The RADIANCE Lighting Simulation and Rendering System. In: Computer Graphics Proceedings, Annual Conference Series (1994) [4] Hoefflinger, B.: High-Dynamic-Range (HDR) VisionMicroelectronics, Image Processing. In: Computer Graphics. Springer, Heidelberg (2007) [5] Ahmet, O.A., Erik, R.: Perceptual Evaluation of Tone Reproduction Operators Using the Cornsweet-Craik-O’Brien Illusion. ACM Transactions on Applied Perception 4(4), 1–29 (2008) [6] Jiangtao, K., Hiroshi, Y., Changmeng, L., Garrett, M.J., Mark, F.D.: Evaluating HDR rendering algorithms. ACM Trans. Appl. Percept. 4(2), 9 (2007) [7] Johnson, G.: Cares and concerns of CIE TC8-08: Spatial appearance modeling and HDR rendering. In: SPIE proceedings series, Image quality and system performance, pp. 148– 156 (2005) [8] Aydin, T.O., Mantiuk, R., Myszkowski, K., Seidel, H.-P.: Dynamic Range Independent Image Quality Assessment. ACM Transactions on Graphics (Proc. of SIGGRAPH 2008) 27(3) (to appear) [9] Kuang, J., Yamaguchi, H., Liu, C., Johnson, G.M., Fairchild, M.D.: Evaluating HDR rendering algorithms. ACM Transactions on Applied Perception 4, Article 9 (2007) [10] Meylan, L., Alleysson, D., Susstrunk, S.: A Model of Retinal Local Adaptation for the Tone Mapping of Color Filter Array Images. Journal of the Optical Society of America A (JOSA A) 24(9), 2807–2816 (2007) [11] Smirnakis, S.M., Berry, M.J., Warland, D.K., Bialek, W., Meister, M.: Adaptation of Retinal Processing to Image Contrast and Spatial Scale. Nature 386, 69–73 (1997) [12] Chaix de Lavarène, B., Alleysson, D., Hérault, J.: Practical Implementation of LMMSE Demosaicing Using Luminance and Chrominance Spaces. Computer Vision and Image Understanding 107(1), 3–13 (2007)
22
A. Benoit et al.
[13] Mantiuk, R., Daly, S., Kerofsky, L.: Display Adaptive Tone Mapping. ACM Transactions on Graphics (Proc. of SIGGRAPH 2008) 27(3) (to appear) [14] Yoshida, A., Blanz, V., Myszkowski, K., Seidel, H.: Perceptual Evaluation of Tone Mapping Operators with Real-World Sceness. In: Rogowitz, B.E., Pappas, T.N., Daly, S.J. (eds.) Human Vision and Electronic Imaging X, IS&T/SPIE’s 17th Annual Symposium on Electronic Imaging, pp. 192–203. SPIE, San Jose (2005) [15] Mantiuk, R., Myszkowski, K., Seidel, H.-P.: A Perceptual Framework for Contrast Processing of High Dynamic Range Images (revised and extended version). ACM Transactions on Applied Perception 3(3), 286–308 (2006) [16] Proposed TMO results, http://benoit.alexandre.vision.googlepages.com/HDR [17] Didyk, P., Mantiuk, R., Hein, M., Seidel, H.-P.: Enhancement of Bright Video Features for HDR Displays. In: Computer Graphics Forum (Proc. of EGSR 2008)
Colour Representation in Lateral Geniculate Nucleus and Natural Colour Distributions Naokazu Goda1,2, Kowa Koida1,2, and Hidehiko Komatsu1,2 1
National Institute for Physiological Sciences, Okazaki 444-8585, Japan 2 Sokendai, Okazaki 444-8585, Japan {ngoda, koida, komatsu}@nips.ac.jp
Abstract. We investigated the representation of a wide range of colours in the lateral geniculate nucleus (LGN) of macaque monkeys. We took an approach to reconstruct a colour space from responses of a population of neurons. We found that, in the derived colour space (‘LGN colour space’), red and blue regions were compressed whereas purple region was expanded, compared with those in a linear cone-opponent colour space. We found that the expanding/compressing pattern in the LGN colour space was related to the colour histogram derived from a natural image database. Quantitative analysis showed that the response functions of the population of the neurons were nearly optimal according to the principle of 'minimizing errors in estimation of stimulus colour in the presence of response noise'. Our findings support the idea that the colour representation at the early neural processing stage is adapted for efficient coding of colour information in the natural environment. Keywords: opponency, efficient coding, colour histogram.
1 Introduction It is well established that signals encoded by three classes of cones (L, M and S) are combined at the retina to generate cone-opponent signals, which are then conveyed to the primary visual cortex via the lateral geniculate nucleus (LGN). The retinal and the LGN cone-opponent type of neurons are well modelled as a linear transformation of cone signals, L+M, L-M, and S-(L+M) with appropriate weights. In this framework, colours can be represented by a three-dimensional space consisting of three orthogonal axes, each of which exclusively represents signal of the L+M, L-M or S-(L+M) mechanism [1, 2]. Although this colour space, which we call ‘linear cone-opponent space’, is useful to characterise physiological and psychophysical data in terms of colour mechanisms at the level of the retina and the LGN, it is well recognized that this colour space is not straightforwardly related to the CIELUV, CIELAB, or Munsell spaces, which are perceptually uniform. This suggests that colour signals are nonlinearly transformed into the perceptual colour representation at some neural level. Where does this transformation occur? Although it is likely that this nonlinear transformation involves computations at the cortical levels [3, 4], it is also possible that it begins at the subcortical level. A. Trémeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 23–30, 2009. © Springer-Verlag Berlin Heidelberg 2009
24
N. Goda, K. Koida, and H. Komatsu
Hanazawa et al. [5] investigated response properties of LGN colour-selective neurons in a macaque monkey, which is a good animal model of human colour vision, using a wide range of colour stimuli including highly saturated colours. They found that more than half of the neurons have response nonlinearities. These nonlinearities, most of which were compressive nonlinearity operating at the high-contrast range of colours, as well as variability of the response tuning among neurons, may explain complex, nonlinear characteristics observed at the perceptual level in humans. In the present study, we revisited the colour representation in the LGN of the macaque monkey to explore the response nonlinearity at this neural level and its functional role. Here we took an approach to reconstruct a colour space from responses of a population of neurons [6, 7]. We found that the derived colour space (‘LGN colour space’) was nonlinearly related to the linear cone-opponent colour space. Interestingly, the nonlinearity can partly explain why there are five basic hues in the Munsell colour space. This implies that the nonlinearity of the cone-opponent neurons at the subcortical level is involved in the transformation of cone signals into the perceptual representation. Furthermore, we found that the nonlinear encoding at this neural level was nearly optimal to natural colour distributions. These findings give important clues for understanding relationships among colour perception, neural responses, and natural colour statistics.
2 Analysis of Colour Representation in LGN We analysed the responses of 38 LGN colour-selective neurons to 24 different chromaticities recorded from a macaque monkey performing a fixation task (see [5] for details). Figure 1a shows chromaticity of the stimuli in scaled and translated version of MacLeod-Boynton (MB) chromaticity diagram [1], in which horizontal axis exclusively represents L-M cone-opponent signals and vertical axis represents S-(L+M) cone-opponent signals in a linear fashion. Here we call this space (and its linear transformation) ‘linear cone-opponent space’. The stimulus was a stationary square of uniform colour and covering the entire receptive field on a dark grey background (2.5cd/m2, CIE x=0.310, y=0.317). The luminance of the stimuli was held constant at 20cd/m2 or 7cd/m2. The stimuli were presented at least five times each for 500ms in a pseudo-random order. The visual response was defined as the mean discharge rate during the stimulus presentation minus the baseline activity (200-0ms before stimulus presentation). The neurons were classified into three types based on the peak colour tuning direction in the MB chromaticity diagram; 19 L-M type neurons, 14 M-L type neurons, and 5 S-(L+M) type neurons. There was no (L+M)-S type of neuron. 2.1 Reconstruction of Colour Space from Neural Data We applied classical multi-dimensional scaling (MDS) to the neural data to derive a uniform colour space, in which distances corresponded to pooled response differences of 38 LGN neurons. The pooled response difference between a pair of colours (‘neural distance’) was defined as Euclidian distance between 38-dimensional neural response vectors. The responses of each type of neurons were weighted by an inverse of the number of neurons assuming that there are equal numbers of these different types
Colour Representation in Lateral Geniculate Nucleus and Natural Colour Distributions
25
Fig. 1. (a) Colours used for the analysis plotted in the linear cone-opponent space (scaled and transformed MB chromaticity diagram. (b) Two-dimensional colour space reconstructed from responses of LGN neurons using a multi-dimensional scaling (LGN colour space). The two axes in the LGN colour space were aligned with those in the linear cone-opponent space.
of neurons in the LGN. For comparison with the linear cone-opponent space, the MDS-derived space was aligned to the linear cone-opponent space using a procrustes transformation (translation, reflection, orthogonal rotation, and scaling). Since the relative scale of the two axes of the linear cone-opponent space is unknown, we adjusted this relative scale so as to produce the best correspondence between the transformed MDS-derived space and the linear cone-opponent space. Figure 1b shows the MDS-derived two-dimensional space, which well accounted for the neural distance data. Correlation coefficient between the distance in this space and the neural distance was quite high (0.998). Here we call the derived space ‘LGN colour space’. In this LGN colour space, saturated blue and red regions were compressed compared with the linear cone-opponent space. This clearly shows that cone signals are nonlinearly transformed at this neural level. Interestingly, purple region was relatively expanded in the LGN colour space; purple colour was located around midpoint between red and blue in the linear coneopponent space, whereas it was located near upper right corner in the LGN colour space. The distance between purple and white was comparable with that between blue and white in the LGN colour space. Thus, purple is one of salient colours at the LGN level. This is an interesting characteristic because purple is one of basic hues in the Munsell colour space (red, green, yellow, blue and purple, which are equally spaced in the space). This suggests that the nonlinear transformations into the perceptual colour representation may begin at this neural level. 2.2 Model of LGN Colour Space How is the LGN colour space related to the linear cone-opponent space? We hypothesized that (1) the LGN colour space is comprised of independent cone-opponent L-M axis and S-(L+M) axis, but (2) there are simple compressive nonlinearity (saturation)
26
N. Goda, K. Koida, and H. Komatsu
Fig. 2. (a) Population average of the neural responses along the two axes (open circle) and the modelled response functions (continuous line). For the L-M axis, responses of the M-L types of neurons were averaged with those of the L-M type of neurons after inverting the sign. (b) A two-dimensional space reconstructed by the model.
along each of the two axes. Next, we examined how this simple model can account for the LGN colour space. Figure 2a shows response functions along the two axes and the two-dimensional colour space reconstructed by the model. The response function of each axis was modelled by the sigmoid functions (hyperbolic tangent functions) [5]. The modelled two-dimensional colour space could replicate the pattern of the LGN colour space (fig. 2b), although there were still distortions that could not be explained by the model. The distances in the modelled LGN colour space was more highly correlated with the neural distances than those in the linear space. Thus the pattern in the LGN colour space is at least partly explained by considering simple saturation of the two axes responses, although other complex nonlinearity may be involved as well.
3 Relationships between LGN Colour Representation and Natural Colour Distributions What role does the nonlinearity plays? One hypothesis is that the nonlinear transformation is related to an optimal encoding of colours in the natural environment. Computational studies have suggested that the colour mechanisms at the retina and the LGN levels are adapted to transmit colour information in the natural environment through optic nerve fibre with limited capacity [8, 9]. The compressive nonlinearity of the response functions can also be interpreted computationally in terms of an optimal encoding of natural inputs [10, 11]. To test this hypothesis, we examined how the LGN colour representation is related to natural colour distributions. 3.1 Analysis of Natural Colour Distributions Histogram of natural colours were evaluated by using 327 natural scene images from McGill calibrated colour image database [12], which is a collection of calibrated
Colour Representation in Lateral Geniculate Nucleus and Natural Colour Distributions
27
Fig. 3. (a). Example of natural images. (b) Histogram of natural colours evaluated using the natural image database. Colour density is shown by pseude-colour scale in the linear coneopponent space (left) and in the LGN colour space (right). Marginal distributions are also shown along each axis.
natural images including plants, landscape and so on (fig. 3a). The images were preprocessed taking light adaptation at photoreceptor level into account. We applied von Kries scaling to the LMS cone excitations for each image [13]; this adjusts the gains of LMS cone excitations independently so that mean luminance and chromaticity over entire scene is constant (unit luminance of illuminant C). Then we computed twodimensional colour histograms in the linear cone-opponent space as well as in the LGN colour space using all images. Figure 3b shows derived colour histograms plotted in the linear cone-opponent space and the LGN colour space. Natural colours were highly concentrated around the white point in the linear cone-opponent space (fig. 3b left), whereas they were more flatly distributed in the LGN colour space (fig. 3b right). Importantly, the compressed region in the LGN colour space (saturated blue and red) corresponded to the lowdensity regions in the linear cone-opponent space. This trend supports the hypothesis that the compressive nonlinearity is related to natural colour distributions. 3.2 Are the Response Functions Optimal for Natural Colour Distributions? To examine whether the LGN colour representation is optimised for natural colour distributions more quantitatively, we asked whether the response characteristics of the LGN neurons are optimal in terms of optimisation theories. Assuming that the two axes in the LGN colour space are orthogonal, we analysed the response function along each axis based on two theories. One is the ‘Pleistochrome’ theory constructed by von der Twer and MacLeod [11]. According to this theory, the optimal response
28
N. Goda, K. Koida, and H. Komatsu
function g(x), is derived from the probability density function of the input, p(x), by eq. 1. This function minimizes the error of estimation of input signal (e.g., chromaticity) from the output (response) in the presence of output noise. Another more popular theory is the ‘Infomax’ developed by Laughlin [10] and recently used by Long et al. [14] for investigating relationships between colour perception and natural colour statistics. According to this theory, the optimal response function is derived by the cumulative probability density function (eq. 2). In both cases, the optimal response functions are derived if the input distributions are given.
g ( x) =
∫
g ( x) =
x
p(u )1 / 3 du
−∞
(1)
x
∫ p(u) du −∞
(2)
Figure 4 shows the response functions optimal to the distributions of the L-M and S(L+M) cone-opponent signals predicted from the two theories. The functions predicted from the ‘Pleistochrome’ theory fitted well to the neural response data for both axes. These are also close to the response function of the LGN colour space model (fig.2a). On the other hand, the functions predicted from the ‘Infomax’ were steeper than the neural response functions. These suggest that the colour mechanisms at the LGN level are nearly optimal to natural colour distributions, according to the principle of minimizing the error for estimating colour from the responses. Contrary to our results, von der Twer and MacLeod [11] did not find evidence that the chromatic response functions of the LGN neurons were optimal to natural colour statistics. The discrepancy between their results and ours may be due to differences in the image database used for evaluating natural colour distributions. They used images collected by Ruderman et al. [15], which included only limited classes of natural scenes. The colours in these images are more heavily concentrated around the white
Fig. 4. (a) Comparisons between the response functions of the LGN neurons (open circle) and the optimal response functions predicted from the ‘Pleistochrome’ theory (continuous line, A) and the ‘Infomax’ theory (dotted line, B) along L-M axis (left) and S-(L+M) axis (right). Upper panels show natural colour distributions along each axis. (b) A two-dimensional space composed of optimal response functions predicted from the ‘Pleistochrome’ theory.
Colour Representation in Lateral Geniculate Nucleus and Natural Colour Distributions
29
point than those in the database that we used. Thus, the density of highly saturated colours should be underestimated if that database was used. Note also that we computed colour distributions after applying von Kries adaptation. Without von Kries adaptation, the colour distribution becomes flatter and the optimal response functions for those distributions will become more linear than the neural response functions. This means that receptor gain control, as well as compressive nonlinearity, has an important role in efficient coding of colours.
4 Conclusion We reconstructed a colour space from responses of a population of LGN neurons. We found that the derived colour space (‘LGN colour space’) was considerably expanded/compressed compared with those in the linear cone-opponent colour space. Interestingly, the expanding/compressing pattern may partly explain emergence of five basic hues in colour appearance. This suggests that nonlinear transformations into the perceptual colour representation may begin at this early neural processing stage. Furthermore, we found that the expanding/compressing pattern in the LGN colour space can be explained simple compressive nonlinearity of the cone-opponent type of neurons. Such nonlinear response characteristics were nearly optimal to natural colour distributions according to the principle of 'minimizing errors in estimation of stimulus colour in the presence of response noise'. This supports the idea that the colour representation at the early neural processing stage is adapted for efficient coding of colour information in the natural environment.
References 1. MacLeod, D.I.A., Boynton, R.M.: Chromaticity Diagram Showing Cone Excitation by Stimuli of Equal Luminance. J. Opt. Soc. Am. 69, 1183–1186 (1978) 2. Derrington, A.M., Krauskopf, J., Lennie, P.: Chromatic Mechanism in Lateral Geniculate Nucleus of Macaque. J. Physiol. 357, 241–265 (1984) 3. De Valois, R.L., Cottaris, N.P., Elfar, S.D., Mahon, L.E., Wilson, J.A.: Some Transformations of Color Information from Lateral Geniculate Nucleus to Striate Cortex. Proc. Natl. Acad. Sci. U.S.A. 97, 4997–5002 (2000) 4. Stoughton, C.M., Conway, B.R.: Neural Basis for Unique Hues. Curr. Biol. 18, R698– R699 (2008) 5. Hanazawa, A., Komatsu, H., Murakami, I.: Neural Selectivity for Hue and Saturation of Colour in the Primary Visual Cortex of the Monkey. Eur. J. Neurosci. 12, 1753–1763 (2000) 6. Valberg, A., Seim, T., Lee, B.B., Tryti, J.: Reconstruction of Equidistant Color Space from Responses of Visual Neurons of Macaque. J. Opt. Soc. Am. A 3, 1726–1734 (1986) 7. Young, R.A.: Principal-component Analysis of Macaque Lateral Geniculate Nucleus Chromatic Data. J. Opt. Soc. Am. A 3, 1735–1742 (1986) 8. Buchsbaum, G., Gottschalk, A.: Trichromacy, Opponent Colours Coding and Optimum Colour Information Transmission in the Retina. Proc. Roy. Soc. Lond. B 220, 1934–1990 (1983)
30
N. Goda, K. Koida, and H. Komatsu
9. Attick, J.J., Li, Z., Redlich, A.N.: Understanding Retinal Color Coding from First Principles. Neural Comp. 4, 559–572 (1992) 10. Laughlin, S.B.: A Simple Coding Procedure Enhances a Neuron’s Information Capacity. Z. Naturforsch. 36(c), 910–912 (1981) 11. von der Twer, T., MacLeod, D.I.: Optimal Nonlinear Codes for the Perception of Natural Colours. Network 12, 395–407 (2001) 12. Olmos, A., Kingdom, F.F.A.: McGill Calibrated Colour Image Database, http://tabby.vision.mcgill.ca 13. Webster, M.A., Mollon, J.D.: Adaptation and the Color Statistics of Natural Images. Vision Res. 37, 3283–3298 (1997) 14. Long, F., Yang, Z., Purves, D.: Spectral Statistics in Natural Scenes Predict Hue, Saturation, and Brightness. Proc. Natl. Acad. Sci. U.S.A. 103, 6013–6018 (2006) 15. Ruderman, D.L., Cronin, T.W., Chiao, C.: Statistics of Cone Responses to Natural Images: Implication for Visual Coding. J. Opt. Soc. Am. A 15, 2036–2045 (1998)
Color Constancy Algorithm Selection Using CART Simone Bianco, Gianluigi Ciocca, and Claudio Cusano DISCo (Dipartimento di Informatica, Sistemistica e Comunicazione), Universit` a degli Studi di Milano-Bicocca, Viale Sarca 336, 20126 Milano, Italy {bianco,ciocca,cusano}@disco.unimib.it
Abstract. In this work, we investigate how illuminant estimation techniques can be improved taking into account intrinsic, low level properties of the images. We show how these properties can be used to drive, given a set of illuminant estimation algorithms, the selection of the best algorithm for a given image. The selection is made by a decision forest composed by several trees that vote for one of the illuminant estimation algorithm. The most voted algorithm is then applied to the input image. Experimental results on the widely used Ciurea and Funt dataset demonstrate the accuracy of our approach in comparison to other algorithms in the state of the art.
1
Introduction
Computational color constancy aims to estimate the actual color in an acquired scene disregarding its illuminant. A scene can be modelled as a collection of Lambertian surfaces illuminated by a single, constant illuminant. The image values for a Lambertian surface located at the pixel with coordinates (x, y) can be seen as a function ρ(x, y), mainly dependent on three physical factors: the illuminant spectral power distribution I(λ), the surface spectral reflectance S(λ) and the sensor spectral sensitivities C(λ). Using this notation ρ(x, y) can be expressed as ρ(x, y) = I(λ)S(x, y, λ)C(λ)dλ, (1) ω
where ω is the wavelength range of the visible light spectrum, ρ and C(λ) are three-component vectors. Since the three sensor spectral sensitivities are usually respectively more sensitive to the low, medium and high wavelengths, the threecomponent vector of sensor responses ρ = (ρ1 , ρ2 , ρ3 ) is also referred to as the sensor or camera RGB = (R, G, B) triplet. The goal of color constancy is to estimate the color I of the scene illuminant, i.e. the projection of I(λ) on the sensor spectral sensitivities C(λ): I= I(λ)C(λ)dλ. (2) ω
A. Tr´ emeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 31–40, 2009. c Springer-Verlag Berlin Heidelberg 2009
32
S. Bianco, G. Ciocca, and C. Cusano
Since the only information available are the sensor responses ρ across the image, color constancy is an under-determined problem [1]; and thus further assumptions and/or knowledge are needed to solve it. Typically, some informati3on about the camera being used is exploited, and/or assumptions about the statistical properties of the expected illuminants and surface reflectances. When these assumptions are not fulfilled, the illuminant estimation is expected to be very inaccurate and leads to an erroneous color correction. We investigated here if it is possible to automatically derive the suitability of a illuminant estimation algorithm for a given image by analyzing a set of visual features. To validate this hypothesis we developed a illuminant estimation framework and evaluated its performance on a public available dataset of images. Given a set of illuminant estimation algorithms, the framework determines how the estimation of the illuminant of a given image should be computed. The prediction of the suitability of each algorithm is carried out by an image classifier based on an ensemble of decision trees. The trees have been trained to identify the best algorithm in the set considered, on the basis of the values of a set of low-level visual features. For the most part these are general purpose features taken from the pattern recognition and image analysis fields. Some features have been specifically designed for the illuminant estimation problem. Within this framework, a illuminant estimation strategy has been evaluated which selects for each image, a single algorithm on the basis of the responses of the trees. Several computational color constancy algorithms exist in the literature which may be included in our framework, each based on different assumptions. Hordley [2] gives an excellent review of illuminant estimation algorithms. In this work, we chose five algorithms, but different algorithms can be used, or added to the set. Recently Van de Weijer et al. [3] have unified a variety of algorithms. These algorithms correspond to instantiations of the following equation: p1 p n |∇ ρσ (x, y)| dx dy = kI, (3) where n is the order of the derivative, p is the Minkowski norm, ρσ (x, y) = ρ(x, y) ⊗ Gσ (x, y) is the convolution of the image with a Gaussian filter Gσ (x, y) with scale parameter σ, and k is a constant to chosen such that the illuminant color I has unit length. In this work, varying the three variables (n, p, σ) we have generated four algorithm instantiations that correspond to well known and widely used color constancy algorithms: 1. Gray World (GW) algorithm [4], which is based on the assumption that the average reflectance in a scene is achromatic. It can be generated setting (n, p, σ) = (0, 1, 0) in Equation 3. 2. White Point (WP) algorithm [5], also known as Maximum RGB, which is based on the assumption that the maximum reflectance in a scene is achromatic. It can be generated setting (n, p, σ) = (0, ∞, 0) in Equation 3. 3. Gray Edge (GE1) algorithm [3], which is based on the assumption that the p−th Minkowski norm of the first order derivative in a scene is achromatic. It can be generated setting (n, p, σ) = (1, p, σ) in Equation 3.
Color Constancy Algorithm Selection Using CART
33
4. Second Order Gray Edge (GE2) algorithm [3], which is based on the assumption that the p−th Minkowski norm of the second order derivative in a scene is achromatic. It can be generated setting (n, p, σ) = (2, p, σ) in Equation 3. The fifth algorithm considered is the Do Nothing (DN) algorithm, which gives for every image the same estimation for the color of the illuminant, I = [1 1 1].
2
Classification and Regression Trees for Algorithm Selection
To perform algorithm selection we used decision trees built according to the Classification and Regression Trees (CART) methodology [6]. Briefly, the classifiers are produced by recursively partitioning the feature space, each split being formed by conditions related to the features values. In tree terminology subsets are called nodes: the feature space is the root node, terminal subsets are terminal nodes, and so on. Once a tree has been built, a class is assigned to each of the terminal nodes, and when a new case is processed by the tree, its predicted class is the class associated with the terminal node into which the case finally moves on the basis of its features values. The construction process is based on training sets of cases of known class. Tree classifiers provide a clear understanding of the conditions that drive the classification process. Moreover, they imply no distributional assumptions for the features. To improve generalization accuracy we decided to perform the classification by also using what is called a “perturbing and combining” method [7]. Methods of this kind, which generate in various ways multiple versions of a base classifier and use these to derive an aggregate classifier, have proved very successful in improving accuracy. We used bagging (bootstrap aggregating), since it is particularly effective when the classifiers are unstable, as trees are, that is, when small perturbations in the training sets, or in the construction process of the classifiers, may result in significant changes in the resulting prediction. With bagging the multiple versions of the base classifier are formed by making bootstrap replicates of the training set and using them as new training sets. The aggregation is made by majority vote. In any particular bootstrap replicate each element of the training set may appear repeated times, or not at all, since the replicates are obtained by resampling with replacement. Our classifier is trained on a training set of images labeled with the corresponding best algorithm. The straightforward application of the CART training process to this problem leads to poor results. This is due to the fact that some properties of the problem are not taken into account in the formulation: i) some algorithms generally perform better than others; ii) the performance of the algorithms are correlated so that the consequences of a non-optimal choice may present a high variability. The first point is addressed by estimating the a-priori probability for each algorithm that it is the best algorithm. For the second point, each pair of algorithms is considered and the average difference in performance obtained when one of the two algorithms corresponds to the best choice is
34
S. Bianco, G. Ciocca, and C. Cusano
computed. In other words, we computed the expected cost (i.e. degradation in performance) caused by the choice of an algorithm when another algorithm is the best choice. These costs are used during training to influence label assignment in such a way that the tree is optimized to minimize the expected misclassification cost instead of the number of errors.
3
Image Features
In the literature many features exist to be used in describing the image content [8,9,10,11]. For our problem, we have limited the choices of the features within the low level features category, since they do not require prior knowledge of the image content and are able to describe some aspects of the image in a compact and efficient way. RGB color histogram is one of the most widely used image descriptors [12,13] and represents the color distribution of the image. It possesses several useful properties that make it a robust visual feature such as compactness, invariance and robustness with respect to the geometric transformation of the original image like rotation and scale. We quantized the RGB color space by uniformly dividing each color axis into 3 intervals for a total of 27 histogram bins. Edge direction histogram can be used to determine the edge structures within an image and thus allow us to distinguish between different image classes. For example, strong edges can be found in buildings, roads, and other man-made structures.On the other hand, pictures of natural scenes usually do not show strong edges and since the subject has no clear structure they do not show a specific pattern. Edge direction histogram Edges are computed applying a Derivate of the Gaussian filter. The orientations are then thresholded and quantized into 18 bins each corresponding to angles of intervals of 10 degrees. Wavelet statistics provide information at different levels of resolution about the textures and structures within the image. Wavelet multiresolution analysis is often used in content-based retrieval for similarity retrieval, target search, compression, texture analysis, biometrics, etc. . . [14,15,16]. For our purposes the wavelet statistics features are extracted from the luminance image using a three-iteration Daubechies wavelet decomposition, producing a total of ten bands. The energy i.e. the amount of information within each band, expressed in terms of the mean and variance of the absolute values in each band, provides a concise description of the image’s content. This feature is thus composed by 20 components. YCbCr Color Moments are used to describe the color distribution of an image. The color distribution of an image can, in fact, be considered a probability distribution and can therefore be characterized uniquely by its central moments alone, as can any probability distribution [17]. We computed the first two central moments, mean, and standard deviation of each color channel of the YCbCr color space for a total of 9 values. The choice of the YCbCr color
Color Constancy Algorithm Selection Using CART
35
space allows the separation of the luminance component from the chrominance components in a simple way using a linear transformation. The number of distinct colors is related to the color range of the image. Since several illuminant estimation algorithms are based on the Gray World assumption, the color range is an indication of whether this assumption holds true for the given image or not. To remove small variations in the color appearance and thus limit the influence of noise in the computation of the feature, the RGB color channels are quantized by considering only the six most significant bits. The percentage of clipped color components takes into account the extent of highly saturated color pixels i.e. pixels having the maximum value that can be represented on the device. We discriminate between pixels with zero, one, two or all three color components clipped (8 different cases). The values are accumulated in a histogram normalized with respect to the total number of pixels in the image, so that the histogram represents a probability density distribution. The cast index is aimed at identifying the presence of a relevant cast within the image. This is important since a strong cast may be an indication that a particular illuminant is present. This feature is inspired by the work done in [18], where the cast is detected and classified into several classes according to its relevance. In this work, we do not consider the class of the cast but instead its distribution statistics (2 components) computed as in [18]. We modify the original formulation by changing the color space representation from the CIELAB to YCbCr which does not require the knowledge of the white point of the scene. Edge Strengths is an important feature since many color illuminant estimation algorithms rely on statistics about the edges in the images. These estimations are reliable if computed on strong edges otherwise they are less accurate. We compute a histogram of edge magnitudes in order to capture the strength of the edges. The edges are detected as in the case of the edge direction histogram and the magnitudes are quantized into 5 intervals. All the features have been chosen uniquely for their ability to describe the content of an image. The aim of the classifier is to choose the features as well as which specific components in a feature are more relevant to discriminate between the classes selected for the problem under analysis. Moreover, while all the features must be computed for the images in the training sets, only the features actually chosen and used by the classifier need to be computed for the images in the test sets and for new images to be processed. This approach is made possible by the use of CART trees as classifiers. Other classification methodologies (such as support vector machines and neural networks) would have required a complex feature selection (and normalization) step.
4
Experimental Results
To evaluate our approach we measured its performance on a subset of the dataset of images presented by Ciurea and Funt [19] which is commonly used in the
36
S. Bianco, G. Ciocca, and C. Cusano
evaluation of color constancy algorithms as it is labeled with the ground truth illuminants. In this dataset 15 digital video clips were recorded (at 15 frames per second) in different settings such as indoor, outdoor, desert, markets, cityscape, etc. . . for a total of two hours of videos. From each clip, a set of images was extracted, resulting in a dataset of more than 11000 images. A gray sphere appears in the bottom right corner of the images and was used to estimate the true color of the scene illuminant. Since the dataset sources were video clips, the images extracted show high correlation. To remove this correlation, only a subset of images should be used from each set. Taking into account that the image sets came from video clips, we applied a two stage video-based analysis to select the image to be included in the final illuminant dataset. For more details about the dataset extraction see [20]. The final dataset so extracted consisted of 1135 images. These have been randomly subdivided into a training set of 340 images (about 30% of the dataset) and a test of 795 images. The training set has been used to: – find the best parameters of the illuminant estimation algorithms; – make an estimate of the a-priori related to the algorithms (i.e. the probability that an algorithm is the best one); – estimate the matrix of misclassification costs. A cross validation on the test set has been adopted to train and evaluate the decision forest and to assess the overall performance of the strategy. 4.1
Performance Evaluation
In order to evaluate the performance of the algorithms considered, we have to define an error measure. Since in estimating the scene illuminant it is more important to estimate its color than its overall intensity, the error measure has to be intensity-independent. As suggested by Hordley and Finlayson [21], we use as error measure the angle between the RGB triplets of the illuminant color (ρw ) and the algorithm’s estimate of it (ρˆw ): ρTw ρˆw eAN G = arccos . (4) ρw ρˆw Hordely and Finlayson [21] showed that a good descriptor for the angular error distribution is the median error. To verify if the performances of different algorithms are statistically different, a test which is able to compare the whole error distribution of different algorithms is needed. Since standard probability models cannot represent underlying errors well, we need a test that does not make any a-priori assumptions about the underlying error distributions. To compare the performance of two color constancy algorithms in addition to the median angular error, we have used the Wilcoxon Sign Test (WST) [22]. 4.2
Tuning of the Color Constancy Algorithms
Two of the color constancy algorithms considered, (GE1 and GE2), needed a training phase to opportunely tune the parameters (n, p, σ). As a training set,
Color Constancy Algorithm Selection Using CART
37
we used the same 300 images used in [20] in order to make the results easily comparable. Starting from the 340 training images, 40 have been discarded in order to balance the frequency of indoor and outdoor images. The performances of the algorithms are evaluated using the median angular error. Since the median error is a nonlinear statistic, we needed a multidimensional nonlinear optimization algorithm: our choice was to use a Pattern Search Method (PSM). PSMs are a class of direct search methods for nonlinear optimization [23,24]. PSMs are simple to implement and do not require any explicit estimate of derivatives. Furthermore, global convergence can be established under certain regularity assumptions of the function to minimize [25]. 4.3
Training and Evaluation of the Classifier
We used cross validation to evaluate the performance of the classifier. The angular error of the illuminant estimation algorithms on the whole dataset is computed. This allows the estimation of the a-priori probability for each algorithm that is the best choice, and of the matrix of misclassification costs. These values, estimated on the 340 images of the training set, are reported in Table 1 and in Table 2. At this point, a ten-fold cross validation is used to train and to evaluate the algorithm selection strategy. Table 3 shows the confusion matrix obtained on the test set. Each row corresponds to an algorithm and reports the distribution of the output of the classifier estimated on the subset of the test set for which that algorithm is the best choice. Most of the images for which the DN Table 1. A-priori probabilities, corresponding to the five illuminant estimation algorithms, estimated on the images of the training set Algorithm Probability DN GW WP GE1 GE2
0.33 0.34 0.04 0.12 0.17
Table 2. Matrix of the estimated misclassification costs estimated on the images of the training set Predicted Algorithm Best Algorithm DN GW WP GE1 GE2 DN GW WP GE1 GE2
0.00 8.43 0.50 2.80 2.86
10.90 0.00 10.19 5.48 6.18
1.98 5.67 0.00 2.29 1.89
6.41 4.13 4.93 0.00 0.67
4.10 6.28 2.68 0.77 0.00
38
S. Bianco, G. Ciocca, and C. Cusano
Table 3. Confusion matrix of the classifier used for algorithm selection, estimated on the images of the test set Predicted Algorithm Best Algorithm DN GW WP GE1 GE2 DN GW WP GE1 GE2
0.85 0.24 0.37 0.39 0.45
0.06 0.61 0.00 0.29 0.15
0.01 0.01 0.11 0.04 0.02
0.04 0.10 0.37 0.17 0.13
0.04 0.05 0.15 0.11 0.26
algorithm is the best choice are correctly classified (85% of accuracy). For the other algorithms the correct classification rate ranges from 61% (GW) to 11% (WP). However, considering the a-priori distribution of the five algorithms, the best algorithm is chosen 55% of the time, the second best algorithm is chosen 11% of the time; and the frequency of the selection of the third, the fourth, and the worse algorithm are 16%, 12%, and 5%, respectively. It should be considered that the classifier has not been trained with the aim of finding the best algorithm, but with the aim of finding the algorithm with the lowest expected error, taking into account the errors determined by misclassifications. This means that the performance of the classifier should not be evaluated in terms of classification accuracy, but in terms of the angular error of the selected algorithms. In fact, in more than 70% of test cases the loss of performance due to the choice of a suboptimal algorithm is below one degree of angular error with respect to the best algorithm. The average angular error of our algorithm selection strategy is about 4.76 degrees, while the median angular error is about 3.21 degrees. These results are compared in Table 4 with those obtained by the five single algorithms and by three combining algorithms: AVG, which simply averages the results of the Table 4. Summary of the results obtained on the test set by our algorithm selection strategy (AS), compared with the performance of the five simple algorithms. The best score for each column are reported in bold. Algorithm Median Mean WSTs DN GW WP GE1 GE2
6.05 5.95 5.48 4.47 4.65
8.07 7.27 7.45 5.84 6.23
0 0 2 4 3
AVG N2M LMS AS
4.66 4.79 4.12 3.21
5.99 5.82 5.29 4.76
3 3 7 8
Color Constancy Algorithm Selection Using CART
39
estimations given by the five algorithms considered [26]; LMS which consists in a weighted average of the outputs of the individual algorithms [26]; N2M which averages the outputs of the three individual algorithms which gave the closest illuminant estimations, automatically excluding the two that gave the furthest estimations [27]. The performance of our approach is clearly superior to that of single and combined algorithms, at least on the dataset we considered.
5
Conclusions
In this work we have presented a framework for automatic illuminant estimation based on the selection of simple algorithms. To improve illuminant estimation accuracy, a decision forest is trained to identify the best algorithm within a set, for a given image. The choice of the best algorithm is based on a set of low-level features representing the pictorial content of the images. Experimental results, performed on subset of uncorrelated images extracted from the widely used Funt and Ciurea dataset, demonstrate that our approach is able to improve the results compared with some state of the art algorithms. From our experiments the approach proposed reduced the median angular error by 22.1% with respect to the best illuminant estimation algorithm considered (LMS).
References 1. Funt, B., Barnard, K., Martin, L.: Is machine colour constancy good enough? In: Burkhardt, H.-J., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1406, pp. 445–459. Springer, Heidelberg (1998) 2. Hordely, S.D.: Scene illuminant estimation: Past, present, and future. Color Research & Application 31(4), 303–314 (2006) 3. van de Weijer, J., Gevers, T., Gijsenij, A.: Edge-based Color Constancy’. IEEE Transactions on Image Processing 16(9), 2207–2214 (2007) 4. Buchsbaum, G.: A spatial processor model for object color perception. Journal of Franklin Institute 310, 1–26 (1980) 5. Cardei, V., Funt, B., Barndard, K.: White point estimation for uncalibrated images. In: Proc. IS&T/SID 7th Color Imaging Conference, pp. 97–100 (1999) 6. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth and Brooks/Cole (1984) 7. Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996) 8. Schettini, R., Ciocca, G., Zuffi, S.: Indexing and retrieval in color image databases. Color Imaging Science: Exploiting Digital Media, 183–211 (2002) 9. Antani, S., Kasturi, R., Jain, R.: Survey on the use of pattern recognition methods for abstraction, indexing and retrieval of images and video. Pattern recognition 35, 945–965 (2002) 10. Eakins, J.P.: Towards intelligent image retrieval. Pattern Recognition 35, 3–14 (2002) 11. Sikora, T.: The MPEG-7 visual standard for content description - An overview. IEEE Transaction on circuits and system for video technology 11(6), 696–702 (2001)
40
S. Bianco, G. Ciocca, and C. Cusano
12. Swain, M.J., Ballard, D.H.: Color indexing. International Journal of Computer Vision 7(1), 11–32 (1991) 13. Gong, Y., Chuan, C.H., Xiaoyi, G.: Image indexing and retrieval using color histograms. Multimedia Tools and Applications 2, 133–156 (1996) 14. Idris, F., Panchanathan, S.: Storage and retrieval of compressed images using wavelet vector quantization. Journal of Visual Languages and Computing 8, 289– 301 (1997) 15. Scheunders, P., Liven, S., Van de Wouwer, G., Vautrot, P., Van Dyck, D.: Waveletbased texture analysis. International Journal Computer Science and Information management 1(2), 22–34 (1997) 16. Mojsilovic, A., Rackov, D., Popovic, M.: On the selection of an optimal wavelet basis for texture characterization. IEEE Transaction on Image Processing 9(12), 2043–2050 (2000) 17. Stricker, M.A., Orengo, M.: Similarity of color images. In: Proc. SPIE Storage and Retrieval for Image and Video Databases III Conference, pp. 381–392 (1995) 18. Gasparini, F., Schettini, R.: Color balancing of digital photos using simple image statistics. Pattern Recognition 37(6), 1201–1217 (2004) 19. Ciurea, F., Funt, B.: A Large Image Database for Color Constancy Research. In: Proc. IS&T/SID 11th Color Imaging Conference, pp. 160–164 (2003) 20. Bianco, S., Ciocca, G., Cusano, C., Schettini, R.: Improving Color Constancy Using Indoor-Outdoor Image Classification. IEEE Transactions on Image Processing 17(12), 2381–2392 (2008) 21. Hordley, S.D., Finlayson, G.D.: Re-evaluating Color Constancy Algorithms. In: Proc. 17th International Conference on Pattern Recognition, pp. 76–79 (2004) 22. Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945) 23. Lewis, R.M., Torczon, V.: Pattern search algorithms for bound constrained minimization. SIAM Journal on Optimization 9, 1082–1099 (1999) 24. Lewis, R.M., Torczon, V.: Pattern search methods for linearly constrained minimization. SIAM Journal on Optimization 10, 917–941 (2000) 25. Lewis, R.M., Torczon, V.: On the convergence of pattern search algorithms. SIAM Journal on Optimization 7, 1–25 (1997) 26. Cardei, V.C., Funt, B.: Committee-Based Colour Constancy. In: IS&T/SID Seventh Color Imaging Conference: Color Science, Systems and Applications, pp. 311– 131 (1999) 27. Bianco, S., Gasparini, F., Schettini, R.: A Consensus Based Framework For Illuminant Chromaticity Estimation. Journal of Electronic Imaging 17(02), 023013 (2007)
Illuminant Change Estimation via Minimization of Color Histogram Divergence Michela Lecca and Stefano Messelodi Fondazione Bruno Kessler, IRST - 38100 Povo, Trento - Italy {lecca, messelod}@fbk.eu
Abstract. We present a new method for computing the change of light possibly occurring between two pictures of the same scene. We approximate the illuminant variation with the von Kries diagonal transform and estimate it by minimizing a functional that measures the divergence between the image color histograms. Our approach shows good performances in terms of accuracy of the illuminant change estimation and of robustness to pixel saturation and Gaussian noise. Moreover we illustrate how the method can be applied to solve the problem of illuminant invariant image recognition.
1
Light and Color
Color descriptors are considered among the most important features in contentbased image retrieval and indexing [15]. Colors are in fact robust to noise, rescaling, rotation and image resolution. The main drawback in the use of color for object and image retrieval is the strict dependency of the color on the light in the scene. Color variations can be produced in different ways, for instance by changing the number, the position or the spectrum of the light sources. Moreover, the color of a picture often depends on the characteristics of the device used to capture the scene. The development of a device- and illuminant- invariant image representation is an old but still unsolved attractive problem in Computer Vision [15]. In this paper, we propose a method for estimating the variation of illuminant between the images of a scene taken under different light conditions. More precisely, we restrict our attention to the photometric changes induced by different kinds of lamps or by variations in the voltage of the lamps illuminating a scene. We assume that the illumination varies uniformly over the whole image and we assume the von Kries diagonal model, in which the responses of a camera sensor under two different illuminants are related by a diagonal linear transformation. This model has been proved to be a good approximation for illuminant changes [6], [7], especially in the case of narrow-band sensory systems [4], and it is employed in many color enhancement techniques, e.g. [2], [5], [3], [16]. Our technique estimates the von Kries transform between an image and a re-illuminated version of it by a least-squares method that minimizes a dissimilarity measure, named divergence, between their color histograms. The accuracy of the estimate obtained by our method has been measured on synthetic and real-world datasets, A. Tr´ emeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 41–50, 2009. c Springer-Verlag Berlin Heidelberg 2009
42
M. Lecca and S. Messelodi
showing good performances even in the presence of saturated pixels, Gaussian Noise and the variation of the color quantization. Moreover, we describe how our method can be applied to the illuminant invariant image retrieval task and we compare it with other image retrieval approaches. Synopsis - Section 2 describes our approach, while its performance is discussed in Section 3, and Section 4 shows us how it can be applied to the illuminant invariant image recognition. Section 5 illustrates our future plans.
2
Diagonal Transform Computation
Let (R0 , G0 , B0 ) be the response of a camera to a given illuminant and let (R, G, B) be the response of the same camera to an unknown illuminant. The von Kries diagonal model approximates the change of illuminant mapping (R, G, B) onto (R0 , G0 , B0 ) by a diagonal transformation K that rescales each channel K independently, i.e. (R, G, B) → (α0 R0 , α1 G0 , α2 B0 ), where α0 , α1 , α2 are nonzero positive real numbers, that we refer as von Kries parameters. In our method, the color of an image is described by the distributions of the values of the three channels R, G, B. Each distribution is represented by a histogram of N bins, where N is in the range in {1, . . . , 256}. Hence, the color feature of an image I is represented by a triplet H := (H 0 , H 1 , H 2 ) of histograms. We refer to H as color histograms, whereas we name its components channel histograms. Let I0 and I1 be two images, where I1 is possibly a rescaled, rotated and differently illuminated version of I0 . Let H0 and H1 be the color histograms of I0 and I1 respectively. Let H0i and H1i indicate the ith component of H0 and H1 respectively. Hereafter, we assume that each channel histogram Hji is normalized N N so that x=1 H0i (x) = x=1 H1i (x). The channel histograms of two images which differ by illumination are stretched each to other by the diagonal model, hence for each i we have that x
H1i (k)
=
k=1
αi x
H0i (k),
(1)
k=1
where, as the data is discrete, the value αi x is cast to an integer in the range [1, 256]. Our estimate of αi consists of two phases: firstly, for each x in [1, 256] we compute the point y in [1, 256] such that x k=1
H0i (k) =
y
H1i (k).
(2)
k=1
Then we obtain the coefficient αi as the slope of the best line fitting the pairs (x, y). The best line is defined by means of a least squares method. The computation of the pairs (x, y) satisfying (2) is done using the following algorithm consisting of two steps:
Illuminant Change Estimation
43
Initialization: Let R0 and R1 indicate the left- and right- side of (2) respectively. Firstly, we compute the minimum values of x and y such that R0 and R1 are greater than zero. Then we set M := min(R0 , R1 ). Let L be a list of points and let W be a list of weights, i.e. real numbers, with L and W initially empty. Iterations: Iteratively, 1. we push the pair (x, y) in L and M in W; 2. in order to satisfy equation (2), if M is equal to R0 (R1 resp.), i.e. R0 < R1 (R0 > R1 resp.), we increment x (y resp.) by one until M = R1 (M = R0 resp.). We update M and then R0 and R1 by R0 := R0 − M ; R1 := R1 − M.
(3)
Note that except for the initialization step, M could be null: in this case, both x and y are incremented until M becomes strictly positive and then R0 and R1 are updated as in (3); 3. we repeat steps 1 and 2 until x or y is 255. We estimate the value of αi by minimizing with respect to α the following functional, that we call divergence: dα (H0i , H1i ) :=
k
Mk d((xk , yk ), A)2 =
Mk (αxk − yk )2 . α2 + 1
(4)
k
Here Mk and (xk , yk ) indicate the kth items of the lists W and L respectively, while d((xk , yk ), A) is the Euclidean distance between the point (xk , yk ) and the line A: y = αx. The weights Mk are introduced to make the estimate robust to color quantization and possible noise affecting the images. We observe that: (i) dα (H0i , H1i ) = 0 ⇔ H0i (αp) = H1i (p), for each p in {1, . . . , N }; and (ii) dα (H0i , H1i ) = d α1 (H1i , H0i ). From these properties follows that dα is a measure of dissimilarity (divergence) between the channel histograms stretched each to other. In particular, if dα is zero, then the two histograms are related by a stretching of the x axis. Note that, since the values of R, G, B are in [1, 256], the values of α0 R0 , α1 G0 , α2 B0 possibly greater than 256 are truncated to 256 (saturated pixels). Therefore, to make the estimate robust (as much as possible) with respect to pixel saturation, the N th bin of the histograms H0i and H1i are not considered when determining the von Kries transform. This explains why in step 3. of the iterative phase the algorithm stops when x or y are 255. However, performances decrease by incrementing the number of saturated pixels (see Section 3). Figure 1 shows an example, where a same scene has been acquired under two illuminants (a) and (b), while (c) is obtained by remapping (b) onto (a) by our estimated von Kries transform between (a) and (b). As can be seen, (a) and (c) look very similar. We note that, when no changes of size or in-plane orientation occur, the von Kries map relating two images I and I can be estimated by finding, for each color channel, the best line fitting the pairs of sensory responses (pi , pi ) at the
44
M. Lecca and S. Messelodi
(a)
(b)
(c)
Fig. 1. (a) a picture and (b) a re-illuminated version of (a); (c) is obtained by remapping (b) onto (a) by the von Kries transform estimated. (a) and (c) appear highly similar.
ith pixels of I and I respectively. Our approach basically applies a least square method in the space of the color histograms, and therefore makes the estimate of the von Kries coefficients robust to image rescaling and/or rotating.
3
Accuracy of Our Estimate
The accuracy of our estimate of the von Kries diagonal transform has been tested on different synthetic and real-world databases. For brevity, in this paper we report the results obtained with three public databases (TESTS51, ALOI and ECCV98). More details are available in [11]. Let I0 be an image of a scene under a reference illuminant, and let I be an image of the same scene taken under an unknown illuminant. In the following we refer to I0 as the reference image, while we refer to I as the test image. The accuracy of the estimate of the von Kries transform has been evaluated as A = 1 − L1 (I, K est (I0 )),
(5)
where L1 (I, K est (I0 )) is the L1 distance computed on the RGB space between I and the transform K est (I0 ) of I0 , and K est indicates the von Kries transform estimated. This distance has been normalized to range in [0,1]. Therefore, the closer A is to 1, the better the estimate of the von Kries transform is. To evaluate the goodness of our image correction,for each pair (I0 , I), we compared the accuracy measure (5) with the value A0 = 1 − L1 (I, I0 ). In these experiments we do not consider changes of image size or orientation. The difference between the accuracy measure (5) of the best fit applied on the channel images (mentioned at the end of Section 2) and those obtained by means of our approach is negligible: about 3 · 10−4 in the worst case (ECCV98). Our algorithm has linear complexity with respect to the number of image pixels and to the color quantization N . Therefore, it is particularly efficient, also in comparison to other methods, like for instance [3]. The time for the estimation of the von Kries coefficients for a pair of images of size 150 × 200 is less than 40 ms on a standard Pentium4 CPU 2.8 GHz. Tests on TESTS51 - The dataset TESTS51 has been built starting from the public dataset of Ponce and others [14] (http://www-cvr.ai.uiuc.edu/).
Illuminant Change Estimation
45
This database consists of a set of images of 8 different objects and of a set of 51 test-pictures in which the objects appear under different conditions (occluded, rescaled, rotated, differently illuminated, . . . ). The 51 test-pictures have been taken as references, while the test images have been obtained by rescaling the color channels of the reference images by 20 diagonal linear functions of the form Fβw (R, G, B) = βw (R, G, B), with βw = 0.2 + 0.2w, and w = 0, . . . , 19. For each test image I, we estimated the 20 von est Kries transforms Kw mapping the correspondent reference on I. Figure 2(left) shows the mean value of the accuracy measure (5) versus the parameter βw , for different color quantization. The mean value of A0 is 0.22, while the mean value of A is 0.9999 for N = 256. est The precision of our estimates βw of βw , w = 0, . . . , 19, has been measured est βw using the error Ew = 1− βw . The closer Ew is to zero, the better is the accuracy on the determination of the von Kries transform. A strictly negative (positive, resp.) value of Ew indicates that the estimate is greater (smaller, resp.) than the real parameter. Figure 2(right) shows the mean value E w of Ew averaged over the test images, by varying βw and the color quantization. The best parameter estimates and the best accuracy have been obtained for the finest color quantization.
1
0.06
0.998
0.04
0.996
0.02
0.994
0
0.992
-0.02
mean error
mean RGB distance
Tests on ALOI - ALOI [9] (http://staff.science.uva.nl/~aloi/) is a col-
0.99 0.988 0.986
256 128 64 32 16
-0.04 -0.06 -0.08
0.984
256 128 64 32 16
0.982 0.98 0.5
0.75
-0.1 -0.12 -0.14 1
1.25
1.5
1.75 βw
2
2.25
2.5
2.75
3
0.5
0.75
1
1.25
1.5
1.75 βw
2
2.25
2.5
2.75
3
Fig. 2. TESTS51: (Left) Mean value of the accuracy measure (5) by varying the color quantization. (Right) Mean Error E w by varying the parameter βw and the color quantization.
lection of 110,250 images of 1,000 objects acquired under different conditions. In ALOI, each frontal object view has been shot under 12 different light conditions, produced by varying the color temperature of five lamps illuminating the scene. More precisely, the lamp voltage was modified to be Vj = j × 0.047 Volts with j ∈ J = {110, 120, 130, 140, 150, 160, 170, 180, 190, 230, 250}. The object images captured under the illuminant with voltage V110 have been taken as references, while the other object images have been used for testing. For each reference O110 , we estimate the von Kries transform Kjest (Oj ) mapping O110 onto the test image Oj taken under the illuminant with voltage Vj , and j ∈ J − {110}.
46
M. Lecca and S. Messelodi 0.9985
1.7
256 128 64 32 16
0.998
1.5 estimates of von Kries parameters
0.9975
mean accuracy
0.997 0.9965 0.996 0.9955 0.995
1.4 1.3 1.2 1.1 1
0.9945
0.9
0.994
0.8
0.9935 120
α0 α1 α2
1.6
130
140
150
160
170 180 190 200 Lamps Voltage [x 0.047 V]
210
220
230
240
250
0.7 120
130
140
150
160
170 180 190 200 Lamp Voltage [x 0.047 V]
210
220
230
240
250
Fig. 3. ALOI: (Left) Mean accuracy (5) versus the illuminants for different color quantizations. (Right) Estimates of the von Kries parameters and their standard deviation bars for N = 256.
Figure 3(left) shows the mean accuracy (5) for the different lamp voltages and variations of color quantization. On average, A0 = 0.9913, while A = 0.9961 for N = 256. We note that for j = 140, the accuracy is lower than for the other lamp voltages. This is because the voltage V140 determines a large increment of the light intensity and therefore produces a large number of saturated pixels. In principle, the transform K mapping the object image O110 onto Oj should have the same parameters of the transform K mapping a different object im age O110 onto Oj , because the illuminant change is the same. In practice, since the von Kries model is only an approximation of the illuminant variation phenomenon, the parameters of K and K differ. Hence we measure the robustness on the determination of the coefficients αi , i = 0, 1, 2, by analyzing the standard deviation of their estimates. Figure 3(right) reports the averages and standard deviations of the von Kries parameters estimated with color quantization of 256. Deviations increase when the image brightness is increased, i.e. the number of saturated pixels becomes larger. Tests on ECCV98 - Here we consider a subset of the database [8] consisting of the images of 11 objects captured under 5 different illuminants (halogen, mb-5000, mb-5000+3202, syl-cwf, ph-ulm). We refer to this subset as ECCV98. We took the object images captured under the illuminant halogen as references. This data is available at http://www.cs.sfu.ca. The mean accuracy for each illuminant and for different color quantizations are reported in Table 1. On average, A0 is 0.9311, while A = 0.9734. Table 2 shows the mean values of the von Kries coefficients and their standard deviations, as for ALOI.
4
Application to Image Recognition
Let us consider a set of known images (references) and let I be an unknown image (query). The illuminant invariant image recognition consists of finding the reference I0 that, although re-illuminated, is the most similar to the query.
Illuminant Change Estimation
47
Table 1. ECCV98: Mean accuracy (5) by varying the illuminant and the color quantization
Illuminant mb-5000 mb-5000+3202 ph-ulm syl-cwf
256 0.9767 0.9719 0.9733 0.9718
128 0.9766 0.9716 0.9733 0.9717
64 0.9763 0.9702 0.9732 0.9717
32 0.9755 0.9638 0.9731 0.9716
16 0.9723 0.9308 0.9726 0.9710
Table 2. ECCV98: Values of the von Kries parameters and their errors for N = 256 bins Illuminant mb-5000 mb-5000+3202 ph-ulm syl-cwf
α0 ± Δα0 0.4601 ± 0.1206 0.1915 ± 0.0406 0.7405 ± 0.1059 0.8596 ± 0.1357
α1 ± Δα1 0.8255±0.1949 0.5287± 0.0835 1.0792± 0.1675 0.9581± 0.1291
α2 ± Δα2 1.6004 ± 0.3787 2.0121 ± 0.3177 1.1068 ± 0.1892 1.6924 ± 0.2772
Our solution is outlined as follows: we compute the von Kries transforms mapping each reference onto the query and we associate a dissimilarity score to each of these transforms. The solution I0 is the image reference whose von Kries transform T (I0 ) has the minimum score from I. More precisely, let H be the color histogram of I. For each reference Ir of D with color histogram Hr , (i) we estimate the parameters α0 , α1 and α2 of the von Kries transform K mapping Ir onto I; (ii) for each i we compute the divergence dαi (H i , Hri ) defined in (4) and the dissimilarity score δ= dαi (H i , Hri ). (6) i
Thus, the solution of the image recognition problem is the image I0 of D such that the score (6) is minimized. Due to the dependency of the divergence (4) on the values αi (i = 0, 1, 2), δ does not satisfy the triangular inequality, and thus it is not a distance. Nevertheless, it is a query-sensitive dissimilarity measure, in the sense that it depends on the query [1]. The use of the score (6) instead of a Lp metric (p ≥ 1) between the color histograms is justified by the major robustness of (6) to color quantization. We say that a query I is correctly recognized if the reference image Ir of D minimizing (6) is a re-illuminated version of I. The performance of our approach has been evaluated using a recognition rate, defined as the ratio between the number of test images correctly recognized and the total number of test images. In our experiments, we considered the reference and the test sets of TESTS51 and ALOI defined in Section 3, while we excluded the test set ECCV98 because
48
M. Lecca and S. Messelodi
it contains only a few images and hence it is inadequate for testing retrieval performances. By comparing the reference and the test images by using the score (6) without estimating the von Kries map or enhancing the color, we obtained the following mean recognition rates: 0.20 for TESTS51, 0.77 for ALOI. The recognition rates obtained by means of our approach on TESTS51 and ALOI are shown in Figure 4. 1
1
0.95
0.9
0.9
0.8
256 128 64 32 16
0.85
0.7 recognition rate
recognition rate
0.8 0.75 0.7 0.65 0.6
256 128 64 32 16
0.55 0.5
0.5 0.4 0.3 0.2 0.1
0.45 0.4
0.6
0.5
0.75
1
1.25
1.5
1.75 βw
2
2.25
2.5
2.75
3
0 120
130
140
150
160
170 180 190 200 Lamp Voltage [x 0.047 V]
210
220
230
240
250
Fig. 4. Robustness of the recognition rate with respect to different color histogram quantizations for the datasets TESTS51 (left) and ALOI (right)
In the case of ALOI, we compared our results with the recognition rates obtained by two other approaches, employing the color normalization algorithms Gray World and ACE [13] (Automatic Color Equalization) respectively. The Gray World algorithm used in these tests performs the color enhancement by rescaling each channel by their mean value. The technique ACE relies on the Retinex theory [12], and combines the Gray World enhancement with a color constancy algorithm that separates the spectral distribution of the scene illuminant from the image brightness. We chose these color balancing techniques among the others because they are very popular. In our experiments, each reference as well as each query has been normalized by ACE and by Gray-World; then their color histograms have been compared by means of the score in (6). Figure 5 (left) shows the recognition rates obtained. The performances of our approach are very similar to those given by employing the Gray World color balancing, whereas ACE gave the worst results both in terms of recognition rate and computational run time. In fact the complexity of ACE is O(n2 ), while that of our approach and Gray World is O(n), where n is the number of image pixels. Our approach strongly differs from the recognition methods in which the illuminant invariance is achieved by enhancing the colors of the reference and test images [8]. In these methods, color enhancement is obtained by means of a color constancy algorithm, that illuminates each input image as if it had seen under a canonical known illuminant [2]. In the image/object recognition framework, each reference in D and every query are firstly illuminated under the canonical
Illuminant Change Estimation
49
non normalized ACE gray-world Our Approach 1
recognition rate 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2
recognition rate
0.8
0.6
0.5 0.4
1.0 1.5
0.2
σ
2.0 2.5
0 120
130
140
150
160
170
180
190
200
210
220
230
240
250 230 210 190 180 170 160 150 Lamp Voltage [x 0.047 V] 140 130
250
lamps voltage [x 0.047 V]
Fig. 5. ALOI: Left - Comparison among different methods for illuminant invariant image retrieval. The line labeled by non normalized shows the recognition rate when no color balancing is applied. Right - Recognition rate in presence of Gaussian Noise.
illuminant by a color constancy algorithm, then their re-illuminated versions are compared. If the references are captured using the canonical illuminant, the von Kries coefficients are computed as the element-wise ratios between the canonical illuminant and the estimate of the illuminant of the query. The color enhancing of references and query are completely avoided in our approach, as we directly compare the color histograms of the query to each reference through the query-sensitive score (6). Therefore, our method is more efficient than the recognition procedures based on color normalization, because it does not require any color pre-processing of the references in the database and of the query. Finally, we tested our recognition performances when Gaussian noise was added to the pictures. In particular, the test images of ALOI have been modified by convolving each image by a Gaussian filter with standard deviation σ = 0.5, 1.0, 1.5, 2.0, 2.5, 3.0. Figure 5(right) shows the results achieved: the recognition rate decreases with increasing levels of noise, in particular, for σ greater than 2.0 it is smaller than 0.75.
5
Conclusions and Future Directions
Our estimate of illuminant change performed impressively on the synthetic and real data, and offers high accuracy in illuminant invariant image recognition. Moreover, unlike the pixel-wise best fit mentioned in Section 2, our use of color histograms allows us to estimate the illuminant changes occurring between two images also when they have different size and orientation. Our future plans include a comparison of our approach with other methods for the estimation of the illuminant variation and its integration in the object recognizer MEMORI [10] to make this system robust to changes of light.
50
M. Lecca and S. Messelodi
References 1. Athitsos, V., Hadjieleftheriou, M., Kollios, G., Sclaroff, S.: Query-sensitive embeddings. In: Proc. of SIGMOD 2005, pp. 706–717. ACM Press, New York (2005) 2. Barnard, K., Cardei, V., Funt, B.: A comparison of computational color constancy algorithms. ii: Experiments with image data. IEEE Transactions on Image Processing 11(9), 985–996 (2002) 3. Berens, J., Finlayson, G.: Log-opponent chromaticity coding of colour space. In: 15th IEEE Int. Conf. on Pattern Recognition, pp. 206–211 (2000) 4. Chong, H.Y., Gortler, S.J., Zickler, T.: The von kries hypothesis and a basis for color constancy. In: Proc. of IEEE ICCV (October 2007) 5. Finlayson, C., Hordley, S., Schaefer, G., Tian, G.Y.: Illuminant and device invariance using histogram equalisation. In: IS&T and SID’s 11th Color Imaging Conference, pp. 205–211 (2003) 6. Finlayson, G.D., Drew, M.S., Funt, B.V.: Diagonal transforms suffice for color constancy. In: Proc. of International Conference of Computer Vision (1993) 7. Finlayson, G.D., Drew, M.S., Funt, B.V.: Color constancy: generalized diagonal transforms suffice. J. Optical Society of America 11(11), 3011–3019 (1994) 8. Funt, B.V., Barnard, K., Martin, L.: Is machine colour constancy good enough? In: Burkhardt, H.-J., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1406, pp. 445–459. Springer, Heidelberg (1998) 9. Geusebroek, J.M., Burghouts, G.J., Smeulders, A.W.M.: The Amsterdam library of object images. International Journal of Computer Vision 61(1), 103–112 (2005) 10. Lecca, M.: Object recognition in color images by the self configuring system MEMORI. International Journal of Signal Processing 3(3), 176–185 (2006) 11. Lecca, M., Messelodi, S.: Estimating illuminant changes in color images by color histogram comparison. Technical Report FBK - irst, TR 2008-04-001 (April 2008) 12. Provenzi, E., De Carli, L., Rizzi, A.: Mathematical definition and analysis of the retinex algorithm. Journal of the Optical Society of America. Optics, Image Science, and Vision 22(12) (2005) 13. Rizzi, A., Gatta, C., Marini, D.: From Retinex to ACE: issues in developing a new algorithm for unsupervised color equalization. Journal of Electronic Imaging 13(1) (2004) 14. Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: 3D object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. International Journal of Computer Vision 66(3), 231–259 (2006) 15. Schettini, R., Ciocca, G., Zuffi, S.: A survey of methods for colour image indexing and retrieval in image databases. In: Color Imaging Science: Exploiting Digital Media (2001) 16. Swain, M.J., Ballard, D.H.: Color indexing. Int. J. Comput. Vision 7(1), 11–32 (1991)
Illumination Chromaticity Estimation Based on Dichromatic Reflection Model and Imperfect Segmentation Johji Tajima Nagoya City University, Yamanohata, Mizuho, Nagoya 467-8501, Japan
[email protected]
Abstract. The illumination chromaticity estimation based on the dichromatic reflection model has not been made practicable, since the method needs image segmentation beforehand. However, its two-dimensional model is sufficiently robust, when it is combined with the least square method. The proposed algorithm executes the color space division instead of the segmentation. The original image is divided into small color regions, each of which corresponds to one of color sub-spaces. Though this division is imperfect image segmentation, the illumination chromaticity estimation based on the chromaticity distribution in the color regions is possible. Experimental result shows that this method is also applicable to images of apparently matt surfaces. Keywords: Illumination color estimation, Dichromatic reflection model, Color space division.
1 Introduction Illumination color/chromaticity estimation from an image is one of the essential problems in color image processing. Computational color constancy algorithms based on the estimation have been proposed for identification of objects under various illuminations. If illumination color is known, object identification by color information becomes easier. In digital cameras, ‘white balancing’ is an important function. It estimates the illumination color/chromaticity of the scene in a photographed image; then some other digital process compensates the estimated color shift of the image. Color image simulation under other illumination would also be possible if the illumination color can be estimated. Conventionally, the “Maximum of RGB (MOR)” algorithm or the “Gray World Assumption (GWA)” algorithm has been used for the illumination estimation in practice. They work reasonably well in normal cases. However, MOR fails if the image does not contain any ‘white’ object or objects with the maximum RGB values. It is self-evident that the GWA does not hold in many situations. For example, the average color of an image of green leaves is not gray. Recently, many algorithms have been proposed from the view point of color constancy. The “Color by Correlation” algorithm [1] compares the chromaticity gamut of A. Trémeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 51–61, 2009. © Springer-Verlag Berlin Heidelberg 2009
52
J. Tajima
an input image with the illumination color gamut which is filled by possible object colors under the illumination. If any color of the image is outside the illumination color gamut, the image is judged not to have been taken under the illumination. Possible illuminations are estimated in that way. The pattern recognition approach [2][3] is more straightforward, though the information used is similar to that in the color by correlation algorithm. A chromaticity diagram is divided into many tiny regions. The chromaticity distribution of an image is described as an occupation map of the regions. Considering the maps as feature vectors, the pattern learning process is carried out, using many images. After the learning process, arbitrary images can be classified. Chromaticity estimation performance is reported to be good. The drawback of these algorithms is that the image to be classified must have many colors. As an image with a small number of colors occupies a small area on the chromaticity diagram, its chromaticity distribution could be from under various illuminations; that makes it difficult to limit the range of possible illumination. In addition, the pattern recognition approach needs many images for learning. This paper deals with the illumination chromaticity estimation method based on the “Dichromatic Reflection Model,” with which the estimation is possible, in principle, even if the image has only two colors. However, it has been considered impractical, since the image must be segmented before the estimation process. This paper shows that the method is practical even if the image segmentation is imperfect.
2 Illumination Color Estimation Based on the Dichromatic Reflection Model Shafer proposed the “Dichromatic Reflection Model (DRM)” for computer vision [4]. The model is simply described by Eq.(1):
⎛ Ro ⎞ ⎛ Rw ⎞ ⎛R⎞ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ G ⎟ = α ⎜ Go ⎟ + β ⎜ Gw ⎟ ⎜B ⎟ ⎜B ⎟ ⎜B⎟ ⎝ ⎠ ⎝ o⎠ ⎝ w⎠
(1)
where (Ro , Go , Bo ) is the color of the diffuse reflection (i.e. the object color), and t
(Rw , Gw , Bw )t
is the color of the specular reflection (i.e. the illumination color). It is known that Eq. (1) holds for almost all non-metal objects [5]. The model means that t the observed color values (R, G , B ) of an object with one homogeneous color are distributed on a plane in the RGB three-dimensional color space. t When there is another object with another homogeneous color (R'o , G 'o , B'o ) in the image, observed color values are distributed on another plane. As the illumination color is common for all objects, two planes intersect at the illumination color vector (Fig.1). Hence the illumination color is estimated as follows. (1) (2)
Calculate the plane on which the color distribution of each object lies. Estimate the line where the planes intersect as the illumination color vector [6][7].
Illumination Chromaticity Estimation Based on Dichromatic Reflection Model
53
1
0.8
W
0.6
0.4
O2
0.2
O1 0 0
Fig. 1. Illumination estimation based on the Dichromatic Reflection Model
0.2
0.4
0.6
0.8
1
Fig. 2. Chromaticity of dichromatically reflected light lies on straight lines
This algorithm needs only two colors in an image for the estimation. However, this algorithm has not yet come into use, since each color object (or color region on an object) must be segmented before the distribution plane is calculated. This three-dimensional DRM can be reduced to the two-dimensional model [8]. In color science, it is known that the chromaticity of the mixture of two colored lights is on the straight line that connects the chromaticities of the two colors on the chromat ticity diagram. The chromaticity (x, y ) is calculated by Eq.(2), if the tri-stimulus
values are CIE-XYZ values. It is also true for (r, g ) , if the tri-stimulus values are RGB values. t
x=
X Y , y= X +Y + Z X +Y + Z
(2)
This means that (x, y ) is on the straight line that connects the object chromaticity t
(xo , yo )t
and the illumination chromaticity (xw , y w ) . If there are many objects, many lines that are oriented toward the illumination chromaticity should lie on the chromaticity diagram (Fig.2). Detecting the intersection of these lines, we can obtain the illumination chromaticity. The illumination chromaticity estimation using this two-dimensional model has been studied. Lehmann and Palm [9] applied the model to color lines around specular highlights in an image. This algorithm needs specular highlights in the image. Finlayson and Schaefer [10] discussed the model to estimate the chromaticity of Planckian radiator-like illuminations, assuming the segmentation is perfect. Ebner and Herrmann [11] applied the model to image regions which have been segmented, and whose color saturation is high. However, though it is necessary for the three-dimensional model to perfectly segment the image and precisely detect the plane, it is expected for the simplified twodimensional model that the illumination chromaticity estimation by the line detection could be made robust so that the image segmentation might not have to be perfect. In practical cases, the chromaticity of an object (or a color region) does not exactly lie on a line because of the image noise. For real applications, the following procedure is more reasonable. t
54
J. Tajima
(1)
Apply the principal component analysis (PCA) to the chromaticity value distribution of the pixels in each color region. Two eigenvalues σ1 and σ2 are obtained. If the relation σ1>>σ2 holds, we assume that the chromaticity values lie on the line which is in the direction of the first principal component. As the lines are not expected to cross each other at a point in real cases, the chromaticity point with the smallest squared distances from the lines is estimated to be the illumination chromaticity.
(2) (3)
The distance di between the i-th line (Eq.(3)) and the point (x,y) is expressed by Eq.(4). ai x + bi y + ci = 0
di =
ai x + bi y + ci ai2 + bi2
(3) (4)
The least square method, in which F ( x, y ) = ∑ wi d i2 is minimized, definitely deteri
mines the illumination chromaticity (xw , y w ) , where wi is a weight, which may be the area of the color region, since statistics would be more reliable if the region is larger. The minimization is carried out by solving the simultaneous equations (Eq.(5)). t
∂F (x, y ) = 0, ∂x
∂F ( x, y ) =0 ∂y
(5)
Now that the illumination chromaticity estimation is formulated on the least squares method, the estimation can be made robust using many color regions, even if each distribution line estimation result is not so reliable because of the image noise.
3 Color Space Division for Imperfect Segmentation The reason why the illumination color estimation based on DRM has not been developed might be the impression that the color distribution plane can be estimated only when a color region contains a wide-range color variation from the diffuse reflection to the specular reflection. For obtaining such a color region, a very powerful image segmentation technique would be necessary. However, in the previous section, the problem is reduced to the distribution line estimation in the two-dimensional chromaticity space. It is easier to detect a small chromaticity shift, even if the color region is small. For this purpose, we try to use a color image segmentation algorithm that segments an image based on the color information. The algorithm might divide a large color region that contains a wide-range color variation into several small sub-regions, each of which containing only a part of the wide color variation. However, it is sufficient if the sub-region contains a part of chromaticity shift from the object t t chromaticity (xo , yo ) to the illumination chromaticity (xw , y w ) . The color space division described in this section is a technique to be used to imperfectly segment an image. This technique was originally developed to represent a color image with a small number of colors. Using the algorithm, a full color
Illumination Chromaticity Estimation Based on Dichromatic Reflection Model
55
(24-bit/pixel) image could be represented by 256 colors without visual degradation [12]. In this application, the algorithm is adjusted so that normal images are represented by about 20~50 colors. The color space division is carried out in the following steps. (1) (2) (3)
Convert the color space from RGB to CIELAB, and consider the whole three-dimensional space as one color sub-space. Compute the pixel color distribution in each color sub-space, and apply ‘principal component analysis (PCA)’. Divide the sub-space by a plane that is perpendicular to the first principal axis, so that the new two sub-spaces satisfy the criterion for the discriminant analysis [13]. This division is not carried out when either of the following two conditions is satisfied. (a) The color difference between the mean colors of the pixels in the two subspaces is smaller than a predetermined threshold (=th). th is determined so that all colors within the final sub-space may be considered to have the same color. (b) The number of pixels in one of the two sub-spaces is smaller than a predetermined threshold (=n). This prevents a sub-space being generated by the image noise.
(4)
If there is no sub-space that can be divided, finish the division. Otherwise, return to step (2).
The division takes place in the three-dimensional L*-a*-b* color space, and every pixel is classified to one of the sub-spaces. For the explanation purpose, this division is illustrated in the two-dimensional a*-b* color space in Fig.3. A color distribution is illustrated as a contour map. For the pixel colors in the whole space S0, the principal axis PC0 is calculated. The division plane DP0, which is perpendicular to PC0, divides S0 to generate sub-spaces S1 and S2. This procedure is repeated for the sub-spaces, and sub-spaces S11 and S12 are generated from S1 based on the principal axis PC1 and the division plane DP1. S21 and S22 are generated from S2 by the same procedure. b*
S0
S1
PC0
S12
S11
DP1
S11
S12
PC1
S22
DP0
S21
a*
S22
S21
PC2 S2
DP2
S22
Fig. 3. Color space division
S21
S12
Fig. 4. Color region image. Each region corresponds to the color sub-space.
56
J. Tajima
Though this is the color space division procedure, the color image is also divided into color regions, each of which corresponds to a sub-space (Fig.4). However, multiple color regions in an image may correspond to a single color sub-space. To make sure that the chromaticity distribution of one color region is analyzed, the connected component labeling process is applied to the color region image. By this process, many connected color regions are generated. For these color regions, chromaticity distribution is analyzed.
4 Experiments 4.1 Illumination Chromaticity Estimation
As the first experiment, a scene that contains typical dichromatic reflection objects was processed (Fig.5). Colorful plastic objects are illuminated by fluorescent lamps in a lighting booth (Gretag-Macbeth Judge II®). Figure 5(a) shows the scene under the illuminant D65 simulator (Condition 1). The image was taken by a consumer use
(a)
(c)
(b)
(d)
Fig. 5. Experiment on an image with dichromatic objects: (a) Input image under Condition 1. (b) Color region image (54 colors). (c) Chromaticity distribution of reliable pixels. (d) Illumination chromaticity estimation.
Illumination Chromaticity Estimation Based on Dichromatic Reflection Model
57
digital camera Fujifilm FinePix F31fd. The auto white balancing function was off, and the color balance was set to ‘Daylight’. Auto exposure control function was used, and specular highlights are included in this image. Image size was reduced to 640×480 pixels. Obtained RGB pixel values are dealt with as sRGB values. The color space division procedure requires two parameters as described in 3. The parameters th and n were set to 10 and 256, respectively. For this image, the whole color space was divided into 54 subspaces. After the connected component labeling, the original image was divided into 7297 color regions. The color region image is shown in Fig. 5(b). Large color region with wide color variation (e.g. the red color region in the bell part of the toy trumpet) is divided into four or five color regions, while some color region (e.g. the pink spoon) is united with a neighboring region. The chromaticity estimation was carried out with reliable pixels in a color region with ‘good’ characteristics. Four threshold values were defined. Chromaticity calculation is reliable only for the pixels that are neither very dark nor saturated. Hence, chromaticity was calculated for the pixels that satisfies the following conditions: (a) R + G + B ≥ RGBmin and (b) max (R, G, B ) ≤ RGBmax . In these calculations, the linearized R, G and B values (i.e. no original sRGB values) were used. A color region with the ‘good’ characteristics is a color region in which the number of reliable pixels is sufficiently large and the pixel chromaticities are distributed in a linear shape. The minimum for the number of reliable pixels was defined as nr. The distribution shape was defined by the ratio between the standard deviations σ1 and σ2 in two principal axes. These threshold values were temporarily set to RGBmin=50, RGBmax=240, nr=200 and ratio=2.0, respectively. However, the estimation result was not sensitive to the threshold selection. Figure 5(c) shows the chromaticity distribution of reliable pixels. On the x-y chromaticity diagram, blackbody chromaticity locus (yellow) and daylight chromaticity locus (orange) were added. In addition, the first principal axes for ‘good’ color regions were depicted with white lines in Fig.5 (d), where the estimated illumination chromaticity was shown with a red cross, too. The estimated chromaticity for this
(a)
(b)
Fig. 6. Typical chromaticity distribution in a color region. The chromaticity of the color region painted in white in (a) is distributed in the white region in (b).
58
J. Tajima
(a)
(b)
Fig. 7. Experiment under Condition2: (a) Input image under Condition 2. (b) Illumination chromaticity estimation.
image was (x1,y1)=(0.2741, 0.3064). Typical chromaticity distribution of a color region is shown in Fig.6 (a) and (b). The area painted in white in Fig.6 (a) is a part of a red region, where there are 2516 reliable pixels. The chromaticity distribution of the region is also painted in white in Fig.6 (b). The ratio (=σ1/σ2) of the region is 6.52, and its linear shape can be clearly observed. Though the correct chromaticity of D65 illuminant is (x,y)=(0.3127, 0.3290), camera characteristics are related to the color reproduction. The comparison with the chromaticity of the neutral gray (N7) on the inner surface of the booth is more appropriate. Assuming that the image data is generated in sRGB space, the mean chroma-ticity of the gray background of the image in Fig.5 (a) was (0.283, 0,312). The chromaticity estimation error was about -0.009 in x direction and -0.006 in y direction. The same scene was also taken under the illuminant A simulator (Condition 2 Fig.7(a)). The image looks very yellowish. The same processing as above was applied, and Fig.7 (b) shows the estimation result. In this case, the estimated chromaticity was (x2,y2)=(0.4775, 0.4158). The mean chromaticity of the gray background in this image was (0.448, 0.415). The estimation error was about +0.027 in x direction and +0.001 in y direction. The estimation for the scene was successful, and the estimation error was reasonably small, though the estimation of x for Condition 2 was not so accurate. However, the scene consists of plastic objects, whose reflection is ideally composed of diffuse reflection and specular reflection. We applied the algorithm to another scene, which contains only ‘Gretag-Macbeth Color Checker®’ under Condition 1. The surface of the colored areas is very matt, and no specular reflection can be observed (Fig. 8(a)). The same image processing as to the previous image (Fig. 5) was applied. As the Color Checker is flat and matt, each color region on the Color Checker was clearly divided as one color region after the color space division (Fig. 8(b)). The whole color space was divided into 30 subspaces. After the connected component labeling, the original image was divided into 6064 color regions. Figure 8(c) shows the chromaticity distribution of reliable pixels, and Fig. 8(d) shows the first principal axes and the estimated illumination chromaticity.
Illumination Chromaticity Estimation Based on Dichromatic Reflection Model
(a)
(c)
59
(b)
(d)
Fig. 8. Experiment on an image with matt objects. (a) Original image. (b) Color region image (30 colors). (c) Chromaticity distribution of reliable pixels. (d) Illumination chromaticity estimation.
Very interestingly, the color distribution of the pixels in each large color region, which corresponds to one color area on the Color Checker, is sufficiently elliptical and can be used for the illumination chromaticity estimation. The estimation result was (x1,y1)=(0.2810, 0.3069). The mean chromaticity of the gray background, in this case, was (0.286, 0.312). The chromaticity estimation error was about -0.005 in both x and y directions. The same scene was also taken under Condition 2, and the estimation was carried out. In this case, the estimated chromaticity was (x2,y2)=(0.4169, 0.4125). The estimation error was about -0.034 in x direction and -0.001 in y direction. 4.2 Discussion
In the previous subsection, it was shown that the illumination chromaticity estimation algorithm based on the two-dimensional DRM and the color space division is applicable to real images. One image contained typical objects with diffuse reflection and specular reflection, while another image contained objects with very matt surfaces. The estimation algorithm worked very well not only for the former image which has ideal characteristics for the algorithm, but also for the latter image without apparent specular reflection. In the real world, object surfaces would have the reflection characteristics between these two. The estimation algorithm can be applicable to most surfaces in real scenes.
60
J. Tajima
The estimation result was sufficiently good for the images under Condition 1 (illuminant D65 simulator). However, it was not for the images under Condition 2 (illuminant A simulator). Under Condition 2, x coordinate estimation was especially inaccurate. The reason is not clear at this moment. Though more experiments should be carried out for the analysis and the algorithm improvement, it may be partly due to the fact that the camera used was one for consumer use. Color calibration may not be accurate, or some color rendering process may have been applied to the original signal. In addition, it may be due to the chromaticity distribution under Condition 2. From Fig. 7(b), it is observed that chromaticity distribution is pushed to upper area of the chromaticity diagram. Major first principal axes are oriented nearly parallel. This may cause the inaccuracy, especially in x direction.
5 Conclusions In this paper, the illumination chromaticity estimation using the two-dimensional DRM combined with imperfect image segmentation was proposed. Though the color space division algorithm may divide a large homogeneous color region into several small regions or merge several color regions, the robust chromaticity estimation could find the illumination chromaticity based on the least squares method, allowing segmentation error to a certain extent. Wide dynamic range color regions that include both diffuse reflection and specular reflection areas were not necessary. The proposed algorithm was applied to real images taken by a digital camera. The result was encouraging. In addition, it turned out that the two-dimensional DRM is also applicable to apparently matt surfaces like Gretag-Macbeth Color Checker. The scenes used for this experiment included many colors. The algorithm is considered to be useful for the scenes with a few colors, in principle. However, no algorithm is all-round for all images. In practice, combining the algorithm with other simpler methods (e.g. MOR or GWA) would be desirable.
Acknowledgement This work was partly supported by KAKENHI 20500160.
References 1. Finlayson, G.D., Hubel, P.M., Hordley, S.: Color by Correlation. In: Proc. 5th Color Imaging Conference, pp. 6–11 (1997) 2. Barnard, K., Cardei, V., Funt, B.: A Comparison of Computational Color Constancy Algorithms – Part I & II. IEEE Trans. on Image Processing 1(9), 972–996 (2002) 3. Funt, B., Xiong, W.: Estimating Illumination Chromaticity via Support Vector Regression. In: Proc. 12th Color Imaging Conference, pp. 47–52 (2004) 4. Shafer, S.A.: Using Color to Separate Reflection Components. Color Res. Appl. 10, 210– 218 (1985) 5. Tominaga, S.: Surface Identification Using the Dichromatic Reflection Model. IEEE Trans. PAMI-13, 658–670 (1991)
Illumination Chromaticity Estimation Based on Dichromatic Reflection Model
61
6. Saito, T.: Method and Apparatus for Illumination Color Measurement of a Color Image. Japanese Patent 2081885 (1988) (in Japanese) 7. Tominaga, S.: Consideration on a Color Reflection Model for Object Surfaces. IPSJ, 1988CVIM-059 (1989) (in Japanese) 8. Saito, T.: Method and Apparatus for Illumination Chromaticity Measurement of a Color Image. Japanese Patent 2508237 (1989) (in Japanese) 9. Lehmann, T.M., Palm, C.: Color Line Search for Illuminant Estimation in Real-World Scenes. J. Opt. Soc. Am. A 18(11), 2679–2691 (2001) 10. Finlayson, G.D., Schaefer, G.: Solving for Colour Constancy using a Constrained Dichromatic Reflection Model. IJCV 42(3), 127–144 (2001) 11. Ebner, M., Herrmann, C.: On Determining the Color of the Illuminant Using the Dichromatic Reflection Model. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds.) DAGM 2005. LNCS, vol. 3663, pp. 1–8. Springer, Heidelberg (2005) 12. Tajima, J., Ikeda, T.: High Quality Color Image Quantization, Utilizing Human Vision Characteristics. J. IIEEJ, 293–301 (1989) (in Japanese) 13. Ohtsu, N.: A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. SMC-9(1), 62–66 (1979)
An Improved Image Re-indexing Technique by Self Organizing Motor Maps Sebastiano Battiato, Francesco Rundo, and Filippo Stanco Dipartimento di Matematica e Informatica, University of Catania, Viale A. Doria, 6, 95125 Catania, Italy {battiato, rundo, fstanco}@dmi.unict.it http://iplab.dmi.unict.it
Abstract. The paper presents a novel Motor Map neural network for re-indexing color mapped images. The overall learning process is able to smooth the local spatial redundancy of the indexes of the input image. Differently than before, the proposed optimization process is specifically devoted to re-organize the matrix of differences of the indexes computed according to some predefined patterns. Experimental results show that the proposed approach achieves good performances both in terms of compression ratio and zero order entropy of local differences. Also its computational complexity is competitive with previous works in the field.
1
Introduction
Color-mapped images [1] make use of an index map to store the different involved colors maintaining for each pixel the location of the corresponding index. Reindexing techniques reduce the local indexes redundancy try to find the optimal reordering avoiding to consider all possible color indexing (M ! for an image with M colors). The existing re-indexing algorithms are devoted to obtain color and index similarity [2]. The color based solutions [3, 4, 5, 6, 7, 8] assign consecutive symbols to visually similar colors according to some heuristic measures (typically, the indexes are sorted by luminance order [3]). Alternatively, index based methods [9, 10, 11, 12] are guided by both information theory and local adaptive considerations even if they have an intrinsic inefficiency to numerically optimize the palette re-indexing. To overcome this problem different heuristics have been proposed [2, 13, 14]. Recently, in [15, 16] improvements have been obtained by pre-processing the indexed input image with respect to the available encoder. Such methods provide only a much more sophisticated ”coded” representation and they cannot be classified as methods for re-indexing. In [17], a Motor Map (MM) neural network [18] has been used to properly learn the shape palette clustering for searching (in the output stage of the network) the optimum indexing scheme. The overall results have proved the ability of the self organizing process to obtain effective results. In this paper we propose a method that outperform existing approaches by properly design a MM neural network that works and manipulates directly the A. Tr´ emeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 62–70, 2009. c Springer-Verlag Berlin Heidelberg 2009
An Improved Image Re-indexing Technique by Self Organizing Motor Maps
63
matrix of differences of the input image. This allows to exploit the solution space in a profitable way. As in [17] the ability of MM to find an optimal solution without requiring the knowledge of the underlying model has been crucial in this context. The overall performances have been evaluated by considering the same repository used in [2] that contains images with different size and number of colors. Experimental results show how the proposed method outperforms previous results in the field both in terms of overall compression ratio and residual zero-order entropy of the images with the best performance even in terms of computational complexity. The paper is structured as follows. Section 2 introduces the re-indexing problem, while the proposed approache is detailed in Section 3. Experimental results are presented in Section 4. Conclusions are drawn in Section 5.
2
Problem Formulation
The re-indexing problem can be stated as expressed in [2,14]. Let I be an image of m × n pixels, and M be the number of distinct colors. I can be represented as I(x, y) = P (I (x, y)), where P = {S1 , S2 , . . . , SM } is the set of all the colors in I, and I is a m × n matrix of indexes in {1, 2, . . . , M }. An image represented in such a fashion is called indexed image (or color mapped image) and P is its palette. For indexed image, an ordered scan of the indexes in I named p1 , . . . , pm×n is usually performed. The residual entropy of local differences can be considered to estimate the overall ”energy” of the signal. The information needed to reconstruct the original image is: 1) the color of pixel p1 ; 2) a table providing the correspondence between colors S1 , S2 , . . . , SM with index i1 , i2 , . . . , iM ; 3) the set of differences: D(I ) = {dx,y |x = 1, 2, . . . , m y = 1, 2, . . . , n} where each dx,y is a local difference obtained by considering some specific patterns as better specified below. Information theory states that any lossless scheme to encode the set of differences D(I ) requires a number of bits per pixel (bpp) greater or equal to the true source entropy. For our purposes it is sufficient to measure the zero-order entropy of the statistical distribution of D(I ) properly managing the proposed optimization process. If indexes i1 , i2 , . . . , iM are ordered so as to produce an almost uniform distribution of values dx,y the entropy value will be large. Conversely, a zero-peaked distribution in D(I ) gives a lower entropy value. Hence, finding an optimal indexing scheme is a crucial step for any lossless compression of indexed images.
3 3.1
The MM Re-indexing Algorithm MM for Re-indexing
The re-indexing problem is addressed in literature by means of two main approaches [2]: the ones based on color information and the other ones based on the information retrieved from index matrix . In [17] we have showed that MM
64
S. Battiato, F. Rundo, and F. Stanco
is able to reach high performance in searching optimum indexing scheme, by means of color based training set. In [17] we show how MM is also able to reach near ”optimum” indexing scheme by means of index based training set. This shows the high capability of MM to solve complex problem such as the image reindexing. The self organization capability is strongly related to the ”knowledge” of problem to be solved. More information and detailed problem descriptions increase the ability of the MM to provide right solution for the problem. The results shows that the optimum re-indexing scheme is ”more” related to index based information with respect to the color palette as the obtained results outperform the same obtained with previous version [17]. This is a result which has not been discovered till now. The proposed algorithm is based on the ability of the MM neural network to learn the ”features” of the input pattern providing an appropriate output stimulus. The learning process has been modified respect to the algorithm in [17] in order to adapt itself to reduce the local spatial redundancy. We assume that the MM can learn the main information about the entropy minimization of the source image I if it is trained by the elements of the matrix D and not by using only the color of the index. In this way the structural characteristics (e.g., presence of edges, flat areas, etc. ) of the underlying image are properly addressed. As shown in the experimental section, the final matrix of indexes obtained at the end of the optimization process is clearly smoother than before allowing to achieve better compression results. The overall approach can be described by the following steps. Step 1. The topology of the MM has been established by making use of a lattice structure of (m−1)×(n−1) neurons, where m and n are the original dimensions of the input image I. Let Q = (m − 1) × (n − 1) be the number of neurons of the MM. Each neuron is composed by an input weight wiin , i = 1, . . . , Q with wiin ∈ [0, 1] and output weight wiout , i = 1, . . . , Q and a variable bi , i = 1, . . . , Q which stores the average increasing of the reward function. The range of the wiout values is [1, . . . , M ]. In our case, the following reward function has been chosen: 2 Reward = − D(I ) (1) The above reward is strictly proportional to the zero-order entropy of the differences matrix D(I ). Moreover, the selection of the above reward function leads the MM to find an optimum palette index scheme which minimize the entropy of the image and the relative compression ratio. Regarding the output stimulus produced by the MM during the learning phase, in this work, it has been forced equal to wiout . The wiout will be equal to a random index generated during the learning process when the corresponding neuron wins. Before to start the learning phase, the MM (both input layer and output layer) is initialized randomly. Before starting the learning phase, a palette pre-processing is applied (i.e., the corresponding lightness factor of the input colors are sorted in increasing order). Step 2. Let D(I ) the matrix of differences computed starting form the index matrix I . The set of differences D(I ) has been computed by using different local pattern configurations [19]:
An Improved Image Re-indexing Technique by Self Organizing Motor Maps
DV1 (I ) = {Ix,y − Ix,y−1 }
(2)
DV2 (I ) = {Ix,y − Ix−1,y }
(3)
DV3 (I ) =
DV4 (I ) =
65
2 ∗ Ix,y − Ix,y−1 − Ix−1,y 2
3 ∗ Ix,y − Ix,y−1 − Ix−1,y − Ix−1,y−1 3
(4) (5)
where x, y are the corresponding valid indexes in the original m×n image I. Just using an initial configuration V1 and considering for simplicity only the absolute value of differences we obtain the initial configuration. Step 3. Each element of the matrix D, normalized in the range [0, 1], is fed to the input layer of the MM, one element at each iteration, searching the winner neuron (i.e., the neuron which has the minimum value of the following value): vi = |dx,y − wiin | i = 1, 2, . . . , Q;
(6)
where x = 1, 2, . . . , m; and y = 1, 2, . . . , n. The winner neuron provides an updating of the output stimulus which is, in this case, a new index (i.e., a new index for the related color on the corresponding palette performing also the related swaps on the indexes image). The new index indrand x,y is generated randomly in the range [1, 2, . . . , M ]. After the index updating, the new reward function can be computed updating only the elements in the associated matrix D(I ) which have been involved in the indexes swapping, in order to speed up the algorithm execution time. The ΔReward will be computed as: ΔReward = (Rewardnew − Rewardold )2
(7)
The average increasing of the reward function is weighted by the bnew win : old old bnew win = bwin + ρ(ΔReward − bwin )
(8)
where ρ is a positive value related to the smoothing action. Step 4. If the ΔReward ≥ bnew win and the new current entropy (really the new current sum of absolute differences) is better than the already ones processed till now, the new index scheme will be accepted and the weights of the winner neurons will be updated: in in in wwin (t + 1) = wwin (t) + η(dx,y − wwin (t)) out wwin (t
+ 1) =
indrand x,y
(9) (10)
where η is the learning rate factor. After that, the learning steps (from 2 to 4) is repeated until the stop criteria is verified. Conversely, the new index scheme will
66
S. Battiato, F. Rundo, and F. Stanco
Fig. 1. Flow chart of the proposed MM algorithm
be rejected and the previous ones will be restored. Fig. 1 reports a schematic representation of the overall process. In the proposed MM architecture, the neuron has not an adaptive neighboring and the learning rate remains constant during all the learning phase. The stop of the learning process is reached when the computed entropy is less or equal to a specific lower bound value or after a fixed number of epochs (an epoch is a number of cycles needed for presenting all the input patterns to neural network). To avoid local minimum a standard random scrolling of the weights are employed; it provides a perturbation of the neurons by using a gaussian random variable. 3.2
Algorithm Parameters
The MM parameters are chosen according to trial-and-error policies as well as heuristic considerations. In particular, we have chosen η = 0.95 and ρ = 0.90. Maximum number of learning cycle k has been heuristically set to 106 in all involved experiments.
An Improved Image Re-indexing Technique by Self Organizing Motor Maps
3.3
67
Computational Complexity
Let M the overall number of involved colors of an input image I having N = m× n pixels. The proposed technique requires a preprocessing phase devoted to sort the input colors according to their lightness factor. Each learning cycle have to compute the reward function, by considering just a single index swap in the matrix index D(I ). The overall computational complexity is O(M logM ) + O(kN 2 ) where k is the number of learning cycles. Further experiments will be devoted to measure the average number of iteration k required to achieve convergence. In the current implementation we have used 106 as maximum possible value. As showed in [17] the proposed MM re-indexing is competitive with the state of the art solutions: O(M logM ) for luminance order, O(M 2 logM ) for Battiato’s approach [14], O(M 3 ) for Zeng’s [13] and its modification [19], and O(M 4 ) for Memon [12]. See [2, 17, 20] about interesting considerations over computational complexity for re-indexing process.
4
Experimental Results
In order to check the performances of the MM as palette re-indexing algorithm (called MMap new), we propose the comparison between our method and the most important reordering methods [12, 13, 14, 17, 19]. For sake of comparison, the dataset used is the same of [2]. In our experiments we have used the following two groups: a set of natural images also known as ’kodak’ database, and a set of popular natural images. These contain quantized version (non-dithered) of the same images with 64 colors. Table 1 reports the final residual entropy of the local differences by making use of pattern Vi with i = 1, . . . , 4 applied to the two datasets considering the current approach MMap new and MMap [17]. The proposed new approach clearly outperform [17] in all cases. As stated above, the MMap new re-indexing scheme has been applied by computing the residual zero order entropy of local differences D(I ) by considering first the pattern V1 and then re-applying the overall process just considering pattern V2 . A useful comparisons of the final entropy values between the proposed method MMap new and the others is reported in Table 2. Table 1. Residual entropy of local differences computed by using Vi , i = 1, . . . , 4 MMap [17] V1 V2 V3 Natural1 2,534 2,693 2,621 Natural2 2,753 2,787 2,774 MMap new V1 V2 V3 Natural1 1,416 1,512 1,496 Natural2 2,739 2,737 2,741
V4 2,735 2,874 V4 1,545 2,834
68
S. Battiato, F. Rundo, and F. Stanco
(a)
(b) Entropy=1,785
(c) Entropy=1,556
(d) Entropy=1,523
(e) Entropy=1,676
(f) Entropy=0,889
Fig. 2. Different re-indexing schemes. (a) Color image; (b) Original indexes; (c) Memon [12]; (d) mZeng [19]; (e) MMap [17]; (f) MMap new.
Finally, Table 3 reports the bits rate in terms of bpp (bit per pixels) obtained by lossless compression of the datasets after palette reordering with JPEG-LS1 , and PNG2 , respectively. By using all coding engines, the values of bpps of the proposed approach are considerable lower than the other methods, in some cases the differences are substantial. Finally, we show the final re-ordered palette for a 1 2
SPMG/JPEG-LS Encoder 1.0. Matlab 7.0.1.
An Improved Image Re-indexing Technique by Self Organizing Motor Maps
69
Table 2. Residual zero-order entropy of images before and after using the palette reordering methods (Memon [12], Zeng [13], mZeng [19], Battiato [14] and MMap [17])
Natural1 Natural2
Random Lum. Memon Zeng mZeng Battiato MMap MMap new 3,616 3,101 2,444 2,491 2,491 2,576 2,534 1,416 4,414 3,944 3,100 3,236 3,226 3,370 3,133 2,739
Table 3. Lossless compression results in bit per pixel, obtained with Jpeg-LS and PNG applied to the indexed images after using the palette re-ordering methods (Memon [12], Zeng [13], mZeng [19], Battiato [14] and MMap [17]) Type Random Lum. Memon Zeng mZeng Battiato MMap MMap new JPEG-LS Natural1 3,661 3,002 2,641 2,804 2,709 3,144 2,710 2,431 Natural2 3,967 3,069 2,793 3,060 2,912 3,281 2,921 2,998 PNG Natural1 4,234 3,613 3,438 3,497 3,529 3,116 3,009 2,091 Natural2 5,396 4,715 4,558 4,545 4,671 4,130 3,307 3,424
single image just to visually evaluate the smoothness obtained with the different methods (Fig. 2).
5
Conclusion and Future Work
Palette reordering is a very effective approach for improving the compression of color-indexed images. In this paper, we described a technique that shows a good performance on the optimum palette scheme generation making use of a self organizing MM that works directly on the spatial smoothness of the matrix of differences of the input image. Preliminary experiments confirm the real effectiveness of the proposed approach. Future works will be devoted to extend the experimental phase to larger dataset. Also the possibility to design proper reward function, specifically addressed to the specific codec engine will be considered.
References 1. Battiato, S., Lukac, R.: Color-Mapped Imaging. In: Furth, B. (ed.) Encyclopedia of Multimedia, pp. 83–88. Springer, Heidelberg (2008) 2. Pinho, A.J., Neves, A.J.R.: A survey on palette reordering methods for improving the compression of color-indexed images. IEEE Transactions on Image Processing 13(11) (November 2004) 3. Zaccarin, A., Liu, B.: A novel approach for coding color quantized images. IEEE Transactions on Image Processing 2, 442–453 (1993) 4. Spira, A., Malah, D.: Improved lossless compression of color-mapped images by an approximate solution of the traveling salesman problem. In: IEEE Int. Conf. Acoustics, Speech, Signal Processing, May 2001, vol. III, pp. 1797–1800 (2001) 5. Po, L.M., Tan, W.T.: Block address predictive color quantization image compression. Electron. Lett. 30(2), 120–121 (1994)
70
S. Battiato, F. Rundo, and F. Stanco
6. Hadenfeldt, A.C., Sayood, K.: Compression of color-mapped images. IEEE Trans. Geosci. Remote Sens. 32, 534–541 (1994) 7. Lai, J.Z.C., Liaw, Y.-C.: A novel approach of reordering color palette for indexed image compression. IEEE Signal Processing Letters 14(2), 117–120 (2007) 8. Chuang, W.-H., Pei, S.-C.: A low-complexity palette re-indexing technique based on sampling-swapping. In: Proc. of IEEE International Conference on Image Processing, pp. 1029–1032 (2008) 9. Fojtik, J., Vaclav, H.: Invisible modification of the palette colour image enhancing lossless compression. In: Amin, A., Pudil, P., Dori, D. (eds.) SPR 1998 and SSPR 1998. LNCS, vol. 1451, pp. 1029–1036. Springer, Heidelberg (1998) 10. Waldemar, P., Ramstad, T.A.: Subband coding of color images with limited palette size. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 1994, April 1994, No. V, pp. 353–356 (1994) 11. Gormish, M.J.: Compression of palletized images by colour. In: Proc. of IEEE International Conference on Image Processing (1995) 12. Memon, N., Venkateswaran, A.: On ordering colour maps for lossless predictive coding. IEEE Trans. Image Proc. 5(11), 1522–1527 (1996) 13. Zeng, W., Li, J., Lei, S.: An efficient colour re-indexing scheme for palette-based compression. In: Proc. of 7th IEEE International Conference on Image Processing, pp. 476–479 (2000) 14. Battiato, S., Gallo, G., Impoco, G., Stanco, F.: An efficient re-indexing algorithm for color-mapped images. IEEE Transactions on Image Processing 13(11), 1419– 1423 (2004) 15. You, K.-S., Han, D.-S., Jang, E.S., Jang, S.-Y., Lee, S.-K., Kwak, H.-S.: Ranked image generation for arithmetic coding in indexed color image. In: Proceedings of 7th International Workshop on Enterprise networking and Computing in Healthcare Industry, HEALTHCOM, June 2005, pp. 299–302 (2005) 16. Neves, A.J.R., Pinho, A.J.: A bit-plane approach for lossless compression of colorquantized images. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP 2006, May 2006, vol. 13 (2006) 17. Battiato, S., Rundo, F., Stanco, F.: Self organizing motor maps for color-mapped image re-indexing. IEEE Transactions on Image Processing 16(12), 2905–2915 (2007) 18. Arena, P., Fortuna, L., Frasca, M., Sicurella, G.: An adaptive, self-organizing dynamical system for hierarchical control of bio-inspired locomotion. IEEE Transactions on Systems, Man, and Cybernetics, Part B 34, 1823–1837 (2004) 19. Pinho, A.J., Neves, A.J.R.: On the relation between memon’s and the modified zeng’s palette reordering methods. Image Vision Comput. 24(5), 534–540 (2006) 20. Pei, S.-C., Chuang, Y.-T., Chuang, W.-H.: Effective palette indexing for image compression using self-organization of kohonen feature map. IEEE Transaction on Image Processing 15(9), 2493–2498 (2006)
KANSEI Based Clothing Fabric Image Retrieval Yen-Wei Chen1,2, Shota Sobue2, and Xinyin Huang3 1
Elect & Inf. Eng. School, Central South Univ. of Forest and Tech., Changsha, China 2 Graduate School of Science and Engineering, Ritsumeikan University, Japan 3 School of Education, Soochow University, Suzhou, China
Abstract. KANSEI is a Japanese term which means psychological feeling or image of a product. KANSEI engineering refers to the translation of consumers' psychological feeling about a product into perceptual design elements. Recently KANSEI based image indexing or image retrieval have been done by using interactive genetic algorithms (IGA). In this paper, we propose a new technique for clothing fabric image retrieval based on KANSEI (impressions). We first learn the mapping function between the fabric image features and the KANSEI and then the images in the database are projected into the KANSEI space (psychological space). The retrieval is done in the psychological space by comparing the query impression with the projection of the images in database. Keywords: Image retrieval, KANSEI, mapping function, image features, psychological space, impression, semantic differential (SD) method, neural network, principal component analysis.
1 Introduction Recently a growing increase has been seen in digital images. Image retrieval has become an important issue for database management and computer vision. To date, many researches have been done for image retrieval. They can be divided into two groups: text-based image retrieval and image contents based image retrieval [1]. The text-based image retrieval [2] was started from the late of 1970’s. In text-based image retrieval system, some keywords should be first manually annotated to each image in the database. The drawback of text-based image retrieval is that the vast amount of labor required in manual image annotation and the annotation impreciseness due to the subjectivity of human perception. In the early of 1990’s, contents-based image retrieval was proposed to overcome the above problems. In contents-based image retrieval, images are indexed by their own visual content, such as color and texture [1, 3-7]. Both text-based and contents-based methods lack the capability of utilize human intuition and KANSEI (impression). KANSEI is a Japanese term which means psychological feeling or image of a product. KANSEI engineering refers to the translation of consumers' psychological feeling about a product into perceptual design elements [8]. Recently KANSEI based image indexing or image retrieval systems have been done by using interactive genetic algorithms (IGA) [9,10]. In IGA-based image retrieval systems, the retrieval results are evaluated by users and the retrieval process is repeated until the user is A. Trémeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 71–80, 2009. © Springer-Verlag Berlin Heidelberg 2009
72
Y.-W. Chen, S. Sobue, and X. Huang
satisfied. The IGA-based image retrieval system could retrieval the images based on human intuition or KANSEI (impression), but it is time and labor consuming method. In this paper, we propose a new technique for clothing fabric image retrieval based on KANSEI (impressions). We first learn the mapping function between the fabric image features and the human KNASEI factors. In our previous studies, we have significantly estimated the mapping functions from the image feature space to the KANSEI space for four groups with different ages [11]. We use the semantic differential (SD) method to extract the KANSEI factors (impressions) such as bright, warm from human while they viewing an fabric image. A neural network is used to learn the mapping functions from the image feature space to human KANSEI factor space (psychological space) and then the images in the database are projected into the psychological space. The retrieval is done in the psychological space by comparing the query impression with the projection of the fabric images in the database.
2 Mapping Functions In order to make a quantitative study on the relationship between the image features and KANSEI factors, we construct an image feature space and a KANSEI factor space (psychological space) as shown in Fig.1.
Fig. 1. Mapping Functions from image feature space into psychological space
One fabric image has one point in the image feature space and has a corresponding point in the KANSEI factor space, which is an impression (KANSEI) of the subject to the image, just like a projection of the image feature to the psychological space. The relationship between the image features and the KANSEI can be described by the mapping function from the image feature space to the psychological space. The input of the mapping function is the image features and the output of the function is the KANSEI factors (impressions). The mapping functions (relationships) can be learned by finding the corresponding points in the image feature space and psychological space.
3 Psychological Features (Impressions) In order to find a corresponding point in psychological space, we use the semantic differential (SD) method to extract the KANSEI factors (impressions) such as bright, warm from 8 adults while they viewing an image (material). In this research, we use
KANSEI Based Clothing Fabric Image Retrieval
73
Table 1. 23 pairs of adjectives
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23.
strange - familiar unique - usual bright - dark Interesting - uninteresting scary - not scary pretty - ugly western - eastern gorgeous - quiet natural - artificially clear - indistinct fine - rough adult - childlike refreshing - messy gentle - indifferent deep - faint regular - irregular modern - classical warm - cool transparent - opaque simple - complex jaunty - placid manly - womanly like - dislike
Fig. 2. Examples of clothing fabric images for training
words (adjective pairs) to measure the KANSEI. By careful selections, we chose 23 pairs of adjectives, which are shown in Table 1, as measures of KANSEI. We chose 168 clothing fabric images [12] for learning mapping functions and validations. Some examples are shown in Fig.2. We asked 8 college students to rate the impression with 23 pairs of adjectives in 7-level scales (-3,-2, -1, 0, 1, 2, 3).
4 Image Features Low-level visual features such as color, texture and shape information in an image are used as image features.
74
Y.-W. Chen, S. Sobue, and X. Huang
Fig. 3. Typical color fabric images and their color features
4.1 Color Features Several color features have been proposed to represent color compositions of images [13]. Color histograms are widely used to capture the color information in an image [14]. They are easy to compute and tend to be robust against small changes of camera viewpoints. We first transform the color image from the RGB space to the HSV space and the Hue value which is used as color features is quantized into 360 bins. The gray levels (R=G=B) are represented by four bins. So the dimension of color feature vectors is 364. Two typical color images and their color feature vectors are shown in Fig. 3. In order to find an efficient representation of color features, we use principal component analysis (PCA) to reduce the dimension of the color feature space. The 364-dimentional color space can reduced into 30-dimention, while maintaining 90% of information. 4.2 Texture Features The texture features are represented by the use of Fourier transform power spectrum P(r,θ), which express a periodic pattern of image in polar coordinate, where r is the amplitude of the frequency and θ is the direction angle of the frequency. Following two features with dimensions of 50 and 180, respectively, are used for texture representation:
p( r ) =
∑θ
q(θ ) =
∑
π =0
w/2
r =0
P( r , θ )
(1)
P( r,θ )
(2)
KANSEI Based Clothing Fabric Image Retrieval
75
Fig. 4. A typical fabric image and its texture features
Fig. 5. A typical image and its Gabor filtered image (α=π/2)
The typical fabric image and its features are shown in Fig.4. The dimension of the texture feature vector, which is composed of p(r) and q(θ), is 230 and it can reduced into only 2 by PCA, while maintaining 70% of information. 4.3 Shape Features
We use Gabor filters to extract shape features. A directional Gabor filter can be expressed as:
F ( x, y) = e −π ( x a
2 2
+ y 2b 2 )
cos(ux + vy)
(3)
u = f cosα , v = f sin α f : frequency where α is the direction of the filter. Four directional Gabor filters with angles of 0, π/4, π/2, π3/4, respectively, are used for shape feature extractions. Each filtered image is divided into 10 × 10 sub-images and the mean value of each sub-image is used as
76
Y.-W. Chen, S. Sobue, and X. Huang
shape features. Thus the dimension of the shape feature vector is 400 and it can reduced to 15 by PCA, while maintaining 80% of information. A typical image and its Gabor filtered image (α=π/2) are shown in Fig. 5. It can be seen that the horizontal features are extracted.
5 Learning and Validation In this paper, we use a neural network as a model of mapping function as shown in Fig.6. The neural network can be used to approximate any nonlinear functions. The neural network or the mapping function can be learned by finding the corresponding points in the image feature space and psychological space. The input of the neural network is the image features and the output is the corresponding impression. The number of input neurons is 47 (30+2+15), the number of the output neurons is 23 (the number of the pairs of adjectives) and the range of output neurons is [-3, 3]. The number of the neurons in middle layer is 67. We choose 165 images as training images to train the neural network. Once the neural network is trained, we use remained 3 images, which are not included in the training images, as test images for validations. We compared the outputs (estimated impressions) of the test images with real impressions obtained by SD method and calculate the mean squared error (MSE) between the estimated impression and real impression. The experimental process is shown in Fig.6(b).
Fig. 6. (a) 3-layer neural network; (b) experiment process for training and validation
Fig. 7. (a) Averaged MSE of each impression; (b) Overall MSE of each image
KANSEI Based Clothing Fabric Image Retrieval
77
The experiments are repeated 56 times with different training and test images. Averaged MSE of each impression is shown in Fig.7(a). Since the value of the impression is -3~3, the estimation error is less than 10%. The overall MSE of each image is shown in Fig.7(b).
6 KANSEI Based Image Retrieval As an application, we developed a KANSEI based clothing fabric image retrieval system. The flowchart of the system is shown in Fig.8.
Fig. 8. Flowchart of KANSEI based image retrieval system
The input query is KANSEI words (impressions). The image features (color, texture and shape features) of fabric images in the database are first extracted by use the methods, which are shown in Sec.4) and then the image feature vector is projected or transformed into the psychological space (impression vector) by the mapping function (trained neural network). We calculated the Euclid distance between the query impression vector and the each transformed fabric image impression vector. The image with minimum distance is retrieved as an output. The examples of the retrieval results are shown in Fig.9(a) and 9(b). The retrieval results for query impressions of “bright”,
78
Y.-W. Chen, S. Sobue, and X. Huang
Fig. 9. Retrieval results for query impressions. (a) “bright”, “fine”, and “faint”; (b) “dark”, “rough” and “deep”.
“fine”, and “faint” are shown in Fig.9(a) and the retrieval results for the query impression of “dark”, “rough” and “deep” are shown in Fig.9(b). It can be seen that the retrieval images are marched with the query impressions. In order to make a quantitative evaluation, the levels of retrieval images, which are measured by the SD method (Sec.3), are listed in Table 2. “Bright” has a positive value and the highest level is +3, while “dark” has a negative value and the highest level is -3. As shown in Table 1, the retrieval images are also evaluated by human as “bright” and “dark”, respectively. Table 2. Human evaluated levels of the retrieval images
Query Bright Dark
No.1 2.17 -1.83
No.2 1.67 -0.10
No.3 1.67 -0.13
KANSEI Based Clothing Fabric Image Retrieval
79
Fig. 10. Retrieval images for a query of “bright” by conventional IGA
The retrieval images for a query of “bright” by conventional IGA [15] are shown in Fig.10. It can be seen that in order to obtain a satisfied result, the retrieval process should be repeated 10 times (10 generations). It is time and labor consuming.
7 Conclusion We proposed a novel approach to learn the mapping function from the image features space to the human KANSEI space (psychological space) using a multi layer neural network. Experimental results show that for a given image, the KANSEI (impressions) estimation error is less than 10%. As an application, we also developed a KANSEI based clothing fabric image retrieval system. Significant positive results have been obtained. We can retrieve the desired clothing fabric images from the database by using only some KANSEI words (impression words).
80
Y.-W. Chen, S. Sobue, and X. Huang
Acknowledgements This work is supported in part by the "Open Research Center" Project for Private Universities: matching fund subsidy from Japanese Ministry of Education, Culture, Sports, Science, and Technology.
References 1. Rui, Y., Huang, T.S.: Image retrieval: Current techniques, promising directions, and open Issues. Journal of Visual Communication and Image Representation 10, 39–62 (1999) 2. Tamura, H., Yokoya, N.: Image database systems: A survey. Pattern Recognition 17(1) (1984) 3. Niblack, W., et al.: The QBIC project: Querying images by content using color, texture, and shape. In: Storage and Retrieval for Image and Video Databases, pp. 173–187. SPIE (1993) 4. Hirata, K., Kato, T.: Query by visual example: Content based image retrieval. In: Advances in Database Technology, EDBT 1992, pp. 56–61 (1992) 5. Pentland, A., Picard, R., Sclaroff, S.: Photobook: Content-based manipulation of image databases. In: SPIE Storage and Retrieval for Image and Video Databases. SPIE (1994) 6. Zeng, X.-Y., Chen, Y.-W., Nakao, Z., Cheng, J., Lu, H.: Independent Component Analysis for Color Indexing. IEICE Trans. Information & Systems E87-D, 997–1003 (2004) 7. Han, X.-H., Chen, Y.-W., Sukegawa, T.: A supervised nonlinear neighbourhood embedding of color histogram for image indexing. In: Proc. of 15th IEEE International Conference of Image Processing (ICIP 2008), USA, pp. 949–953 (2008) 8. Grimsæth, K.: Linking emotions and product features. KANSEI Engineering, 1–45 (2005) 9. Cho, S.B., Lee, J.Y.: A human-oriented image retrieval system using iterative genetic algorithm. IEEE Trans. Systems, Man and Cybernetics, Part A 32, 452–458 (2002) 10. Takagi, H., Noda, T., Cho, S.: Psychological Space to Hold Impression Among Media in Common for Media Database Retrieval System. In: Proc. of IEEE International Conference on System, Man, and Cybernetics (SMC 1999), Tokyo, Japan, vol. VI (1999) 11. Huang, X., Sobue, S., Kanda, T., Chen, Y.-W.: Linking KANSAI and image features by multi-layer neural networks. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part II. LNCS (LNAI), vol. 4693, pp. 318–325. Springer, Heidelberg (2007) 12. http://www.epattern.cn/ 13. Sticker, M.A., Orengo, M.: Similarity of color images. In: Proc. SPIE Storage Retrieval Still Image Video Database, pp. 381–392 (1996) 14. Deng, Y., Manjunath, B.S.: An efficient color representation for image retrieval. IEEE Trans. Image Processing 10, 140–147 (2001) 15. Sobue, S.: KANSEI based image retrieval by SVM and IGA. Ritsumeikan University Master’s Thesis (2008)
A New Spatial Hue Angle Metric for Perceptual Image Difference Marius Pedersen1,2 and Jon Yngve Hardeberg1 1
2
Gjøvik University College, Gjøvik, Norway Oc´e Print Logic Technologies S.A., Cr´eteil, France
Abstract. Color image difference metrics have been proposed to find differences between an original image and a modified version of it. One of these metrics is the hue angle algorithm proposed by Hong and Luo in 2002. This metric does not take into account the spatial properties of the human visual system, and could therefore miscalculate the difference between an original image and a modified version of it. Because of this we propose a new color image difference metrics based on the hue angle algorithm that takes into account the spatial properties of the human visual system. The proposed metric, which we have named SHAME (Spatial Hue Angle MEtric), have been subjected to extensive testing. The results show improvement in performance compared to the original metric proposed by Hong and Luo.
1 Introduction During the last two decades many different color image difference metrics have been proposed, some for overall image quality and some for specific distortions. New and improved metrics are created every year, but so far no one has been able to create an universal color image difference metric. The CIE published the CIELAB (L∗ a∗ b∗ ) color space specification [1], with the idea of a perceptually uniform color space. In a color space like this it is straightforward to calculate the distance between two colors, by using the Euclidean distance. This metric ∗ , and has also been used to calculate the difference between color is known as ΔEab images by calculating the color difference of all pixels. A spatial extension to the CIELAB color difference formula (S-CIELAB) was proposed by Zhang and Wandell [2], and it introduced a spatial pre-processing to the CIELAB color difference formula [1] by using a spatial filter to simulate the human visual system. The image is first separated into an opponent-color space, and each opponent color image is convolved with a kernel determined by the visual spatial sensitivity of that color dimension. Finally the filtered image is transformed into CIE-XYZ, ∗ is calculated. and further into CIELAB, where a pixelwise ΔEab The hue angle algorithm proposed by Hong and Luo [3], is based on the CIELAB color difference. This metric corrects some of the drawbacks with the CIELAB color difference formula, for example that all pixels are weighted equally. Even though the metric shows good results for two different images [3], it does not include spatial filtering of the image and is therefore unsuitable for halftoned images where the viewing A. Tr´emeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 81–90, 2009. c Springer-Verlag Berlin Heidelberg 2009
82
M. Pedersen and J.Y. Hardeberg
distance is crucial for the visual impression of artifacts. It has been shown to have problems in calculating perceived image difference [4,5,6]. Due to this we propose a new image difference metric with spatial filtering simulating the human visual system called SHAME (spatial hue angle metric).
2 The Proposed Metric A new color image difference metric is proposed based on the hue angle algorithm and two different spatial filtering methods are tested. We give an overview of the hue angle algorithm, and then the two spatial filtering methods. 2.1 The Hue Angle Algorithm Hong and Luo [3] proposed a full-reference color image difference metric built on the CIELAB color difference formula [1]. This metric is based on the known fact that systematic errors over the entire image are quite noticeable and unacceptable. The metric is based on some conjectures; summarized from Hong and Luo [3] these are: – Pixels or areas of high significance can be identified, and suitable weights can be assigned to these. – Pixels in larger areas of the same color should be given a higher weight than those in smaller areas. – Larger color difference between the pixels should get higher weights. – Hue is an important color perception for discriminating colors within the context. ∗ , h . Based The first step is to transfer each pixel in the image from L∗ , a∗ , b∗ to L∗ , Cab ab on the hue angle (hab ) a histogram from the 360 hue angles is computed, and sorted in ascending order based on the number of pixels with same hue angle to an array k. Then weights can be applied to four different parts (quartiles) of the histogram, and by doing this Hong and Luo corrected the drawback that the CIELAB formula weights the whole image equally. The first quartile, containing n hue angles, is weighted with 1/4 (that is, the smallest areas with the same hue angle) and saved to a new array hist. The second quartile , with m hue angles, is weighted with 1/2. The third quartile, containing l hue angles, is given 1 as a weight and the last quartile with the remaining hue angles is weighted with 9/4. ⎧ k(i) ∗ 1/4, i ∈ {0, ..., n} ⎪ ⎪ ⎪ ⎨k(i) ∗ 1/2, i ∈ {n + 1, ..., n + m} hist(i) = ⎪ k(i) ∗ 1, i ∈ {n + m + 1, ..., n + m + l} ⎪ ⎪ ⎩ k(i) ∗ 9/4, otherwise ∗ , is calculated for all pixels having The average color difference, computed using ΔEab the same hue angle and stored in CD[hue]. Then the overall color difference for the image, CDimage , is calculated by multiplying the weights based on the quartiles for every pixel with the average CIELAB color difference for the hue angle 359
CDimage = ∑ hist[hue] ∗ CD[hue]2/4. 0
A New Spatial Hue Angle Metric for Perceptual Image Difference
83
2.2 Spatial Filtering We propose two different spatial filtering methods for the hue angle algorithm. The first spatial filtering is adopted from S-CIELAB [2]. The image goes through color space transformations, first the RGB image is transformed into CIEXYZ and further into the opponent color space (O1 ,O2 ,O3 ) [2]. O1 = 0.279X + 0.72Y − 0.107Z O2 = −0.449X + 0.29Y − 0.077Z O3 = 0.086X − 0.59Y + 0.501Z Now the image contains a channel with the luminance information (O1 ), one with the red-green information (O2 ) and one with blue-yellow information (O3 ). Then a spatial filter is applied, where data in each channel is filtered by a 2-dimensional separable spatial kernel: f = k ∑ wi Ei i
where
Ei = ki e[−(x
2 +y2 )/σ2 ] i
,
and ki normalize Ei such that the filter sums to 1. The parameters wi and σi are different for the color planes as seen in Table 1. k is a scale factor, which normalize each color plane so its two-dimensional kernel f sums to one. Table 1. The parameters used for the spatial filtering, where wi is the weight of the plane and σi is the spread in degrees of visual angle as described by Zhang and Wandell [2] Plane Weights wi Spreads σi Luminance 0.921 0.0283 0.105 0.133 -0.108 4.336 Red-Green 0.531 0.0392 0.330 0.494 Blue-Yellow 0.488 0.0536 0.371 0.386
The second spatial filtering proposed is adopted from Johnson and Fairchild [7]. By specifying and implementing the spatial filters using contrast sensitivity functions (CSF) in the frequency domain, rather than in the spatial domain as the first spatial filtering, more precise control of the filters is obtained [7] but usually at the cost of computational complexity. The luminance filter is a three parameter exponential function, based on research by Movshon and Kiorpes [8]. CSFlum (p) = a · pc · e−b·p
84
M. Pedersen and J.Y. Hardeberg
where a = 75, b = 0.22, c = 0.78 and p is represented as cycles per degree (cpd). The luminance CSF is normalized so that the DC modulation is set to 1.0, resulting in a low pass filter instead of a bandpass filter. This will also enhance any image differences where the human visual system is most sensitive to them [7]. For the chrominance CSF, a sum of two Gaussian functions are used. CSFchroma(p) = a1 · e−b1 ·p 1 + a2 · e−b2 ·p 2 , c
c
where different parameters for a1 , a2 , b1 , b2 , c1 and c2 have been used as seen in Table 2. Table 2. The parameters used for the spatial filtering in the frequency domain of the chrominance channels Parameter a1 b1 c1 a2 b2 c2
Red-Green 109.14130 -0.00038 3.42436 93.59711 -0.00367 2.16771
Blue-Yellow 7.032845 -0.000004 4.258205 40.690950 -0.103909 1.648658
2.3 Applying Spatial Filtering to the Hue Angle Algorithm The images are spatially filtered with the previously introduced spatial filtering methods. This results in a filtered original and a filtered modified version of the original, which are used as input to the hue angle algorithm, as shown in Figure 1. The hue angle algorithm, filtered respectively with the first and second filter, is from now on referred to as SHAME-I and SHAME-II. The new metric will theoretically have several key features from both the S-CIELAB and the hue angle measure: – Weight allocation: pixels in larger areas of the same color should be weighted higher. – Simulation of the spatial properties of the human visual system – Undetectable distortions are ignored – Suitable for different kind of distortions, not only color patches – Generates one value for easy interpretation
Transform image to opponent colorspace
Apply spatial filters
Transform filtered image to CIELAB then to CIELCH
Calculate histogram
Sort histogram and apply weights
Fig. 1. Workflow of the proposed metrics
Calculate average color difference
Sum the combined hue weights and color differences
A New Spatial Hue Angle Metric for Perceptual Image Difference
85
3 Experimental Results and Discussion Many different image databases have been proposed for evaluation of image difference metrics. For the evaluation we have used one of these databases [9] together with a dataset of gamut mapped images [10,11,4] and a dataset with lightness changed images [5,6]. Three types of correlation are computed for the results, the Pearson productmoment correlation coefficient, the Spearman’s rank correlation coefficient and the Kendall tau rank correlation coefficient [12]. The first assumes that the variables are ordinal, and finds the linear relationship between variables. The second, Spearman, is a non-parametric measure of correlation that uses the ranks as basis instead of the actual values. It describes the relationship between variables without making any assumptions about the frequency distribution of the variables. The third, Kendall, is a non-parametric test used to measure the degree of correspondence between two rankings, and assessing the significance of this. The new metric, with the two different spatial filtering methods, is compared against ∗ , S-CIELAB [2] and S-CIELAB the original hue angle algorithm [3], pixelwise ΔEab Johnson [7] to see if the segmentation done according to the hue angles improves the performance of the metric. We also compare SHAME to SSIM [13] and UIQ [14], both being state of the art metrics. The evaluation performed will show potential differences between the two proposed spatial filtering methods used in SHAME, but also how they perform against other state of the art metrics. 3.1 Evaluation Using the TID2008 Database The TID2008 database [9] has been used for evaluation of the proposed metric. This database contains a total of 1700 images, with 25 reference images with 17 types of distortions over 4 distortion levels. The mean opinion scores (MOS) are the results of 654 observers attending the experiments. For the viewing distance, since this was not fixed in the TID2008 database we have used 32 samples per degree, equal to approximately 60 cm on a normal 17 inch screen. The hue angle algorithm has a low overall correlation for the TID2008 database as seen on Figure 2. When looking at specific distortions the metric does not perform well, the highest Pearson correlation is 0.375 on the Hard dataset containing noise, compression, blurring and transmission errors. This indicates that the hue angle algorithm should be improved for the distortions found in the TID2008 database. SHAME-I shows a better correlation for the full database, with a Pearson correlation of 0.544 (Figure 3). When looking at the specific distortions, Noise, Noise2, Safe, Hard and Simple SHAME-I has high correlation coefficients, indicating that it is able to predict perceived image difference. For the Exotic dataset, containing pattern noise, local block-wise distortions of different intensity, mean shift and contrast change, we get low correlation coefficient indicating problems with the metric for these distortions. When looking at the different distortions SHAME-I performs very well for the JPEG and JPEG2000 transmission error, these distortions are a part of exotic2 but not of the exotic dataset. For the groups pattern noise and local block-wise distortions we get a good correlation with the MOS, but not for mean shift and contrast change. In the distortions mean shift and contrast change we have a large difference between the
86
M. Pedersen and J.Y. Hardeberg
Dataset Noise Noise2 Safe Hard Simple Exotic Exotic2 Full
Pearson Spearman Kendall correlation correlation correlation 0.299 0.311 0.207 0.174 0.212 0.161 0.286 0.269 0.177 0.375 0.342 0.243 0.306 0.312 0.224 -0.063 -0.093 -0.046 0.089 0.064 0.056 0.179 0.161 0.113
Fig. 2. Pearson, Spearman and Kendall correlation coefficients for hue angle algorithm based on the TID2008 database. The hue angle algorithm has a low or medium performance for the different datasets, and low correlation for the full database.
scenes, and due to this a low correlation is found. This indicates that more work is needed for these types of distortions in order to develop better image difference metrics. The same analysis is valid for SHAME-II, but it has a higher correlation for all datasets and for the full database as seen in Figure 4. It should be noted that the improvement in most cases is minimal, even so the general performance indicates that a precise spatial filtering is important for image difference metrics. The hue angle algorithm was proposed to correct some of the drawbacks of the ∗ color difference formula. When looking at the overall results from the TID2008 ΔEab database the results for these two metrics are very similar (Figure 5). For this database ∗ . SHAME-I the extension done in the hue angle algorithm does not improve the ΔEab and SHAME-II has significantly better correlation than the hue angle algorithm and ∗ . ΔEab ∗ The S-CIELAB has been shown to perform better than the ΔEab [2], and since the same filtering is used for SHAME-I the S-CIELAB should also be used for comparison. From the results in Figure 5 we can see that both SHAME-I and SHAME-II perform better than S-CIELAB and S-CIELABJohnson. This shows that the segmentation done Dataset Pearson Spearman Kendall correlation correlation correlation Noise 0.852 0.865 0.669 Noise2 0.840 0.845 0.646 Safe 0.840 0.849 0.658 Hard 0.828 0.839 0.645 Simple 0.844 0.857 0.680 Exotic 0.052 0.006 0.023 Exotic2 0.114 0.065 0.076 Full 0.544 0.550 0.414 Fig. 3. Pearson, Spearman and Kendall correlation coefficients for SHAME-I based on the TID2008 database. SHAME-I has high correlation coefficients for the datasets, except for Exotic and Exotic2. For the full database it has an average performance.
A New Spatial Hue Angle Metric for Perceptual Image Difference
87
Dataset Pearson Spearman Kendall correlation correlation correlation Noise 0.893 0.905 0.726 Noise2 0.885 0.891 0.709 Safe 0.887 0.894 0.717 Hard 0.859 0.867 0.678 Simple 0.891 0.895 0.726 Exotic 0.098 0.057 0.053 Exotic2 0.199 0.152 0.126 Full 0.613 0.609 0.468 Fig. 4. Pearson, Spearman and Kendall correlation coefficients for SHAME-II based on the TID2008 database. SHAME-II gets high correlation coefficients for the datasets, except for exotic and exotic2. For the full database SHAME-II has an average performance.
according to the hue angle improves the metric when the images are spatially filtered. This also supports the fact that the whole image is not important when judging image difference, but that some areas are more important than others [5,6]. 3.2 Evaluation Using Gamut Mapped Images The TID2008 database contains only one distortion for each image, in order to test the metrics extensively we have used a dataset with gamut mapped images from Dugay [10,11]. 20 different images were gamut mapped with 5 different algorithms. The 20 different images were evaluated by 20 observers in a pair comparison experiment. This is a more complex task for the observers since many artifacts must be considered, and also a demanding task for the image difference metrics. In Figure 6 shows the results from the dataset with gamut mapped images. In general all metrics have a low performance. This was probably because the task is very Metric
Pearson Spearman Kendall correlation correlation correlation
Hue angle SHAME-I SHAME-II ∗ ΔEab S-CIELAB S-CIELABJohnson SSIM UIQ
0.179 0.544 0.613 0.174 0.476 0.542 0.547 0.616
0.161 0.550 0.609 0.173 0.482 0.538 0.653 0.606
0.113 0.414 0.468 0.121 0.354 0.400 0.437 0.438
Fig. 5. Comparison of all tested image quality metrics. We can see that SHAME-I and SHAMEII clearly perform better than the hue angle algorithm, and that they perform similar to SSIM and UIQ. It is also interesting to see how the new metric with the two spatial filtering methods perform compared to the S-CIELAB and the improved S-CIELABJohnson, from the Figure we can see that SHAME-I and SHAME-II have better correlation than these.
88
M. Pedersen and J.Y. Hardeberg Metric
Pearson Spearman Kendall correlation correlation correlation
Hue angle SHAME-I SHAME-II S-CIELAB S-CIELABJohnson ∗ ΔEab SSIM UIQ
0.052 0.047 0.035 0.056 0.029 0.042 0.163 0.005
0.114 0.082 0.077 0.105 0.104 0.107 0.054 -0.089
0.076 0.054 0.053 0.073 0.071 0.071 0.044 -0.055
Fig. 6. SHAME-I and SHAME-II compared against other metrics for a set of gamut mapped images. All metrics have a low performance on the gamut mapped images, indicating that the calculating the difference between an original and a gamut mapped image is very difficult for image difference metrics.
complex, in gamut mapping multiple artifacts can occur and the observers may judge them differently [10,11]. Previous research has shown that image difference metrics have problems when multiple distortions occur simultaneously, as in gamut mapping [15,16]. This is not the case for TID2008 since only one artifact at the time occur in the images. 3.3 Evaluation Using Luminance Changed Images The last dataset used for the evaluation has previously been used by Pedersen [6] and Pedersen et al.[5], where four images where modified in lightness, both globally and locally, resulting in 32 reproductions. This dataset differs from the previous due to the controlled changes only in lightness, and this should be easier for the metrics to judge than the gamut mapped images. SHAME-II has a higher correlation than SHAME-I and the hue angle algorithm, indicating that spatial filtering done in SHAME-II improves the hue angle algorithm. Metric
Pearson Spearman Kendall correlation correlation correlation
Hue angle SHAME-I SHAME-II ∗ ΔEab S-CIELAB S-CIELABJohnson SSIM UIQ
0.452 0.078 0.509 0.464 0.467 0.500 0.762 0.370
0.507 0.036 0.670 0.618 0.637 0.629 0.586 0.396
0.383 0.024 0.528 0.472 0.488 0.472 0.464 0.270
Fig. 7. SHAME-I and SHAME-II compared against other metrics for the lightness changed image from [5,6]. We notice that SHAME-II outperforms SHAME-I, but only a minor improvement over the hue angle algorithm. The SSIM is better than SHAME-II for the Pearson correlation, but SHAME-II is better for Spearman and Kendall, indicating that the ranking by SHAME-II is more correct than the ranking by SSIM.
A New Spatial Hue Angle Metric for Perceptual Image Difference
89
SHAME-I does not have the same high correlation, and is clearly worse than the rest. When analyzing the results we can see that the SHAME-I metric miscalculated images that had a low mean luminance compared to images with high mean luminance. We can also notice that SHAME-II has a higher Spearman and Kendall correlation than SSIM, but a lower Pearson. This indicates that the ranking done by SHAME-II is more correct than the ranking by SSIM, but that SSIM has a more correct frequency distribution. The results indicate that the more precise spatial filtering and the bandpass nature of the filter in SHAME-II is important for the performance of the metric, therefore the filtering in SHAME-II should be preferred over SHAME-I.
4 Conclusion and Further Research The proposed metric, SHAME, use well-known spatial filtering methods to improve a color image difference metric, which results in several advantages. Extensive testing of the proposed metrics show an improvement over the traditional metrics, as pixelwise ∗ and S-CIELAB. We have demonstrated the importance of weighting areas of inΔEab terest and the importance of spatial filtering for color image difference metrics. The results indicate that precise control of the spatial filters will improve the performance of the metric, and therefore SHAME-II gives an advantage over SHAME-I. State of the art image difference metrics also show weaknesses when judging the difference between an original and a modified version of it when more than one distortion occurs, more research should be carried out to improve the metrics in this field, both in terms of difference calculation and spatial filtering.
Acknowledgments The authors would like to thank Gabriele Simone and Fritz Albregtsen for their advice, suggestions and feedback regarding this project. The author hereof has been enabled by Oc´e-Technologies B.V. to perform research activities which underlies this document. This document has been written in a personal capacity. Oc´e-Technologies B.V. disclaims any liability for the correctness of the data, considerations and conclusions contained in this document.
References 1. CIE: Colorimetry. Technical Report 15 (2004) 2. Zhang, X., Wandell, B.: A spatial extension of CIELAB for digital color image reproduction. In: Soc. Inform. Display 96, San Diego, 731–734 (1996), http://white.stanford.edu/˜ brian/scielab/scielab.html 3. Hong, G., Luo, M.: Perceptually based colour difference for complex images. In: Chung, R., Rodrigues, A. (eds.) 9th Congress of the International Colour Association. Proceedings of SPIE, vol. 4421, pp. 618–621 (2002) 4. Pedersen, M., Hardeberg, J.Y.: Rank order and image difference metrics. In: CGIV 2008 Fourth European Conference on Color in Graphics, Imaging and Vision, Terrassa, Spain, IS&T, pp. 120–125 (June 2008)
90
M. Pedersen and J.Y. Hardeberg
5. Pedersen, M., Hardeberg, J.Y., Nussbaum, P.: Using gaze information to improve image difference metrics. In: Rogowitz, B., Pappas, T. (eds.) Human Vision and Electronic Imaging VIII (HVEI 2008), San Jose, USA. SPIE proceedings, vol. 6806. SPIE (January 2008) 6. Pedersen, M.: Importance of region-of-interest on image difference metrics. Master’s thesis, Gjøvik University College (2007) 7. Johnson, G.M., Fairchild, M.D.: Darwinism of color image difference models. In: The 9th Color Imaging Conference: Color Science and Engineering: Systems, Technologies, Applications, pp. 108–112 (2001) 8. Movshon, J.A., Kiorpes, L.: Analysis of the development of spatial sensitivity in monkey and human infants. J. Opt. Soc. Am. A 5, 2166–2172 (1988) 9. Ponomarenko, N., Lukin, V., Egiazarian, K., Astola, J., Carli, M., Battisti, F.: Color image database for evaluation of image quality metrics. In: International Workshop on Multimedia Signal Processing, Cairns, Queensland, Australia (October 2008), http://www.ponomarenko.info/tid2008.htm 10. Dugay, F.: Perceptual evaluation of colour gamut mapping algorithms. Master thesis, Gjøvik University College and Grenoble Institute of Technology (2007) 11. Dugay, F., Farup, I., Hardeberg, J.Y.: Perceptual evaluation of colour gamut mapping algorithms. Color Research & Application 33(6), 470–476 (2008) 12. Kendall, M.G., Stuart, A., Ord, J.K.: Kendall’s Advanced Theory of Statistics: Classical inference and relationship, 5th edn., vol. 2. A Hodder Arnold Publication (1991) 13. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13(4), 600–612 (2004) 14. Wang, Z., Bovik, A.: A universal image quality index. IEEE Signal Processing Letters 9, 81–84 (2002) 15. Hardeberg, J.Y., Bando, E., Pedersen, M.: Evaluating colour image difference metrics for gamut-mapped images. Coloration Technology 124(4), 243–253 (2008) 16. Bonnier, N., Schmitt, F., Brettel, H., Berche, S.: Evaluation of spatial gamut mapping algorithms. In: 14th Color Imaging Conference. IS&T/SID, vol. 14, pp. 56–61 (2006)
Structure Tensor of Colour Quaternion Image Representations for Invariant Feature Extraction Jesús Angulo CMM-Centre de Morphologie Mathématique, Mathématiques et Systèmes, MINES Paristech; 35, rue Saint Honoré, 77305 Fontainebleau Cedex, France
[email protected]
Abstract. Colour image representation using real quaternions has shown to be very useful for linear and morphological colour filtering. This paper deals with the extension of first derivatives-based structure tensor for various quaternionic colour image representations. Classical corner and edge features are obtained from eigenvalues of the quaternionic colour structure tensors. We study the properties of invariance of the quaternion colour spatial derivatives and their robustness for feature extraction on practical examples.
1
Introduction
A colour point can be represented according to different geometric algebra structures. Real quaternions have been considered in last years to represent and to perform colour transformations by taking into account the 3D vector nature of colour triplets. Quaternion-based colour operations, such as colour Fourier transform, colour convolution and linear filters, have been studied mainly by [9,16,10] and by [7], and to build colour PCA by [18]. Quaternion representations of normalized colours were used by [5] to construct edge colour detectors based on Prewitt edge detector. We have recently explored also the interest of colour quaternions for extending mathematical morphology to colour images [2]. Extraction of differential-based features such as edges, corners or salient points is a necessary low-level image processing task in many applications such as segmentation, tracking, object matching and object classification. These features are based on Gaussian-filtered combination of spatial derivatives [11] [3]. The most stable differential invariants involves lonely derivatives till order “one” [12]. Di Zenzo stated, in [8], that a simple summation of the derivatives ignores the correlation between the channels and proposed in his pioneer work a tensor representation of colour derivatives in order to compute the colour gradient by considering the colour image as a surface in R3 . Later, Sochen et al. [19] considered a colour image as a two-dimensional manifold embedded in the five-dimensional non-Euclidean space whose coordinates are (x, y, R, G, B) ∈ R5 which is described by Beltrami colour metric tensor. Colour tensor based methods have been used in various colour feature detection algorithms [6] [15] [17] [13] [21]. Weijer et al. [22] proposed a framework to combine the differential-based features with the photometric invariance theory, in order to obtain colour A. Trémeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 91–100, 2009. c Springer-Verlag Berlin Heidelberg 2009
92
J. Angulo
photometric invariant edges, corners, etc. We propose in this study a parallel framework combining the alternatives colour quaternion representations with the structure tensor. We focus also on the extraction of features such as edges and corners which present interesting colour invariance properties.
2
Colour Quaternion Image Representations
Colour Quaternion Representations. Let c = (r, g, b) be the triplet of the red, green and blue intensities for the pixel of a digital colour image. According to the previous works on the representation of colour by quaternions, we consider the gray-centered RGB colour-space [9]. In this space, the unit RGB cube is ˆ 0, 0) represents mid-gray (middle translated so that the coordinate origin O(0, point of the gray axis or half-way between black and white). In order to better exploit the power of quaternion algebra, we have recently proposed in [2] to represent each colour c by a full real quaternion q in its hypercomplex form, i.e., c = (r, g, b) ⇒ q = ψ(c, c0 ) + iˆ r + jˆ g + kˆb. where ˆc = ( r, g, b) = (r − 1/2, g − 1/2, b − 1/2). A quaternion has a real part or scalar part, S(q) = ψ(c, c0 ), and an imaginary part or vector part, V (q) = ri+ gj+bk, such that the whole quaternion may be represented by the sum of its scalar and vector parts as q = S(q)+V (q). A quaternion with a zero real/scalar part is called a pure quaternion. The scalar component, ψ(c, c0 ), is a real value obtained from the current colour point and a colour of reference c0 = (r0 , g0 , b0 ). The reference c0 can be for instance the white point (1, 1, 1), but also any other colour which should impose a particular effect of the associated operator. We have considered three possible definitions for the scalar part. 1) Saturation: ψ(c, c0 )sat = s − 1/2, where s is the saturation of the luminance/saturation/hue representation in norm L1 [1]. 2) with respect to c0 : ψ(c, c0 )mass = exp (−wE c − c0 − w∠ arccos λ Mass √ c·c0 , where wE = (1/ 2)λ and w∠ = (2π)−1 (1 − λ), with 0 ≤ λ ≤ 1. cc0 3) Potential with respect to c0 and the nine significant colour points in the RGB 4 − κ+ κ− unit cube: ψ(c, c0 )pot = φ+ E + φE = 4π0 c−c0 + n=−4 4π0 c−cn , where the positive potential φ+ E represents the influence of a positive charge placed at the position of the reference colour c0 and the negative term φ− E corresponds to the potential associated to nine negative charges in the significant colours of the RGB cube: (r,g,b,c,m,y,w,b and mg), {cn }4n=−4 . More details on the properties of these scalar parts are given in [2]. ξθ Any quaternion may be represented in polar form √ as q = ρe , with ρ = √ 2 +c2 +d2 bi+cj+dk b a2 + b2 + c2 + d2 , ξ = √b2 +c2 +d2 and θ = arctan . In this polar a formulation, ρ = |q| is the modulus of q; ξ is the pure unitary quaternion associated to q (by the normalisation, the quaternion representation of a colour discards distance information, but retains orientation information relative to mid-gray, which correspond in fact to the chromatic or hue-related information.), sometimes called eigenaxis; and θ is the angle, sometimes called eigenangle,
Structure Tensor of Colour Quaternion Image Representations
93
between the real part and the 3D imaginary part. The eigenaxis of a colour quaternion, ξ, is independent of its scalar part. The imaginary term b2 + c2 + d2 = (r − 1/2)2 + (g − 1/2)2 + (b − 1/2)2 is the norm of the colour vector in the centered cube and can be considered as a perceived energy of the colour (i.e., relative energy with respect to the mid-gray), being maximal for the eight significant colours associated to the cube corners. Note that the black and white have the same value as the six chromatic colours. The modulus ρ is an additive combination of the imaginary part and the scalar part. Using the product of quaternions, it is possible to describe vector decompositions. A full quaternion q may be decomposed about a pure unit quaternion pu in its parallel/perpendicular form [9]: q = q⊥ + q , the parallel part of q according to pu , also called the projection part, is given by q = S(q) + V (q), and the perpendicular part, also named the rejection part, is obtained as q⊥ = V⊥ (q) where V (q) = 12 (V (q) − pu V (q)pu ) and V⊥ (q) = 12 (V (q) + pu V (q)pu ). In the case of colour quaternions, pu corresponds to the pure unit quaternion associated to the reference colour c0 , which is denoted qu0 . It should be remarked that the rejection part is a pure quaternion and that the value is independent of the scalar part of q, but, of course, it depends on the reference colour used for the decomposition. We can particularise the expression to obtain the following vectorial part√and perpendicular part for the unit quaternion c0 = (r0 , r0 , r0 ) (⇒ r0,u = 1/ 3), which represents the decomposition along the grey axis: 1 V (q) = 3 r + g + b i + r + g + b j + r + g + b k , and V⊥ (q) = 13 2 r − g − b i + 2 g − r − b j + 2b − r − g k . We notice that these projection components correspond respectively to the luminance and the chromaticity terms [1]. Hence the colour image is decomposed into the intensity information along the grey axis (parallel part) and the chromatic information (perpendicular). Taking another example, for instance c0 = (r0 , r0 /2, 0) (⇒ r0,u =
4 5)
we have V (q) = 4 r /4 − g/2) i + 45 ( g − r/2) j + bk. 5 (
4 5
[( r+ g/2) i + ( r /2 + g/4) j] and V⊥ (q) =
Quaternionic Colour Images. Let f (x, y) = (fR (x, y), fG (x, y), fB (x, y)) be a colour image in the standard red, green and blue representation, i.e., a multivariate image of three vector components. Associating to each colour pixel the three alternative colour quaternion representations, and after fixing the scalar partψc0 and consequently the reference colour c0 , three quaternionic colour images can be defined. The hypercomplex quaternion colour image is denoted as fhyper (x, y) = fψc0 (x, y), fi (x, y), fj (x, y), fk (x, y) , where fi (x, y) = fˆR (x, y), fj (x, y) = fˆG (x, y) and fk (x, y) = fˆG (x, y). The polar quaternion colour image is given by the following 5-variable function fpolar (x, y) = (fρ (x, y), fθ (x, y), fξ, i (x, y), fξ, j (x, y), fξ, k (x, y)). Finally, the parallel/perpendicular quaternion colour image is composed of two functions
94
J. Angulo
fpar/pen (x, y) = f (x, y), f perp (x, y) =
fψc0 (x, y), f, i (x, y), f, j (x, y), f, k (x, y), f⊥, i (x, y), f⊥, j (x, y), f⊥, k (x, y) .
3
2D Structure Tensor for Quaternionic Colour Images
Spatial First-Derivatives and Structure Tensor. Given a multivariate image i(x, y), the 2D structure tensor is defined as [3]:
gxx (x, y) gxy (x, y) T G(i)(x, y) = ωσ ∗ ∇i(x, y)∇i(x, y) = gxy (x, y) gyy (x, y) This 2 × 2 matrix represents the averaged dyadic product of the 2D spatial T ∂i(x,y) intensity gradient ∇i(x, y) = ∂i(x,y) , , where ωσ stands for a Gaus∂x ∂y sian smoothing with a standard deviation of σ. The elements of the tensor are invariant under rotation and translation of the spatial axes. This tensor does not be mistaken for the Hessian matrix which involves second-derivatives. In the case of a standard RGB colour image we have G(f )(x, y) with gxx (x, y) = x 2 ωσ ∗ (fRx )2 + ωσ ∗ (fG ) + ωσ ∗ (fBx )2 (idem. gyy (x, y) mutatis mutandis fCx by fCy ) x y and gxy (x, y) = ωσ ∗ (fRx fRy ) + ωσ ∗ (fG fG ) + ωσ ∗ (fBx fBy ), where to simplify the (x,y) notation the spatial derivatives of colour component C are fCx = ∂fC∂x and y ∂fC (x,y) fC = . ∂y Feature Detectors. The two real eigenvalues of the structure tensor G(x, y) 2 2 at each point are given by λ1,2 = 12 gxx + gyy ± (gxx − gyy ) + (2gxy ) . These eigenvalues are correlated with the local image properties of edgeness and cornerness, i.e., λ1 >> 0 and λ2 ≈ 0 and λ1 ≈ λ2 respectively. More precisely, based on the spatial functions λ1 (x, y) and λ2 (x, y) several edge and corner feature indicator functions have been proposed in the literature. The classical Harris and Stephen corner detector [11] works on the Gaussian curvature of the surface, and this quantity is corrected by the square average of the principal curvature: H(i)(x, y) = det(G(x, y)) − k trace(G(x, y))2 = 2 λ1 (x, y)λ2 (x, y) − k (λ1 (x, y) + λ2 (x, y)) , with typically k = 0.04. Instead of this original formulation, we propose to use for corner detection the modification proposed by Noble [14] which leads to better results without needing the parameter k: λ1 (x, y)λ2 (x, y) N (i)(x, y) = . λ1 (x, y) + λ2 (x, y) Most of previous works consider the magnitude of λ1 (x, y) for edge detection. We prefer here the sounder theoretical approach introduced by Sochen et al. [19] The colour based on the determinant of the Beltrami colour metric tensor G. metric tensor can be reformulated as function of the 2D structure tensor [4]: G(i)(x, y) = I2 + G(i)(x, y), where I2 is the identity matrix. The Beltrami colour edge can now be defined as
Structure Tensor of Colour Quaternion Image Representations
95
B(i)(x, y) = det(G(x, y)) = 1 + trace(G(x, y)) + det(G(x, y)) = 1 + (λ1 (x, y) + λ2 (x, y)) + λ1 (x, y)λ2 (x, y). Adaptation to Quaternionic Representations: Quaternionic Image Derivatives. The various quaternionic image representations involve different structure tensor values. Here we formulate the quaternionic image derivatives in x terms of the RGB spatial derivatives, i.e., fRx , fG , fBx for x-derivative. The idea of this calculation is basically to apply the derivative Chain Rule. To be more precise, let r = r(x, y), g = g(x, y) and b = b(x, y) have first-order partial derivatives at the point (x, y) and suppose that the multivariate function i = i(r, g, b) = (r(x, y), g(x, y), b(x, y)) is differentiable in r, g and b. The first-order partial ∂i ∂i ∂r ∂i ∂g ∂i ∂b derivative at point (x, y) is given by ∂x = ∂r ∂x + ∂g ∂x + ∂b ∂x , and idem. ∂i for ∂y . Let us start by the scalar part. For the saturation it is obtained that, ⎧3 x x − 12 fBx if fm ≥ fmed ⎨ 2 fmax − 12 fRx − 12 fG x fψsat, c0 = ⎩1 x 1 x 1 x 3 x 2 fR + 2 fG + 2 fB − 2 fmin if fm ≤ fmed , and identically for fψysat, c . The term fψxsat, c is quasi-invariant to specular 0
0
x changes [21], and consequently by this way fhyper decorrelates the derivative x x x
with respect to specular variations; note that fi , fj , fk is the specular variant part. In the case of the mass with respect to c0 , for the sake of simplicity we fix λ = 1, and we have: x fψxmass, c = Θ(c0 ) [(fR − r0 )fRx + (fG − g0 )fG + (fB − b0 )fBx ], 0 −1/2 1/2 2 where Θ(c0 ) = −wE (Δc0 ) exp −wE (Δc0 ) with Δc0 = (fR − r0 ) +
(fG − g0 )2 + (fB − b0 )2 . In the case of the potential function as scalar part, the derivative involves quite similar terms: −3/2 x fψxpot, c = −9Q (Δc0 ) [(fR − r0 )fRx + (fG − g0 )fG + (fB − b0 )fBx ] 4 0 −3/2 x + n=−4 Q (Δcn ) [(fR − rn )fRx + (fG − gn )fG + (fB − bn )fBx ]. x x We observe that both fψmass, c and fψpot, c weight each colour derivative with 0 0 the corresponding distance term to the reference colour. Using these terms in x fhyper we are able to introduce a particular effect with respect the reference colour c0 . x The derivative in the polar representation fpolar has, on the one hand, the x x √ x x √ three intensity normalised derivatives from ξ, i.e., fξ, i = fR / μ, fξ, j = fG / μ, √ x x 2 2 2 fξ, k = fB / μ, where μ = fi + fj + fk . For the modulus and the eigenangle, it is achieved respectively fρx = fθx =
√
ψc0 μ(ψc20 +μ)
x x x fR fR +fG fG +fB fB √ ψc0 +μ
x (fR fRx + fG fG + fB fBx ) −
ψ
c +√0
ψcx0
ψc0 +μ
√ μ ψx . (ψc20 +μ) c0
and
These last two expres-
sions depend particularly on the derivative of the scalar part. But in any case,
96
J. Angulo
quite complex factors weight the values of each colour derivative; and consequently their precise analytical analysis is not easy, even if their practical interest can be explored from empirical examples. As shown in Section 2, the parallel/perpendicular quaternionic variables depend strongly on the reference c0 . Choosing the significant point c0 = (1, 1, 1), the derivatives parallel quaternion x x xof the image are: f +f +f f x +f x +f x f x +f x +f x fx = R 3G B , R 3G B , R 3G B and
x x x x f⊥ = 23 fRx − 13 fG − 13 fBx , 23 fG − 13 fRx − 13 fBx , 23 fBx − 13 fRx − 13 fG . Hence, this decomposition with c0 = (1, 1, 1) decorrelates the derivative information into an intensity variant part and the shadow-shading-specular quasi-invariant part. In addition, the choice of other references c0 leads to other interesting decorrelations in this framework. For instance, the simplest c0 = (1, 0, 0) produces x x fx = (fRx , 0, 0) and f⊥ = (0, fG , fBx ), that is a decomposition along the red derivative and the orthogonal green/blue derivatives. In a similar way, for
the x 2 x x colour c0 = (1, 1/2, 0) it is obtained fx = 45 fRx + 25 fG , 5 fR + 15 fG , 0 and
x x 4 x f⊥ = 15 fRx − 25 fG , 5 fG − 25 fRx , fBx . For many practical situations, it is better x to take advantage of the decomposition and to use separately fx and f⊥ . Note x also that the global derivative of function fpar/pen includes also the term of the scalar part fψxc associated to the parallel quaternion. 0
4
Application to Invariant Feature Detection
We propose now to extract significant features from the corner energy image N (i)(x, y) and the edge energy image B(i)(x, y). These quite simple algorithms are based on morphological tools. Colour Corner Extraction. The corner extraction from image N (i)(x, y) requires two classical steps. 1) The local maxima within a ball of radius R are obtained by finding the invariant points to a gray level dilation, i.e., M (x, y) = 1 if δR (N (i))(x, y) = N (i)(x, y), otherwise M (x, y) = 0. For all our examples, R = 3 × 3 pixels. This binary mask is used to recover the initial intensities at the maxima points: m(x, y) = N (i)(x, y) × M (x, y) (where × is the pointwise multiplication). 2) Thresholding the function m(x, y) at value umax to keep only the most signifˆ (x, y) = 1 iff m(x, icant maxima: M ˆ y) > umax , where the parameter is defined as a percentage of the maximal intensity of N (i), umax = (αmax /100) max(N (i) (x, y)). For all our examples, αmax = 4%. Colour Geometry Sketch. The colour geometry sketch, more original, can be interpreted as a rough definition of the main colour image edges. The idea is to have a representation of the distribution of the edges according to their colour and geometry (size, orientation, etc.) without defining the regions. This simple descriptor can be useful for tracking, registering, etc. The needed steps are: 1) Intensity transformation of image N (i)(x, y) using a γ-correction function
Structure Tensor of Colour Quaternion Image Representations
f (x, y)
B(fhyper )(x, y) fψsat, c0 =(1,1,1)
B(f )(x, y) fψmass, c0 =(1,0,0)
N (fhyper )(x, y) fψsat, c0 =(1,1,1)
N (fhyper )(x, y) fψmass, c0 =(1,0,0)
N (f )(x, y) fψmass, c0 =(1,0,0)
97
Fig. 1. Comparison with the image “Road” (from [20]) of colour geometry sketch for two Beltrami quaternionic colour edge energy functions B(i)(x, y) and of colour corners for three Noble quaternionic colour corner energy functions N (i)(x, y)
followed by adjustment of dynamics in a discrete interval [tmin , tmax ]: e(x, y) = γ e(x,y)−min(e(x,y)) (N (i)(x, y)) and eˆ(x, y) = tmin + max(e(x,y))−min(e(x,y)) tmax . For all our examples γ = 2 and [tmin , tmax ] = [0, 255]. 2) Extraction of the contours of the flat-zones of image eˆ having a minimal area of aregion pixels: E(x, y) = 1 iff eˆ(x + i, y + j) = ∀i, j ∈ {0, 1} and Area(R(x, y)) > aregion , where R(x, y) is the connected component associated to point (x, y). In practice, this step is implemented using a region growing procedure to remove the regions of size ≤ aregion . For all our examples aregion = 50 pixels. 3) Contourfiltering using the supremum of orientated linear openings of size lmin : S(x, y) = β γlLmin ,β , where the used orientations are β = {0o , 45o , 60o, 90o , 120o , 135o}. The aim is to regularize the contours and to keep only those of length upper than lmin according to the 6 main directions of the discrete grid. In fact, the preserved contour segments are then valued with the original colours: s(x, y) = S(x, y) × f (x, y). For all the examples of this paper, the Gaussian derivatives are computed with σ = 2. In Fig. 1 is given an example with a road image from ROMA
98
J. Angulo
f (x, y)
N (fpar/pen )(x, y) fψmass, c0 =(1,1,1)
N (f⊥ )(x, y) fψmass, c0 =(1,1,1)
Fig. 2. Invariance of colour corners extraction using two Noble quaternionic colour corner energy functions N (i)(x, y). The three images were acquired from the same scene under different lighting conditions (natural light, fluorescence tube, tungsten bulb).
database [20]. The aim of this comparative example is to illustrate the colour geometry sketch and the colour corner extraction, and in particular to show how the effect associated to particular reference colour can be imposed. As we can observe, the hypercomplex representation with c0 = (1, 1, 1) extract the main contours and corners independently of their colour. If we focus on the red features by fixing c0 = (1, 0, 0), the parallel part representation focus more specifically on the target colour structure, although of course, other very significant unsaturated structures are also detected. The illumination invariance of colour corners extraction is illustrated in Fig. 2, where the three images were acquired from the same scene under different lighting conditions (natural light, fluorescence tube, tungsten bulb). We verify in particular, as expected in theory, that the chromatic information represented by the perpendicular part to c0 = (1, 1, 1) is more robust than the intensity information (i.e., the parallel part). Using two images differing from the viewpoint and orientation of lighting, and including highlights, we evaluate in Fig. 3 the invariance of colour geometry sketch using two Beltrami
Structure Tensor of Colour Quaternion Image Representations
f (x, y)
B(fhyper )(x, y) fψmass, c0 =(1,1,1)
99
B(f⊥ )(x, y) fψmass, c0 =(1,1,1)
Fig. 3. Invariance of colour geometry sketch using two Beltrami quaternionic colour edge energy functions B(i)(x, y). The two images differing from the viewpoint and orientation of lighting.
quaternionic colour edge energy function. We observe again that the chromatic information involves more robust and invariant features than the global colour information provided in this example by the hypercomplex representation.
5
Conclusion
In this paper, we proposed the extension of first derivatives-based structure tensor for various quaternionic colour image representations. From the quaternionic colour structure tensor, classical corner and edge features have been derived, obtaining in particular colour corners and colour geometry sketch. Experiments show that the colour quaternion-based features are more flexible and powerful than the RGB counterparts; in particular, it is possible to focus on features associated to a particular colour and their invariance properties lead to robust extraction results.
References 1. Angulo, J., Serra, J.: Modelling and Segmentation of Colour Images in Polar Representations. Image and Vision Computing 25(4), 475–495 (2007) 2. Angulo, J.: Quaternion colour representations and derived total orderings for morphological operators. In: CGIV 2008, pp. 417–422 (2008)
100
J. Angulo
3. Bigun, J., Granlund, G., Wiklund, J.: Multidimensional orientation estimation with applications to texture analysis and opitcal flow. IEEE Trans. Patt. Anal. and Mach. Intell. 13(8), 775–790 (1991) 4. Bunyak, F., Palaniappan, K., Nath, S.K., Seetharaman, G.: Flux tensor constrained geodesic active contours with sensor fusion and persistent object tracking. Journal of Multimedia 2(4), 20–33 (2007) 5. Cai, C., Mitra, S.K.: A normalized color difference edge detector based on quaternion representation. In: ICIP 2000 (2000) 6. Cumani, A.: Edge detection in multispectral images. CVGIP: Graphical Models and Image Processing 53(1) (1991) 7. Denis, P., Carré, P., Fernandez-Maloigne, C.: Spatial and spectral quaternionic approaches for colour images. Computer Vision and Image Understanding 107(23), 74–87 (2007) 8. Di Zenzo, S.: A note on the gradient of a multi-image. Computer Vision, Graphics, and Image Processing 33(1), 116–125 (1986) 9. Ell, T.A., Sangwine, S.J.: Hypercomplex Wiener-Khintchine theorem with application to color image correlation. In: IEEE ICIP 2000, vol. II, pp. 792–795 (2000) 10. Ell, T.A., Sangwine, S.J.: Hypercomplex Fourier transform of color images. IEEE Transactions on Image Processing 16(1), 22–35 (2007) 11. Harris, C., Stephens, M.: A combined corner and edge detector. In: Proc. 4th Alvey Vision Conf., vol. 15, pp. 147–151 (1988) 12. Montesinos, P., Gouet, V., Deriche, R.: Differential invariants for color images. In: IAPR International Conference on Pattern Recognition (ICPR 1998), pp. 838–840 (1998) 13. Naik, S.K., Murthy, C.A.: Standardization of edge magnitude in color images. IEEE Trans. on Image Processing 15(9), 2588–2595 (2006) 14. Noble, J.A.: Finding corners. Image and Vision Computing 6(2), 121–128 (1988) 15. Ruzon, M.A., Tomasi, C.: Edge, junction, and corner detection using color distributions. IEEE Trans. Patt. Anal. and Mach. Intell. 23(11), 1281–1295 (2001) 16. Sangwine, S.J., Ell, T.A.: Mathematical approaches to linear vector filtering of colour images. In: CGIV 2002, pp. 348–351 (2002) 17. Schmid, C., Mohr, R., Bauckhage, C.: Evaluation of interest point detectors. International Journal of Computer Vision 37(2), 151–172 (2000) 18. Shi, L., Funt, B.: Quaternion color texture segmentation. Computer Vision and Image Understanding 107(1-2), 88–96 (2007) 19. Sochen, N., Kimmel, R., Malladi, R.: A general framework for low level vision. IEEE Transactions on Image Processing 7(3), 310–318 (1998) 20. Veit, T., Tarel, J.-P., Nicolle, P., Charbonnier, P.: Evaluation of Road Marking Feature Extraction. In: Proceedings of 11th IEEE Conference on Intelligent Transportation Systems (ITSC 2008), Beijing, China, October 12-15 (2008) 21. van de Weijer, J., Gevers, T., Geusebroek, J.M.: Edge and corner detection by photometric quasi-invariants. IEEE Trans. Patt. Anal. and Mach. Intell. 27(4) (2005) 22. van de Weijer, J., Gevers, T., Smeulders, A.W.M.: Robust Photometric Invariant Features From the Colour Tensor. IEEE Trans. on Image Processing 15(1), 118–127 (2006)
Non-linear Filter Response Distributions of Natural Colour Images Alexander Balinsky1 and Nassir Mohammad1,2 1
School of Mathematics, Cardiff University, Cardiff, CF24 4AG, UK 2 Hewlett Packard Laboratories, Bristol, BS34 8QZ, UK {BalinskyA,MohammadN3}@Cardiff.ac.uk,
[email protected]
Abstract. We observe a non-Gaussian heavy tailed distribution for the non-linear filter γ(U )(r) = U (r) − w(Y )rs U (s), (1) s∈N(r)
applied to the chromacity channel ’U’ (and equivalently to ’V’) on individual natural colour images in the colour space YUV. We fit a Generalised Gaussian Distribution (GGD) to the histogram of the filter response, and observe the shape parameter (α) to lie within the range 0 < α < 2, but rarely α > 1. Keywords: Non-Gaussian Statistics, Image Colorization, Non-Linear Filter Response, Natural Colour Statistics.
1
Introduction
Statistical analysis of natural luminance images have revealed an interesting property: non-Gaussian behaviour of image statistics, i.e. high kurtosis, heavy tails, and sharp central cusps (see e.g. [1], [2], [3], [4], [5], [6]). This property has been extensively studied via the emperical distributions on large databases of natural images, establishing image statistics, under common representations such as wavelets or subspace bases (PCA, ICA, Fishers etc.), as non-Gaussian. For example, a popular mechanism for decomposing natural images locally, in space and frequency, using wavelet transforms leads to coefficients that are quite non-Gaussian with the histograms displaying heavy tails and sharp cusps at the median [7]. In this study we show that this striking phenomenon readily follows across to natural colour images. Given an RGB image we convert it to the colour space YUV. (The chromacity images U and V are similar and so we only explain our workings for the U component where the exact same procedure is repeated for the V component.) Our filter takes as input the chromacity channel U and the intensity image Y, the proposed filter is given below, γ(U )(r) = U (r) − w(Y )rs U (s), (2) s∈N (r) A. Tr´ emeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 101–108, 2009. c Springer-Verlag Berlin Heidelberg 2009
102
A. Balinsky and N. Mohammad
where r represents a two dimensional point, N (r) a neighborhood (e.g. 3x3 window) of points around r, and w(Y )rs a weighting function. For our purpose we define two weights: 2
w(Y )rs ∝ e−(Y (r)−Y (s)) and w(Y )rs ∝ 1 +
/2σr 2
,
1 (Y (r) − μr )(Y (s) − μr ), σr2
(3)
(4)
where μr and σr2 are the mean and variance of the intensities in a window around r. The proposed filter thus takes a point r in U and subtracts a weighted average of chromacity values in the neighborhood of r. The w(Y )rs is a weighting function that sums to one over s, large when Y (r) is similar to Y (s), and small when the two intensities are different. Filters with weights (3) and (4) have been used in [8] for the colorization problem. These types of filters are compatible with the hypothesis that the essential geometric contents of an image are contained in its level lines (see [9] for more details). We explain some further symbols that are used in this paper: Assuming X to be a random variable on R with μ and σ 2 the mean and variance of X, respectively, we define: k=
E(X − μ)4 E(X − μ)3 , S= , 4 σ σ3
where k is the kurtosis, S is the skewness. For a normal distribution kurtosis and skewness take the values, 3 and 0, respectively. We also note here that our pictures of probability distributions are shown with the vertical scale not probability but log of probability. This is very important as it shows the nonGaussian nature of these probability distributions more clearly, especially the nature of the tails. Subscript notation e.g. kU and kV denote statistic values for the ’U’ and ’V’ filtered components of an image, respectively.
2
Non-linear Filter Response of Individual Images
Figure 1 shows a sample of 8 natural colour images from our dataset of 25 images which are all bitmap uncompressed, captured using a Canon digital SLR camera, and were chosen to cover a wide spectrum of natural scenes in order to give some measure of robustness to our findings. We did not pay too much attention to the methods of capture, or any subsequent re-calibration as we wish to work with colour images captured via any mode, and believe that when images are considered to be natural this will have little effect on the general properties of the filter response. Applying the non-linear filter on each of the colour channels, U and V, in the image outputs an intensity matrix on which we compute a histogram. We note that application of the filter is only possible within a boundary of the original
Non-linear Filter Response Distributions of Natural Colour Images
(a) balloons
103
(b) indoors
(c) houses
(d) sky
(e) objects
(f) seaside
(g) night
(h) nature
Fig. 1. Here we display a sample of 8 pictures taken from our dataset of 25 images. In order to give a measure of robustness to our findings we chose pictures covering a wide spectrum of natural scenes, ranging from natural landscapes to urban environments. Images shown here are all truecolour RGB obtained by a Canon digital SLR camera of varying resolutions in uncompressed bitmap format, and reduced to sizes in the region of 200x200 pixels using Adobe photoshop.
image, dependent on the size of the neighborhoods used in the filter construction. In our case the filter was not computed on a one pixel boundary of the image. Outputs of the filter on both the colour channels for two of our sample images, ’balloons’ and ’objects’, are shown in Figure 2 as grey-scale intensity images. In [1] they consider the response of derivative filters on calibrated natural luminance images and model the histograms using the GGD. Similarly we fit the following GGD model to our data, f (x) =
1 −|x/s|α e , Z
(5)
104
A. Balinsky and N. Mohammad
(a) balloons U
(c) objects U
(b) balloons V
(d) objects V
Fig. 2. Filter response of each of the colour channels, U and V, of two of our sample images, ’balloons’ and ’objects’, using the first weighting function (3)
where Z is a normalising constant so that the integral of f (x) is 1, s the scale parameter and α the shape parameter, these are directly related to the variance and kurtosis by: s2 Γ ( 3 ) Γ ( α1 )Γ ( α5 ) σ 2 = 2 1α and k = . (6) Γ (α) Γ 2 ( α3 ) Special cases of this distribution occur when α = 1 or α = 2, giving the Laplacian or Gaussian distribution, respectively. We calculate the parameters of the model (numerically) directly from the variance and kurtosis using (6). In [1] they observe that such a calculated model is very close to the best fitting model obtained by minimisation of the mean square error. Figures 3 and 4 show the histograms of the filter response (using weighting function (3)) on each chromacity channel, U,V, for two of our sample images, with the GGD fitting overlaid. The responses are typically concentrated around zero and highly non-Gaussian, exhibiting large kurtosis and heavy tails, as compared with the normal distribution.
Non-linear Filter Response Distributions of Natural Colour Images
105
Fig. 3. Distribution of the filter response for both chromacity channels U and V for the image ’balloons’ from figure 1 using the first weighting function (3)
Table 1 shows the associated parameters for each filtered image using the first weighting function (3). We observe that kurtosis is greater than that of the normal distribution for all the images considered. α is seen to lie within the range [0, 1], with the parameter varying from image to image, the only exception being the distribution of the U-filtered response for image: ’indoors’. This was the only component of an image to show α > 1 in our diverse dataset of images.
106
A. Balinsky and N. Mohammad
Fig. 4. Distribution of the filter response for both chromacity channels U and V for the image ’objects’ from figure 1 using the first weighting function (3). The non-Gaussian; high kurtosis, heavy tailed distribution is clearly observed and is typical of the images used in our dataset.
Generally, responses also exhibit some degree of skewness and have very low variance. Table 2 shows the statistics obtained by filtering the same set of 8 images, but using the second weighting function. Results are similar and show again that the filter response is highly non-Gaussian.
Non-linear Filter Response Distributions of Natural Colour Images
107
Table 1. Statistics of the non-linear filter response for our sample images using the first weighting function (3) Image U filtered response αU kU SU balloons 0.695 11.23 -0.17 indoors 1.11 5.22 -0.05 houses 0.624 14.18 -0.39 sky 0.344 94.00 0.64 objects 0.54 20.35 0.68 seaside 0.539 20.44 0.63 night 0.944 6.52 0.03 nature 0.745 9.76 -0.11
V filtered response αV kV SV 0.624 14.23 0.03 0.619 14.45 0.16 0.633 13.74 0.68 0.328 114.87 -2.58 0.662 12.43 -0.08 0.491 26.60 0.09 0.561 18.37 0.13 0.826 8.11 0.26
Table 2. Statistics of the non-linear filter response for our sample images using the second weighting function (4) Image U filtered response αU kU SU balloons 0.685 11.57 -0.19 indoors 1.094 5.31 -0.07 houses 0.61 14.98 -0.41 sky 0.339 99.38 0.78 objects 0.534 21.02 0.62 seaside 0.54 20.39 0.60 night 0.931 6.66 0.07 nature 0.736 10.00 -0.07
V filtered response αV kV SV 0.624 14.19 -0.01 0.599 15.62 0.16 0.607 15.14 0.76 0.321 126.27 -2.48 0.654 12.79 -0.07 0.489 26.84 0.14 0.556 18.85 0.18 0.811 8.37 0.26
We note here that the jpeg standard of image compression and storage is common place and hence we wanted to see how the filter holds under this form of compression. In order to do this we converted samples of the bitmap images from our dataset to the jpeg standard and filtered the images. Results (not shown) were again similar: non-Gaussian, heavy tailed distribution of the filter response on both chromacity channels.
3
Summary
In this paper we observe that conversion of an RGB natural colour image into the space YUV and subsequent performance of the non-linear filter independently on each of the chrominance chanels, U and V, results in a filter response that is highly non-Gaussian, exhibiting heavy tails and large kurtosis. We fit a Generalised Gausian Distribution to the histogram of the filter response and obtain scale parameters α that vary from image to image, lying within the range 0 < α < 2, but rarely α > 1. In order to show some measure of robustness we used a dataset of images that covered a diverse range of natural scenes, with the
108
A. Balinsky and N. Mohammad
distribution of the filter response always the same, i.e. non-Gaussian with heavy tails. In future work we intend to develop Bayesian analysis of the colorization problem using the GGD (5) as a regularization term. Our results indicate an interesting connection between ’compressive sensing’ and the U,V components of natural images.
Acknowledgments We would like to thank Stephen Pollard and Andrew Hunter from HP Labs, Bristol, UK for providing some of the images used in the study and for fruitful discussion. This work was supported in part by grants from the Engineering and Physical Sciences Research Council (EPSRC) and Hewlett Packard Labs, awarded through the Smith Institute Knowledge Transfer Network.
References 1. Huang, J., Mumford, D.: Statistics of Natural Images and Models. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, Fort Collins, CO, vol. 1, pp. 541–547 (1999) 2. Mallat, S.G.: A Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE Trans. Pattern Anal. Machine Intell. 11, 674–693 (1989) 3. Mumford, D.: Empirical Investigations into the Statistics of Clutter and the Mathematical Models it Leads To, Lecture for the Review of ARO Metric Pattern Theory Collaborative, Brown Univ., Providence, RI (2000) 4. Wainwright, M.J., Simoncelli, E.P., Willsky, A.S.: Random Cascades on Wavelet Trees and their use in Analysing and Modelling Natural Images. Appl. Comput. Harman, Anal. 11, 89–123 (2001) 5. Field, D.J.: Relations between the Statistics of Natural Images and the Response Properties of Cortical Cells. J. Opt. Amer. 4(12), 2379–2394 (1987) 6. Olshausen, B.A., Field, D.J.: Natural Image Statistics and Efficient Coding, New. Computation Neural syst. 7 (1996) 7. Srivastava, A.: Stochastic Models for Capturing Image Variability. IEEE Signal Procesing Magazine 19(5), 63–76 (2002) 8. Levin, A., Lischinski, D., Weiss, Y.: Colorization using optimization. ACM Transactions on Graphics 23(3), 689–694 (2004) 9. Caselles, V., Coll, B., Morel, J.M.: Geometry and color in natural images. Journal of Mathematical Imaging and Vision 16(2), 89–105 (2002)
Perceptual Color Correction: A Variational Perspective Edoardo Provenzi Departament de Tecnologies de la Informaci´ o i les Comunicacions, Universitat Pompeu Fabra, C/T` anger 122-140, 08018, Barcelona, Spain
Abstract. Variational techniques provide a powerful tool for understanding image features and creating new efficient algorithms. In the past twenty years, this machinery has been also applied to color images. Recently, a general variational framework that incorporates the basic phenomenological characteristics of the human visual system has been built. Here we recall the structure of this framework and give noticeable examples. We then propose a new analytic expression for a parameter that regulates contrast enhancement. This formula is defined in terms of intrinsic image features, so that the parameter no longer needs to be empirically set by a user, but it is automatically determined by the image itself.
1
Introduction
Human vision is a process of great complexity that involves many features as, e.g. shape and pattern recognition, movement analysis and color perception. In this paper we will focus on this last one. This process begins with light capture by the three different types of cone pigments inside the retina. When a light stimulus activates a cone, a photochemical transition occurs, producing a nerve impulse that reaches the brain, where it is analyzed and interpreted. However, neither retina photochemistry nor nervous impulses propagation are well understood, hence a deterministic characterization of the visual process is unavailable. For this reason, the majority of color perception models follow a descriptive approach, trying to simulate macroscopic characteristics of color vision, rather than reproduce neurophysiological activity. A common characteristic of these models is that the Human Visual System (HVS) features are taken as inspiration to devise the explicit equations of a perceptual algorithms. The point of view adopted in [1] is rather different, since it focuses on the translation of the basic macroscopic phenomenological characteristics of color vision into mathematical axioms to be fulfilled by a variational energy functional in order to be considered perceptually inspired. Remarkably, it can be proven that only one class of energy functionals can cumply with all the axioms at once. Once a perceptual energy is fixed, the Euler-Lagrange equations corresponding to its minimization give rise to a computational algorithm that can be used to perform perceptual color correction. The advantage of this point of view A. Tr´ emeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 109–119, 2009. c Springer-Verlag Berlin Heidelberg 2009
110
E. Provenzi
relies in the intertwining between the algorithm equation and the corresponding variational energy, which permits to better understand the algorithm behavior in terms of important image features as tone dispersion or contrast. A parameter appearing in these models is a coefficient which controls the degree of contrast enhancement. Here we propose an image-driven formula that respects the HVS phenomenology and automatically set this parameter pixel-wise, so that it does not need to be superimposed by a user.
2
Basic Human Color Perception Phenomenology
The phenomenological properties of human color perception that we consider as basics for our axiomatic construction are three: color constancy, local contrast enhancement and visual adaptation. Let us begin with human color constancy: It is well known that the HVS has strong ‘normalization’ properties, i.e. humans can perceive colors of a scene almost independently of the spectral electromagnetic composition of a uniform illuminant, usually called color cast [2]. This peculiar ability is known as human color constancy and it is not perfect [3] because, even though human color perception strongly depends on the context, i.e. on the relative radiance rather than on the absolute one, also absolute luminance information plays a role in the entire color perception process. The majority of color modeFor this reason, the majority of color perception models follow a descriptive approach, trying to simulate macroscopic characteristics of color vision, rather than reproduce neurophysiological activity.ls try to remove color cast due to illuminant looking for the invariant component of the light signal: the physical reflectance of objects. However it is well known that the separation between illuminant and reflectance is an ill-posed problem, unless one imposes further constraints which are not verified by all images [4]. Instead, in [1], another approach based on contrast enhancement is used to remove color cast. That method is based on the consideration that an image with color cast is always characterized by one chromatic channel with remarkably different standard deviation σ with respect to the others (not to be confused with an image with dominant color, e.g. the close up of a leaf, in which the average value of a channel prevails). Since standard deviation is a measure of average contrast, it is clear that contrast enhancement can help decreasing the difference between σR , σG and σB , by spreading the intensity values of all separate chromatic channels, thus reducing color cast. Let us now recall the local contrast enhancement property: while looking at a natural scene, the HVS enhances local edges in order to better distinguish shapes and objects. Well known phenomena exhibiting this effect are Mach bands and simultaneous contrast [5]. Local HVS features related to color perception can be detected also in more complex scenes and go under the general name of ‘chromatic induction’ [6,7]. Experimental evidences show that the strength of chromatic induction between two different areas of a scene decreases monotonically with their Euclidean distance, even though a precise analytical description is not yet available [7].
Perceptual Color Correction: A Variational Perspective
111
Finally, let us remember the adaptation properties of the HVS. The range of light intensity levels to which the HVS can adapt is impressive, on the order of 1010 , from the scotopic threshold to the glare limit [5]. However, the HVS system cannot operate over such a range simultaneously, rather it adapts to each scene to prevent saturation which would depress contrast sensitivity [8]. During this adaptation, the HVS shifts its sensitivity to light stimuli in order to present only modulations around the average intensity level of a scene [8]. This provides a phenomenological motivation for the so-called ‘gray-world’ (GW) assumption, which says that the average color in a scene is gray [9].
3
Assumptions for a Perceptually-Inspired Variational Energy Functional
The content of this section have been discussed and proved in detail in [1]. Here we only aim at summarizing the most relevant information of that work. Let us fix the notation. Given a discrete RGB image, we denote by I = {1, . . . , W } × {1, . . . , H} ⊂ Z2 its spatial domain, W, H ≥ 1 being integers; x = (x1 , x2 ) and y = (y1 , y2 ) denote the coordinates of two arbitrary pixels in I. We will always consider a normalized dynamic range in [0, 1], so that a color image function is denoted by I : I → [0, 1]3 , I(x) = (IR (x), IG (x), IB (x)), where Ic (x) is the intensity level of the pixel x ∈ I in the chromatic channel c ∈ {R, G, B}. All computations will be performed on the scalar components of the image, thus treating independently each chromatic channel, written, for simplicity, as I(x). 3.1
Assumption 1: General Structure of a Perceptual Energy Functional
In order to write the general structure of a perceptually inspired energy functional, note that human color perception is characterized by both local and global features: contrast enhancement has a local nature, i.e. spatially variant, while visual adaptation and attachment to original data (implied by the failure of color constancy) have a global nature, i.e. spatially invariant, in the sense that they do not depend on the intensity distribution in the neighborhood. These basic considerations imply that a perceptually inspired energy functional should contain two terms: one spatially-dependent term whose minimization leads to a local contrast enhancement and one global term whose minimization leads to a control of the departure from both original pointwise values and the middle gray, which, in our normalized dynamic range, is 1/2. Let us first describe the general form of the contrast enhancement terms. For that we need a contrast measure c(a, b) between two gray levels a, b > 0 (to avoid some singular cases, we shall assume that intensity image values are always positive). We require the contrast function c : (0, +∞) × (0, +∞) → R to be continuous, symmetric in (a, b), i.e. c(a, b) = c(b, a), increasing when min(a, b) decreases or max(a, b) increases. Basic examples of contrast measures are c = |a − b| ≡ max(a, b) − min(a, b) or c(a, b) = max(a,b) min(a,b) .
112
E. Provenzi
Since our purpose is to enhance contrast by minimizing an energy, we define an inverse contrast function c(a, b), still continuous and symmetric in (a, b), but decreasing when min(a, b) decreases or max(a, b) increases. Notice that, if c(a, b) is a contrast measure, then c(a, b) = −c(a, b) or c(a, b) = 1/c(a, b) is an inverse contrast measure, so that basic examples of inverse contrast are: min(a,b) c(a, b) = min(a, b) − max(a, b) or c(a, b) = max(a,b) . Let us now introduce a normalized weighting function to localize the contrast computation. Let w : I × I → R+ be a positive symmetric kernel, i.e. such that w(x, y) = w(y, x) > 0, for all x, y ∈ I, that measures the mutual influence between the pixels x, y. The symmetry requirement is motivated by the fact that the mutual chromatic induction is independent on the order of the two pixel considered. Usually, we assume that w(x, y) is a function of the Euclidean distance x − yI between the two points. We shall assume that the kernel is normalized, i.e. that w(x, y) = 1 ∀x ∈ I. (1) y∈I
Given an inverse contrast function c(a, b) and a positive symmetric kernel w(x, y), we define a contrast energy term by Cw (I) = w(x, y) c(I(x), I(y)) . (2) x∈I y∈I
The symmetry assumption implies that c(a, b) = c˜(min(a, b), max(a, b)) for some function c˜ (indeed well defined by this identity). Notice that c˜ is non-decreasing in the first argument and non-increasing in the second one. Let us now consider the term that should control the dispersion. As suggested previously, it should realize an attachment to the initial given image I0 and to the average illumination value, which we assume to be 1/2. Thus, we define two dispersion functions: d1 (I(x), I0 (x)) to measure the separation between I(x) and I0 (x), and d2 (I(x), 12 ) which measures the separation from the value 1/2. Both d1 and d2 are continuous functions d1,2 : R2 → R+ such that d1,2 (a, a) = 0 for any a ∈ R, and d1,2 (a, b) > 0 if a
= b. We write dI0 , 12 (I(x)) = d1 (I(x), I0 (x)) + 1 d2 (I(x), 2 ), and the dispersion energy term as D(I) = dI0 , 12 (I(x)) . (3) x∈I
We can now formulate our first assumption. Assumption 1. The general structure of a perceptually inspired color correction energy functional is Ew (I) = D(I) + Cw (I), (4) where Cw (I) and D(I) are the contrast and dispersion terms defined in (2) and (3), respectively. The minimization of D must provide a control of the dispersion around 1/2 and around the original intensity values. The minimization of Cw must provide a local contrast enhancement.
Perceptual Color Correction: A Variational Perspective
3.2
113
Assumption 2: Properties of the Contrast Function
In order to find out which properties the contrast term should satisfy, let us observe that an overall change in intensity, measured by the generic quantity λ > 0, does not affect the visual sensation. This requires the contrast function c to be homogeneous, recalling that c is homogeneous of degree n ∈ Z if c(λa, λb) = λn c(a, b) ∀λ, a, b ∈ (0, +∞),
(5)
where a and b are synthetic representations of I(x) and I(y). Of course, if n = 0, c automatically disregards the presence of λ, but we can say more: since λ can take any positive value, if we set λ = 1/b, we may write equation (5) as: a c(a, b) = bn c ,1 ∀a, b ∈ (0, +∞), (6) b so, when n = 0, bn = 1 and thus c results as a function of the ratio a/b which intrinsically disregards overall changes in light intensity. If n > 0, then λ has a global influence and could be removed performing a suitable normalization (for instance, dividing by the n-th power of the highest intensity level). We can formalize these considerations in our second assumption. Assumption 2. We assume that the inverse contrast function c(a, b) is homogeneous. Thanks to the arguments presented so far, we have that inverse contrast functions which are homogeneous of degree n = 0 are those that can be written as a min(I(x),I(y)) monotone non-decreasing function of max(I(x),I(y)) . Let us now introduce into the discussion the important Weber-Fechner’s law 0 [5] which says that the so-called Weber-Fechner ratio RW F ≡ I1I−I , i.e. the 0 ratio between an intensity variation ΔI ≡ I1 − I0 and the background intensity I0 , remains constant. The consequence is that the same variation is perceived in a weaker way as the strenght of the intensity increases. Even though WeberFechner’s law is not perfect [10], the intensity range over which it is in good agreement with experience (called ‘Weber-Fechner’s domain’) is still comparable to the dynamic range of most electronic imaging systems [10]. Since RW F = I1 /I0 − 1, Weber-Fechner’s law is saying that the perceived contrast is a function of I1 /I0 . This reason motivates us to say that c(a, b) is a generalized Weber-Fechner contrast function if c is an inverse contrast function which can be written as a non-decreasing function of min(a, b)/ max(a, b). Hence, we can particularize assumption 2 as follows. Assumption 2’. We assume that c is a generalized Weber-Fechner contrast function. Noticeable examples of contrast terms are the following: Cwid (I) :=
1 min(I(x), I(y)) w(x, y) , 4 max(I(x), I(y)) x∈I y∈I
(7)
114
E. Provenzi
Cwlog (I)
1 := w(x, y) log 4 x∈I y∈I
min(I(x), I(y)) max(I(x), I(y))
,
(8)
The upper symbol in the above definitions of Cw simply specifies the monotone min(I(x),I(y)) function applied on the basic contrast variable t := max(I(x),I(y)) . To refer to f any one of them we use the notation Cw (I), where f = id, log. Notice that the function t = min(I(x), I(y))/ max(I(x), I(y)) is minimized when min(I(x), I(y)) takes the smallest possible value and max(I(x), I(y)) takes the largest possible one, which corresponds to a contrast stretching. Thus, minimizing an increasing function of the variable t, will produce a contrast enhancement. 3.3
Assumption 3: Entropic Dispersion Term
The main features of the dispersion term have to be its attachment to the initial given image I0 and to the average illumination value, which we assume to be 1/2. In principle, to measure the dispersion of I with respect to I0 or 1/2, any distance function can be used. However, let us notice that, given that contrast terms are expressed as homogeneous functions of degree 0, the variational derivatives are homogeneous functions of degree -1. Since the previous axioms do not give any precise indication about the analytical form of the dispersion term that should be chosen, we search for functions able to maintain coherence with this homogeneity. E A good candidate for this is the entropic dispersion term Dα,β (I): α
1 x∈I
2
log
1 − 2I(x)
1 I0 (x) − I(x) +β I0 (x) log − (I0 (x) − I(x)) , 2 I(x) x∈I
(9) where α, β > 0, which is based on the relative entropy distance [11] between I and 1/2 (the first term) and between I0 and I(the second term). Notice that, if a > 0 and f (s) = a log as − (a − s), s ∈ (0, 1], has a global minimum in s = a. In particular, this holds when a = I0 (x) or a = 1/2. Given the statistical interpreE tation of entropy, we can say that minimizing Dα,β (I) amounts to minimizing the disorder of intensity levels around 1/2 and around the original data I0 (x). E Thus, Dα,β (I) accomplishes the required tasks of a dispersion term.
4
Minimization of the Energy Functionals f E f Ew,α,β (I) = Dα,β (I) + Cw (I)
f The minimization of Ew,α,β (I), f = id, log, −M corresponds to a trade-off between two opposing mechanisms: on one hand we have entropic control of dispersion around 1/2 and around original data, on the other hand we have local contrast enhancement. The existence of a minimum in the discrete framework can be guaranteed for a quite general class of energy functionals see [1] for details.
Perceptual Color Correction: A Variational Perspective
115
f E f Assume that α, β > 0 are fixed. If Ew,α,β (I) = Dα,β (I) + Cw (I), then, by
f E f linearity of the variational derivative, we have δEw,α,β (I) = δDα,β (I) + δCw (I).
f f The minimum of Ew,α,β (I) satisfies δEw,α,β (I) = 0. To search for the minimum a semi-implicit discrete gradient descent strategy with respect to log I can be used. This corresponds to using a gradient descent approach in which the metric is the relative entropy, instead of the usual quadratic distance (see [11]). The continuous gradient descent equation is f ∂t log I = −δEw,α,β (I),
(10)
with t being the evolution parameter. Since ∂t log I = 1I ∂t I, we have f ∂t I = −IδEw,α,β (I).
(11)
Let us now discretize the scheme: choosing a finite evolution step Δt > 0 and setting I k (x) = IkΔt (x), k ∈ N, with I 0 (x) being the original image being, then, E by direct computation of δDα,β (I), the semi-implicit discretization of (11) is 1 − I k+1 (x) + β I0 (x) − I k+1 (x) − I k (x)δCwf (I k )(x). 2 (12) log id k log k The terms −2I k (x)δCwid (I k )(x) ≡ Rw,I (x) and −2I (x)δC (I )(x) ≡ R (x) k w w,I k can be explicitly written as [1]: I k+1 (x) − I k (x) =α Δt
k
I (y) I k (x) id Rw,I w(x, y) k sign+ (I k (x)−I k (y))− k sign− (I k (x)−I k (y)) ; k (x) = I (x) I (y) y∈I
(13) log Rw,I k (x) =
w(x, y) sign(I k (x) − I k (y)),
(14)
y∈I
where we set, for every ξ ∈ R, ⎧ ⎨1
if ξ > 0, sign(ξ) := 12 if ξ = 0, ⎩ −1 if ξ < 0,
⎧ ⎨ 1 if ξ > 0, sign+ (ξ) := 12 if ξ = 0, ⎩ 0 if ξ < 0,
sign− (ξ) = 1−sign+ (ξ).
(15) Equation (12) can be used to implement iterative computational algorithms for perceptual color image enhancement. In [12,1,13] it is shown that the algorithm corresponding to f = log is a variational version of the ACE algorithm [14], while the one corresponding to f = id is a variational version of the (anti-)symmetrized Retinex algorithm [2,15].
116
5
E. Provenzi
An Image-Driven Contrast Enhancement Parameter
From (12), it follows that the discrete equation that represents the variational method is the following: I k (x) + Δt α2 + βI 0 (x) + 12 RIfk (x) I k+1 (x) = , (16) 1 + Δt(α + β) α and β represent the strength of the attachment to 1/2 and to I 0 , respectively. RIfk (x) performs contrast enhancement and can be written in the general f k k form RIfk (x) = y∈I w(x, y) r (I (x), I (y)), where, for f = id, log we have k
k
(y) − k k rid (I k (x), I k (y)) = II k (x) sign+ (I k (x) − I k (y)) − II k (x) (y) sign (I (x) − I (y)) and log k k k k r (I (x), I (y)) = sign(I (x) − I (y)). In practice, the sign functions appearing in rid and rlog are too singular to be used without producing artifacts, so that a smoothed version is needed, we take arctan(s(I k (x)−I k (y))) signs (I k (x) − I k (y)) = maxy∈I , s > 1 defining its slope. {| arctan(s(I k (x)−I k (y)))|} It turns out that a proper election of the parameter s is crucial to perform a suitable color enhancement. So far, this parameter was let as a free user parameter, here we propose a formula to set s as a function of image features in line with human visual perception, so that it does not need to be imposed by a user with an expensive try-and-look procedure. The function we propose is the following
sw (x) =
1 1 − σw (x) · , μw (x) σw (x)
(17)
where μw (x) is the local average intensity and σw (x) is the local standard deviation σw (x): μw (x) =
y∈I
w(x, y)I k (y),
σw (x) =
2
w(x, y) (I k (y) − μw (x)) .
(18)
y∈I
The first factor (‘WF factor’) is introduced in accordance with the WeberFechner law: the HVS is more sensitive to contrast variations in low intensity areas rather than in bright ones, coherently with this, the WF factor increases sw (x) in low intensity areas. The second factor (‘homogeneity factor’) expresses the fact that the HVS is more sensible to contrast variations in homogeneous zones than in detailed ones, thus, using the local standard deviation as a measure of detail inside a given image area, we have that sw (x) increases when gets small. Because of the dependence on x, it is impossible to depict sw (x), however, it is clear that both the factors appearing in the formula show a hyperbolically decreasing behavior as the local average intensity or local standard deviation increase.
Perceptual Color Correction: A Variational Perspective
117
Fig. 1. First row: original images. Second row: filtered versions with fixed slope parameter s = 10. Third row: filtered versions with varying slope parameter sw (x) as in (17). id The contrast term used is Cw and the parameters α and β are both set to 1.
Both μw (x) and σw (x) can be computed through convolutions (denoted with the usual symbol ∗), which can be rapidly implemented thanks to the Fast Fourier Transform FFT. In fact, by definition, μw (x) = (I k ∗ w)(x) and, expanding the binomial expression in the definition of σw (x), 2 σw (x) =
y∈I
2
w(x, y)I k (y) − 2μw (x)
y∈I
w(x, y)I k (y) + μw (x)
2
w(x, y) (19)
y∈I
but, remembering that w is normalized and the definition of μw (x) we can write 2 σw (x) = (w ∗ (I k )2 )(x) − [(w ∗ I k )(x)]2 .
118
E. Provenzi
Since the convolutions involved in the computation of s(x) must be comf puted to approximate the function Rw,I k (x), as explained in [12,1], this formula does not increase the computational complexity of the algorithm, which remains O(N ) log(N ), where N is the total number of pixels. Hereafter, we present some results that will show the soundness of this proposal for sw (x). The local automatic contrast parameter sw (x) still provides sound results and a better rendition of details in dark zones, as can be noticed in all images, in particular in the book one. Remark: After many experiments, we empirically found out that a kernel function w corresponding to overall good results is extended over the entire image and has this analytical expression w(x, y) = 1/x − yI . It would be interesting to see if there is a relationship between this empirical function and physiological o psychophysical properties of human vision.
6
Conclusions
We have summarized a recently developed variational framework for perceptual color correction models [1]. We have also proposed an image-driven formula to automatically set a parameter that influences the strenght of contrast enhancement. This function is in line with human vision properties and its computation does not increase the computational complexity of the algorithm.
Acknowledgement The author acknowledges the Ram´on y Cajal fellowship by Ministerio de Ciencia y Tecnolog´ıa de Espa˜ na.
References 1. Palma-Amestoy, R., Provenzi, E., Caselles, V., Bertalm´ıo, M.: A perceptually inspired variational framework for color enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(3), 458–474 (2009) 2. Land, E., McCann, J.: Lightness and Retinex theory. Journal of the Optical Society of America 61(1), 1–11 (1971) 3. West, G.: Color perception and the limits of color constancy. Journal of Mathematical Biology 8, 47–53 (1979) 4. Hurlbert, A.: Formal connections between lightness algorithms. Journ. Opt. Soc. Am. A 3, 1684–1693 (1986) 5. Gonzales, R., Woods, R.: Digital image processing. Prentice-Hall, Englewood Cliffs (2002) 6. Creutzfeld, O., Lange-Malecki, B., Wortmann, K.: Darkness induction, retinex and cooperative mechanisms in vision. Exp. Brain Res. 67, 270–283 (1987) 7. Zaidi, Q.: Color and brightness induction: from Mach bands to three-dimensional configurations. Cambridge University Press, New York (1999)
Perceptual Color Correction: A Variational Perspective
119
8. Shapley, R., Enroth-Cugell, C.: Visual adaptation and retinal gain controls 3, 263– 346 (1984) 9. Buchsbaum, G.: A spatial processor model for object colour perception. Journal of the Franklin Institute 310, 337–350 (1980) 10. Pratt, W.: Digital Image Processing. J. Wiley & Sons, Chichester (2007) 11. Ambrosio, L., Gigli, N., Savar´e, G.: Gradient flows in metric spaces and in the space of probability measures. Lectures in Mathematics. Birkh¨ auser, Basel (2005) 12. Bertalm´ıo, M., Caselles, V., Provenzi, E., Rizzi, A.: Perceptual color correction through variational techniques. IEEE Trans. on Image Proc. 16, 1058–1072 (2007) 13. Bertalm´ıo, M., Caselles, V., Provenzi, E.: Issues about Retinex Theory and Contrast Enhancement. International Journal of Computer Vision 83, 101–119 (2009) 14. Rizzi, A., Gatta, C., Marini, D.: A new algorithm for unsupervised global and local color correction. Pattern Recognition Letters 24, 1663–1677 (2003) 15. Provenzi, E., De Carli, L., Rizzi, A., Marini, D.: Mathematical definition and analysis of the Retinex algorithm. Journal of the Optical Society of America A 22(12), 2613–2621 (2005)
A Computationally Efficient Technique for Image Colorization Adrian Pipirigeanu, Vladimir Bochko, and Jussi Parkkinen Department of Computer Science and Statistics, University of Joensuu, 80101 Joensuu, Finland
[email protected],
[email protected]
Abstract. In this paper, the fast technique for image colorization is considered. The proposed method transfers colors from the color image (source) to the gray level image (target). For the source image, we use the segmented uniformly colored regions (dielectric surfaces) under single color illumination. This method maps the gray level image into the color space by means of parametrical mapping learnt using PCA and principal components regression. The experiments show the method’s feasibility for colorizing the objects, and textures, as well.
1
Introduction
The demand for colorization is increasing, especially in the movie industry, television, the medical industry, computer graphics and scientific visualization. Colorization is still expensive, complicated and time-consuming and requires human participation. There are several successful colorization schemes [11], [8]. These methods consider two images: a color image (source) and a gray level image (target). Then color is transferred from the source to the target if pixels match in both images. The methods implement matching by using achromatic information - intensity, pixel neighborhood statistics, and texture features. In Welsh’s method, the user defines the color regions (swatches) used in colorization [11]. In Levin’s method, the interactive colorization is improved by using color scribbles [8]. Color scribbles are used for gray level images and then color propagates to the neighbor pixels via optimization. Further improvement was made in the Horiuchi and Tominaga work, where the appropriate color seeds are automatically selected and propagate to adjacent pixels [5]. Their colorization scheme, incorporating the spatial pixel location and luminance value matching, is efficiently used in color image coding. The automatic colorization of gray level images is proposed by Vieira et al. [10]. They eliminate human participation in selection of the source images by using content-based image retrieval methods. A colorization approach where colorized images look realistic was proposed by Drew and Finlayson [4]. The colorized images are obtained by matching the gradient perceived in the original gray level image. In this case, the input color may mismatch the synthetic colors. A. Tr´ emeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 120–129, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Computationally Efficient Technique for Image Colorization
121
For methods [11], [8] [5], [10], [3] realistic colorization may be difficult because they are not based on a physical model. Therefore, the other methods incorporating physical image properties are proposed, [6], [1]. Horiuchi and Kotera consider colorization of gray level images based on a diffuse-only reflection model [6]. Their approach makes colorization more realistic. Abadpour and Kasaei propose the efficient PCA based colorization method [1]. For the segmentation defined by the user, the method generates the color vector related to the gray level pixel value as the function of the gray level pixel value, the mean of the gray level image, the mean vector and the first eigenvector of color image. The experimental results show that the PCA based method is superior to Welsh’s method for several test images. However, methods [6], [1] are suitable for color images described by the incomplete physical model presented either by diffuse reflection (shadow and color) or by color and highlight. For example, Fig. 8 from [1] illustrates this and indicates that PCA is good for analysis of the linear or low degree nonlinear data structure. Our method works well with the incomplete and complete physical models because it explores nonlinear correlation between principal components. We propose modification of the authors’ algorithm presented in [2]. To make the algorithm more computationally efficient, we replace the regression based on machine learning by closed form solution. The method works with segmented uniformly colored regions (dielectric surfaces) under single color illumination. The algorithm colorizes objects, and textures, as well. In Section 2, we describe methodology of our approach. The experimental results are presented in Section 3. Finally, we give our conclusions in Section 4.
2 2.1
Methodology Data Model for Colorization
Our design scheme for colorization includes a source, that is an RGB color image, and a target, that is a gray level image. The colors are then transfered from the source to the target. We assume that the source image has a roughly constant hue. Then, the physical approach suggests that the dichromatic reflection model describes data in color space [7]. The model considers the data structure as a linear combination of two linear clusters related to highlight and body reflection. Then, the first two eigenvectors span the data space. Alternatively, we consider the data structure as a single nonlinear cluster [2]. In this case, the intrinsic dimensionality of the data is one. To describe the data structure we still need the first two eigenvectors but the second principal component is not required because we approximate it using the first principal component. We assume that there is a single-valued mapping between these components. Thus, in addition to the two eigenvectors, we have to find the single-valued function of the first principal component approximating the second principal component. On the one hand, it is important for colorization to reduce the data dimensionality to one component where the color information is discarded. Then we simply replace the component with the gray level image. On the other hand, the first two eigenvectors and the mapping function contain information about color. This is
122
A. Pipirigeanu, V. Bochko, and J. Parkkinen
PCA decomposition
Color image
Parametrical vector computation
Mask
PCA decomposition
Color image
Principal component regression Mask
Replacement of the first component
Gray level image
Principal component regression
Mask
PCA reconstruction
Reconstructed image
(a)
PCA reconstruction
Colorized image
(b)
Fig. 1. a) A flow chart for image analysis. b) A flow chart for image colorization.
used in our colorization scheme because we have a straightforward way to embed color into the gray level image. 2.2
Regressive PCA
In this section, we consider an approach that combines two techniques, these being PCA and principal components regression. We call this approach regressive PCA (RPCA). A similar technique referred to as principal components regression (PCR) is used for enhancing regression accuracy in the feature space defined by the first principal components [9]. RPCA occupies the intermediate place between the design schemes dealing with linear and (high degree) nonlinear data structures, due to its limitation of assuming a single-valued mapping between components. Unlike [2], the main point of this paper is to use an analytic closed form solution. As a result, we expect to reduce a demand for computation time. The computation time issue is especially important when the volume of data being colorized is large. The most suitable way for us is to represent the RGB color image as a vector x = (x1 , x2 , x3 )T , where the vector elements are random variables and T denotes the transposition. We consider that x1 is a red component, x2 is a green component and x3 is a blue component. For computing, instead of the random
A Computationally Efficient Technique for Image Colorization
123
variables we use the sample or observation sets. We obtain these sets by stacking or concatenating the pixels values along rows or columns in each component image to arrange them as long vectors. Then these vectors are the elements in x. For the vector x, the elements are the row vectors while for the transposed vector xT , the elements are the column vectors. Thus, we assume that x is a three-dimensional observed vector x = (x1 , x2 , x3 )T ,
(1)
For PCA, the vector x in Eq. 1 is first centered by subtracting its mean μ = E(x), x ← x − μ, (2) where the symbol ← represents substitution when the value of the right-hand side is computed and substituted in x. The covariance matrix is then defined as C = E(xxT ),
(3)
where E denotes an expectation operator. PCA involves an eigen decomposition as follows: C = U ΛU T , (4) where U = (e1 , e2 , e3 ), ei = (ei1 , ei2 , ei3 )T are the eigenvectors and Λ is a diagonal matrix of the sorted eigenvalues of the covariance matrix, Λ = diag|λ1 , λ2 , λ3 |, λ1 > λ2 > λ3 . Thus, the first two principal components are given by: y2 = U2T x,
(5)
where y2 = (y1 , y2 )T and U2 = (e1 , e2 ). Then, polynomial regression is used to approximate the second principal component by the first principal component as follows: M yˆ2 = f (y1 , w) = wi Φi (y1 ) = wT Φ(y1 ), (6) i=1
where yˆ2 is the approximated y2 principal component, f () is a mapping function, Φ represents the polynomial basis functions, Φ = (Φ1 , . . . , ΦM )T , and w is a vector of the polynomial coefficients or a parametrical vector, w = (w1 , . . . , wM )T . M is an order of the polynomial used for approximation (in our case M = 3). The vector w is learnt from the regression approximation of y2 by y1 . The polynomial basis functions Φ are considered Φj = y j , j = 1, . . . , M , so the equation Eq. 6 becomes yˆ2 = (w1 , w2 , w3 )(y1 , y12 , y13 )T . (7) The parametric vector minimizing the approximation error is as follows: w = (ΦT Φ)−1 ΦT y2 .
(8)
In Eq. 8, we use Φ ← ΦT , where ← means substitution, and the samples of y2 are arranged as a column. The term Φ† = (ΦT Φ)−1 ΦT is the Moore-Penrose pseudoinverse of the matrix Φ.
124
A. Pipirigeanu, V. Bochko, and J. Parkkinen
The real and approximated components are an estimate of yˆ2 , yˆ2 = (y1 , yˆ2 )T . Finally, since U is an orthogonal matrix satisfying U U T = I, where I is an identity matrix, reconstruction is defined as ˆ = U2 yˆ2 + μ. x 2.3
(9)
System Diagram
We consider two flow charts related to image analysis and colorization. The image analysis is important in order to examine how reconstruction is made in comparison with the original images. Fig. 1 shows the flow charts. For image analysis, the color image is given. In addition, the user manually defines the mask corresponding to the region or the object which is used as a source image in colorization. These regions should contain shadows, colors and highlights, as well. PCA is used only for the masked regions. After PCA decomposition, principal components regression is implemented. We approximate the second principal component by the first one and reconstruct the image using PCA. Finally, we obtain the reconstructed image that is an image with synthetic colors. For image colorization, the source image (color image) and the target image (gray level image) are given. The user manually defines the masks for both images. After selecting the mask area and PCA decomposition of the color image, we compute the parametrical vector Eq. 8. Then, the selected region of the gray level image replaces the first principal component y1 . The single requirement here is that the gray level image should have the same dynamic range as the first principal component. Therefore, we scale the gray level image so that its minimum and maximum pixel values are equal to the minimum and maximum pixel values of the first principal component. Then we find the second principal component using polynomial regression Eq. 7 and reconstruct the image. As a result, we obtain the colorized image. One problem arises in colorization when the first eigenvector changes its direction. In this case, the colorized image has negative contrast. To avoid this, we use the following empirical approach. First, we measure the average Cavg , minimum Cmin and maximum Cmax for the first principal component and Iavg , Imin and Imax for the intensity image I = R + G + B. Then, we compute sign(Cavg − (Cmax + Cmin )/2) = sign(Iavg − (Imax + Imin )/2).
(10)
If Eq. 10 is not true, the pixel values of the gray level image are multiplied by −1. This provides a positive contrast of the resultant image.
3
Experiments
We conducted experiments with several images of fruits and vegetables. They were obtained with a Canon EOS 40D reflex camera in laboratory conditions (a
A Computationally Efficient Technique for Image Colorization
125
Fig. 2. Test images. The first row: Apple, Green Stem, Oranges and Tomato. The image Green Stem is rotated. The second row: Avocado & Banana, Onion, Red Pepper and Yellow Pepper.
lightbox with daylight). Although two light sources were used, the data structure in the color space was similar to one light source and well defined by the physical model. Then, the images in RAW format were converted into JPEG format. The images are Apple, Green Stem, Oranges, Tomato, Avocado & Banana, Onion, Red Pepper and Yellow Pepper. The image size (width and height) except for the Green Stem is 1280 × 853 pixels. The size of the Green Stem image is 853 × 1280. Fig.2 shows the test images.
Fig. 3. Apple and Red Pepper. The original images (first column). The images are partially covered by a semi-transparent mask. PRPCA (second column), PCA (third column) and RRPCA (fourth column) reconstruction and the error maps.
126
A. Pipirigeanu, V. Bochko, and J. Parkkinen
Fig. 4. Avocado & Banana and Orange. The original images (first column). The images are partially covered by a semi-transparent mask. PRPCA (second column), PCA (third column) and RRPCA (fourth column) reconstruction and the error maps.
In order to evaluate the performance, we compare the proposed method with standard PCA and RPCA based on RBF regression [2]. The parameters for the RBF neural network are 10 iterations and 7 hidden neurons. First, we test the algorithm for accurate reproduction of color images. For selected image regions PCA, RPCA based on polynomial regression (PRPCA) and RPCA based on RBF regression (RRPCA) are used. The color differences between the original images and the reconstructed images are measured by using Table 1. Average and maximum S-CIELAB ΔE Image Apple Avocado Banana Green Stem Onion Oranges Red Pepper Tomato Yellow Pepper
PRPCA ΔEavg ΔEmax 1.78 43.16 5.14 58.2 3.29 137.39 4.29 41.69 1.86 33.1 4.25 73.37 6.43 41.52 5.92 37.83 1.15 30.60
PCA ΔEavg ΔEmax 2.80 164 7.32 127.88 3.32 146.62 6.60 53.04 3.43 56.72 5.38 154.66 9.16 72.46 7.08 81.91 1.64 69.74
RRPCA ΔEavg ΔEmax 1.22 17.18 3.45 69.6 3.15 129.7 2.94 38.26 0.89 22.04 2.67 53.99 2.05 19.88 3.05 31.82 0.66 27.39
A Computationally Efficient Technique for Image Colorization
127
the S-CIELAB ΔE [12]. We also use an error map. For the error map, the error values greater than 100 are truncated and the remaining values are scaled in the range [0-255]. Fig.3 illustrates the reconstruction results and the error maps made by using three methods. Only parts of the color objects are used. The regions used in colorization contain shadow, color and highlight. PRPCA and RRPCA visually reproduce images very close to the original images. The error maximum for PRPCA and RRPCA is around the boundaries of highlight areas. It was found that RRPCA gives the best results. PCA has large error values in the highlight regions due to its lack of approximation of the nonlinear data structure. The visual reproduction of highlights by PCA is also not very good. Similar results are obtained for color textures. Fig.4 shows texture surfaces and their reproduction along with an error map. This is a case of multiple highlights. These images are more difficult for approximation in comparison with the objects. One can see that the error values in color and color-shadow regions are increased. PRPCA has visual color and highlight reproduction close to RRPCA. Table 1 summarizes the color differences computed for the test images. RRPCA is superior to the other methods. The error rate of the proposed method (PRPCA) is better than PCA. The Banana image does not have the highlight areas, therefore the RPCA methods have results comparable with PCA. In addition, we compare the methods by using computation time. The methods are implemented using Matlab 7.0, Intel Core 2 Duo CPU at 2 GHz and 2GB of RAM. Table 1 shows the computational time. PCA is fastest. Since PRPCA combines PCA and principal components regression, its computation time is longer than PCA computation time. But the method is still fast. RRPCA using machine learning is, however, much more computationally demanding in comparison with PCA and PRPCA. Finally, we conducted the experiments with image colorization. We use the colors of the fruits and vegetables described above. For colorization, we take the synthetic 3D object, the synthetic teapot and the synthetic texture image. Fig.5 shows the colorized images. Sometimes the highlight areas in the objects and texture are narrowed, since the scaling procedure affects the intensity values of gray level images. PCA results are not very good. The colorized images have Table 2. Computational time, s Image Apple Avocado Banana Green Stem Onion Oranges Red Pepper Tomato Yellow Pepper
PRPCA 0.285 0.116 0.081 0.049 0.088 0.391 0.206 0.255 0.063
PCA RRPCA 0.162 15.068 0.067 6.544 0.046 4.575 0.029 2.938 0.052 5.119 0.197 20.149 0.113 10.56 0.124 12.73 0.037 3.534
128
A. Pipirigeanu, V. Bochko, and J. Parkkinen
(a) The 3D object. For rows from top to (b) The teapot and texture. For rows bottom we use: Apple, Onion, Oranges, (teapot) from top to bottom we use: Oranges Red Pepper, Tomato and Yellow Pepper and Red Pepper. For rows (texture) from top to bottom we use: Apple, Onion and Tomato Fig. 5. The colorization results. For each object and texture the PCA resultant images are given in the first column, the PRPCA resultant images are given in the second column and the RRPCA resultant images are given in the third column.
poor contrast and color reproduction. The images colorized by PRPCA have good contrast and highlights. The images are sharp and clear. RRPCA also has good highlight reproduction and contrast. Colors are, however, usually more vivid with better appearance. Thus, PRPCA is a tradeoff between PCA and RRPCA. Since visual color reproduction by PRPCA is close to RRPCA and its computation time close to PCA, PRPCA is suitable for practical use.
4
Conclusions
In this paper, we considered the fast technique for image colorization. The technique using PCA and polynomial regression does not require segmentation of the image into the surface color and highlight regions. The visual colorization results are similar to the best method using RBF regression, while computation
A Computationally Efficient Technique for Image Colorization
129
time is short and close to PCA. The technique is simple for implementation and can be used for colorizing objects and textures, as well.
References 1. Abadpour, A., Kasaei, S.: An efficient PCA-based color transfer method. Journal of Visual Communication and Image Representation 18, 15–34 (2007) 2. Bochko, V., Parkkinen, J.: A spectral color analysis and colorization technique. IEEE Computer Graphics and Applications 26, 74–82 (2006) 3. Chen, T., Wang, Y., Schilings, V., Meinel, C.: Grayscale image matting and colorization. In: Proc. ACCV 2004, pp. 1164–1169 (2004) 4. Drew, M.S., Finlayson, G.D.: Realistic colorization via the structure tensor. In: Proc. ICIP, pp. 457–460 (2008) 5. Horiuchi, T., Tominaga, S.: Color image coding by colorization approach. EURASIP Journal on Image and Video Processing 8, 1–9 (2008) 6. Horiuchi, T., Kotera, H.: Colorization for monochrome image based on diffuse-only reflection model. In: Proc. AIC 2005, pp. 353–356 (2005) 7. Klinker, G.J., Shafer, S.A., Kanade, T.: A physical approach to color image understanding. International Journal of Computer Vision 4, 7–38 (1990) 8. Levin, A., Lischinski, D., Weiss, Y.: Colorization using optimization. ACM Trans. Graphics 23, 689–694 (2004) 9. Shawe-Taylor, J., Cristianini, N.: Kernel methods for pattern analysis. Cambridge University Press, Cambridge (2004) 10. Vieira, L.F.M., do Nascimento, E.R., Fernandes Jr., A., Carceroni, R.L., Vilela, D.R., de Ara´ ujo, A.A.: Fully automatic coloring of grayscale images. Image and Vision Computing 25, 50–60 (2007) 11. Welsh, T., Ashikhmin, M., Mueller, K.: Transferring color to grayscale images. Proc. ACM Siggraph 20, 277–280 (2002) 12. Zhang, X., Wandell, B.: A Spatial extension of CIELAB for digital color image reproduction. In: Proc. Soc. Information Display Symp. Technical Digest, vol. 27, pp. 731–734 (1996)
Texture Sensitive Denoising for Single Sensor Color Imaging Devices Angelo Bosco1, Sebastiano Battiato2, Arcangelo Bruna1, and Rosetta Rizzo2 2
1 STMicroelectronics, Stradale Primosole 50, 95121 Catania, Italy Università di Catania, Dipartimento di Matematica ed Informatica, Viale A. Doria 6, 95125 Catania, Italy
[email protected],
[email protected],
[email protected],
[email protected]
Abstract. This paper presents a spatial noise reduction technique designed to work on CFA (Color Filter Array) data acquired by CCD/CMOS image sensors. The overall processing preserves image details by using heuristics related to HVS (Human Visual System) and texture detection. The estimated amount of texture and HVS sensitivity are combined to regulate the filter strength. Experimental results confirm the effectiveness of the proposed technique. Keywords: Denoising, Color Filter Array, HVS, Texture Detection.
1 Introduction The image formation process through consumer imaging devices is intrinsically noisy. This is especially true using low-cost devices such as mobile-phones, PDA, etc., mainly in low-light conditions and absence of flash-gun. In terms of denoising, linear filters can be used to remove Gaussian noise (AWGN), but they also significantly blur edge structures of an image. Many sophisticated techniques have been proposed to allow edge preserving noise removal such as: [12] and [13] that perform multiresolution analysis and processing in the wavelet domain, [3] that uses anisotropic non-linear diffusion equations but work iteratively, [1] and [10] that are spatial denoising approaches. In this paper we propose a novel spatial noise reduction method that directly processes the raw CFA data, combining together HVS (Human Visual System) heuristics, texture/edges preservation techniques and sensor noise statistics, in order to obtain an effective adaptive denoising. The proposed algorithm introduces the concept of the usage of HVS properties directly on the CFA raw data from the sensor to characterize or isolate unpleasant artifacts. The complexity of the proposed technique is kept low by using only spatial information and a small fixed-size filter processing window, allowing real-time performance on low cost imaging devices (e.g., mobile phones, PDAs, …). The paper is structured as follows. In the next section some details about the CFA and HVS characteristics are briefly discussed; in Section 3 the overall details of the A. Trémeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 130–139, 2009. © Springer-Verlag Berlin Heidelberg 2009
Texture Sensitive Denoising for Single Sensor Color Imaging Devices
131
proposed method are presented. An experimental section reports the results and some comparisons with other related techniques. The final section tracks directions for future works.
2 CFA Data and HVS Properties In typical imaging devices a color filter is placed on top of the imager making each pixel sensitive to one color component only. A color reconstruction algorithm interpolates the missing information at each location and reconstructs the full RGB image. The color filter selects the red, green or blue component for each pixel; the most common arrangement is known as Bayer pattern [4]. In the Bayer pattern the number of green elements is twice the number of red and blue pixels due to the higher sensitivity of the human eye to the green light, which, in fact, has a higher weight when computing the luminance. The HVS properties are a complex phenomenon (highly nonlinear) not yet completely understood involving a lot of complex parameters. It is well known that the HVS has a different sensitivity at different spatial frequencies [15]. In areas containing mean frequencies the eye has a higher sensitivity. Furthermore, chrominance sensitivity is weaker than the luminance one. The HVS response does not entirely depend on the luminance value itself, rather, it depends on the luminance local variations with respect to the background; this effect is described by the Weber-Fechner’s law [7]. These properties of the HVS have been used as a starting point to devise a CFA filtering algorithm, that providing the best performance if executed as the first algorithm of the IGP (Image Generation Pipeline) [2]. Luminance from CFA data can be extracted as explained in [11], but for our purposes it can be roughly approximated by the green channel values before gamma correction.
3 Algorithm 3.1 Overall Filter Block Diagram A block diagram describing the overall filtering process is illustrated in Fig. 1. Each block will be separately described in detail in the following sections. The fundamental blocks of the algorithm are: • Signal Analyzer Block: computes a filter parameter incorporating the effects of human visual system response and signal intensity in the filter mask. (Section 3.2 for further details) • Texture Degree Analyzer: determines the amount of texture in the filter mask using information from the Signal Analyzer Block. (Section 3.4) • Noise Level Estimator: estimates the noise level in the filter mask taking into account the texture degree. (Section 3.5) • Similarity Thresholds Block: computes the thresholds that are used to determine the weighting coefficients for the neighborhood of the central pixel.
132
A. Bosco et al.
Fig. 1. Overall Filter Block Diagram
• Weights Computation Block: uses the thresholds computed by the Similarity Thresholds Block and assigns a weight to each neighborhood pixel, representing the degree of similarity between pixel pairs. (Section 3.6) • Filter Block: actually computes the final weighted average generating the final filtered value. (Section 3.7) 3.2 Signal Analyzer Block As noted in [5] and [8] it is possible to approximate the minimum intensity gap that is necessary for the eye to perceive a change in pixel values. This phenomenon is known as luminance masking or light adaptation. Higher gap in intensity is needed to perceive a visual difference in very dark areas, whereas for mid and high pixel intensities a small difference in value between adjacent pixels is more easily perceived by the eye [8]. It is also crucial to observe that in data from real image sensors, the constant AWGN model does not fit well the noise distribution for all pixel values. In particular, as discussed in [6], the noise level in raw data is predominantly signal-dependent and increases as the signal intensity raises; hence, the noise level is higher in very bright areas. We decided to incorporate the above considerations of luminance masking and sensor noise statistics into a single curve as shown in Fig. 2. The shape of this curve allows compensating for lower eye sensitivity and increased noise power in the proper areas of the image, allowing adaptive filter strength in relation to the pixel values. A high HVS value (HVSmax) is set for both low and high pixel values: in dark areas the human eye is less sensitive to variations of pixel intensities, whereas in bright areas noise standard deviation is higher. HVS value is set low (HVSmin) at mid pixel intensities.
Texture Sensitive Denoising for Single Sensor Color Imaging Devices
133
HVSweight
HVSmax
HVSmin
(2bitdepth -1)/2
2bitdepth -1
Pixel Value
Fig. 2. HVS curve used in the proposed approach
The HVS coefficient computed by this block will be used by the Texture Degree Analyzer that outputs a degree of texture taking also into account the above considerations (Section 3.4). As stated in Section 2, in order to make some simplifying assumptions, we use the same HVS curve for all CFA colour channels taking as input the pixel intensities directly from the sensor. 3.3 Filter Masks The proposed filter uses different filter masks for green and red/blue pixels to match the particular arrangement of pixels in the CFA array. The size of the filter mask depends on the resolution of the imager: at higher resolution a small processing window might be unable to capture significant details. For our processing purposes a 5x5 window size provided a good trade-off between hardware cost and image quality. Typical Bayer processing windows are illustrated in Fig. 3. G
G G
G
G G
G
G
B
B
B
R
R
R
G
B
B
B
R
R
R
G
B
B
B
R
R
R
G
G G
Fig. 3. Filter Masks for Bayer Pattern Data
3.4 Texture Degree Analyzer The texture analyzer block computes a reference value Td that is representative of the local texture degree. This reference value approaches 1 as the local area becomes increasingly flat and decreases as the texture degree increases (Fig. 4). The computed coefficient is used to regulate the filter strength so that high values of Td correspond to flat image areas in which the filter strength can be increased. Depending on the color of the pixel under processing, either green or red/blue, two different texture analyzers are used. The red/blue filter power is increased by slightly modifying the texture analyzer making it less sensitive to small pixel differences (Fig. 5).
134
A. Bosco et al. Td
Td
1
0
1
0
Texture Threshold
Fig. 4. Green Texture Analyzer
ThR/B
Texture Threshold
Fig. 5. Red/Blue texture analyzer
The texture analyzer block output depends on a combination of the maximum difference between the central pixel and the neighborhood Dmax and TextureThreshold, a value that is obtained by combining information from the HVS response and noise level, as described below (2). The green and red/blue texture analyzers are defined as follows: ⎧1 ⎪ D max ⎪ Td (green ) = ⎨ − +1 TextureThr eshold ⎪ ⎩⎪0
D Max = 0 0 < D Max ≤ TextureThr eshold
(1)
D Max > TextureThr eshold
⎧1 ⎪ (Dmax − Th R / B ) ⎪ Td (red / blue ) = ⎨ − +1 TextureThr eshold − Th R / B ) ( ⎪ ⎪⎩0
D max ≤ Th R / B Th R / B < D max ≤ TextureThr eshold D max > TextureThr eshold
hence: − if Td = 1 the area is assumed to be completely flat; − if 0 < Td < 1 the area contains a variable amount of texture; − if Td = 0, the area is considered to be highly textured. The texture threshold for the current pixel, belonging to Bayer channel c (c=R,G,B), is computed by adding the noise level estimation to the HVS response (2): TextureThresholdc(k)= HVSweight(k)+ NLc(k-1)
(2)
where NLc denotes the noise level estimation on the previous pixel of the same Bayer color channel c(see Section 3.4) and HVSweight (Fig. 2) can be interpreted as a jnd (just noticeable difference); hence an area is no longer flat if the Dmax value exceeds the jnd plus the local noise level NL. The green texture analyzer (Fig. 4) uses a stronger rule for detecting flat areas, whereas the red/blue texture analyzer (Fig. 5) detects more flat areas being less sensitive to small pixel differences below the ThR/B threshold.
Texture Sensitive Denoising for Single Sensor Color Imaging Devices
135
3.5 Noise Level Estimator In order to adapt the filter strength to the local characteristics of the image, a noise level estimation is required. The proposed noise estimation solution is pixel based and is implemented taking into account the previous estimation to calculate the current one. The noise estimation equation is designed so that: i) if the local area is completely flat (Td = 1) , then the noise level is set to Dmax; ii) if the local area is highly textured (Td = 0), the noise estimation is kept equal to the previous region (i.e., pixel); iii) otherwise a new value is estimated. Each color channel has its own noise characteristics hence noise levels are tracked separately for each color channel. The noise level for each channel c (c=R,G,B) is estimated according to the following formulas:
NLc (k ) = Td (k ) * Dmax (k ) + [1 − Td (k )] * NLc (k − 1)
(3)
where Td(k) represents the texture degree at the current pixel and NLc(k-1) is the previous noise level estimation, evaluated considering pixel of the same colour, already processed. These equations satisfy requirements i), ii) and iii). 3.6 Weighting Coefficients The final step of the filtering process consists in determining the weighting coefficients Wi to be assigned to the neighboring pixels of the filter mask. The absolute differences Di between the central pixel and its neighborhood must be analyzed in combination with the local information (noise level, texture degree and pixel intensities) for estimating the degree of similarity between pixel pairs (Fig. 6). W1
P1
P2
P3
P4
Pc
P5
P6
P7
P8
Fig. 6. Wi coefficients weight the similarity degree between the Pc and its neighborhood
As stated in Section 2, if the central pixel Pc belongs to a textured area, then only small pixel differences must be filtered. The lower degree of filtering in textured areas allows maintaining the local sharpness, removing only pixel differences that are not perceived by the HVS. Let: − Pc be the central pixel of the working window; − Pi, i = 0,…,7, be the neighborhood pixels; − Di = abs(Pc -Pi), i=0,…,7 the set of absolute differences between the central pixel and its neighborhood;
136
A. Bosco et al.
In order to obtain the Wi coefficients, each absolute difference Di must be compared against two thresholds Thlow and Thhigh that determine if, in relation to the local information, the i-th difference Di is: − small enough to be heavily filtered, − big enough to remain untouched, − an intermediate value to be properly filtered. To determine which of the above cases is valid for the current local area, the local texture degree is the key parameter to analyze. It is important to remember at this point that, by construction, the texture degree coefficient (Td) incorporates the concepts of dark/bright and noise level; hence, its value is crucial to determine the similarity thresholds to be used for determining the Wi coefficients. In particular, the similarity thresholds are computed according to the following rules: 1. if the local area is flat both thresholds (Thlow , Thhigh) are set to Dmax, which means that all neighborhood pixels whose difference from the central pixel is less than Dmax have maximum weight. 2. if the local area is fully textured then Thlow is set to Dmin and Thhigh is set to the average point between Dmin and Dmax, meaning that only pixels whose difference from the central pixel is very small have the maximum weight. 3. if the local area has a medium degree of texture Td (0 < Td < 1), the situation is as depicted in Fig. 7, where the similarity weight progressively decreases as the i-th difference increases. Once the similarity thresholds have been fixed, it is possible to finally determine the filter weights by comparing the Di differences against them (Fig. 7). Wi
Max
weight
Max Similarity
0
No Similarity
Mid Similarity
Th
Th
low
D
i
high
Fig. 7. Weights assignment. The i-th weight denotes the degree of similarity between the central pixel in the filter mask and the i-th pixel in the neighborhood.
3.7 Final Weighted Average Let W0,…,WN (N: number of neighborhood pixels) be the set of weights computed for the each neighboring element of the central pixel Pc. The final filtered value Pf is obtained by a weighted average as follows:
Pf =
1 N
N
∑ [W P + (1 − W )P ] i i
i =0
i
c
(4)
Texture Sensitive Denoising for Single Sensor Color Imaging Devices
137
4 Experimental Results In order to assess the visual quality of the proposed method, we have compared it with the SUSAN (Smallest Univalue Segment Assimilating Nucleus) [14] and multistage median filters [9] classical noise reduction algorithm. This choice is motivated by considering the comparable complexity of these solutions. Though more complex recent methods for denoising image data achieve very good results, they are not yet suitable for real-time implementation. The test noisy image in Fig. 8 was obtained adding noise with mean standard deviation σ=10. Fig. 9 (b),(c),(d) show filtered results respectively with SUSAN, Multistage median-1, Multistage median-3 and proposed technique of the cropped and zoomed detail of Fig. 8, showed in Fig. 9(a). To perform the test, all the input images were bayerized before processing. Fig. 10 shows how the proposed method performs well in terms of PSNR compared to the other algorithms used in the test over the 24 Standard Kodak Images
Fig. 8. Noisy image (PSNR 32.8 dB)
(a) Cropped and zoomed noisy image (PSNR 32.8.1 dB)
(b) SUSAN (PSNR 32.5 dB)
(c) Multistage median -1 filter. (PSNR 32.9 dB)
(d) Multistage median -3 filter. (PSNR 29.8 dB)
(e) Proposed method. (PSNR 33.8 dB)
Fig. 9. (a) Cropped and zoomed noisy image in Fig.8. (b) SUSAN. (c) Multistage median-1 filter. (c) Multistage median-3 filter. (e) Proposed method.
138
A. Bosco et al. PSNR results (Noise Level σ=10)
39 37 PSNR [dB]
35 33 31 29 27 25 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Kodak Image # PSNR(ref-noisy)
PSNR(ref-proposed)
PSNR(ref-SUSANfiltered)
PSNR(ref-MMEDIAN-1 filtered)
PSNR(ref-MMEDIAN3filtered)
Fig. 10. PSNR of the Standard Kodak Images test set with standard deviation
5 Conclusions and Future Works A spatial adaptive denoising algorithm has been presented; the method exploits characteristics of the human visual system and sensor noise statistics in order to achieve pleasant results in terms of perceived image quality. The noise level and texture degree are computed to adapt the filter behaviour to the local characteristics of the image. The algorithm is suitable for real time processing of images acquired in CFA format since it requires simple operations and divisions that can also be implemented via lookup tables. Future works include the extension of the processing masks along with the study and integration of other HVS characteristics.
References 1. Amer, A., Dubois, E.: Fast and reliable structure-oriented video noise estimation. IEEE Transaction on Circuits System Video Technology 15(1) (2005) 2. Battiato, S., Mancuso, M.: An Introduction to the Digital Still Camera Technology. ST Journal of System Research, Special Issue on Image Processing for Digital Still Camera 2, 2–9 (2001) 3. Barcelos, C.A.Z., Boaventura, M., Silva, E.C.: A Well-Balanced Flow Equation for Noise Removal and Edge Detection. IEEE Transactions on Image Processing 12(7), 751–763 (2003) 4. Bayer, B.E.: Color Imaging Array, US. Patent No. 3, 971, 965 (1976) 5. Chou, C.-H., Li, Y.-C.: A perceptually tuned subband image coder based on the measure of just-noticeable-distortion profile. IEEE Transactions on Circuits and Systems for Video Technology 5(6), 467–476 (1995) 6. Foi, A., Trimeche, M., Katkovnik, V., Egiazarian, K.: Practical Poissonian-Gaussian Noise Modeling and Fitting for Single-Image Raw-Data. IEEE Transactions on Image Processing 17(10), 1737–1754 (2008)
Texture Sensitive Denoising for Single Sensor Color Imaging Devices
139
7. Gonzales, R., Woods, R.: Digital Image Processing, 3rd edn. Prentice Hall, Englewood Cliffs (2007) 8. Hontsch, I., Karam, L.J.: Locally adaptive perceptual image coding. IEEE Transactions on Image Processing 9(9), 1472–1483 (2000) 9. Kalevo, O., Rantanen, H.: Noise Reduction Techniques for Bayer-Matrix Images. In: Proceedings of SPIE Electronic Imaging, Sensors, Cameras, and Applications for Digital Photography III 2002, San Jose, CA, USA, vol. 4669 (2002) 10. Kim, Y.-H., Lee, J.: Image feature and noise detection based on statistical hypothesis tests and their applications in noise reduction. IEEE Transactions on Consumer Electronics 51(4), 1367–1378 (2005) 11. Lian, N., Chang, L., Tan, Y.-P.: Improved color filter array demosaicking by accurate luminance estimation. In: IEEE International Conference on Image Processing, vol. 1, pp. 41–44 (2005) 12. Portilla, J., Strela, V., Wainwright, M.J., Simoncelli, E.P.: Image Denoising Using Scale Mixtures of Gaussians in the Wavelet Domain. IEEE Transactions on Image Processing 12(11), 1338–1351 (2003) 13. Scharcanski, J., Jung, C.R., Clarke, R.T.: Adaptive Image Denoising Using Scale and Space Consistency. IEEE Transactions on Image Processing 11(9), 1092–1101 (2002) 14. Smith, S.M., Brady, J.M.: SUSAN - A New Approach to Low Level Image Processing. International Journal of Computer Vision 23(1), 45–78 (1997) 15. Wandell, B.: Foundations of Vision, Sinauer Associates (1995)
Color Reproduction Using Riemann Normal Coordinates Satoshi Ohshima1 , Rika Mochizuki1 , Jinhui Chao1 , and Reiner Lenz2 1
Graduate School of Science and Engineering Chuo University 1-13-27, Kasuga, Bunkyo-ku, Tokyo Japan
[email protected] 2 Department Science and Technology Link¨ oping University SE-60174 Norrk¨ oping, Sweden
[email protected], http://www.itn.liu.se/~reile
Abstract. In this paper we use tools from Riemann geometry to construct color reproduction methods that take into account the varying color perception properties of observers. We summarize the different steps in the processing: the estimation of the discrimination thresholds, the estimation of the Riemann metric, the construction of the metricpreserving or color-difference-preserving mapping and the usage of Semantic Differentiation (SD) techniques in the evaluation. We tested the method by measuring the discrimination data for 45 observers. We illustrate the geometric properties of the color spaces obtained and show the result of the metric-preserving maps between the color space of colornormal observers and a color-weak observer. We then apply the obtained maps to produce color corrected reproductions of two natural images.
1
Introduction
All color reproduction methods in use today are rather rigid in the sense that the goal of the reproduction is decided in advance and often based on some kind of standard. A typical example is a print where the tolerance of the final product is specified in terms of some ΔE. This does not take into account that color perception varies from observer to observer and even changes depending on many other factors. This approach is sufficient for most cases since human perception is quite flexible and reasonably similar between different observers. An extreme case where it obviously fails is color blindness where the observer is unable to see the difference between different pairs of colors. A color reproduction strategy that aims at reproduction for individual observers can only be realized in a feedback loop where the reproduction process takes into account the color perception properties of the intended observer. In computer vision a similar strategy is known as ”active vision”. An active vision system interacts with its environment by changing camera parameters or other imaging modalities to collect as much information as possible while analyzing the scene. A. Tr´ emeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 140–149, 2009. c Springer-Verlag Berlin Heidelberg 2009
Color Reproduction Using Riemann Normal Coordinates
141
In an adaptive reproduction system we can also try to interact with the user in order to create a reproduction that is based on the perception of the individual user. Instead of minimization of reproduction error in the working color spaces of output devices, a perception based color reproduction requires a colordifference-preserving map between the perceptional color spaces of observers. The fundamental problem with this approach is of course the difficulty to obtain a quantitative description of the perception properties of an observer. A standard method to characterize color perception properties is the measurement of discrimination thresholds that describe the local sensitivities of the observer. In this paper we describe some of our experiments where we characterize the color perception of color-normal and color-weak observers by discrimination thresholds. We then use Riemann normal coordinates [3] to build a descriptor of the color perception properties of an observer from the measured discrimination thresholds. In the third processing stage the Riemann normal coordinates are used to establish color-difference-preserving mappings between the color spaces of the observers. An evaluation of the effect of such mappings is very difficult since it requires the measurement of the subjective impressions created by the color reproductions. We use semantic differentiation (SD) techniques to evaluate the performance of the proposed reproduction strategy. This SD evaluation of the reproduction shows that our method can create very similar impressions to color-weak observer as that of color-normals.
2
Riemann Color Spaces
The standard tool to characterize color perception are color differences. The most accessible and reliable measurements are small or local color differences. The most popular such measurements are the so-called just noticeable-difference (jnd) thresholds or the discrimination thresholds. They are obtained by measuring the minimal color-difference between a reference color and a test color that the observer can detect. These threshold measurements provide at every reference color xo a measure of local distances in color space as follows: use the reference color xo as the origin and a color vector x = x + Δ close to xo as a test color. Then the discrimination ellipses/ellipsoids are points on the unit circles/spheres centered at the reference color xo . (Here the distance will be defined below). They are given by the color vectors x = xo + Δ where the increments Δ satisfy the equation: (Δ )T G(xo )Δ = 1
(1)
The positive definite matrix G(xo ) is uniquely determined by the ellipses/ellipsoids and vice versa. With such a matrix G(xo ) defined at every xo , the local distance around xo can be computed as x − xo 2 = Δ2 = ΔT G(xo )Δ
(2)
A space with a local distance defined by a matrix G(x) at every point x is called a Riemann space with Riemann metric G(x). The exact definition and technical details can be found in every book on Riemann geometry, for example in [2].
142
S. Ohshima et al.
We estimate the matrices G(x) as follows: First we select a color vector x0 as a reference color. This vector is used as the origin of the ellipsoid. The observer varies a test color in a certain direction until he/she can see a difference to the reference color. Then this experiment is repeated but now the direction of the variation is modified. The result of N such experiments is a sequence of N color vectors xn , n = 1 · · · N that are by definition located on the surface of the ellipsoid defining the just-noticeable differences. In our experiments we measured the discrimination threshold ellipsoids using a 10 degree visual field of size 14cm × 14cm seen from a distance of 80cm. The discrimination threshold data were measured for 45 college students (38 male, 7 female, 1 color-weak) in CIEXYZ coordinates. We choose ten uniformly distributed points within the gamut of the monitor as reference colors. We used 14 different directions for each reference color. The distribution of the 14 directions is not uniform but denser around the direction of the confusion lines and the long axes of threshold ellipsoids. The experimental setup is shown in Figure 1.
Fig. 1. Experimental setup
After the measurement of the jnd-vectors we have to compute the matrix G(xo ) as the solution of the N equations 1 = ΔTn G(xo )Δn , n = 1, · · · , N with xn = xo + Δn . a b c Writing Δ = (x, y, z) and G = b d e every measurement (represented by c e f
the vector Δ) gives one equation in the matrix entries a to f : ax2 + by 2 + cz 2 + dxy + exz + f yz = 1
(3)
The matrix entries a to f are obtained by least squares fitting (for the details see [5]). The characteristics of color-normal perception are represented by a matrix whose entries are the averages of the entries in the matrices of 44 color-normal
Color Reproduction Using Riemann Normal Coordinates
(a) Color-normals
143
(b) Color-weak
Fig. 2. Discrimination threshold ellipsoids
observers. The average threshold ellipsoid of the color-normals and the thresholds of the color-weak observer are shown in Fig. 2a and Fig. 2b. The threshold ellipses of color-normal and the color weak are obtained as intersection between the above ellipsoids with the chromaticity plane, they are shown in Fig. 3a and Fig. 3b. An even more important quantity in color perception is the difference between two colors with a larger color difference. They are however more subjective and therefore harder to deal with. In a Riemann color space the geometric distance between two color points x1 and x2 can be defined as the length of the shortest curve connecting the two points. Such shortest curves are known as geodesics. When the matrices G(x) are known for all points x then it is known that the geodesics are solutions to differential equations and they can be computed using numerical differential equation solvers. Using this approach in practice is however complicated by the fact that in this application the measurements of the thresholds is very time-consuming with the result that there are only very few samples available. Interpolation is therefore crucial in the estimation. In our implementation the interpolation uses the Akima algorithm [1].
x
(a) Color-normals
(b) Color-weak
Fig. 3. Discrimination threshold ellipses
144
S. Ohshima et al.
0.5
0.6 0.55
0.45 0.5 0.4
0.45
0.35 y
y
0.4 0.35
0.3
0.3 0.25
0.25
0.2 0.2 0.15 0.1
0.2
0.25
0.3
0.35 x
0.4
0.45
0.5
0.55
0.2
0.25
0.3
0.35
0.4
0.45
0.5
x
(a) Color-normals
(b) Color-weak
Fig. 4. Riemann normal coordinates
On a Riemann color space we can use the geodesics and construct a coordinate system as follows: We first select a point of origin xo . Then we connect the points of the same distance from xo on the neighboring geodesics. This gives a generalized polar coordinate system on the Riemann space similar to the coordinate system on the sphere with the north (or south) pole and the longitude/latitude coordinates. These coordinates are known as Riemann or normal coordinates. In the following two figures the measured ellipsoids where first projected onto the chromaticity plane. Then the corresponding ellipses were computed and from them Riemann normal coordinates are obtained. In Figure 4a the solution computed from the color normal observers and in Figure 4b) the corresponding Riemann normal coordinates for the color weak observer are shown.
3
Color Mapping Using Discrimination Threshold Matching
In the previous sections, we described the construction of a Riemann color space from discrimination thresholds. As a result we obtain a geometric description of the color perception of an observer either locally in the form of a Riemann space with its metric or globally in the form of Riemann normal coordinates. Given two Riemann color spaces we now construct a mapping between these color spaces that preserves the metric or color-difference properties of these spaces. We explain the basic idea by constructing an isometry between the color spaces of a color-weak observer and the color space of color-normal observers. We call such a map a “color weak” map (for a detailed description see [5]).
Color Reproduction Using Riemann Normal Coordinates
(a) Color-weak onto color-normal
145
(b) Color-normal onto color-weak
Fig. 5. Images of geodesic coordinates
If the color space of the color weak observer is Cw and the color space of the color normal observers is Cn then the color-weak map is a function w : Cw −→ Cn ,
x −→ y = w(x)
This mapping maps a color stimulus x perceived by color-normals to y = w(x) perceived by a color-weak observer (for details see [6] and [4]). For the metric Gn (x) of the color-normal to be mapped isometrically to corresponding metric Gw (y) of the color-weak, the Jacobian matrix D w of w must fulfill the threshold matching condition: Gw (y) = (D w )T Gn (x)D w
(4)
Applying w to the input image and showing it to color-normal observers will provide them the same experience as the color-weak observer. Applying the inverse map w−1 and presenting it to the color weak observer should give him the same impression as the original image to the color normal observer. In order to evaluate the precision of the color-difference-preserving maps we calculated the image of Riemann normal coordinates of color spaces of colorweak and color-normals under the simulation map and the correction map. The results are plotted in the following figures. The precision of the Riemann normal coordinates is determined by the numerical accuracy of interpolation algorithm and the ODE solver to draw geodesics. The upper limit of errors in the maps reading out from the Riemann normal coordinates is the size of the grid cell. Here we used the ODE solver based on the Runge-Kutta algorithm in Matlab. In Fig. 5a, the blue coordinates show the image of geodesic coordinates of the color-weak under the simulation map, which is observed to be well fitted onto the yellow geodesic coordinates of the color-normal. Thus the color-differencepreservation of the color-weak or simulation map is confirmed. In Fig. 5b, the blue coordinates are the geodesics of the color-normal under the correction map
146
S. Ohshima et al.
or the inverse of the color-weak map. Also here we have a good fitting which means that the color-differences are preserved by the correction map.
4
Color Correction of Natural Images
The proposed correction algorithm is applied to natural images shown in Fig. 6b and Fig. 7b. The simulated color-weak images are in Fig. 6a and Fig. 7a, the corrected images for color-weak are in Fig. 6c and Fig. 7c, respectively. The intended application of the correction method is the production of an image that gives two different observers with different color vision properties the same impression. Since these effects are not measurable we cannot evaluate the quality of the methods quantitatively. Instead we use the standard Semantic Differential (SD) method to evaluate the impression of color-normals and color-weak. We prepared a questionnaire and let one color-normal and the colorweak observer judge ten polar pairs of most relevant adjectives with respect to the original images. A questionnaire with a seven level SD score is used in the evaluation. Figures 8 and 9 show the comparisons of the SD scores between color-normals and the color-weak, before and after correction. The solid line shows the score of the color-normal and the dotted line that of the color-weak. In the SD evaluations shown on the left both color-normals and the color-weak see the original image before correction. The evaluations of the correction are shown on the right. Here the color-normal observer sees the original while the color-weak sees the corrected image. The closeness of the SD curves after correction show that the color-normals and color-weak achieved the similar visual impression on the original image and the corrected image respectively.
(a) Simulated weak image
color-
(b) Original Fig. 6. Rock Example
(c) Corrected image for the color-weak
Color Reproduction Using Riemann Normal Coordinates
(a) Simulated color-weak image
(b) Original
(c) Corrected image for the color-weak Fig. 7. Fall Example
147
148
S. Ohshima et al.
(a) SD evaluation of 6b before correction
(b) SD evaluation of 6b after correction
Fig. 8. SD evaluation of Fig. 6b before and after correction
(a) SD evaluation of 7b before correction
(b) SD evaluation of 7b after correction
Fig. 9. SD evaluation of Fig. 7b before and after correction
5
Conclusion
We described how the theory and tools of Riemann geometry can be used to construct maps that compensate the differences in the color perception of different observers. We demonstrated its application in a color correction for color weak vision and evaluated the resulting effects with the help of an SD-procedure.
Acknowledgements This research is partially supported by the Institute of Science and Engineering, Chuo University. The financial support of the Swedish Science Foundation is gratefully acknowledged.
Color Reproduction Using Riemann Normal Coordinates
149
References 1. Akima, H.: A Method of Bivariate Interpolation and Smooth Surface Fitting for Irregulary Distributed Data Points. ACM Transactions on Mathematical Software 4(2), 148–159 (1978) 2. Boothby, W.M.: An Introduction to Differentiable Manifolds and Riemannian Geometry. Academic Press, New York (1955) 3. Chao, J., Lenz, R., Matsumoto, D., Nakamura, T.: Riemann geometry for color characterization and mapping. In: Proc. CGIV, pp. 277–282. IS&T, Springfield (2008) 4. Chao, J., Osugi, I., Suzuki, M.: On definitions and construction of uniform color space. In: Proc. CGIV, pp. 55–60. IS&T, Springfield (2004) 5. Mochizuki, R., Nakamura, T., Chao, J., Lenz, R.: Correction of color-weakness by matching of discrimination thresholds. In: Proc. CGIV, pp. 208–213. IS&T, Springfield (2008) 6. Suzuki, M., Chao, J.H.: On construction of uniform color spaces. IEICE Trans. Fundamentals Elec. Comm. Comp. Sci. E85A(9), 2097–2106 (2002)
Classification of Paper Images to Predict Substrate Parameters Prior to Print Matthias Scheller Lichtenauer, Safer Mourad, Peter Zolliker, and Klaus Simon Swiss Federal Laboratories for Materials Testing and Research Media Technology Lab Ueberlandstrasse 129, 8600 Duebendorf, Switzerland {matthias.scheller,klaus.simon,peter.zolliker}@empa.ch,
[email protected] http://empamedia.ethz.ch/
Abstract. An accurate characterization of the substrate is a prerequisite of color management in print. The use of standard ICC profiles in prepress leaves it to the printer to match the fixed substrate characteristics contained in these profiles. This triggers the interest in methods to predict, if a given ink, press and paper combination complies with a given characterization. We present an approach to compare physical and optical characteristics of papers in order to achieve such a prediction of compliance by classification methods. For economical and ecological reasons it is preferable to test paper without printing it. We therefore propose non-destructive methods. Keywords: image classification, color management, dotgain, tone value increase, ISO 12647, non-destructive testing.
1
Introduction
Print by Numbers. With digital workflows in graphical industry, digital data has become the dominant representation of color and customers of the printing industry increasingly expect repeatable and predictable results. In order to meet these expectations, the printing process has in recent years undergone standardization efforts. Paper as a printing substrate however remains a source of variability. Empirically characterizing a particular ink, press and paper combination in offset print requires production of a set of plates, printing patches at different levels of ink provision, measuring with a spectrophotometer — and in the worst case iteration of the entire process. This means “producing a profile”. The costs have to be amortized by print runs with the substrate characterized. To characterize the substrate for a single job can more than double the fixed costs in offset print. But fixed costs are the major drawback of offset relative to its competition subsumed under the term of digital printing technologies. Thus, offset has a high interest to reuse characterization of paper. A. Tr´ emeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 150–159, 2009. c Springer-Verlag Berlin Heidelberg 2009
Classification of Paper Images to Predict Substrate Parameters
151
Fig. 1. Microscopic images of papers used in classification. The resolution of the original images was about 0.4 µm per pixel. The area shown is about 0.4 × 0.4 mm.
Find a Matching Profile. In digital publishing, color is specified in a device independent space. To map these color specifications to a print, substrate dependent information is interpolated from data in a look up table (ICC profile). An ICC profile describes color transformations for a single, fixed substrate and a fixed set of colorants. The users of profiles in prepress are interested to minimize profile handling expense by using a few generic profiles. So, instead of being used as a tool for matching data to a specific press room situation, ICC profiles became normative references printers have to meet. Therefore, the printing industry must find procedures to check, whether a given lot of a paper brand is within an acceptable range of variability relative to data given by an ICC profile. This is a classical categorization problem we try to approach with image classification methods. Organisation of this Paper. In this paper, we will introduce a few paper properties known to impact color in print, assess their impact and the actions a printer can take. We then discuss our understanding of radiative transfer in paper substrate developed during the last years, in order to explain the choice of features used in classification. And finally, we will present our attempts to classify offset papers to then compare their performance to ISO 12647-2 standard.
2
Prior Knowledge
When passing from analog reproduction by photographic techniques to a digital workflow and digital plate production, the limitations of the process by a substrate are only met in the last stage [1], but still have to be accounted for earlier. When matching the digital specification of a color to a physical representation, the following properties of paper have to be taken into account: Coating: Some papers consist of a body of mixed fibers and fillers only, while other papers are additionally surface coated in variable thickness and flatness. This influences paper gloss and roughness, the diffusion of ink components in the paper and the drying process of wet ink.
152
M. Scheller Lichtenauer et al.
White Point: Paper color usually becomes the white point of the print gamut. Dimensions: In most mass printing technologies, halftoning is used to achieve a color impression. Areas covered with ink dots are combined with areas left blank at a microscopic scale above 10 µm. This limit corresponds to 2400 dots per inch of a digital plate setter using nonperiodic screen. A paper fibre has a lateral extension of about 10 to 50 µm [2]. The extension of graphical paper or cardboard perpendicular to the printed surface is 50 to 500 µm, and coating has to level out thickness variations of 5 to 15 µm per side. The thickness of an eventual coating may vary in these dimensions. Spatial dimensions of a paper surface have an impact on minimal dot size. Penetration: Some of the properties even depend on the paper/press/ink combination, for instance the colorimetric prediction of a print result by colorimetric measurements of wet prints in offset, due to varying penetration in the paper body over time. Trapping: Impact printing processes bring ink on a plate, which is then pressed against the substrate and doing so transfers the ink to it. If another ink has already been printed at a particular place, there may be less ink accepted than on the bare paper next to it, a phenomenon known as trapping. The extent of this effect depends not only on material properties, but also on the sequence and timing of printing the different colors. Trapping also influences gray balance. Dotgain: Dotgain is also known as tone value increase. Halftoned colors usually appear darker than predicted by the Neugebauer model [3], where tone is proportional to intended area coverage. Some of this accounts to lateral spreading of ink (mechanical or physical dotgain), but the major cause of this effect is lateral scattering of light in the substrate (optical dotgain, see Fig. 2). In order to account for dotgain and trapping in impact printing, the separation and amplitude modulating halftoning algorithm can compensate for it, letting for instance a 34% tone area appear on the plate where a 50% tone level is intended on paper. Plate correction, separation and print sequence as well as ink provision are the major factors a printer can influence, when the substrate is given. Therefore, dotgain, trapping and maximal ink provision are the parameters we have to estimate. Other parameters can be measured. The Approach of ISO 12647. ISO provides normative references for color locations of papers and inks so as to justify the use of a few generic profiles in prepress and proofing. This has led to the standards ISO 12647 and ISO 2846. These norms do not provide models predicting achievable print results for given substrate parameters. There are first principal models of turbid media based on radiative transfer [5]. Transfer theory models give some insight in the optical part of dotgain and solid ink color. We will discuss such models and our experience with applying them prior to position our classification approach relative to the one of ISO.
Classification of Paper Images to Predict Substrate Parameters
y
6
m v
}
- - Δx = c
153
z
-x
6 xi
- Δx
pβ (x|xi )
-x
substrate layer
Fig. 2. In a simple model, optical dotgain can be seen as a convolution of a probability density function of remittance pβ (x) and S(λ, x), the spectral power distribution of incident light at position x in direction −z. Hence, light entering the substrate at xi contributes to the exiting light in a small vicinity Δx of xi . In the left half, we schematically show influence area of two ink dots.
Paper as a Turbid Medium. In a very thick substrate of infinite lateral extension, all photons will sooner or later get absorbed. Absorbed light might either be extinct, meaning that absorbed energy is completely transferred to heat, light may be scattered, meaning that a photon of same wavelength as the absorbed one is emitted in a different direction, or a fluorescent process takes place, the re-emitted photon is shifted in both, color and direction. We are interested in the local distribution of remission pβ (x|xi ) of a layer — also known as Point Spread Function — and the total fraction of incident light being remitted (Fig. 2). If we assume that pβ (x|xi ) is similar across the whole printable surface, this principally explains sensitivity of dotgain to dot pattern and resolution. Would pβ (x, y|xi , yi ) be known, the remittance β(λ) of a halftone pattern could be approximated as integrated convolution pβ ◦Si (λ) dxdy with the spectrum of the incident light Si (λ). Incident and reemitted power spectra themselves can be convoluted with functions representing the halftone patterns. Should we also account for backside reflectance, multiple layers or fluorescence, the model will become far more complex. The extinction pα can be expressed as a product of probability density of length of way in a layer and constant extinction probability per unit length. If the incident beam is collimated, this will result in different pα dxdydz as with a diffuse incident light, since a photon’s expected length of way in a substrate layer will differ [6]. It can be derived from simulations [7] that multiple scattering in a paper substrate is very likely to happen. These results coincide with experience in printing industry that coated papers exhibit lower dotgain compared to uncoated papers of the same mass per area. We attribute this to higher scattering in coating, resulting in a sharper peak of pβ . In solid patches on uncoated papers, some paper spots will shine through. This effect influences the solid ink color
154
M. Scheller Lichtenauer et al.
and hence limits reproducible colors. These relationships of dotgain and solid ink saturation to resolution and coating are the principal insight we could gain. We now have to discuss the limitations we encountered when synthesizing these ideas to a radiative transfer theory model. Transfer Theory Models. A family of such models dates back to 1931 [4], when Kubelka and Munk presented a model to determine the efficiency of paint in covering the background. They used probabilistic assumptions about the directional distribution of light to predict the effect of adding more paint on opacity. It is not simple to apply a Kubelka-Munk-type model for offset halftone print, since coated papers can not be approximated as uniform bodies. Additionally, the penetration depth and UV-absorptance of inks vary, incident light is not perfectly diffuse and fluorescence may not be neglected. Furthermore, the model has to be extended in order to deal with halftone patterns. Nevertheless, there has been an attempt to do so in our group. The parameters of this model had to be determined by fitting. Mourad demonstrated its performance predicting single-color wedge dotgain from white and solid patch measurements [8]. Unfortunately, the numerical fitting approach showed tendency to overfitting in absence of calibration measurements of individual solid ink patches. Our model based on radiative transfer was inappropriate in a prediction prior to print.
3
A Non-destructive Classification Method for Paper
Most properties of a paper can be tested with non-destructive methods [9], but dotgain and trapping have yet to be determined by printing sheets of a paper with particular colorants on a particular press. This is expensive for a paper batch in quality control and inapplicable for a facsimile reprint. These are the reasons for the interest in non-destructive methods to predict dotgain, trapping, ink provision needed and color of solid ink. Transfer theory models don’t predict these parameters. We therefore turned to develop a learning system, taking all prior knowledge into account and extrapolating it to a new, unknown sheet. Considering the relation between coating and light scattering discussed above, we postulated that thickness of the coating would be a distinguishing feature for dotgain. In this section, we will present the experimental approach and the feature extraction methods. We printed a sample of commercially available offset papers under comparable conditions and empirically tested the correlations between substrate parameters measured, image analysis features extracted and colorimetric results achieved. Printing Experiment on Paper Classification. When standardizing papers, the result should be similar when printed under similar conditions. We experimentally tested the assumption, that paper grades would show comparable results when printed with the same, uncorrected plate in one run with about the same amount of ink. Coherent results under such a condition would
Classification of Paper Images to Predict Substrate Parameters
155
1.2 APCO 1 0.8 β(λ)
CIELab b*
0
−5
0.6 0.4 0.2
−10 0
5 CIELab a*
10
0
400
500
λ
600
700
Fig. 3. Relative spectral power distribution of remission. We measured the remission spectra of 32 commercially available offset paper grades around 150 g/m2 with GretagMacbeth Spectrolino (45◦ /0◦ , D50, 2◦ ) on white backing. Each curve shows the average of 20 measured sheets of one brand. The dotted line shows the spectrum of a yellowish, unbrightened APCO paper. The circles denote the location of paper white of the ISO 12647 standard (L* values of all papers are in the norm around 95).
justify the use of the same ICC profile for two papers. We intentionally used the same amount of ink for all papers, adjusted on a coated paper grade the printer was experienced with. Since uncoated offset papers would demand far more ink, colors on them were not as saturated as achievable. The good news is, that uncoated grades were coherently off the norm in solid ink tones. White points of most uncoated papers are off the norm by more than 5 ΔEab (Fig. 3). To summarize, some of the commercially available paper grades we tested are not covered by the norm ISO 12647-2, but are comparable with regard to paper white, dotgain and ink acceptance. These observations suggest a need to generically classify papers with regard to white point, ink acceptance and dotgain characteristics. Analysis of Paper Image. Since we assume a relationship of dotgain and ink acceptance with coating, we based our classification approach on transmission image analysis combined with physical and optical dimensions of the bare substrate. We made transmission images of all papers in the printing experiment with Leica laboratory microscope equipment, (Fig. 1). Human Performance. We manually classified the images with regard to visibility of the paper fibres. The number of groups was a priori fixed to three. This allowed humans to achieve a categorization that correlates with dotgain (Fig. 4). Automatisation. In order to achieve such a classification algorithmically, we investigated the correlation with the following features:
156
M. Scheller Lichtenauer et al.
Categorisation according to ISO 12647 24 Dotgain
22 20 18 16 14 Human Categorisation of images 24 Dotgain
22 20 18 16 14 125
130
135
140
145
150 2 g/m
155
160
165
170
175
Fig. 4. Classification of paper with regard to dotgain. We tested 32 paper brands around 150 g/m2 . Each mark stands for the mean dotgain value of a gray patch printed on 20 sheets of a particular paper brand. The symbol of the mark denotes the category this brand was assigned to. For better readability, we plotted values against mean grammage.
– – – – – – – –
specific volume (vol/g) thickness orthogonal to printed surface (µm) grammage (g/m2 ) CIE Y histogram analysis, see Fig. 6 coefficients after discrete cosinus transformation of the image, see Fig. 7 distribution of segment size after image segmentation waterlevel transformations combined with fractal dimension analysis waterlevel transformations combined with binary pattern statistics
The physical dimensions of the paper were included assuming that the BeerLambert law holds -at least approximatively- for paper. Physical dimensions are also good features on their own, due to differences in specific weight between fibres and coating (Fig. 5).
4
Observations and Results
The goal of our experimental feasability study was to predict similarity of a new, unprinted paper with existing papers, for which parameters after print
Classification of Paper Images to Predict Substrate Parameters
Experiment 1
157
Experiment 2
Dotgain at K 40%
25 20 15 10 5 0 0
1 cm3/g
2
50
100
150
200
g/m2
Fig. 5. Relation of dotgain to specific volume and grammage. These plots show data from two printing experiments, on the left with offset print on papers from Fig. 3 as a cross section trough paper grades at equal level of ink provision. On the right side, two paper grades printed with a laser printer to explore dotgain along the grammage axis from 90 to 170 g/m2 .
0.1
0.5 0.45 sample var / sample mean
relative pixel frequency
0.08
0.06
0.04
0.02
0.4 0.35 0.3 0.25 0.2 0.15
0 0
20
40 60 CIE Y
80
100
0.1 120
140 160 grammage
180
Fig. 6. Histogram analysis. The left figure shows histogram percentiles of 32 transmission images (Fig. 1) transformed from sRGB into a CIE Y grayscale image (horizontally 5%, 50%, 95% percentiles, vertically at the location of peak frequency). On the right, the ratio of sample mean to sample variance against grammage, where the symbol stands for classification according to ISO 12647-2.
were known. The similarity was expressed in the dimensions of paper white and solid black color location as well as dotgain of a 40% single color patch printed with an elliptic dot at 60 lines per cm. The categorisation of transmission
M. Scheller Lichtenauer et al.
weight of coefficients
−3
6
x 10
Spectral interpolation
Relation to Dotgain dotgain of K 50%
158
4 2 0 0
50
100 radius
150
24 22 20 18 16 14 0.05
0.1
0.15 0.2 sum of weights
0.25
Fig. 7. Spectral interpolation. We transformed 32 transmission images (Fig. 1) to CIE Y grayscale and took 32 random imax × jmax = 256 × 256 px samples out of a 1024 × 1024 px image. We interpolated with discrete cosinus transformation using Matlab dct2. The absolute values of coefficients c ∈ {cij | i2 + j 2 < 12 imax } were summed as weights. This sum of weights was normalized with c0 (sample mean) and plotted against dotgain of a grey patch (right).
images by human experts coincides with ISO 12647 in classifying papers with regard to coating. A finer differentiation with regard to dotgain seems achievable (Fig. 4). Classification by grammage, specific volume, histogram and spectral interpolation alone allowed a classification with regard to dotgain of a single color wedge within 4% tone value (Fig. 5, 6, 7). The b* value of paper white does not correlate with coating, dotgain or solid black color location. The initial differences in b* as shown in Fig. 3 do not completely vanish along Cyan and Magenta ramps. Most uncoated paper grades we tested do not comply with the ISO norm in b* value, but show coherent results with regard to dotgain and density. Statistical analysis showed correlations > 0.8 of the presented features with dotgain and density, with the exception of grammage, since we arbitrarily included three paper grades with higher respectively lower grammage.
5
Conclusions
We presented an attempt to predict similarity of paper with regard to dotgain and solid tone characteristics by analysis of images and data acquired with nondestructive methods. These data are measurable prior to print with equipment used in the printing industry. Altough methods presented are non-destructive, they do require prints on a training set of paper in order to find a prediction for a new paper. Fortunately enough, training sets is what printing companies produce when controlling quality of their print jobs. While classification with regard to dotgain of primary color wedges is achievable with the methods presented, prediction of similarity with regard to solid ink color should yet be improved. This will be subject of further study.
Classification of Paper Images to Predict Substrate Parameters
159
Acknowledgements. This work was partly financed by the Swiss Commission of Technology and Innovation (CTI) and the Swiss Centre of Competence for Media and Printing Technology (Ugra) in project 8804.1 PFES-ES.
References 1. Hunt, R.: The Reproduction of Colour, 5th edn., pp. 684–689. Fountain Press, Kingston-upon-Thames (1995) 2. Downey, A., Hengemihle, F.: The TAPPI Standard Paper Materials Collection of the Library of Congress, Library of Congress Preservation and Conservation Microscopy Laboratory (2006) 3. Neugebauer, H.: Die theoretischen Grundlagen des Mehrfarbendruckes. Z. Wiss. Photogr. 36, 73–89 (1937) 4. Kubelka, P., Munk, F.: Ein Beitrag zur Optik der Farbanstriche. In: Zeitschrift f¨ ur technische Physik, Leipzig, pp. 593–601 (1931) 5. Chandrasekhar, S.: Radiative Transfer. Dover Publications Inc., New York (1960) 6. Yang, L., Miklavcic, S.: Revised Kubelka Munk theory III. A general theory of light propagation in scattering and absorptive media. JOSA 22(9), 1866–1872 (2005) 7. Jenny, P., Mourad, S., Stamm, T., V¨ oge, M., Simon, K.: Computing Light Statistics in Heterogeneous Media Based on a Mass Weighted Probability Density Function (PDF) method. JOSA 24(8), 2206–2219 (2007) 8. Mourad, S.: Improved Calibration of Optical Characteristics of paper by an Adapted Paper-MTF Model. Journal of Imaging Science and Technology 51, 283–291 (2007) ˇ 9. Moˇzina, M., Cernic, M., Demˇsar, A.: Non-destructive methods for chemical, optical, colorimetric and typographic characterisation of a reprint. Journal of Cultural Heritage 8, 339–349 (2007)
A Colorimetric Study of Spatial Uniformity in Projection Displays Jean-Baptiste Thomas1,2 and Arne Magnus Bakke1 1
2
Gjøvik University College, The Norwegian Color Research Laboratory Universit´e de Bourgogne, Laboratoire d’Electronique Informatique et Image
Abstract. In this paper we investigate and study the color spatial uniformity of projectors. A common assumption is to consider that only the luminance is varying along the spatial dimension. We show that the chromaticity plays a significant role in the spatial color shift, and should not be disregarded, depending on the application. We base our conclusions on the measurements obtained from three projectors. Two methods are used to analyze the data, a conventional approach, and a new one which considers 3D gamut differences. The results show that the color gamut difference between two spatial coordinates within the same display can be larger than the difference observed between two projectors.
1
Introduction
Color spatial uniformity for projection displays has been studied[1] [2]. However, it is often considered that only the luminance is of importance, and in most applications only this aspect is corrected for. The chromaticity shift is often considered as negligible. Moreover, the analysis of the color shift along the spatial dimension are mainly supported by either incomplete or qualitative results. This work presents a quantitative analysis of projector spatial nonuniformity. We based our study on two aspects. First a conventional 2D approach is presented, which considers the analysis of a projected full intensity patch. We then use a global comparison of the gamuts at different spatial locations to evaluate the color non-uniformity. We introduce the context and the reasons which have motivated this work in the next section. We then define our experiments, and present the results we obtained. We finally discuss the influence of these results for different applications before we give our conclusions.
2
Background and Motivation
A projection system displaying an image on a screen shows some color spatial non-uniformities. These non-uniformities can come from the system properties, A. Tr´ emeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 160–169, 2009. c Springer-Verlag Berlin Heidelberg 2009
A Colorimetric Study of Spatial Uniformity in Projection Displays
161
such as lens alignment, but also simply from the position of the projection system compared with the screen. Since CRT displays started being analyzed, it has been widely considered that only the luminance was changing along the spatial dimension [3]. This is still the assumption made by many when modelling newer displays, and they maintain that the chromaticity shift is negligible compared with the change of luminance. In this paper, we demonstrate that the chromaticity shift can not be disregarded, especially for some of todays projection system applications, such as tiled projection systems, and for color research and experiments linked with the human visual system. Many works have been done in order to characterize the color of projection displays. However, a lot of assumptions used in this case are borrowed from the characterization of CRT monitors. These assumptions have been shown to give a reasonable approximation of the real behavior of this kind of displays [4] [5], but they show their limit with projection displays. Some of these assumptions are already known to be incorrect, such as the non gamma shape of the response curve for LCD systems. Despite of the studies or tentative works to study or evaluate the color shift along the spatial dimension [1][2][6], it is still common to consider that the color varies only in luminance along the spatial dimension of a display. Then it is common in many proposed correction algorithms to only consider a luminance attenuation map, such as in [7] for CRT monitors, and in [8] for projectors or multi-projector systems corrections. In her study about multi-projector systems, Majumder et al. assessed that the spatial chromaticity shift is negligible compared to the luminance shift. However, looking at the figures presented in [9], the gamut shows a severe shift, which at first seems to be comparable to the difference observed from one display to a completely different one. While Majumder et al. looked at the projector gamuts in chromaticity diagrams, Bakke et al. [10] recently proposed a method for computing the difference between two gamuts in a 3-D color space. They suggested that a method using discretized representations of the gamuts can be used to compute the relative gamut mismatch between two gamut boundaries. First, a binary voxel structure is created for each gamut. The value of each position is determined using the following method. If the position is within the gamut, the value is set to one, otherwise it is set to zero. Determining the differences between two gamuts can then be simplified to counting the positions where the values of the two gamut representations are different, and multiplying this count with the volume of the cube represented by a single discretized position. The resulting number can be divided by the volume of the reference gamut, giving the relative gamut mismatch.
3
Experimental Setup
We performed our investigation on three displays, two LCD projectors from the same model and manufacturer (Sony VPL-AW 15), and one DLP projector
162
J.-B. Thomas and A.M. Bakke
(Projection Design Action One). They are named LCD1, LCD2 and DLP in the following. All the displays were used with the default settings. In order to have accurate measurements, we used the CS-1000 spectroradiometer from Minolta. The measurements were done in a dark surrounding, so that no light is involved except that from the display. A warming up time of at least one hour and fifteen minutes has been used before any measurement to reach a correct temporal stability. The geometry of the all system was basically of the same type that the one used in [2]. In our first experiment, we used the same kind of approach as is described in the IEC draft [6] and in the work of Kwak and MacDonald [2]. We measured only a full intensity white image (RGB=[255,255,255]) at 5 × 5 locations regularly spread along the display, having positioned the measurement devices in approximately the same position as described in [2]. In addition to this approach, we are interested in looking at the differences in the gamut volume of the projector. We chose to limit the measurement process to 9 spatial positions among the set of 25, because of the time needed to complete the measurements. Bakke [10] showed that the gamut boundaries descriptor algorithm suggested by Balasubramanian and Dalal [11] performs well on most data sets. We have therefore used the modified convex hull with a preprocessing step using 0.2 γ to compute the gamuts. In order to perform the gamut evaluation, we used the ICC3D framework [12]. The evaluation is performed in the CIELAB color space. We encountered a challenging issue in using this space. In the past studies we know, since the luminance was supposed to be at its highest value in the center of the display and since the observer was supposed to look at the center first, the measurement of a white patch at this position was used as the reference white. This follows the recommendation of the IEC draft [6]. However, considering the position of the display or the alignment of the lens, the highest luminance point can be severely shifted from the center. That can happen for instance when the projector is made to be used in an office and to project the image on a wall for presentation, such as the DLP projector we tested. We decided then to use the brightest point of the white image displayed as reference white. This choice has some advantages in our case. If we consider the geometry of the system and the lens alignment, choosing the reference white at the highest point is more in accordance with the physical properties of the device. Since we base our experiment on colorimetry, and we do not to attempt to take more human factors into consideration, we have chosen to use this as our reference white. In the following, we call global reference white the measurement of the brightest white, and local reference white , the white measured at the different locations.
4
Results
In this section we present and discuss the results we obtained, first with the conventional evaluation, secondly with the 3D gamut comparison approach.
A Colorimetric Study of Spatial Uniformity in Projection Displays
163
60
50
100
200
40 300
30 400
20 500
10
600
700 200
400
600
800
1000
0
1200
0
10
20
30
40
50
60
40
50
60
40
50
60
(a) Lightness, chroma and hue shift for display LCD1 60
50
100
200
40 300
30 400
20 500
10
600
700 200
400
600
800
1000
0
1200
0
10
20
30
(b) Lightness, chroma and hue shift for display LCD2
60 100
50
200
300
40
400
30 500
20 600
10
700 200
80
82
400
84
600
86
88
800
90
92
1000
94
96
1200
98
100
0
0
10
20
30
(c) Lightness, chroma and hue shift for display DLP Fig. 1. Visualization of the color shift throughout the display. On the left, we show a visualization of the lightness shift. The maximum lightness is 100 (white), the minimum (black) is around 79. On the right, hue and chroma shift are plotted relative to their spatial position. The position of the circles is the reference, the crosses indicate the measured value. The angle of the segment represent the hue shift, and the norm the chroma shift in the (a∗ ,b∗ ) plane.
164
J.-B. Thomas and A.M. Bakke
Table 1. Relative shift in lightness and chroma at 25 locations for the three tested displays Shift in lightness
Shift in Chroma LCD1
∗
ΔL 1 2 3 4 5
1 -8.92 -7.66 -6.42 -9.29 -11.27
2 -4.85 -3.72 -4.09 -4.77 -7.02
3 -1.61 -0.37 0.00 -1.29 -3.78
4 -1.60 -0.36 -0.58 -1.91 -4.64
5 ΔC ∗ 1 2 3 4 -5.55 1 5.09 2.46 2.29 1.99 -5.55 2 4.68 1.78 1.36 1.81 -3.74 3 3.53 0.87 0.00 1.65 -2.81 4 2.37 0.40 1.39 1.80 -5.84 5 3.16 3.41 4.73 3.77
5 2.49 1.97 1.56 2.31 1.91
LCD2 ∗
ΔL 1 2 3 4 5
1 -6.49 -6.63 -6.90 -5.71 -7.59
2 -3.43 -2.93 -2.85 -4.68 -6.75
3 -1.14 0.00 -0.11 -1.94 -4.82
4 -1.53 -0.90 -2.00 -3.79 -6.09
5 ΔC ∗ 1 2 3 4 -6.09 1 4.13 3.09 1.26 1.63 -5.96 2 3.17 2.68 0.00 0.92 -4.78 3 1.67 0.24 1.97 0.66 -5.89 4 1.60 2.32 4.44 2.77 -9.66 5 3.18 6.03 5.25 4.18
5 2.03 1.38 1.35 0.78 2.76
DLP ∗
ΔL 1 2 3 4 5
4.1
1 -20.88 -20.90 -19.39 -18.06 -17.77
2 -16.72 -14.79 -11.46 -8.61 -7.62
3 -13.84 -11.49 -6.63 -1.68 0.00
4 -14.40 -11.83 -9.29 -4.87 -1.21
5 ΔC ∗ 1 2 3 4 -18.14 1 5.97 5.37 5.47 5.47 -16.80 2 5.68 4.85 4.65 4.44 -15.60 3 4.94 3.62 2.81 3.56 -12.63 4 3.53 1.70 0.92 2.41 -11.58 5 3.01 0.31 0.00 2.18
5 5.92 5.40 4.81 4.09 3.85
Conventional Evaluation
By displaying the white patch and measuring the projected color at each position, we get an overview of the global behavior of the display. In Figure 1, we can see the lightness shift along the spatial dimension in the left part of the figure. This visualization is based on the measurements at 25 locations. The white surround comes from the fact that we have no information on this part of the displayed area, while we can interpolate the data inside this rectangle. We can see that the brightness point is not necessarily in the center of the screen. The color shift is illustrated in the right part of this figure. We can see the same effect as the one described in [2], a shift in the color around the center of the lens displayed on the screen (i.e., the brightest point). The LCDs projectors show a shift from green/cyan to blue/red as a general behavior from the top left corner to the bottom right. The DLP shows a shift to the blue from the top to the bottom. The causes of this shift can be found in the literature [13].
A Colorimetric Study of Spatial Uniformity in Projection Displays
165
Table 2. Relative shift in CIELAB unit at 25 locations for the three tested displays Shift in CIELAB unit LCD1 ∗ ΔEab
1 2 3 4 5
1 9.26 7.89 6.61 9.57 11.64
2 5.24 4.14 4.41 5.10 7.97
3 2.80 1.41 0.00 1.89 6.05
4 2.93 1.81 1.05 1.95 5.75
5 7.53 7.26 5.14 3.68 6.64
4 3.45 2.82 2.02 4.44 8.57
5 7.36 6.75 5.07 6.11 10.17
4 15.36 12.78 9.97 5.16 1.25
5 19.09 17.73 16.36 13.11 11.97
LCD2 ∗ ΔEab
1 2 3 4 5
1 6.80 6.77 7.03 5.76 8.07
2 3.80 3.07 2.92 5.44 7.94
3 1.70 0.00 1.97 4.84 7.13
DLP ∗ ΔEab
1 2 3 4 5
1 21.71 21.58 19.97 18.52 18.18
2 17.60 15.45 12.00 8.94 7.92
3 14.88 12.40 7.20 1.92 0.00
The results of the quantitative analysis are presented in Tables 1 and 2. The first shows the ΔL∗ and ΔC ∗ relative to the brightest point. The second shows ∗ the ΔEab . ∗ The largest ΔEab observed are 11.64, 10.17 and 21.71 for LCD1, LCD2 and DLP respectively. The differences are definitely over the noticeable difference from a colorimetric point of view. For the LCDs, we noticed a maximum lightness shift of 11.27 units in the bottom left corner for LCD1, and of 9.66 units in the bottom right corner for LCD2. The corresponding chroma shifts are respectively of 3.16 and 2.76. The maximum chroma shifts for these displays are 5.09 in the upper left corner for LCD1 and 6.03 at the bottom left for LCD2, with associated lightness shifts of 8.92 and 6.75. The DLP projector shows a maximum lightness shift of 20.90 units in the upper left part of the displayed area, and 5.68 units in chroma at the same position. The maximum chroma shift is of 5.97 units in the upper left corner for 20.88 units in lightness. In some locations we can clearly see that the lightness variation is smaller than or equivalent to the chromaticity shift, such as below the center for LCD2, which shows a ΔL∗ of 1.94 and a ΔC ∗ of 4.44 compared to the reference location.
166
J.-B. Thomas and A.M. Bakke
Table 3. Relative gamut mismatch for each position compared with the gamut of the position with the highest luminance. The gamuts are calculated using the global white point as well as the local white point for each of the 9 selected locations. Gamut mismatch, global white point Gamut mismatch, local white point LCD1 % 1 3 5
1 27.23 23.92 32.66
3 4.90 0.00 9.48
5 17.08 16.15 13.50
% 1 3 5
1 9.57 7.49 7.90
3 3.30 0.00 2.07
5 5.72 5.53 4.09
% 1 3 5
1 9.42 6.00 5.98
3 2.48 0.00 1.98
5 4.46 2.40 2.48
% 1 3 5
1 8.51 7.96 6.62
3 6.86 3.92 0.00
5 6.91 6.38 4.87
LCD2 % 1 3 5
1 24.84 20.18 29.75
3 5.83 0.00 11.01
5 19.75 18.79 20.82 DLP
% 1 3 5
1 52.36 47.73 43.22
3 38.02 18.29 0.00
5 41.06 36.28 26.93
When we consider the hue shift which is shown in Figure 1 on the right, the chromaticity difference from a spatial coordinate to another can easily be larger than the lightness shift, and the hypothesis which considers the color shift as negligible can be disputed. 4.2
3D Gamut Evaluation
The reference gamut for each projector was constructed from the measurement data of the position with the highest luminance value. Table 3 contains the percentage of gamut mismatch for each position compared with this reference. As we can see, the gamut at some locations can be as much as 52% smaller than the reference, which is illustrated in Figures 2a and 2c. The luminance shift is responsible for a large part of this difference, but compensating for the luminance shift by using the local white point for calculating CIELAB values still leaves a significant maximum gamut mismatch of 8.51%, 9.42% and 9.57% for the three projectors. Figures 2b and 2d show the gamuts computed using the local white point. This mismatch is comparable in relative volume to the error introduced when using a strictly convex hull to represent the gamut of an arbitrarily chosen device,
A Colorimetric Study of Spatial Uniformity in Projection Displays
(a) LCD1, global white point
(b) LCD1, local white point
(c) DLP, global white point
(d) DLP, local white point
167
Fig. 2. The gamut boundaries for two of the projectors at the position with the highest luminance (wireframe) compared with the gamut of the top left corner (solid and wireframe). CIELAB measurement values were computed relative to the global white point for (a) and (c), while (b) and (d) utilizes the white point of each location.
and is greater than many inter-device gamut differences. In our experiment, the gamut mismatch between the two LCD projectors (at the reference position) is 2.75%, giving an intra-device difference 3.43 times larger than the inter-device difference. The DLP shows large differences in gamut depending on the spatial location, similar to what we showed in our analysis of lightness. Compared with the two LCDs, a larger part of the differences can be explained by the luminance shift. The remaining gamut mismatch volume mainly consists of the volume that is contained within the reference and is not a part of the gamut of the other spatial locations, which is illustrated in Figure 3. This means that there are effects in addition to the luminance shift which contribute to the reduction of the gamuts.
168
J.-B. Thomas and A.M. Bakke
(a) LCD1
(b) DLP
Fig. 3. While using the white point of each location reduces the difference between the gamuts by compensating for the luminance shift, we still see some difference between the gamuts
5
Discussion
Based on our analysis of these results, there appears to be sufficient evidences to claim that the chromaticity shift has to be taken into account in some cases. Some applications might not be affected, while some might suffer seriously from this fact. It appears important for us to compensate for this problem in at least two situations. While performing psychophysical experiments for color science purpose with a projector, and while tiling projectors together to build a multiprojector system. Related to the choice we made in our experiment by using the brightest white point as a reference, we found that the gamut of the position with the largest luminance results in the largest estimated gamut volume. It is then a logical choice to use this as the basis for the reference gamut. Considering the case of a multi-projector systems, since the chroma is shifting in two opposite hue directions from the center of the lens, the area around the overlapping edges will show two really different colors. Note that even if the computed chrominance shift is major, since we observed some ΔC ∗ of about 6 from a position to another and greater differences can be found between extreme positions, if we consider the spatial content of an image, it is not certain that the chrominance shift will break the perceived uniformity. Similarly, the reduction in gamut volume of up to 52% when using the global white point does not appear to be indicative of the perceived color capability of the projectors. However, using the local white point seems to underestimate the real difference. This is indorsed by the conventional approach. When we look at the full intensity white patch, the perceived difference does not seem to be as large as the measured one.
A Colorimetric Study of Spatial Uniformity in Projection Displays
169
In order to make a model which fits our perceived color appearance, we need to consider more psychovisual features, such as the color adaptation at the local and at the global level, cognition and physiology.
6
Conclusion
We have shown that the measured chromaticity shift along a projector is important, and that considering only the luminance as non-uniform can be a critical mistake in some applications. However, considering the image content, it is reasonable to think that the perceived non-uniformity would not be broken. Further experiments could be done in this direction to find what can be considered as perceived spatial uniformity. As a straightforward continuation of this work, we think it could be of great interest to utilize spatial gamut algorithm using a spatially varying gamut in multi-projector systems.
References 1. Seime, L., Hardeberg, J.Y.: Colorimetric characterization of LCD and DLP projection displays. Journal of the Society for Information Display 11(2), 349–358 (2003) 2. Kwak, Y., MacDonald, L.: Characterisation of a desktop LCD projector. Displays 21(5), 179–194 (2000) 3. Brainard, D.H., Pelli, D., Robson, T.: Display characterization, Encyclopedia of Imaging Science and Technology. Wiley, New-York (2002) 4. Cowan, W., Rowell, N.: On the gun independency and phosphor constancy of color video monitor. Color Research & Application 11, S34–S38 (1986) 5. Berns, R.S., Gorzynski, M.E., Motta, R.J.: CRT colorimetry. part II: Metrology. Color Research & Application 18(5), 315–325 (1993) 6. IEC:61966-6: Color measurement and management in multimedia systems and equipment, part 3: Equipment used for digital image projection, committee Draft (August 1998) 7. Brainard, D.H.: Calibration of a computer-controlled color monitor. Color Research & Application 14, 23–34 (1989) 8. Majumder, A., Stevens, R.: Lam: Luminance attenuation map for photometric uniformity in projection based displays. In: Proceedings of ACM Virtual Reality and Software Technology, pp. 147–154 (2002) 9. Majumder, A., Stevens, R.: Color nonuniformity in projection-based displays: Analysis and solutions. IEEE Transactions on Visualization and Computer Graphics 10(2), 177–188 (2004) 10. Bakke, A.M., Hardeberg, J.Y., Farup, I.: Evaluation of gamut boundary descriptors, 50–55 (2006) 11. Balasubramanian, R., Dalal, E.: A method for quantifying the color gamut of an output device. In: Color Imaging: Device-Independent Color, Color Hard Copy, and Graphic Arts II, January 1997, vol. 3018. SPIE, San Jose (1997) 12. Farup, I., Hardeberg, J.Y., Bakke, A.M., le Kopperud, S., Rindal, A.: Visualization and interactive manipulation of color gamuts, 250–255 (2002) 13. Matthew, M., Brennesholtz, S., Stupp, E.H.: Projection Displays, 2nd edn. John Wiley & Sons, Ltd., Chichester (2008)
Color Stereo Matching Cost Applied to CFA Images Hachem Halawana, Ludovic Macaire, and Fran¸cois Cabestaing LAGIS, USTL, Bt. P2, Cit Scientifique, 59655 Villeneuve dAscq. France
[email protected], {ludovic.macaire,francois.cabestaing}@univ-lille1.fr http://lagis-vi.univ-lille1.fr/
Abstract. Most color stereovision setups include single-sensor cameras which provide Color Filter Array (CFA) images. In those, a single color component is sampled at each pixel rather than the three required ones (R,G,B). We show that standard demosaicing techniques, used to interpolate missing components, are not well adapted when the resulting color pixels are matched for estimating image disparities. In order to avoid this problem while exploiting color information, we propose a new matching cost designed for dense stereovision based on pairs of CFA images. Keywords: Color stereovision, CFA image, Demosaicing.
1
Introduction
Dense stereo correspondence algorithms are based on measures of the similarity between image locations in a pair of stereo images. Typically, a matching cost is computed at each pixel of the left image for all the shifts in a predefined range, i.e. for a limited set of candidate pixels in the right image. Then, the candidate pixel minimizing the cost is retained and its position yields the disparity. Matching costs assume that homologous pixels have almost the same component values, but they cope with limited radiometric changes and/or with noise. Common window-based matching costs include the sum of absolute or squared differences (SAD/SSD), normalized cross-correlation (NCC), and census transform [6]. Chambon et al. have compared widely used stereo matching costs applied to gray level and color images [2]. They have shown that taking into account color information generally improves the performance of matching costs [8]. Color images can be acquired by two types of cameras: those including three sensors associated with beam splitters and color filters providing the so-called full-color images, and those including a single image sensor. Many recent digital cameras include a single-chip CCD (Charge Coupled Device) or CMOS (Complementary Metal Oxide Semiconductor) sensor, to increase image size while reducing device cost. The surface of such a sensor is covered with an array of small spectrally selective filters, arranged in an alternating pattern, so that each photo-sensitive element samples only one of the three color components Red (R), A. Tr´ emeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 170–179, 2009. c Springer-Verlag Berlin Heidelberg 2009
Color Stereo Matching Cost Applied to CFA Images
171
Green (G) or Blue (B). These single-sensor cameras actually provide a CFA image, where each pixel is characterized by a single color component. Figure 1 shows the Bayer CFA which is the most widely used one. To estimate the color vector (R G B)T at each pixel, one has to determine the levels of the two missing components in figure 1. This process is commonly referred to as CFA demosaicing, and yields a color demosaiced image where each pixel is characterized by an estimated color vector [1] [4] [7] .
odd line
? G11 ? R12 ? ? ? G13 ? R14 ? ? ? G15 ? R16 ? ?
even line
? ? B21 ? G22 ? ? ? B23 ? G24 ? ? ? B25 ? G26 ?
odd line
? G31 ? R32 ? ? ? G33 ? R34 ? ? ? G35 ? R36 ? ?
even line
? ? B41 ? G42 ? ? ? B43 ? G44 ? ? ? B45 ? G46 ?
Fig. 1. Bayer Color Filter Array image (? corresponds to the missing color components)
Since the demosaicing methods intend to produce “perceptually satisfying” demosaiced images, they attempt to reduce the presence of color artifacts, such as false colors or zipper effects, by filtering the images [10]. So, useful color texture information may be erased in the color demosaiced images. However, to match homologous pixels, window-based stereo matching costs need as much local texture information as possible. Thus, the quality of stereo matching on color demosaiced image pairs may suffer either from color artifacts or from the removal of color texture caused by demosaicing schemes. In order to avoid this problem while exploiting color, we propose a new matching cost designed for stereovision based on CFA images. In section 2, we briefly introduce dense color stereovision. In the third section of this paper, we present the problems inherent in CFA stereovision. In the fourth section, we detail our proposed matching cost function which is specifically designed to compare levels of pixels in CFA images. Experimental results on synthetic and real color stereo images are provided in the last section, in order to show the effectiveness of our proposed matching cost applied to pairs of CFA images.
2
Dense Color Stereovision
Stereovision schemes aim at computing a three-dimensional representation of a scene observed by two cameras. Stereo correspondence of homologous pixels, i.e. pixels in the left and right images onto which the same physical point of the scene is projected, allows for 3D reconstruction. One of the key points of stereovision is to find these homologous pixels through stereo matching [2]. Sparse
172
H. Halawana, L. Macaire, and F. Cabestaing
stereovision matching techniques match only the pixels marked on salient image features, such as lines or corners. Their performance depends on the quality of the primitive detection stage [3]. On the other hand, dense stereovision matching techniques search the right homologous pixel of every left pixel. When the geometry of the stereovision setup is precisely adjusted as Bumblebee system does (available at http://www.ptgrey.com/products/bumblebee2/ index.asp), epipolar lines correspond to horizontal lines in the images and homologous pixels have the same vertical coordinate. Let us consider a pixel in a left image, called left pixel and denoted PL with spatial coordinates (xL yL )T . The spatial coordinates of its right homologous pixel PR in the line at the same vertical position of the right image are (xR yL )T (see figure 2). The disparity d, estimated at the left pixel PL , is expressed as: d(PL ) = xL − xR .
(1)
P
d(PL )
PL
Epipolar line
xR Left Image Plane
xL
PR
xR Right Image Plane
Fig. 2. Disparity between two homologous pixels
The objective of the dense stereovision scheme is to estimate the disparity at each left pixel in order to produce the disparity map from which it is possible to reconstruct the 3D scene. For this purpose, it measures local similarity between the levels of the neighbors of the considered left pixel and the levels of the neighbors of each candidate right pixel thanks to correlation scores. The sum of the squared differences (SSD) between colors of neighboring pixels is one of the most widely used matching cost functions. The SSD score between the left pixel PL with spatial coordinates (xL yL )T and a candidate pixel in the right image, with the s-shifted spatial coordinates (xL − s, yL )T , is expressed as:
Color Stereo Matching Cost Applied to CFA Images
SSD(xL , yL , s) =
173
w w 1 (2w + 1)2 i=−w j=−w
(2) 2
C(PL (xL + i, yL + j)) − C(PR (xL + i − s, yL + j)) , where C(P ) is the color vector (R G B)T of a pixel P , s is the spatial shift along the horizontal epipolar line, and w the half-width of a (2w + 1) × (2w + 1) correlation window. SSD scores computed for different right candidates, i.e. for different shifts s, are then compared. With respect to the winner take all (WTA) method, the candidate pixel yielding the lowest SSD score is matched to the considered left L ) is given by: pixel and the estimated disparity d(P L ) = arg min(SSD(xL , yL , s)) . d(P s
(3)
In order to show the limits reached by applying SSD-based matching to a pair of color demosaiced images, we propose to consider the benchmark Tsukuba color stereo images [9], whose ground truth disparity map is available (figure 3(a) shows the left image of the Tsukuba pair). For comparing SSD-based performance on full and demosaiced color images, we compute artificial left and right CFA images by removing two color components of each pixel according to the Bayer CFA (see figure 1). Then, the
(a)
(b)
(c)
(d)
Fig. 3. Tsukuba left full color image (a), left demosaiced color image (b), zoom on full color image (c), and zoom on demosaiced color image (d)
174
H. Halawana, L. Macaire, and F. Cabestaing
88
87
Percentage
86
85
84
83
82
81
Full color images Demosaiced color images
80 2
3
4
5
6
7
8
9
10
Window half length w
Fig. 4. Percentage of pixels correctly matched by SSD applied to Tsukuba color images of figure 3
two missing color components are estimated by Hamilton’s method [5] to produce demosaiced images. In those, each pixel P is characterized by an estimated ˜ ). Hamilton’s method has been sethree-dimensional color vector denoted C(P lected since it reaches a good compromise between demoisaicing quality and processing time [10]. The original left color image (fig. 3(a)) and the demosaiced one (fig. 3(b)) look very similar. However, zooming on textured areas (fig. 3(c) and fig. 3(d)) shows that these two images are locally very different: here, false colors have been appeared in the demosaiced image. We match all the pixels by computing color SSD scores on these two pairs of L ) and the ground color stereo images. By comparing the estimated disparity d(P truth disparity d(PL ), we can estimate the percentage of correctly matched pixels, i.e pixels for which the difference between the estimated and the ground truth disparities is lower than or equal to one pixel. Figure 4 shows this percentage which is obtained by SSD applied to the full and demosaiced color image pairs ,respectively, with respect to different correlation window half-widths w. It arises that, whatever the window width, the percentage of correctly matched pixels with demosaiced images is lower than with full color images. We notice that the difference of correctly matched pixel percentages ranges between 5% and 2% although the demosaiced color image of Tsukuba seems to be visually identical to the full color one. When w ranges between 2 and 5, the assumption about constant disparity inside the correlation window is verified. The difference between the rates obtained with full and demosaiced images is mainly due to the error of missing color components estimation. That explains why the percentage of correctly matched pixels increases with respect to w. However, when w is higher than 5,
Color Stereo Matching Cost Applied to CFA Images
175
the assumption about constant disparity is no longer verified and the percentage of correctly matched pixels decreases. This is the reason why the difference between the rates obtained with full and demosaiced images decreases when w is higher than 5. These experimental results demonstrate that the demosaicing step degrades the quality of stereo matching. That leads us to propose a SSD score specifically designed for CFA images.
3
CFA Stereovision
From a pair of stereo CFA images acquired by two single-sensor cameras, we aim at calculating the disparity map. We have found no algorithm available in the literature that computes the disparity map by processing directly two CFA images. We propose to adapt the matching cost function designed for color images (eq. 3) in order to take into consideration the specific properties of CFA images. The main problem with CFA stereovision is that the available color components of homologous pixels in the left and the right images may be different. For example, let us examine figure 5 that shows a situation in which a physical space point P is projected onto a green pixel PL in the left CFA image and onto a red pixel PR in the right CFA image. A green (resp. red) pixel in a CFA image is characterized by only the green (resp. red) color component. Therefore, one cannot assume that the green level of the left pixel is equal to the red level of its homologous pixel in the right CFA image. This example shows that the SSD score cannot be computed directly from CFA values, since the assumption that the CFA levels of homologous pixels are similar is not met for odd disparities. More precisely, the integer disparity value for each left pixel can be:
P
Left image plane
Epipolar line
Right image plane
PL
PR
xL
xR
Fig. 5. Problem of matching CFA images
176
H. Halawana, L. Macaire, and F. Cabestaing
– Even: the right homologous pixel is characterized by the same available color component in the right CFA image. The assumption about similarity of levels is met. – Odd: the right homologous pixel is characterized by another available color component in the right CFA image. The assumption about similarity of levels is not met. Whatever the parity of the actual disparity d(PL ), the adapted SSD score must reach a minimum when s is equal to the actual disparity d so that the estimated L ) is equal to the actual one d(PL ). disparity d(P
4
Partial Demosaicing for CFA Stereovision
We assume that the matching errors of standard SSD applied to demosaiced color images are mainly caused by the error of estimation of the two missing color components at each pixel. We propose to reduce the matching error by estimating only one missing color component at each pixel. The Bayer pattern is designed so that the pixels in one line of the CFA image are characterized by one among two possible color components. Figure 1 shows that pixels of odd (resp. even) lines are characterized by red or green levels (resp. blue or green levels). So, the modified SSD score can consider only two color components, the available one and an estimated one, rather than three. For matching purposes, we propose to estimate only the missing color component at each pixel (called hereafter second component) that exists in the same line as shown in figure 6. We estimate only the missing red or green level of each pixel located on an odd line, and only the missing blue or green level of each pixel located on an even line. Therefore, each pixel of an odd (resp. even) line is characterized only by its red (resp. blue) and green levels in the so-called partially demosaiced image. Each pixel in the partially demosaiced image is characterized by a two-dimensional partial color vector denoted C˜P (P ). The single missing color component is also estimated by Hamilton’s approach [5].
˜ 14 R ˜ 15 G15 R16 G ˜ 16 ˜ 11 G11 R12 G ˜ 13 G13 R14 G ˜ 12 R R Blue component not estimated in the odd line ˜ 21 B21 G22 B ˜ 25 B25 G26 B ˜26 ˜ 23 B23 G24 B ˜24 G ˜22 G G
Red component not estimated in the even line
Fig. 6. Partially demosaiced image
Color Stereo Matching Cost Applied to CFA Images
177
The SSD score described by eq. (3) is modified so that it can be applied to partially demosaiced images: SSDP (xL , yL , s) =
w w 1 (4) (2w + 1)2 i=−w j=−w 2 ˜ CP (PL (xL + i, yL + j)) − C˜P (PR (xL + i − s, yL + j)) .
Since the pixels of horizontal lines with the same parity in the left and right partially demosaiced images are characterized by the same two color components, we can reasonably assume that the partial color vectors of two homologous pixels are similar. Since our partial SSDP score compares the partial color vectors of left and right pixels located on the same horizontal lines, we assume that SSDP reaches a minimum when the shift s is equal to the actual disparity.
5
Experimental Results
In order to compare the quality of pixel matching, we first apply the partial SSDP score on the Tsukuba partially demosaiced images. Figure 7 shows the rates of correctly matched pixels obtained by analyzing the original full color images, the demosaiced images and the partially demosaiced images. We remark that our partial SSD score outperforms the standard SSD applied to demosaiced color images. Obviously, our method does not reach the matching quality obtained on full color images even if the difference between these two rates decreases when the size of the correlation window increases. Furthermore, the processing time needed for partial demosaicing and for computing the SSDP score is lower
88
87
Percentage
86
85
84
83
82
Full color images Demosaiced color images Partial Demosaiced color images
81
80 2
3
4
5
6
7
8
9
10
Window half length w
Fig. 7. Percentage of correctly matched pixels by SSD applied to Tsukuba color images(full color and demosaiced) and SSDP applied to partially demosaiced images
178
H. Halawana, L. Macaire, and F. Cabestaing
99
98
97
Percentage
96
95
94
93
92
91
Full color images Demosaiced color images Partial Demosaiced color images
90
89 2
3
4
5
6
7
8
9
10
Window half length w
(a)
(b)
Fig. 8. Percentage of exactly matched pixels by SSD applied to mur color images (full color and demosaiced) and SSDP applied to partial demosaiced images
than that required by total demosaicing and SSD computation since only one color component is estimated for each pixel and since the partial SSDP takes into consideration two color components instead of three. We have explained in section 3 that the main problem of CFA stereovision is caused by odd disparities. In order to examine the behavior of our approach in this case, we use the photo-realistic image ‘murs’ designed by Bocquillon and available at http://www.irit.fr/~Benoit.Bocquillon/MYCVR/ download.php. All the pixel locations of the left image are shifted by an odd value (nine pixels in our case) to produce the right image. The actual disparity between all pairs of homologous pixels of these two images is therefore equal to nine. Then, we measure the percentage of exactly matched pixels, i.e. pixels whose difference between estimated and ground truth disparity is zero, for the full color, demosaiced and partially demosaiced images respectively. Figure 8 shows that the exactly matched pixel rate obtained by considering the full color images is close to 100%. We also notice that the difference between exactly matched percentages obtained by analyzing demosaiced and partially demosaiced images ranges from 0% to 4%. These results demonstrate that the partial demosaicing improves the matching process, and is robust against the odd disparity problem.
6
Conclusion
In this paper, we have outlined that the demosaicing step can decrease the quality of pixel matching by considering images acquired by single-sensor color cameras. We have proposed a modified SSD score, specifically designed to match pixels of stereo CFA images. We have experimentally shown that using partial demosaicing instead of total demosaicing improves the disparity estimation results. Moreover, the proposed method is faster than the classical one.
Color Stereo Matching Cost Applied to CFA Images
179
Future works will show the reason why the partial demosaicing improves the quality of matching by analyzing not only the stereo matching results but also the color estimation errors. The second color component of partially demosaiced images is estimated thanks to Hamilton’s method which uses a two-dimensional window for color estimation. We will try in future work to simplify this step by using a simple one-dimensional interpolation and study its influence on the quality of disparity estimation. It would be also interesting to study the performance of our matching applied only to the luminance information extracted from CFA images.
References 1. Battiato, S., Guarnera, M., Messina, G., Tomaselli, V.: Recent patents on color demosaicing. Recent Patents on Computer Science 1(3), 194–207 (2008) 2. Chambon, S., Crouzil, A.: Color stereo matching using correlation measures. In: SEE (ed.) Proceedings of the First International Conference on Complex Systems Intelligence and Modern Technological Applications, Cherbourg, France, September 2004, pp. 520–525 (2004) 3. Gouet, V., Montesinos, P., Pel´e, D.: Stereo matching of color images using differential invariants. In: Proceedings of the 5th International Conference on Image Processing, ICIP 1998, Chicago, IL, USA, October 1998, vol. 2, pp. 152–156 (1998) 4. Gunturk, B.K., Glotzbach, J., Altunbasak, Y., Schafer, R.W., Mersereau, R.M.: Demosaicking: Color filter array interpolation. IEEE Signal Processing Magazine 22(1), 44–54 (2005) 5. Hamilton, J.F., Adams, J.E.: Adaptive color plan interpolation in single sensor color electronic camera. US patent 5, 629, 734, to Eastman Kodak Co., Patent and Trademark Office, Washington, DC (May 1997) 6. Hirschm¨ uller, H., Scharstein, D.: Evaluation of cost functions for stereo matching. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2007), Minneapolis, June 2007, pp. 1–8 (2007) 7. Lukac, R.: Single-Sensor Imaging: Methods and Applications for Digital Cameras. CRC Press, Boca Raton (2008) 8. Pinhasov, E., Shimkin, N., Zeevi, Y.: Optimal usage of color for disparity estimation in stereo vision. In: 13th European Signal Processing Conference (EUSIPCO 2005), Antalya, Turkey (September 2005) 9. Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision 47, 7–42 (2002) 10. Yang, Y., Losson, O., Duvieubourg, L.: Quality evaluation of color demosaicing according to image resolution. In: Proceedings of the 3rd International Conference on Signal-Image Technology & Internet-based Systems (SITIS 2007), Shanghai Jiaotong University, China, December 2007, pp. 640–646 (2007)
JBIG for Printer Pipelines: A Compression Test Daniele Rav`ı1, Tony Meccio1 , Giuseppe Messina1,2 , and Mirko Guarnera2 2
1 Universit` a degli studi di Catania, D.M.I., Catania, Italy STMicroelectronics, Advanced System Technologies, Catania, Italy
Abstract. The proposed paper describes a compression test analysis of JBIG standard algorithm. The aim of such work is to proof the effectiveness of this standard for images acquired through scanners and processed into a printer pipeline. The main issue of printer pipelines is the necessity to use a memory buffer to store scanned images for multiple prints. This work demonstrates that for very large scales the buffer can be fixed using medium compression case, using multiple scans in case of uncommon random patterns.
1
Introduction
In the latest years there has been a growing demand for independent multifunctional printing and scanning devices. During standalone printing processes, there are cases where it is necessary to keep the whole image in memory, for example when it is requested to make multiple hard copies of a document acquired by the integrated scanner. While the first copy can be done ”on the fly” during the scanning phase, the constraint to have identical multiple copies implies the storing of the whole scanned document. This can lead to excessive memory requirements, which is the reason why a compression method is normally used, especially for low-cost products, where it is important to embed as little physical memory as possible. Moreover, the compression must be lossless [1,2] to have identical copies of the same input media. The compression phase is normally placed just after the halftoning (see Fig.1) , because it already uses data with reduced bit planes [3]. A widely used algorithm to achieve this purpose is JBIG. 1.1
JBIG Compression Standard
JBIG is short for the ‘Joint Bi-level Image experts Group’. This was a group of experts nominated by national standards bodies and major companies to work to produce standards for bi-level image coding. JBIG developed IS 11544 (ITU-T T.82) for lossless compression of bi-level images [4]. It can also be used for coding grayscale and colour images with limited numbers of bits per pixel. It can be seen as a form of facsimile encoding, similar to Group 3 or Group 4 fax, offering between 20% and 80% improvement in compression over these methods (about 20:1 over the original uncompressed digital bit map). A. Tr´ emeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 180–187, 2009. c Springer-Verlag Berlin Heidelberg 2009
JBIG for Printer Pipelines: A Compression Test
181
Fig. 1. A possible printing pipeline
Basically it models the redundancy in the image as the correlations of the pixel currently being coded with a set of nearby pixels called the template. An example template might be the two pixels preceding the current one on the same line, and the five pixels centred above the current one on the previous line. Note that this choice only involves pixels that have already been seen from a scanner. The selected pixel is then arithmetically coded based on an eight-bit state so formed. So there are 256 possible contexts to be coded. The arithmetic coder and probability estimator for the contexts is IBM’s (patented) Q-coder. The Q-coder uses low precision, rapidly adaptable (those two are related) probability estimation combined with a multiply-less arithmetic coder. The probability estimation is closely tied to the interval calculations necessary for the arithmetic coding. To overcome this issue the JBIG uses adaptive templates. A description of the Q-coder as well as the prior version of JBIG can be found in the November 1988 issue of the IBM Journal of Research and Development [5,6,7,8]. JBIG can be used on both grey-scale and color images by simply applying the algorithm one bit-plane at a time. The JBIG works well up to about six bits per pixel, beyond which JPEG’s lossless mode works better. The Q-coder must also be used with JPEG to get this performance. Actually no lossless mode works well beyond six bits per pixel, since those low bits tend to be noise, which doesn’t compress at all. In any case the actual intent of JBIG is to replace the less effective group 3 and 4 fax algorithms. The work described in this paper aims to evaluate the performance of the JBIG algorithm through a critical (synthetic) cases analysis. The tests will consider also some specific processing effects during scanning phase; in particular, we focused on CIS [9] (Contact Image Sensors) scanning methodologies. 1.2
Contact Image Sensors
Contact Image Sensors, abbreviated as CIS, is a type of optical flatbed scanner that does not use the traditional CCD arrays that rely on a system of mirrors and lenses to project the scanned image onto the arrays. CIS scanners gather
182
D. Rav`ı et al.
light from red, green and blue LEDs (which combine to create white light) and direct the light at the original document being scanned (see fig.2). The light that is reflected from the original is gathered by a lens and directed at an image sensor array that rests just under the document being scanned. The sensor then records the images according to the intensity of incident light. The sensors of the CIS systems cannot directly measure color hues. Instead, color is obtained as a linear combination of the three base colors (Red, Green and Blue). Each line of the image is illuminated with a light beam of one of the three colors, and reflected light is captured by the sensors which measure the intensity of the corresponding light component; the other two components are interpolated from adjacent lines, in a way similar to color demosaicing in single-sensor cameras.
Fig. 2. Contact Image Sensor schematic plan
The paper is organised as follows: in section 2 a JBIG compression experiment in a standalone system is described; section 3 shows the same experiments in a printing pipeline. Finally a conclusion is presented taking into account the analysis of the tests.
2
JBIG Compression Experiments in a Standalone System
To evaluate the performance of JBIG it is important to use a set of images whose patterns can not be predicted via predetermined templates: random-generated images are thus used. The synthetic image generation algorithm produced has been parameterized in order to adjust the probability to generate each color, so that images with different degrees of randomness can be produced. For blackand-white images (1 bit per pixel), the only parameter used is the probability to generate a black pixel, varying in the range [0.5 - 1] (the behaviour of the algorithm in the range [0 - 0.5] is specular). If the probability is p0 = 0.5 the output image is most random and less predictable, then hardest to compress. On the other hand, if the probability is nearly 1 then the output image is more uniform, thus more predictable and easier to compress.
JBIG for Printer Pipelines: A Compression Test
183
Fig. 3. Randomly generated patterns using (a) 1 bit per pixel with probability pi = 0.5 for white/black value, (b) 8 bits per pixel with probability pi = 0.001 for RGB values between 101 and 255 and pi = 0.0085 otherwise, (c) 8 bits per pixel with probability pi = 0.0039 for each value, and (d) 8 bits per pixel with probability pi = 0.001 for RGB values between 0 and 100 and pi = 0.0057 otherwise.
For images with n bits per pixel, there are 2n possible colors, each with an associated probability to be randomly generated, given the constraint that the sum of all probabilities must be 1. n
2
pi = 1
(1)
i=0
Fig.3 shows some examples of randomly generated images, with different number of bit planes, and different probability associated to each color. The index used to evaluate the experiments is the Compression Percentage, defined as: SizeCompressedImage CompP er = 1 − ∗ 100 (2) SizeOriginalImage The main goal of the experiments is to demonstrate that there are cases where the compressed images are bigger than the original ones. In particular it will be
Fig. 4. Standalone test: Compression percentage over randomly generated pattern images using 1, 2 and 8 Bitplanes. The size of the images varying from 100x100 to 5000x5000 pixels.
184
D. Rav`ı et al.
Fig. 5. Compression percentage vs. size of the input image, for black/white images
shown that for large enough images, in the worst case, the output images are 5% bigger than the original ones. This result is independent form the number of bits used and is always true for images not too small, since in such cases the performance is variable and depends on the initial guesses made by the JBIG arithmetic encoder. An example of compression percentage over randomly generated pattern is depicted in Fig.4. Fig.4 shows that, for large enough images, compression percentage is always a negative number. This is explained by the lack of correlation between pixels in purely random (pi = 1/n for i in [1..n]) images, and is a well-known behavior of every lossless compression algorithm. Fig.5 plots the compression percentage against the size of the input image, for bi-level images with different probabilities of a black pixel being generated.
3
JBIG Compression Experiments in a Printing Pipeline
During printing processes, there are cases where it is necessary to keep the whole image in memory, for example when it is requested to make multiple hard copies of a document acquired by the integrated scanner. This can lead to excessive memory requirements, which is why a compression method is adopted. Fig.1 shows a possible printing pipeline. Data received by the JBIG decoder is copied into a dedicated memory area, then decoded and sent to the subsequent pipeline step. As soon as the copy has been completely processed, the JBIG decoder can restart printing the document, which is stored in memory as compressed data. In a printing pipeline, data received by the final block, controlling printer heads, have very different characteristics than original data: they show much greater correlation and more repeated patterns, introduced (respectively) by the color interpolation of the CIS scanning system and by the halftoning algorithm. This means that the results shown in the previous section 2, where a stand-alone system is taken into consideration, are no longer valid for a printing pipeline (a
JBIG for Printer Pipelines: A Compression Test
185
2
1.5 1
1.5 1
Fig. 6. A 5x5 Gaussian low-pass filter used to simulate the color interpolation of the CIS scanning system
Fig. 7. Difference between pure randomly generated image and blurred randomly generated image
Fig. 8. JBIG compression tests with/without low-pass filter and halftoning over different randomly generated images
random image is not a realistic input in this case). To evaluate the compression system in a printing pipeline, new experiments have been performed: randomly generated images are low-pass filtered (Fig.6), in order to simulate the acquisition
186
D. Rav`ı et al.
system, and then processed by the pipeline algorithms. Fig.7 shows an example of how the image being processed is blurred compared to the original one. Results in Fig.8 show how in such cases, even for random images, the JBIG compression percentage is above 20%. This result guarantees that no worst case is encountered: therefore, using the JBIG compression algorithm inside a printing pipeline always allows to save memory. Lastly, Fig.9 shows the compression ratio for randomly generated images processed by low-pass filters with different strengths and apertures. It is easily seen how stronger low-pass filters generate greater correlation, which in turn translates to better compression ratios. Experiments performed on real use cases showed that JBIG obtained an average 46% compression percentage over a set of 100 scanned photographs and an average 60% compression percentage over a set of 100 text-based documents.
Fig. 9. The compression ratio for randomly generated images processed by low-pass filters with different strengths and apertures
4
Conclusion
The JBIG Encoder/decoder has been implemented into a real pipeline. In particular to be embedded into the original pipeline the following modifications were needed due to the constraints enforced by the pipeline steps: – Memory dynamically allocated by each step must occupy a contiguous, welldelimited area, due to the sequential processing of the pipeline; – Each step must run in a separate thread, to be able to perform a realtime scan-print processing. – Data transmission between steps must be performed via shared buffers, due to the architecture of the workflow. – Lastly the function standard JBIG Split Plane, which only works with 8 bits per pixel images, was rewritten into a new function, which works with any number of bits per pixel.
JBIG for Printer Pipelines: A Compression Test
187
The optimization of memory buffer size, to store scanned images for multiple local copies, is a critical point in printer pipelines. This work demonstrates that using compressed randomly generated images the memory necessary to achieve such purpose is always greater than original size (worst case: for A4 images at 600 ppi, 5% more). Thus for such cases it is necessary to use stripe processing and multiple scan (without buffering images). On the other hand the statistics demonstrated that in the medium case the compression is always well performed and multiple print could be achieved with a reasonable buffer size.
References 1. Denecker, K., de Neve, P.: A comparative study of lossless coding techniques for screened continuous-tone images. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, April 21-24, 1997, vol. 4, pp. 2941–2944 (1997) 2. Savakis, A.E.: Evaluation of lossless compression methods for gray scale document images. In: International Conference on Image Processing, September 10-13, vol. 1, pp. 136–139 (2000) 3. Yovanof, G.S.: Compression in a printer pipeline. In: Proceedings of the 29th Asilomar Conference on Signals, Systems and Computers, vol. 2, p. 219 (1995) 4. ITU-T T.82 Information technology Coded representation of picture and audio information Progressive Bi-Level Image compression (March 1993) 5. Pennebaker, W.B., Mitchell, J.L., Langdon, G.G., Arps, R.B.: An overview of the basic principles of the Q-coder adaptive binary arithmetic coder. IBM Journal of research and development 32(6), 717–726 (1988) 6. Mitchell, J.L., Pennebaker, W.B.: Software Implementations of the Q-Coder. IBM Journal of Research and Development 32(6), 753–774 (1988) 7. Pennebaker, W.B., Mitchell, J.L.: Probability Estimation for the Q-Coder. IBM Journal of Research and Development 32(6), 737–752 (1988) 8. Mitchell, J.L., Pennebaker, W.B.: Optimal Hardware and Software Arithmetic Coding Procedures for the Q-Coder. IBM Journal of Research and Development 32(6), 727–736 (1988) 9. Anderson, E.E., Wang, W.-L.: Novel contact image sensor (CIS) module for compact and lightweight full-page scanner applications. In: Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, May 1993, vol. 1901, pp. 173–181 (1993)
Synthesis of Facial Images with Foundation Make-Up Motonori Doi1 , Rie Ohtsuki2 , Rie Hikima2 , Osamu Tanno2 , and Shoji Tominaga3 1
Osaka Electro-Communication University, Osaka, Japan 2 Kanebo COSMETICS INC., Japan 3 Chiba University, Chiba, Japan
Abstract. A method is described for synthesizing color images of whole human face with foundation make-up by using bare face image and the surface-spectral reflectance of the bare cheek. The synthesis of made-up facial images is based on the estimation of skin color with foundation make-up and the control of skin texture. First, the made-up skin color is estimated from the spectral reflectance of the bare cheek and the optical properties of the foundation. The spectral reflectance of made-up skin is calculated by the Kubelka-Munk theory. Second, smooth texture control is done by the intensity change of layers in the multi-resolution analysis with the Daubechies wavelet. Luster is enhanced and acnes are attenuated by the texture control. Experimental results show the accurate estimation of made-up skin color and the effective texture control. It is shown that the made-up face images are rendered with sufficient accuracy. Keywords: Color image, Facial image synthesis, Make-up foundation, Kubelka-Munk theory, Multi-resolution analysis, Texture synthesis.
1
Introduction
The analysis/synthesis of human skin color and texture is one of the most interesting topics for many fields including computer graphics, medical imaging, and cosmetic development. Especially, the synthesis of make-up skin color and texture are needed to examine the effect of make-up to human skin. Foundation is one of cosmetics to conceal undesirable color on skin and gives basic color and luster to the skin. It is important to evaluate the change of skin condition by the application of the foundation on the skin. In a previous paper [1], we proposed a method for estimating the skin color with foundation make-up. The main idea of the method was based on the fact that skin color with foundation make-up could be computed from the measured reflectance of the bare skin and the optical properties of the foundation by the Kubelka-Munk theory. The made-up skin color without texture was estimated for small areas of cheek. We also analyzed skin texture based on the multi-resolution analysis (MRA) using wavelet transform [2]. A. Tr´ emeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 188–197, 2009. c Springer-Verlag Berlin Heidelberg 2009
Synthesis of Facial Images with Foundation Make-Up
189
The present paper describes a method for synthesizing color images of whole human face with foundation make-up by using bare face image and the surfacespectral reflectance of the bare cheek. It should be noted that the best way to show the effect of foundation application is to display the whole face image with the estimated skin color on a display device. We aim at developing a facial image synthesis system for displaying the effect of make-up foundation on human faces. However, it is difficult to acquire the detailed spectral reflectance data for all surface points on a human face. Moreover, the skin texture is altered with the applied foundation. For instance, acnes can be inconspicuous with foundation, and strong luster can exist in made-up face image. Therefore, the synthesis of made-up face images is based on the estimation of skin color with foundation and the control of skin texture.
2
Synthesis of Made-Up Facial Images
Fig.1 shows the procedure for synthesizing made-up facial images. The input data for the facial image synthesis are a bare face image and the surface-spectral reflectance of the bare skin. The representative reflectance is measured from a point on the cheek. First, the made-up skin color is estimated from the spectral reflectance of the bare skin and the optical properties of the foundation. The facial image is divided into block sub-images. Then, the skin color area is detected and converted to the estimated made-up color. The color distribution of the made-up facial image is smaller than the bare facial image. Therefore, the reduction of the color distribution for the entire skin area is done. Next, texture of the image is controlled by changing the intensity of the texture components in different levels of the MRA. This texture control works for attenuating undesirable spots on skin such as acnes, and enhancing luster patterns. Then, the texture-controlled sub-images are unified into whole facial image. Finally, a synthesized made-up face image is created by combining the skin area from the
Bare face image
Division into sub-images
Bare skin reflectance
Made-up skin color synthesis
Optical properties of foundation
Skin color conversion
Non-skin area
Skin texture control
Texture control coefficients
Unification of sub-images Skin area Synthesized made-up face image
Fig. 1. Procedure for synthesizing made-up facial images
190
M. Doi et al.
unified image and the non-skin area, such as hair, eyes and background, from the original bare face image into whole human face image. 2.1
Estimation of Made-Up Skin Color
We estimate skin color based on the surface-spectral reflectance, because the spectral reflectance function provides the physical property inherent to skin surface. Therefore, we first estimate the surface-spectral reflectance with make-up foundation. The estimated reflectance is then converted to RGB values in considering illumination conditions. In order to determine a relationship between the spectral reflectance function of skin with make-up and the optical properties of skin and foundation, an optics model is assumed for the skin with foundation make-up as shown in Fig.2. The skin optics model consists of two layers of foundation and skin, where the foundation layer is the outer layer contacting the air and the skin layer is located under the foundation layer. In this model, incident light is partly reflected at an interface between the foundation surface and the air. The light penetrating the interface is absorbed and scattered in the foundation layer. The light ray that reaches the skin surface is reflected and absorbed in skin layer. The theory by Kubelka [3,4] is used for estimating the spectral reflectance of skin with foundation make-up based on the above optics model. In general, the optical values of reflectance and transmittance within a layer consisting of turbid materials can be calculated using the Kubelka-Munk theory, where we are not consider the complex path of scattered light inside the medium. We can derive the reflectance R and the transmittance T of the turbid layer with thickness D from solving the above equations under some assumptions as follows ⎫ 1 ⎪ ⎪ R= ⎪ ⎪ a + b coth bSD ⎪ ⎪ ⎬ b (1) T = a sinh bSD + b cosh bSD ⎪ ⎪ ⎪ ⎪ √ ⎪ S+K ⎪ ⎭ a= , b = a2 − 1 S Light Source
Reflection from skin with foundation
Pigments
Foundation Bare skin
Fig. 2. Optics model for the skin with foundation make-up
Synthesis of Facial Images with Foundation Make-Up
191
Foundation Foundation layer absorption thickness
Foundation scattering
K f (λ )
S f (λ )
Df
Kubelka-Munk model
Interface Bare skin Foundation Foundation reflectance reflectance reflectance transmittance Rb (λ )
Ri (λ )
R f (λ )
T f (λ )
Optics model for skin with foundation
Reflectance of skin with foundation Ra (λ )
Fig. 3. Schema of spectral reflectance estimation for skin with foundation
where S and K are, respectively, coefficients of scattering and absorption in the media. When the object consists of two layers, multiple reflections in the interface between the higher layer and the lower layer is considered. In the twolayer model, the total reflectance including the inter-reflection is described as T12 R2 (2) 1 − R1 R2 where T1 and R1 are the transmittance and reflectance of upper layer, respectively, and R2 is the reflectance of lower layer. The proposed algorithm can predict the spectral shape of skin surface by appropriately determining model parameters. Fig.3 shows the schema of the spectral reflectance estimation for skin with foundation make-up. We defined a set of equations for estimating spectral skin reflectance using the Kubelka-Munk theory as Equation 3. Tf2 (λ)Rb (λ) Ra (λ) = (1 − Ri (λ)) Rf (λ) + + Ri (λ) (3) 1 − Rf (λ)Rb (λ) R1,2 = R1 + T12 R2 (1 + R1 R2 + R12 R22 + · · · ) = R1 +
The functions of λ in this equation are described as 1 af (λ) + bf (λ) coth bf (λ)Sf (λ)Df bf (λ) Tf (λ) = af (λ) sinh bf (λ)Sf (λ)Df + bf (λ) cosh bf (λ)Sf (λ)Df Sf (λ) + Kf (λ) af (λ) = , bf (λ) = a2f (λ) − 1 Sf (λ) Rf (λ) =
⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭
.
(4)
We note that the spectral reflectance Ra (λ) is determined by the two parameters; the interface reflectance between the air and the skin surface Ri and the thickness
192
M. Doi et al.
of foundation layer Df . We assume that Ri doesn’t depend on wavelength. Rb (λ) is the spectral reflectance of bare skin. Kf (λ) is the absorption of foundation and Sf (λ) is the scattering of foundation. It should be noted these coefficients depend on wavelength. The CIE-XYZ tristimulus values are calculated using the estimated spectral reflectance of the skin, the spectral power-distribution of a light source, and the CIE color matching functions. These color values are further converted to RGB values by taking account of the characteristics of a calibrated display and the camera used for the bare face image. 2.2
Division of Facial Image into Sub-images
The input image of a bare face is divided into several sub-images. The size of the sub-image is 256x256 pixels. The image size is convenient for the MRA. The MRA process is done separately in eye areas, nose areas, a mouth area, forehead areas, and chin areas. Each sub-image includes overlap area to the neighboring sub-images. This area used for avoiding artifacts on edges in the texture control stage described later. 2.3
Skin Color Detection and Conversion to Made-Up Color
The detection of skin color and the conversion to made-up color are done for each sub-image. Terrillon[5] showed that the normalized r-g color space is effective for skin color-based image segmentation. The normalized r (nr) and normalized g (ng) are defined as : nr =
R G , ng = R+G+B R+G+B
(5)
First, we extract skin area from the bare facial image. To perform this, all pixel values are mapped into the normalized r-g color space, and the range of skin color is determined. Next, we estimate the make-up color to each pixel included in the extracted skin area. Note that all the pixel colors are determined on the basis of the make-up color estimated for a particular point on the cheek which is obtained from the surface-spectral reflectance analysis. Moreover, it is suggested that the whole of human face with make-up foundation has a reduced color distribution, compared with the original bare face. Let (R(x, y), G(x, y), B(x, y))T be the original pixel values at location (x, y) on the bare facial image, (Rc , Gc , Bc )T be the original pixel values at a particular point on the cheek, and (Re , Ge , Be )T be the made-up color values estimated from the spectral reflectance at the same point. Then the make-up color values at any location (x, y) can be estimated in the following form. ⎛ ⎞ ⎛⎛ ⎞ ⎛ ⎞⎞ ⎛ ⎞ R (x, y) R(x, y) Rc Re ⎝ G (x, y) ⎠ = ⎝⎝ G(x, y) ⎠ − ⎝ Gc ⎠⎠ × α + ⎝ Ge ⎠ , (6) B (x, y) B(x, y) Bc Be where α is a weighting coefficient for adjusting color distribution.
Synthesis of Facial Images with Foundation Make-Up
193
Weighting Decomposition
SubImage
Layer 1
x W1
Layer 2
x W2
Layer n
x Wn
Composition
SubImage
Base color
Fig. 4. Schema of texture control
LLLL LLHL
LL
HL
HL LLLH LLHH
LH
HH
(a) Level -1
LH
HH
(b) Level -2
Fig. 5. Multi-resolution analysis
2.4
Texture Control
Fig.4 shows the procedure for texture control. Each sub-image with the estimated made-up color is decomposes into some texture layers by the MRA. Then, the texture image at each layer is weighted to change the intensity. Finally, these layers are combined into one sub-image. Tsumura[6] proposed skin melanin texture control based on the Laplacian pyramid decomposition. In the present study, the wavelet decomposition with the Daubechies filter provides smooth synthesis of several textures on skin. The wavelet transform is suitable for extracting both periodic patterns and local non-periodic patterns, that is, the feature patterns are localized in both spatial domain and frequency domain. Therefore, the MRA by wavelet transform is useful for extracting base color in the DC component and some feature patterns in color texture images. The MRA is done as the following steps. First, an image is decomposed into four components as shown in Fig. 5(a). These are the four components of low frequency for column and row (LL), low frequency for column and high frequency for row (LH), high frequency for column and low frequency for row (HL), and high frequency for column and row (HH). The layer of Level-1 represents the composition of the original image into LH, HL, HH. This level includes the highest frequency component of the image. Second, the LL component is decomposed into the four components of Level-2 layer as shown in Fig. 5(b). Suppose a squared image of 2m-by-2m pixels. By repeating the decomposition, we obtain the multiple layers of Level-1 to Level-m with 2-by-2 and the DC component with 1-by-1.
194
M. Doi et al.
Fine surface texture
Sub-mage
Pigment texture
Shade texture
Base color
Fig. 6. Texture components of face image
The DC component shows the base color of the original image. The low frequency layers can include the shading of image. The middle and high frequency layers can include feature patterns of textures. Various texture patterns appear on the surface of human skin. For the whole made-up face image, we define four components. The components are (1) base color, (2) fine surface texture including pores, (3) pigment texture including acnes, and (4) shade texture caused by 3D face shape. Fig.6 shows the texture components. The color image of each texture component in the figure is enhanced for the explanation. In this case, we find that the fine surface texture exists in layers of Levels 1, 2 and 3. The pigment texture exists in layers of Levels 4 and 5. Shade texture exists in layers of Levels 6, 7 and 8. The decomposed layers are enhanced and attenuated with the weighting coefficient. The enhancement of the fine surface texture causes the enhancement of luster, and the attenuation of the pigment texture causes the attenuation of acnes. Then, the decomposed and texture-controlled layers are combined into one sub-image. 2.5
Unification of Sub-images
The texture controlled sub-images are unified into a whole face image. In this unification, the overlap areas are cut off. Finally, the skin area with the estimated made-up skin colors and the non-skin area in the original bare face image are combined into the whole face image as a synthesized made-up face image.
3
Experiment
Experiments were executed for evaluating the proposed method. First, we took two pictures of the bare face and the foundation-applied face for a person (Subject 1) by a digital camera. We analyzed the pictures and determined the texture control parameters. Next, we took the two kinds of picture for another person (Subject 2). Moreover, the surface-spectral reflectances of the bare cheek and the foundation-applied cheek were measured by a spectrophotometer. The made-up face image is synthesized from the bare face image and the spectral reflectance of the bare cheek. The image size of a facial image was 709x928 pixels .
Synthesis of Facial Images with Foundation Make-Up
195
Table 1. Texture control parameters
3.1
Level
Weighting coefficient R G B
1 2 3 4 5
3.00 1.30 1.00 0.94 0.82
3.00 1.30 1.00 0.77 0.70
3.00 1.30 1.00 0.78 0.65
Decision of Texture Control Parameters
Sub-images of the facial image of Subject 1 were decomposed into multiresolution layers. The order of Daubechies wavelet was 8. Then, the standard deviations of the layers were calculated for each color components of RGB. The standard deviations of layers were compared between the bare face image and the made-up face image. The ratio of them was calculated. This ratio was used to determine the weighting coefficients in the texture control. Table 1 shows the weighting coefficients for Levels from 1 to 5. Levels 1, 2 and 3 include the fine surface texture. The weighting coefficients for Levels 1, 2 enhance the luster. Level 4 and 5 include the pigment texture. The weighting coefficients for Levels 4 and 5 reduce the skin troubles, such as acnes. The coefficients for other levels were set to 1. The color distribution control coefficient was set to 0.9. 3.2
Skin Color Estimation
Fig.7 shows the estimation result of the spectral reflectance for the made-up skin of Subject 2. The reflectance was estimated to fit the measured data with the parameter control on the interface reflection and the foundation thickness. Note 70
Reflectance rate [%]
60
Measured Estimated Bare skin
50 40 30 20 10 0 400
450
500
550
600
650
700
Wavelength [nm]
Fig. 7. Measured reflectance and the estimated reflectance of made-up skin
196
M. Doi et al.
that the spectral curve of the estimated reflectance is almost coincident with the measured curve. The average estimation error over the visible wavelength range was 0.5%. Thus the accurate estimation of skin spectral reflectance is performed.
3.3
Texture Control and Rendering Results
The estimated spectral reflectances were converted to RGB values. Then, whole face image was synthesized by using the estimated colors and the texture control parameters shown in Table 1. In this case, the same control parameters were applied to all block images. Fig. 8 demonstrates a set of the bare face image, the
(a) Bare face
(b) Synthesized made-up face
(c) Real made-up face
Fig. 8. Synthesized face image and real face images
(a) Bare skin
(b) Synthesized made-up skin
Fig. 9. Luster enhancement on cheek
(a) Bare skin
(b) Synthesized made-up skin
Fig. 10. Attenuation of acne patterns on chin
Synthesis of Facial Images with Foundation Make-Up
197
synthesized made-up face image from the bare face image and the real made-up face image. We should note that the synthesized made-up face image is very close to the real made-up face image. The proposed method predicts the made-up face image with sufficient accuracy. Figures 9 and 10 show the details in the images. Fig. 9 represents the luster effect on the cheek image, where the luster is enhanced at Layers 1 and 2. Fig. 10 represents the attenuation effect on the chin image, where reddish acne patterns in the synthesized image are inconspicuous. Specialists of cosmetics development gave the comment that the synthesized image showed the feeling of make-up on human face well.
4
Conclusions
This paper has described a method for synthesizing color images of the whole human face with foundation make-up by using bare face image and the surfacespectral reflectance of the bare cheek. The synthesis of made-up face images was based on the estimation of skin color with foundation make-up and the control of skin texture. First, the made-up skin color was estimated from the spectral reflectance of the bare cheek and the optical properties of foundation. The spectral reflectance of made-up skin was calculated by the Kubelka-Munk theory. Second, the texture control was based on the MRA with wavelet analysis. Smooth texture synthesis was done by the intensity change of layers in the MRA with the Daubechies wavelet. Luster was enhanced and acnes were attenuated by the texture control. Experimental results showed the accurate estimation results for made-up skin color and the effectiveness of the texture control. Thus, the proposed method predicts the made-up facial image with sufficient accuracy.
References 1. Doi, M., Tominaga, S.: Spectral estimation of made-up skin color under various conditions. In: SPIE/IS&T Electronic Imaging, vol. 6062 (2006) 2. Doi, M., Tominaga, S.: Image Analysis and Synthesis of Skin Color Textures by Wavelet Transform. In: IEEE SSIAI, pp. 193–197 (2006) 3. Kubelka, P.: New Contributions to the Optics of Intensely Light-Scattering Materials. Part I. J. Opt. Soc. Am. 38(5), 448–457 (1948) 4. Kubelka, P.: New Contributions to the Optics of Intensely Light-Scattering Materials. Part II. J. Opt. Soc. Am. 44(4), 330–335 (1954) 5. Terrillon, J.C., Pilpre, A., Niwa, Y., Yamamoto, K.: Properties of Human Skin Color Observed for a Large Set of Chrominance Spaces and for Different Camera Systems. In: 8th Symposium on Sensing Via Imaging Information, pp. 457–462 (2002) 6. Tsumura, N., et al.: Real-time image-based control of skin melanin texture. In: ACM SIGGRAPH 2005 (2005)
Polynomial Regression Spectra Reconstruction of Arctic Charr’s RGB J. Birgitta Martinkauppi1, Yevgeniya Shatilova2, Jukka Kekäläinen1, and Jussi Parkkinen1 1 University of Joensuu, PL 111, 80101 Joensuu {Birgitta.Martinkauppi,Jussi.Parkkinen}@cs.joensuu.fi,
[email protected] 2 Previously at University of Joensuu
[email protected]
Abstract. Arctic Charr (Salvelinus alpinus L.) exhibit red ornamentation at abdomen area during the mating season. The redness is caused by carotenoid components and it assumed to be related to the vitality, nutritional status, foraging ability and generally health of the fish. To assess the carotenoid amount, the spectral data is preferred but it is not always possible to measure it. Therefore, an RGB-to-spectra transform is needed. We test here polynomial regression model with different training sets to find good model especially for Arctic charr. Keywords: Arctic Charr (Salvelinus alpinus L.), carotenoid, spectral data, sRGB to spectra transform.
1 Introduction Arctic charr are an endangered fish species living in Finland [1]. It is also grown in fisheries and the individuals are considered valuable assets. The most striking feature of charr is its red abdomen area during the mating season. This red ornamentation is thought to be related to the ability of fish to acquire carotenoids from food since animals cannot synthesize carotenoid components (e.g. [2]). It is assumed to indicate the nutritional status and foraging ability. Since the carotenoid component seems to be important factor for evaluating vitality, we are developing a system using spectral data for analyzing it. The survival of valuable fish in the quality evaluation is required but up until now, the spectral imaging is too slow, expensive, difficult and cumbersome to use for an ordinary layman. The relation between RGB and spectra has been studied in many papers; see e.g. Baronti et al., Bochko et al., Hardeberg, Heikkinen et al. [3-6]. The 2nd and 3rd order polynomial was chosen for this work. The applying the transformation for the charr is a challenging task for many reasons. First, since the charr is a natural object, its coloration vary even within one individual. Then its surface and shape also set limitations. Of course, the camera and illumination need to be somehow characterized for the transformation. If the A. Trémeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 198–206, 2009. © Springer-Verlag Berlin Heidelberg 2009
Polynomial Regression Spectra Reconstruction of Arctic Charr’s RGB
199
system is to be applied in fisheries and test places in nature, the number of test samples are limited. In this paper, we describe tests with two polynomial regression models and training samples for obtaining the RGB-to-spectral transform dedicated for Arctic charr (see also [7]). The training samples consist of Macbeth chart and few pages from Munsell book. The spectral imaging is applied all the training and fish samples. The transform is calculated for sRGB presentation which is commonly used in many cameras. The sRGB is obtained from spectral data thus making the evaluation camera independent and the results can be thought to be optimal in this sense. To evaluate the quality of the transformation, we employed two commonly used error metrics: Root-MeanSquare error (RMSE) for spectra and ΔE of CIELab for human vision.
2 Spectral Reconstruction Methods for Arctic Charr The spectral reconstruction for Arctic charr needs several stages as shown in Fig. 1. First, training set and polynomial are selected and this data is used for calculating the transformation matrix: X·W = Y,
(1)
where X = RGB values of camera for the selected samples, Y = spectral reflectance corresponding the samples, and W = transformation matrix. The transformation matrix is calculated in the least square sense using pseudo-inverse method. The obtained transformation matrix is then applied on the test set which consists of spectral images of charr. The quality of the reconstructed image is analyzed in two metrics: CIELab error ΔE for human vision,
ΔE = (L - L t ) 2 + (a - a t ) 2 + (b - b t ) 2 ,
(2)
where L,a,b = CIELab values for measured spectra, and Lt,at,bt = CIELab values for measured spectra and root-mean-square error RMSE for the spectra
∑ (S (i) − S (i) ) (
n
RMSE =
i =1
n
where n=number of wavelengths, S=original, measured spectra, and Š=spectra approximation from transform.
2
,
(3)
200
J.B. Martinkauppi et al.
Fig. 1. Schema for spectral reconstruction
The best polynomial model is selected based on the error after the calculations. Table 1 shows terms of 2nd and 3rd polynomials which both have also a constant term (1st order polynomial was excluded due its simplicity). Table 1. Terms of polynomials Number of terms 10 20
Terms of polynomials R G B R2 G2 B2 RG RB GB 1 R G B R2 G2 B2 RG RB GB RGB RGG RBB GRR GBB BRR BGG R3 G3 B3 1
3 Reconstruction Results Two different training sample set where used in reconstruction. All training and fish data was first subjected to spectral measurements. Then the corresponding sRGB presentation was calculated under illuminant ‘D65’ which is the ideal daylight 6500 K light. 3.1 Reconstruction with Macbeth Chart The training set consisted of all 24 samples of Macbeth chart. The 2nd and 3rd order polynomials were applied to calculate the transform. Fig.2 shows an example of
Polynomial Regression Spectra Reconstruction of Arctic Charr’s RGB
201
Fig. 2. The upper image is a sRGB presentation calculated from the original spectral image. The lower row display sRGB presentations computed from polynomial approximated spectra: left image is obtained using 2nd order transform, while the right image is from 3rd order approximation.
results: sRGB presentation for original spectra and spectra approximated from sRGB. The 2nd order approximation produces clearly color distortions for human point of view but color quality of 3rd order approximation is acceptable. The numerical evaluation of data is presented in Table 2. The 3rd order polynomial transform produces smaller ΔE for the fish image than the 2nd order one but RMSE is bigger for the 3rd order. This indicates over fitting as shown in Fig. 3.
Fig. 3. The 3rd order polynomial transform over fits the spectra. Note that the original spectral data is noisy at the ends of the wavelength range.
202
J.B. Martinkauppi et al. Table 2. Numerical evaluation for Macbeth chart as a training set Error metric ΔE
RMSE
Average
Standard deviation
Image of Fish 2nd 9.1862 6.9114 3rd 2.8097 2.8057 Training set 2nd 0.7511 0.5437 3rd 0.0551 0.0576 Image of Fish 2nd 1.6078 2.2106 3rd 3.186 3.5024 Training set 2nd 0.0354 0.0250 3rd 0.0262 0.0225
Maximum Minimum
31.1366 27.2516
0.1041 0.003
1.9430 0.2244
0.1354 0.0021
11.3326 16.2798
0.1819 0.2319
0.1229 0.0916
0.0094 0.0026
3.2 Reconstruction with Macbeth Chart and Pages from Munsell Book To solve the problem of over fitting, the training samples were complemented with few pages from Munsell book. Munsell book is a colour atlas which has a large number of samples for different hues. The sample pages selected from the book (like YY or RR) have hues similar to the hues present in Arctic charr. Fig. 4 displays the sRGB presentations for the new training data. The extended training set clearly improves color quality for 2nd order model.
Fig. 4. Upper row: sRGB from measured spectra. Lower row: left image, sRGB from 2nd order transform and right image, sRGB from 3rd order transform. The extended training set clearly reduces the color distortion in the 2nd order polynomial transform.
Polynomial Regression Spectra Reconstruction of Arctic Charr’s RGB
203
Table 3 and 4 presents the numerical errors ΔE and RMSE obtained using different training set. The results indicate that the extending training set will reduce the average error in most of the cases, and that the 3rd order polynomial transform produces smaller errors. Tables 5 and 6 display the error calculated for skin patches of Arctic charr to test the transform with color variations. The results are the same also for these cases. The over fitting problem is also avoided as can be seen in Fig. 5. Table 3. CIELab error ΔE error
Polynomial 2nd
3rd
2nd
3rd
Fish image Macb Mac+XYY Mac+XYY+XYR Mac+YYRR(404) Macb Mac+XYY Mac+XYY+XYR Mac+YYRR(404) Training set Macb Mac+XYY Mac+XYY+XYR Mac+YYRR(404) Macb Mac+XYY Mac+XYY+XYR Mac+YYRR(404)
Average
Standard deviation
Maximum
Minimum
9.1862 7.7407 8.3574 8.2703 2.8097 3.0271 3.2334 1.8174
6.9114 6.0777 6.4532 7.1867 2.8057 3.5854 3.1669 2.9300
31.1366 28.2561 28.2481 31.8375 27.2516 26.9802 27.0605 27.1408
0.1041 0.1058 0.0260 0.0360 0.003 0.0147 0.0054 0.0030
0.7511 1.3823 1.0955 0.4700 0.0551 0.6849 0.5458 0.0571
0.5437 1.8873 1.6314 0.5994 0.0576 1.3344 1.0999 0.1188
1.9430 17.8670 18.0721 8.3434 0.2244 13.2440 13.9019 2.2247
0.1354 0.0617 0.0821 0.0139 0.0021 0.0233 0.0116 0.0016
Table 4. RMSE error for the extended training set
RMSE Training set Polynomial 2nd Macb Mac+XYY Mac+XYY+XYR Mac+YYRR Macb 3rd Mac+XYY Mac+XYY+XYR Mac+YYR Fish Image
Average
Standard deviation
Maximum
Minimum
0.0354 0.0254 0.0276 0.0188 0.0262 0.0222 0.0240 0.0138
0.0250 0.0168 0.0193 0.0156 0.0225 0.0159 0.0184 0.0139
0.1229 0.1408 0.1554 0.1620 0.0916 0.1007 0.1456 0.1637
0.0094 0.0032 0.0049 0.0030 0.0026 0.0023 0.0020 0.0016
204
J.B. Martinkauppi et al. Table 4. (continued)
2nd
3rd
Macb Mac+XYY Mac+XYY+XYR Mac+YYRR(404) Macb Mac+XYY Mac+XYY+XYR Mac+YYRR(404)
1.6078 1.5047 1.4981 1.4041 3.186 1.5418 1.4258 1.5349
2.2106 2.2076 2.1945 1.9767 3.5024 2.3626 2.3840 2.1575
11.3326 11.0383 11.0646 9.8274 16.2798 10.9252 10.9380 9.8115
0.1819 0.2475 0.2938 0.1737 0.2319 0.1089 0.1255 0.0940
Table 5. Numerical evaluation of a sample
Average
Standard deviation
Maximum
Minimum
0.5867 0.2449 0.2009 0.1534
0.3341 0.1011 0.0978 0.0692
1.7043 0.9747 0.3766 0.3573
0.0928 0.0149 0.0014 0.0067
0.3135 0.469 0.4224 0.4886
0.1812 0.3878 0.5151 0.417
0.7583 1.1527 2.8739 1.2334
0.0636 0.0584 0.063 0.0584
Polynomial ΔE 2nd Macb Mac+YYRR 3rd Macb Mac+YYRR RMSE 2nd Macb Mac+YYRR 3rd Macb Mac+YYRR
Table 6. Numerical evaluation of another sample
Polynomial 2nd 3rd 2nd 3rd
Average
Standard deviation
Maximum
Minimum
2.2833 1.4028 0.3441 0.445
2.2596 1.2312 0.3428 0.2606
14.0901 9.9955 4.1933 2.7709
0.0905 0.42 0.0023 0.0184
0.3835 0.2246 0.7983 0.296
0.3063 0.1044 0.6225 0.1587
1.6232 0.5408 2.1389 0.6033
0.0738 0.0629 0.0518 0.0528
ΔE
Macb Mac+YYRR Macb Mac+YYRR RMSE Macb Mac+YYRR Macb Mac+YYRR
Polynomial Regression Spectra Reconstruction of Arctic Charr’s RGB
205
Fig. 5. The extended training set reduces the over fitting problem. Upper row, left image: training set Macbeth chart + Munsell YY; right image: Macbeth + Munsell YY and YR. The lower row, training set Macbeth chart + Munsell YY, YR and RR.
4 Conclusions We have tested two polynomial regression models, 2nd and 3rd order polynomials, for sRGB-to-spectra transform with different training sets. The results indicate that a bare Macbeth chart will produce poor results both polynomials (color distortion and over fitting) when tested with Arctic charr. When adding more training samples from Munsell book corresponding to Arctic charr coloration, the models work better. The obtained results clearly show that we can use RGB image to approximate the spectra for Arctic charr and thus make it as part of spectral based carotenoid content evaluation system.
References 1. Urho, L., Lehtonen, H.: Fish species in Finland. Riista- ja kalatalous - selvityksiä 18/2008 Finnish Game and Fisheries Research Institute, Helsinki (2008) 2. Badyaev, A.V., Hill, G.E.: Evolution of sexual dichromatism: contribution of carotenoidversus melanin-based coloration. Biological Journal of the Linnean Society 69, 153–172 (2000)
206
J.B. Martinkauppi et al.
3. Baronti, S., Casini, A., Lotti, F., Porcinai, S.: Multispectral imaging system for the mapping of pigments in works of art by use of principal-component analysis. Applied optics 37(8), 1299–1309 (1998) 4. Bochko, V., Tsumura, N., Miyake, Y.: A Spectral color imaging system for estimating spectral reflectance of paint. Journal of Imaging Science and Technology 51(1), 70–78 (2007) 5. Hardeberg, J.Y.: Acquisition and reproduction of colour images: colorimetric and multispectral approaches. Ph.D dissertation (Ecole Nationale Superieure des Telecommunications), Paris, France (1999) 6. Heikkinen, V., Jetsu, T., Parkkinen, J., Hauta-Kasari, M., Jaaskelainen, T., Lee, S.D.: Regularized learning framework in the estimation of reflectance spectra from camera responses. Journal of the Optical Society of America A 24(9), 2673–2683 (2007) 7. Shatilova, Y.: Color Image Technique in Fish Research, Master thesis, University of Joensuu (2008)
An Adaptive Tone Mapping Algorithm for High Dynamic Range Images Jian Zhang and Sei-ichro Kamata Graduate School of Information, Production and Systems, Waseda University 2-7 Hibikino, wakamatsu-ku, kitakyushus-shi, Fukuoka 808-0135, Japan
[email protected],
[email protected] http://www.waseda.jp/ips
Abstract. Real world scenes contain a large range of light intensities which range from dim starlight to bright sunlight. A common task of tone mapping algorithms is to reproduce high dynamic range(HDR) images on low dynamic range(LDR) display devices such as printers and monitors. In this paper, a new tone mapping algorithm is proposed for the display of HDR images. Inspired by the adaptive process of the human visual system, the proposed algorithm utilized the center-surround Retinex processing. The novelty of our method is that the local details are enhanced according to a non-linear adaptive spatial filter (Gaussian filter), whose shape is adapted to high-contrast edges of the image. The proposed method uses an adaptive surround instead of the traditional pre-defined circular. Therefore, the algorithm can preserve visibility and contrast impression of high dynamic range scenes in the common display devices. The proposed method is tested on a variety of HDR images, and we also compare it to previous work. The results show good performance of our method in terms of visual quality. Keywords: Tone mapping, retinex, high dynamic range, Hilbert scan, Gaussian filter.
1
Introduction
HDR imaging techniques can produce images that record the full dynamic range of the scene in the real world. Recently, HDR images can be captured easily using a composition of multiple photographs of the same scene with different exposures [1] or new sensor technologies [2]. Thus, the availability of HDR data becomes more commonplace. In contrast, HDR display devices are not commonplace yet because of their high price. Furthermore, most current output devices, such as computer monitors and printers, often only make a part of the scene visible, which is much lower than the dynamic range of the scene. This mismatch between HDR input and LDR output leads to the problem that how we can reproduce or render such images in a standard output device. In order to solve this problem, tone mapping techniques[3]-[6] are used to map HDR values to LDR values. Tone mapping is a contrast enhancement technique which can A. Tr´ emeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 207–215, 2009. c Springer-Verlag Berlin Heidelberg 2009
208
J. Zhang and S.-i. Kamata
(a) Linear scaling
(b) Image treated with our method
Fig. 1. An example of HDR image that needs a tone mapping technique
scale the HDR data in order to preserve certain characteristics of the input HDR image, such as contrast, visibility or appearance in the resulting displayable image. Currently, there existed a number of tone mapping algorithms. According to the transformation they apply to convert input luminance to the output values, they are usually categorized into two main groups, i.e., global operators and local operators. Global tone mapping methods compress the original range into the output range in accordance with non-linear one-to-one matching functions, for example, logarithmic function, gamma function, sigmoidal function and so on [3]. These global algorithms tend to preserve the subjective perception of the scene and have advantage of being simple and fast. However, simple global processing will cause a loss of contrast, which is apparent in the loss of detail visibility in the dark or light parts of the image (e.g. Fig.1). So they are still difficult to reproduce all the visual details. On the other hand, local tone mapping methods refers to the algorithms which use a local processing for reproduction. These methods are not based on a one-to-one mapping. The mapping function is changed according to the spatial context of the scene, therefore two different pixels with the same intensity in the original image can be mapped to different display values. This makes the local operator-based methods give more details of the image than those global operator-based methods. However, they may also cause ”halo” effects or ringing artifacts in the reproduction. The Human Visual System (HVS) can perceive a vast range of intensities well by using various adaptation mechanisms. Land’s Retinex is a simplified model of the HVS and can be adapted easily for computational image rendering algorithms. Now, more evidence shows Retinex is an optimal solution to the lightness problem. In this paper, an adaptive tone reproduction algorithm based on the center-surround Retinex model is proposed. In general, our method belongs to the local operator’s category. Compared to the previous surround-based
An Adaptive Tone Mapping Algorithm for High Dynamic Range Images
209
algorithms, our Retinex-based local processing uses an adaptive surround instead of the traditional pre-defined circular. Because the shape of a surround can be changed according to the high-contrast edges, the proposed method does not generate halo artifacts but does preserve visibility of local details. Therefore, the proposed method can render high dynamic range images that models the adaptation of HVS. It has been implemented and tested in an image database. The experimental results show that this algorithm is effective and easy to use, only one parameter needs setting.
2
The New Adaptive Processing Algorithm
In this section, an adaptive tone mapping algorithm is presented. The proposed approach can overcome the problems mentioned above while achieving satisfactory results. Our work is totally focused on the luminance channel, so at first we introduce how to calculate the luminance values. Then the Retinex-based adaptive Gaussian filtering method is proposed. The local operators make use of a local information in the spatial domain, i.e., the shape of Gaussian filter should depend on the pixel’s partial derivatives. This makes it reproduce the details of the image effectively. Note that all the operations are performed in logarithm space. 2.1
Retinex-Based Local Processing
Because we only treat the luminance in our method, we compute first the luminance L (in cd/m2 ) from the RGB space. In the LHS (luminance, hue, and saturation) system, the luminance is defined as L = 0.299R + 0.587G + 0.114B
(1)
Let L be the luminance image encoded linearly, whose maximum value is 1. Compared to the early stage of HVS where a global adaptation happens, the global tone mapping is performed for a first compression of the dynamic range. Then, we assume the logarithmic relation in our tone mapping solution, which can be described as the first luminance adjustment of HVS. The model presented in [7] essentially preserves good global perception. The following equation shows this mapping relationship. L =
log(L + ϕ) − log(Lmin + ϕ) log(Lmax ) − log(Lmin + ϕ)
(2)
In the above equation, L is the nonlinear output luminance image, Lmax and Lmin are the maximum and minimum luminance of the scene, and ϕ is a small value to avoid the singularity that occurs if black pixels are present in the image and to control the overall brightness of the reproduced image. For example, when ϕ decreases, the display will be bright and this leads to increase the sensitivity for dark areas. The parameter ϕ is usually called ”key value”, because it decides
210
J. Zhang and S.-i. Kamata
whether an input value is mapped to a higher or lower value. In our research, we set it according to the average luminance La which is calculated by X
ϕ = La = exp(
Y
1 log(L(x, y))) N x=0 y=0
(3)
where N is the number of pixels in the luminance image, an order tern (x, y) of integers x and y is the coordinates of a pixel, 0 ≤ x < X and 0 ≤ y < Y . After a simple global processing, Retinex-based local processing is carried out. Traditionally, the Retinex is a member of the class of center surround functions where the output value of the function is obtained by computing the difference between the log-encoded treated pixel (center) and the log-encoded value of its neighborhood (surround). Here the surround is a Gaussian function. The mathematical form of the single-scale Retinex is defined as Lout (x, y) = log(L (x, y)) − log(G(x, y) ∗ L (x, y))
(4)
where Lout is the Retinex output image, log is the natural logarithm function. The symbol ”∗” is convolution operator. G is a Gaussian filter (kernel) expressed as 1 G(x, y) = √ exp(−(x2 + y 2 )/2σ 2 ) (5) 2πσ where σ 2 is the variance of Gaussian filter, and the size of the filter is often determined by σ 2 . In the traditional Retinex [5], the value of sigma is fixed. However, it is known that an image with high-contrast edges are more pleasing to the human eyes, and that the human visual system is sensitive to the position of edges with high contrast in smooth areas. Thus if the fixed σ 2 is small, local contrast can be increased significantly while producing halo artifacts along highcontrast edges. On the contrary, a large σ 2 reduces the artifacts, but provides less increase in local contrast [8]. There exists a tradeoff between the increase in local contrast and good rendition of the image. So we make the filter variance adapt to the local characteristics of an image, that is to say, our Gaussian filter G(x, y) is a function of the variable σ(x, y). This is based on the strategy that different parts of an image should be smoothed differently, depending on the type of edges. The following equation shows information about contrast in horizontal, vertical and diagonal directions respectively. ∇L (x, y) = L (x − 1, y − 1) + L (x + 1, y − 1) + L (x + 1, y + 1)+ L (x + 1, y − 1) − L (x − 1, y) − L (x + 1, y) − L (x, y − 1) − L (x, y + 1). (6) So in an area with high-contrast edges, the partial derivatives (∇L ) are large and the filter variance should be adapted to a small values to keep the high contrast along edges. On the other hand, in a smooth area the partial derivatives are small and the variance should be large to reduce artifacts. This illustrates a fact that there exists a proportional relationship between ∇L and σ 2 . Therefore, we
An Adaptive Tone Mapping Algorithm for High Dynamic Range Images
211
propose the following equation to calculate the variance of 2-D Gaussian filter at location (x, y). β σ 2 (x, y) = (7) |∇L (x, y)| where β is a scaling factor, |x| is absolute value of x which insures σ 2 is a positive value. We employ the 2-D (2w + 1) × (2w + 1) adaptive Gaussian filter for L (x, y), then obtain x+w
G(x, y) ∗ L (x, y) =
y+w
L (xt , yt ) · (δ(xt , yt ) · exp(
xt =x−w yt =y−w x+w
y+w
xt =x−w yt =y−w
exp(
−(x2t + yt2 ) )) σ 2 (x, y)
−(x2t + yt2 ) ) σ 2 (x, y)
(8) Except adjusting variance σ 2 (x, y), the adaptive filter method should be able to adapt the shape of the filter to the high-contrast edges in the image in order to prevent halo artifacts. In other words, adaptive Gaussian filter doesn’t blur the edges. So we use a weight number δ(xt , yt ) to adapt the shape of the Gaussian filter in (8). δ(xt , yt ) changes the shape of the filter based on the information of high-contrast edges, and it can be calculated by 1) If (xt , yt ) ∈I, δ(xt , yt ) = ||∇L (xt , yt )| − δ(xt − 1, yt − 1)| + |∇L (xt , yt )|, 2) If (xt , yt ) ∈II, δ(xt , yt ) = ||∇L (xt , yt )| − δ(xt , yt − 1)| + |∇L (xt , yt )|, ··· ··· ··· 8) If (xt , yt ) ∈VIII, δ(xt , yt ) = ||∇L (xt , yt )| − δ(xt − 1, yt )| + |∇L (xt , yt )|, (9) where I,II,· · · ,VIII describe eight regions of a (2w + 1) × (2w + 1) window, which is shown in Fig.2. The output value of adaptive Gaussian filter centered at pixel (x, y) is given by a weighted average of pixels surrounding the position (x, y). And the weights are given by a 2-D adaptive Gaussian function, whose spatial constant and shape varies in accordance with the image high-contrast edges. From Equation (8), in
Fig. 2. Eight regions of of a (2w + 1) × (2w + 1) window
212
J. Zhang and S.-i. Kamata
fact it is done by adapting the variance σ 2 (x, y) and shape of the filter δ(xt , yt ). According to Equation (6) we can get ∇L (x, y) for each pixel, then they are ∇Lmin scaled to a larger range [min( , 10−5 ), 1], where ∇Lmin and ∇Lmax mean ∇Lmax the minimum and maximum ∇L (x, y) in a image, and function min( ) is in order to perform division operator in (7). In this study, we limit the size of the surround by taking parameter β (in Equation (7)) into account. If we set a default value for the size w, then we can calculate the scaling factor β as: β = (∇Lmin ∗ w/3)2
(10)
In our study, the default value of w is 60, then σ(x, y) is scaled a range: [20/(min(
∇Lmin , 10−5 )), 20]. ∇Lmax
(11)
Figure 3 shows an example of the construction of our adaptive filter. The reproduced luminance values are in the range [0, 1], but the output image still contains the very high dynamic range information even ignoring numerical errors. Thus, a quantization processing is required before the luminance is integrated back into the color image processing. This is done by histogram equalization technique (scaling and clipping). Then the original range is divided into N (=255) intervals based on the pixel distribution. After the luminance processing applications, new luminance values are integrated back into the color image.
1 0.8 1 0.8
0.6
0.6 0.4 0.2
0.4
0 −0.2 −0.4
0.2
−0.6 −0.8 −1
0 0
0 10 20 30 100
40
50 100
20
0
40
60
80
100
90 80
50 70
60
60 50
70 40
80
30 20
90 100
10 0
(b) ∇L
(a) An image with a Highcontrast edge −4
x 10 4
−4
x 10
3 8 6
2 4 2
1 0 100 80
0 0
60
100 80
40
50 100
0
20
40
60
80
60
100
(c) General Gaussian filter
20
40 20 0
0
(d) The proposed adaptive Gaussian filter
Fig. 3. An example of the adaptive filter
An Adaptive Tone Mapping Algorithm for High Dynamic Range Images
3
213
Experiments
In our simulation, we compute the HDR luminance image by Equation (1) firstly. Then the HDR luminance is mapped into the display luminance by the global compression function (2). Thirdly, based on the adaptive Gaussian filtering, each pixel is adjusted locally. Finally, the LDR image is rendered. Because it is very difficult to know how the light or dark the image should be displayed to be faithful to the original HDR image [3], until now, there are not any standard of objective evaluation available for measuring the quality of displayed HDR images. Main evaluation is based on human’s subjective evaluation. However, the method which can increase the contrast while prevent halo artifacts may be desirable for certain applications. In this paper, we consider this to evaluate
(a) The proposed method
(b) Bilateral filtering
(c) Meylan’s approach
(d) Gradient comression
Fig. 4. The M emorialChurch image treated with different methods
214
J. Zhang and S.-i. Kamata
(a) The proposed method
(b) Ashikhmin’s approach
(c) Photographic mapping
(d) Bilateral filtering
Fig. 5. The Atrium image treated with different methods
the performances of different algorithms. In Fig.4 and 5, we compare our algorithm with other local mapping methods. Fig.4(a) shows an image processed by our algorithm. When compared to bilateral filtering method and Meylan’s method as shown in Figs.4(b) and (c), our algorithm can preserve more visual details in both the dim and bright areas under a good overall impression, especially on the left and right-up corners of the image. Fig.4(d) shows the result obtained by gradient domain contrast compression. This method is also good at increasing local contrast, however, the overall impression is different from the real-world scenes. Figs.5 (a) and (d) (regions in blue circles) show the difference between using an adaptive filter and a non-adaptive filter (the variance of Gaussian filter is a fixed value). We can find clearly that the image treated with the adaptive filter can preserve more visual details in both the dim and the bright regions, this is because adaptive processing can preserve high-contrast edges.
An Adaptive Tone Mapping Algorithm for High Dynamic Range Images
4
215
Conclusions
A new adaptive tone mapping algorithm is proposed for rendering the HDR images in this paper. The proposed method provides an adaptively changed rule of the Gaussian filter’s shape in tone mapping, this idea is based on the way by which the human eye perceives real-word scenes. The shape of a surround can be changed according to the high-contrast edges, which makes the proposed method not generate halo artifacts while increase the visibility of local details in the LDR images. Also, we tested our method on various HDR images, compared it with other approaches, and the results have demonstrated the effectiveness of the new method.
References 1. Debevec, P.E., Malik, J.: Recovering high dynamic range radiance maps from photograhs. In: SIGGRAPH 1997, pp. 369–378 (1997) 2. Seetzen, H., Whitehead, L.A., Ward, G.: A high dynamic range display using low and high resolution modulators. In: The Society for Information Display International Symposium (2003) 3. Reihhard, E., Stark, M., Shirley, P., Ferwerda, J.: Photographic Tone Reproduction for Digital Images. ACM Trans. on Graphics 21(3), 267–276 (2002) 4. Devlin, K.: A review of tone reproduction techniques. Computer Science, University of Bristol, Tech. Rep. CSTR-02-005 (2002) 5. Jobson, D.J., Rahman, Z., Woodell, G.A.: A multiscale retinex for bridging the gap between color images and the human obervation of scenes. IEEE Transaction on Image Processing 6(7), 965–976 (1997) 6. Tumblin, J., Rushmeier, H.: Tone reproduction for realistic image. IEEE Computer Graphics & Applications 13(6), 42–48 (1993) 7. Qiu, G., Duan, J.: An optimal tone reproduction curve operator for the display of high dynamic range Images. In: IEEE ISCAS 2005, No. 6, pp. 6276–6279 (2005) 8. Meylan, L., Susstrunk, S.: High dynamic range image rendering with a Retinexbased adaptive filter. IEEE Transaction on Image Processing 15(9), 1820–1830 (2006)
Material Classification for Printed Circuit Boards by Spectral Imaging System Abdelhameed Ibrahim, Shoji Tominaga, and Takahiko Horiuchi Department of Information Science, Graduate School of Advanced Integration Science, Chiba University, Japan
[email protected], {shoji,horiuchi}@faculty.chiba-u.jp
Abstract. This paper presents an approach to a reliable material classification for printed circuit boards (PCBs) by constructing a spectral imaging system. The system works in the whole spectral range [400-700nm] and the high spectral resolution. An algorithm is presented for effectively classifying the surface material on each pixel point into several elements such as substrate, metal, resist, footprint, and paint, based on the surface-spectral reflectance estimated from the spectral imaging data. The proposed approach is an incorporation of spectral reflectance estimation, spectral feature extraction, and image segmentation processes for material classification of raw PCBs. The performance of the proposed method is compared with other methods using the RGB-reflectance based algorithm, the k-means algorithm and the normalized cut algorithm. The experimental results show the superiority of our method in accuracy and computational cost. Keywords: Spectral imaging system, material classification, printed circuit board, spectral reflectance, region segmentation, k-means, normalized cut.
1 Introduction Material classification is one of the important problems in computer vision, which is depending on surface-spectral reflectance of observed materials. The surface-spectral reflectance of objects is inherent to the material composition. Therefore this inherent physical property can be helpful in recognizing objects and segment regions in the illumination invariant way. With computer hardware and camera advances, new computer vision algorithms should be developed and applied in industry. A PCB in a variety of industries is one of the most complicated objects to understand from the observed image. The surface layer of a raw PCB is composed of various elements, which are a mixture of different materials and the area of each element is very small. These features make the machine inspection difficult. There are numerous algorithms, approaches, and techniques in the area of PCB inspection nowadays [1-5]. Most of them are based on binary or gray-scale images subtraction to classify board defects. Chang et al. [1] developed a case-based reasoning evolutionary model to classify defects of PCB images based on binary image difference. An eigenvalue-based similarity measure between two gray-level images is A. Trémeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 216–225, 2009. © Springer-Verlag Berlin Heidelberg 2009
Material Classification for Printed Circuit Boards by Spectral Imaging System
217
proposed in [2] with application of assembled PCB defect inspection. Ibrahim et al. [3] applied image difference operation in the wavelet-domain image in order to minimize computational time for PCB inspection. A contour-based window extraction approach for bar PCB inspection from gray-scale images is proposed in [4]. Leta et al. [5] presents a new algorithm to solve PCB inspection problem based on gray-level images subtraction technique. Since image understanding is the first and foremost step in the inspection of PCBs, an improved image capturing system supports detection of defects. In our previous works, a material classification algorithm was proposed based on surface-spectral reflectance [6], [7]. However, due to limitations of spectral imaging, the method was too simple to obtain enough accuracy in segmentation results of PCB images. The present paper presents a non-contact measurement approach to performing a reliable material classification for PCBs that can be used in inspection system by constructing an improved spectral imaging system. The proposed approach is an incorporation of spectral reflectance estimation, spectral feature extraction, and image segmentation processes for material classification of raw PCBs. The performance of our spectral image segmentation algorithm is compared with typical segmentation algorithms. First, RGB-based image segmentation and the previous method [6] are compared with our results to show importance of the modified spectral imaging system. Then, we compare the segmentation results with the RGBbased k-means [8] and the RGB-based normalized cut algorithms [9]. Experimental results from a number of raw PCBs have shown the effectiveness of the developed method for classification of complicated images.
2 Spectral Imaging System Figure 1 shows the newly constructed spectral imaging system for raw PCBs. The camera system consists of a monochromatic CCD camera (Retiga 1300) with 12-bit dynamic range and Peltier cooling, a macro lens of C-mount connected directly to the camera, VariSpec™ Liquid Crystal Tunable Filter (LCTF), and a personal computer.
Metal Substrate
Resist
Print Holes
Fig. 1. Imaging system
Fig. 2. Partial image of a raw circuit board
We used multiple light sources of incandescent lamps for effective surface illumination. The LCTF has the spectral properties of bandwidth 10nm and wavelength
218
A. Ibrahim, S. Tominaga, and T. Horiuchi
range [400-720nm]. The image resolution is 1280x1024 pixels for the area of 35mm x 30mm. The previous system in [6] had the limited spectral resolution and range of 40nm and [450-650nm]. Moreover the image resolution and sensitivity are much improved. The viewing direction of the camera is always perpendicular to the board surface as shown in Fig. 1. Figure 2 shows the observed image of a small part on a raw circuit board. The main elements on the circuit board surface are four materials (metal, resist-coated metal, silk-screen print, and substrate) and metal holes. Figure 3 shows the measuring geometry with multiple light sources. The observed surface reflectance depend not only on the material composition, but also on the surface geometry and roughness. In order to avoid large fluctuation of pixel values between highlight area and matte area, we control the illumination direction of a light source. In our system, we use three incandescent light sources of 300W. Two light sources illuminate the same surface alternatively from one of two directions (from left or right) that are mirrored about the viewing direction. The third light source works as back illumination for detecting holes. We investigated a proper illumination angle for observing PCB materials. We found that the minimum illumination angle is 20° this is because of the camera shadow on the board. Then the incidence angel 25° was chosen in our imaging system. Decreasing the incident angle to less than 25° makes strong specular highlight on the board especially on metal parts, and increasing this angle to more than 25° makes metal parts more noisy and difficult to classify.
Light Source 1 Light Source 3 25° 25° Camera System
PCB Light Source 2
Fig. 3. Measuring geometry with multiple light sources
3 Reflectance Estimation Based on Material Features 3.1 Material Properties The PCBs materials reflection properties depend on the measuring geometries. We can divide the PCB materials into two categories of metal parts and non-metal (dielectric) parts on the basis of reflection. In the case of metal, incident light is specularly reflected. Sharp edges of metal flakes and holes produce specular highlights and shadowing effects on the other side. Moreover, for metal surface and footprint material edges at some angles of viewing and lighting strong specular highlights appear. Thus specular reflection and shadowing effects can be controlled by changing direction of light. For dielectric parts, materials surface are smooth and strong specular highlights cannot occur at some illumination directions. According to the dichromatic
Material Classification for Printed Circuit Boards by Spectral Imaging System
219
reflection model [10], the diffuse spectral reflectance of these materials is constant. Thus, changing the illumination angle will not have a great effect on the spectral reflectance estimation for such type of materials. The elements of substrate, print, footprint, and resist are classified into this type. 3.2 Spectral Reflectance Data We use a straightforward way of obtaining a reliable estimation of the reflectance function from the camera outputs of narrow band filtration. Let the wavelength bands of the filter be 31 bands of λ1 , λ2 ,..., λ31 corresponding to 400, 410, …, 700nm. Let
S (λk ; x, y ) be the surface-spectral reflectance at wavelength λk (k = 1, 2,...,31) at location of ( x, y ) , which can be recovered by eliminating the illumination effect from the sensor outputs as follows: S (λk ; x, y ) =
ρ k ( x, y )
∫
700
400
E (λ )Rk (λ )d λ
,
(1)
where E (λ ) is the illuminant spectral power distribution of the light source, and Rk (λ ) is the k-th sensor spectral sensitivity function. This process is repeated for the spectral images from both lightning directions of left and right. 3.3 Unified Spectral Reflectance
The light sources illuminate the same surface alternatively from one of two directions. We combine the spectral reflectance data to produce only one spectral reflectance image from two captured images. Because the shape of spectral reflectance characterizes the material properties of each pixel point on the PCB, some features of the spectral curves are used in the combination operation. The proposed combination process is composed of the following steps, 1.
2.
Let S k = S (λk , x , y ) be the average of the observed reflectance at a particular wavelength k over the entire image region. Then we calculate the average spec-
tral reflectance ( S 1 , S 2 , ..., S 31 ) from both images. If both pixel values from left and right images are high and the both reflectances achieve the condition S (λk ; x, y ) > S k (k=1, 2, …, 31), this pixel is classified
3.
4. 5.
into the silk-screen print area, and the higher reflectance is chosen. If one pixel value from both images is very high and the other is extremely low, this pixel includes specular highlight of metal. The higher reflectance is chosen for a metal surface. Then we can neglect shadow area on the board. If both pixel values do not have big difference in reflectance, this pixel is classified into dielectric. The average reflectance is calculated. For the remaining pixels except for the material areas extracted in the above, the higher reflectance from both sides is chosen.
220
A. Ibrahim, S. Tominaga, and T. Horiuchi
4 Material Classification and Image Segmentation A material classification algorithm is proposed based on the spectral features among the spectral reflectances. The image segmentation process is divided into two sub-processes of pixel-based classification and region growing. 4.1 Pixel-Based Classification Algorithm
The following algorithm is applied to each pixel independently. Adjacent pixels with close reflectance are gathered in the same region and used as initial segments for the post processing level. 1.
The average spectral reflectance ( S 1 , S 2 , ..., S 31 ) is calculated for the whole image.
2.
Pixels with high reflectance values and satisfy S (λk ; x, y ) > S k (k=1, 2, …, 31)
3.
for the whole visible range [400-700nm] are classified into silk-screen print. The peak wavelength of each spectral curve is detected for the remaining pixels except for screen-print. If the spectral peak exists in the range [600–700nm] and the spectral reflectance values satisfy conditions of S (λk ; x, y ) > S k (k=21, 22, …, 31) and S (λk ; x, y ) < S k (k=1, 2,…,11), the pixel is classified into metal.
4.
If the peak wavelength in the remaining pixels is in the range [510–590nm] and the relevant spectral reflectance satisfies the conditions S (λk ; x, y ) > S k (k=12, 13, …, 20) and S (λk ; x, y ) < S k (k=21, 22, …, 31), then the pixel is classified
5. 6.
into resist-coated metal. The other pixels satisfying the condition S (λk ; x, y ) < S k (k=1, 2, …, 31) are classified into substrate. Finally, the through holes are determined independently from the observed image by using back-illumination. The back-illuminated image is binarized using a threshold, in which the brighter parts correspond to the holes.
4.2 Region Growing Algorithm
The above algorithm partitions the spectral image of a PCB into different material regions. However, there are pixels remaining without any labels in the above. In addition, isolated regions with a small number of pixels can be considered as noisy pixels. Hence, an algorithm of merging those undetermined pixels into the neighboring regions is needed. The initial segments are provided by the above pixel-based algorithm. Let us consider the following condition of region homogeneity R j H ( R j ) = True, j = 1, 2,..., N ,
(2)
where N is the number of initial segments. The merging process depends on calculating distances between segments S and S'. We define the distance as a spectral difference in the 31-dimensional vector, which is calculated using Euclidian distance 2 D = ∑ iK=1 ( Si - Si ') ,
(3)
Material Classification for Printed Circuit Boards by Spectral Imaging System
221
where K is the number of wavelengths. The region growing algorithm is started using a region of 3x3 pixels with overlapped widows. The minimum distance is checked between the current pixel and surrounding pixels to update the segments. This process continues until the merging of all adjacent regions stop. Finally, a smoothing operation is executed to get the final image segmentation result.
5 Experiments 5.1 Performance of the Proposed Method
The scene of the raw circuit board shown in Fig. 2 was captured with the present spectral imaging system under incandescent lamps. The image size was 1280x1024 pixels. Two data sets of surface-spectral reflectances were estimated from the two spectral images at two different light sources. We combined these reflectance images into one reflectance image by comparing the corresponding reflectances at the same pixel point and applying the above rules to all pixels. Then, the proposed classification algorithm was executed for the spectral reflectance image. The typical spectral reflectances obtained for the PCB in Fig. 2 is shown in Fig. 4(a). Figure 4(b) shows the classification results of the developed method.
(a)
(b)
Fig. 4. (a) Typical curves of surface-spectral reflectance for print, metal, metal, and substrate of the PCB shown in Fig. 2. (b) Material classification results for a part of the raw PCB.
In the figure, the regions classified are painted in different colors, such as white for silk-screen, yellow for metal, green for resist-coated metal, black for substrate and grey for hole. It should be note that the observed PCB image is clearly classified into four material regions and through-holes. 5.2 Comparison with RGB Reflectance-Based Method
In order to examine the effectiveness of surface reflectance in material classification, the spectral camera system was replaced with a digital still camera. We used a Canon camera, EOS-1Ds MarkII to capture color images of the same PCB under the same illumination environment. A Kenko extension ring was inserted between the camera
222
A. Ibrahim, S. Tominaga, and T. Horiuchi
body and the lens to get the required focus from enough distance. The RGB images with the same size 1280x1024 as the spectral images were obtained. The normalized color values were calculated as spectral reflectance from Eq. (1) for only R, G, and B channels by eliminating illumination effect. Figure 5 shows the captured RGB image. Figure 6 shows the typical color reflectances obtained for different PCB materials. The classification process based on the RGB reflectances developed as follows: 1.
Average color reflectances R, G , B over the whole image are calculated from red, green and blue values.
2.
High reflectance pixels satisfying three conditions R ( x, y ) > R, G ( x, y ) > G ,
3.
B ( x, y ) > B are classified into silk-screen print. If the remaining pixels except for the screen print satisfy the condi-
4. 5. 6.
tions R ( x, y ) > R and R ( x, y ) > G ( x, y ) > B ( x, y ) , then the pixels are classified into metal. If the remaining pixels satisfy R ( x, y ) < B ( x, y ), B ( x, y ) < G ( x, y ) , then the pixels are classified into resist metal. The other pixels are classified into substrate. Finally, the through holes can be determined by using back-illumination.
Fig. 5. Captured color image
Fig. 6. Materials typical RGB reflectances
Figure 7(d) presents the RGB-based segmentation results using the above classification algorithm without holes detection to easily compare segmentation results. Comparing with Fig. 4(b), we can confirm the accuracy of the proposed reflectancebased classification algorithm. This is clear from the shape of materials, especially metal flakes and metal holes, where the RGB-based algorithm has a lot of missclassified pixels and some other pixels has wrong classification, especially in metal parts with specular highlight area. 5.3 Segmentation Comparison with K-Means and Normalized Cut Algorithms
For comparison with a traditional clustering algorithm and a popular graph theoretic algorithm, we choose the k-means [8] and the normalized cut [9] algorithms. Those algorithms require expensive computational cost and memory requirements for large size images. Moreover, the high dimension of the spectral images makes it difficult
Material Classification for Printed Circuit Boards by Spectral Imaging System
(a) Ground truth
(c) Previous method [6]
(e) RGB-based K-means
223
(b) Proposed spectral-based method
(d) RGB reflectance-based
(f) RGB-based N-cut
Fig. 7. Segmentation results by the different methods, compared with the ground truth
to apply such algorithms to the present problem. Therefore, we apply the k-means algorithm to the RGB reflectance image and the normalized cut algorithm to the resized RGB reflectance image to check the performance of our classification method. The final segmentation results for all algorithms are summarized in Fig. 7 without holes detection to easily present the performance of each algorithm in PCB segmentation. Fig. 7(a) shows the ground truth of segmentation. The ground truth is manually generated as a desired segmentation. Fig. 7(b) shows the image segmentation results by the proposed method. Figs. 7 (c)-(f) show the segmentation results by the previous method proposed in [6], RGB reflectance-based method, the k-means clustering, and the normalized cut algorithm, respectively. We changed the initial seed points for kmeans many times but we got nearly same result. Table 1 lists the accuracy and CPU time of the compared algorithms. The methods are run on CPU Intel Xeon E5405 2GHz with 3G memory. The proposed, previous and RGB methods used C language on FreeBSD software. K-means and N-cut used Matlab on the same system.
224
A. Ibrahim, S. Tominaga, and T. Horiuchi Table 1. Comparison of the accuracy and CPU time for the compared methods Method
Quality rate CPU time (s)
Proposed method
Previous method [6]
RGB-based
98.72% 8.71
96.45% 8.22
94.01% 6.64
RGB-based K-means 77.56% 3.86
RGB-based N-cut 74.37% 1321.57
To demonstrate the accuracy of our method, we apply the proposed algorithm on a more complicated four materials PCB. Figure 8 shows segmentation result of a different board with four materials.
(a) Four materials PCB spectral image
(b) Relevant segmentation result
Fig. 8. Segmentation results of a four material PCB
In case of five materials PCB with footprint elements, the proposed algorithm can easily be extended by calculating the average reflectance for the remaining pixels except print, resist, and metal after step 4 in section 4.1. Then check step 5 for substrate and the remaining pixels will be footprint. The classification of a five material PCB is presented in Fig 9. We can easily note that the developed method can be used
Footprint
(a) Five materials PCB spectral image
(b) Relevant segmentation result
Fig. 9. Segmentation results of a five material PCB
Material Classification for Printed Circuit Boards by Spectral Imaging System
225
for different PCBs with different number of materials. The classification results have shown a high accuracy with CPU time less than 9s for 1280x1024x31 spectral color PCB image.
6 Conclusion This paper has presented an approach to a reliable material classification for PCBs by constructing a spectral imaging system. The system worked in the whole spectral range of visible wavelength [400-700nm] and the high spectral resolution of narrow filtration. An algorithm was presented for effectively classifying the surface material on each pixel into several elements such as substrate, metal, resist, footprint, and paint, based on the surface-spectral reflectance information estimated from the imaging system. The proposed approach was an incorporation of spectral reflectance estimation, spectral feature extraction, and image segmentation processes for material classification of raw PCBs. The performance of the proposed method was compared with the other methods using the previous method, the RGB-reflectance based algorithm, the k-means algorithm and the normalized cut algorithm. The experimental results showed the goodness of the present method in classification accuracy and computational cost. The algorithm can be applied directly to the material classification problem in a variety of raw PCBs.
References 1. Chang, P.C., Chen, L.Y., Fan, C.Y.: A case-based evolutionary model for defect classification of printed circuit board images. J. Intell. Manuf. 19, 203–214 (2008) 2. Tsai, D.M., Yang, R.H.: An eigenvalue-based similarity measure and its application in defect detection: Image and Vision Computing 23(12), 1094–1101 (2005) 3. Ibrahim, Z., Al-Attas, S.A.R.: Wavelet-based printed circuit board inspection algorithm. Integrated Computer-Aided Engineering 12, 201–213 (2005) 4. Huang, S.Y., Mao, C.W., Cheng, K.S.: Contour-Based Window Extraction Algorithm for Bare Printed Circuit Board Inspection. IEICE Trans. 88-D, 2802–2810 (2005) 5. Leta, F.R., Feliciano, F.F., Martins, F.P.R.: Computer Vision System for Printed Circuit Board Inspection. In: ABCM Symp. Series in Mechatronics, vol. 3, pp. 623–632 (2008) 6. Tominaga, S.: Material Identification via Multi-Spectral Imaging and Its Application to Circuit Boards. In: 10th Color Imaging Conference, Color Science, Systems and Applications, Scottsdale, Arizona, pp. 217–222 (2002) 7. Tominaga, S., Okamoto, S.: Reflectance-Based Material Classification for Printed Circuit Boards. In: 12th Int. Conf. on Image Analysis and Processing, Italy, pp. 238–243 (2003) 8. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley and Sons, New York (2001) 9. Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000) 10. Tominaga, S.: Surface Identification using the Dichromatic Reflection Model. IEEE Trans. PAMI 13, 658–670 (1991)
Supervised Local Subspace Learning for Region Segmentation and Categorization in High-Resolution Satellite Images Yen-wei Chen1,2 and Xian-hua Han1,2 1
Elect & Inf. Eng. School, Central South Univ. of Forest and Tech., Changsha, China
[email protected] 2 Graduate School of Science and Engineering, Ritsumeikan University, Japan
Abstract. We proposed a new feature extraction method based on supervised locality preserving projections (SLPP) for region segmentation and categorization in high-resolution satellite images. Compared with other subspace methods such as PCA and ICA, SLPP can preserve local geometric structure of data and enhance within-class local information. The generalization of the proposed SLPP based method is discussed in this paper. Keywords: supervised locality preserving projections, region segmentation, categorization, high-resolution satellite images, subspace learning, independent component analysis, generalization.
1 Introduction Recently several high resolution satellites such as IKONOS, Quickbird have been launched and the high resolution images (1m) are available. Region segmentation and categorization in high-resolution satellite images are important issues for many applications, such as remote sensing (RS) and geographic information system (GIS) updating. The satellite image is a record of relative reflectance of particular wavelengths of electromagnetic radiation. A particular target reflection depends on the surface feature of the target and the wavelength of the incoming radiation. Multi-spectral information has been widely used for classification of remotely sensed images [1]. Since the spectra are combined by many factors such as object reflectance and instrumentation response, there are strong correlations among the spectra. Principal component analysis (PCA) has been proposed to reduce the redundancy among the spectra and find efficient representation for classifications or segmentations [2]. In our previous works, we proposed to apply independent component analysis (ICA) to learn the efficient spectral representation [3]. Since ICA features are higher-order uncorrelated while PCA features are second-order uncorrelated, higher classification performance has been achieved by ICA. Though ICA is a powerful method for finding efficient spectra A. Trémeau, R. Schettini, and S. Tominaga (Eds.): CCIW 2009, LNCS 5646, pp. 226–233, 2009. © Springer-Verlag Berlin Heidelberg 2009
Supervised Local Subspace Learning for Region Segmentation and Categorization
227
representation, it is an unsupervised approach and it lacks the local geometric structure of data. Locality preserving projections (LPP) was proposed to approximate the eigenfunctions of the Laplace Beltrami operator on the image manifold, and be applied for face recognition and image indexing [4]. In this paper, we propose a new approach based on supervised locality preserving projections (SLPP) for classification of highresolution satellite images. The scheme of the proposed method is shown in Fig.1. The observed multi-spectral images are first transformed by SLPP and then the transformed spectral components are used as features for classifications. A probabilistic neural network (PNN) [5] is used as a classifier. Compared with other subspace methods such as PCA and ICA, SLPP can not only find the manifold of images but also enhance the within-class local information. In our previous work, the proposed method has been successfully applied to IKONOS images and experimental results show that the proposed SLPP based method outperforms ICA-based method [6]. In this paper, we discuss the generalization of the proposed SLPP based method. We use only one image as training sample for SLPP subspace learning and classifier (PNN) training. The trained SLPP subspace and PNN are used for other test image segmentation and categorization.
Fig. 1. The proposed method based on SLPP
The paper is organized as following: the supervised LPP for feature extractions is presented in Sec.2, the probabilistic neural network for classifications is presented in Sec.3 and the experimental results are shown in Sec.4. Finally, the conclusion is given in Sec.5.
2 Supervised Locality Preserving Projections (SLPP) The problem of subspace learning for image for feature extraction is the following. Given a set of spectral feature vectors x1 , x 2 ,", x m in R n of images, the goal is to
find an efficient representation fi of xi such that f i − f j reflects the neighborhood relationship between fi and fj. In other word, if f i − f j is small, then xi and xj are belong to same class. Here, we assume that the images reside on a sub-manifold embedded in the ambient space R n .
228
Y.-w. Chen and X.-h. Han
LPP seeks a linear transformation P to project high-dimensional data into a lowdimensional sub-manifold that preserves the local Structure of the data. Let X = [x1 , x 2 , " , x m ] denote the feature matrix whose column vectors is the sample feature vectors in R n . The linear transformation P can be obtained by solving the following minimization problem: min ∑ (P T x i − P T x j ) 2 Bij P
(1)
ij
where Bij evaluate the local structure of the image space. In this paper, we use normalized correlation coefficient of two sample as the penalty weight if the two sample belong to the same class:
⎧ x Ti x j ⎪ n 2 n 2 Bij = ⎨ ∑ x ∑ x il l =1 jl ⎪ l 0=1 ⎩
if sample i and j are in same class
(2)
otherwise
By simple algebra formulation, the objective function cam be reduced to:
1 ∑ (P T x i − P T x j ) 2 Bij 2 ij = ∑ P T x i Dii P T x i − ∑ P T x i Bij P T x j i
(3)
ij
= P T X(D − B) X T P = P T XLX T P D is a diagonal matrix; its entries are column (or row, since B is symmetric) sum of B, Dii = ∑ j Bij . L=D-S is the Laplacian matrix. Then, the linear transformation P can be obtained by minimizing the objective function under constraint:
P = arg min P T X(D − B) X T P
(4)
P T XDXT P =1
Finally, the minimization problem can be converted to solving a generalized eigenvalue problem as follows: XLX T P = λXDX T P
(5)
3 Probabilistic Neural Network (PNN) The PNN model is based on Parzen’s results on probability density function (PDF) estimators [5]. PNN is a three-layer feedforward network consisting of input layer, a pattern layer, and a summation or output layer as shown in Fig.2. We wish to form a Parzen estimate based on K patterns each of which is n-dimensional, randomly sampled from c classes. The PNN for this case consists of n input units comprising the input layer, where each unit is connected to one and only one f the c category units. The connection from the input to pattern units represents modifiable weights, which
Supervised Local Subspace Learning for Region Segmentation and Categorization
229
will be trained. Each category unit computes the sum of the pattern units connected to it. A radial basis function and a Gaussian activation are used for the pattern nodes.
Fig. 2. PNN architecture
The PNN is trained in the following way. First, each pattern (sample feature) f of the training set is normalized to have unit length. The first normalized training pattern is placed on the input units. The modifiable weights linking the input units and the first pattern unit are set such that w1=f1. Then, a single connection from the first pattern unit is mage to the category unit corresponding to the known class of that pattern. The process is repeated with each of the remaining training patterns, setting the weights to the successive pattern units such that wk=fk for k = 1,2, " , K . After such training we have a network which is fully connected between input and pattern units, and sparsely connected from pattern to category units. The trained network is then used for segmentation and categorization in the following way. A normalized test pattern f is placed at the input units. Each pattern unit computes the inner product to yield the net activation y,
y k = w Tk ⋅ f
(6)
and emits a nonlinear function of yk; each output unit sums the contributions from all pattern units connected to it. The activation function used is exp( x − w k / δ 2 ). Assuming that both x and wk are normalized to unit length, this is equivalent to using exp( x − 1 / δ 2 ).
4 Experimental Results The proposed method has been apply to classification of IKONOS images. IKONOS simultaneously collects one-meter resolution black-and-white (panchromatic) images and four-meter resolution color (multi-spectral) images. The multi-spectral images
230
Y.-w. Chen and X.-h. Han
consist of four bands in the blue (B), green (G), red (R) and near-infrared wavelength regions. And the multi-spectral images can be merged with panchromatic images of the same locations to produce "pan-sharpened color" images of 1-m resolution. In our experiments, we use only RGB spectral images for region segmentation and categorization. Two typical IKONOS color images as shown in Fig.4(a) and Fig.5(a) are used in our experiments. One shown in Fig.4(a) is used as sample image for learning and another one shown in Fig.5(a) is used as test image for testing. In our experiments, we define 5 categories: sea, forest, ground, road and others. We first randomly selected 100 points from each category. In order to keep some texture information, we use a sub-block of 3 × 3 surround the selected point and the RGB values the sub-blocks are used as spectral feature vector x with a dimension of 27. x i (i = 1,2, " ,5 × 100) are used to learn the SLPP subspace for feature extraction and train the probabilistic neural network for region segmentation and categorization. The learning and training process is shown in Fig.3. It is a two-step learning process. We first use x to learn the SLPP subspace P. Then the projection of f = P T x is used as inputs of PNN for training of PNN, which are also used as features for region segmentation and categorization.
Fig. 3. Two-step learning process
Once the SLPP subspace and PNN are trained, they are used for feature extraction and region segmentation, respectively. The segmentation and categorization process is just as shown in Fig.1. The feature vector x ( 27 × 1 ) of each pixel is first projected to the SLPP subspace and the projection is input to trained PNN. The output of PNN is the index of category. Thus the satellite image can be segmented into 5 regions and each region is categorized. The region segmentation and categorization results for sample image (Fig.4(a)) are shown in Fig.4(b)-4(f). The results for test image (Fig.5(a)) are shown in Fig.5(b)-5(f). It can be seen that we can get a satisfy segmentation result for the sample image (Fig.4), while test image (Fig.5) the result is not very satisfied. For example, a part of sea was categorized into the forest region as shown in Fig.5(c). Since only one image is used as training sample in our experiments, the generalization of the PNN is very limited. The segmentation and categorization accuracy will be improved by increasing the number of sample images.
Supervised Local Subspace Learning for Region Segmentation and Categorization
231
Fig. 4. Region segmentation and categorization results (sample image) (IKONOS image: Copyright (C) 2003 Japan Space Imaging Corporation)
232
Y.-w. Chen and X.-h. Han
Fig. 5. Region segmentation and categorization results (test image) (IKONOS image: Copyright (C) 2003 Japan Space Imaging Corporation)
Supervised Local Subspace Learning for Region Segmentation and Categorization
233
5 Conclusions We proposed a new approach based on supervised locality preserving projections (SLPP) for region segmentation and categorization in high-resolution satellite images. The observed multi-spectral images are first transformed by SLPP and then the transformed spectral components are used as features for classifications. A probabilistic neural network (PNN) is used as a classifier. In this paper, we use only one image as training sample for SLPP subspace learning and classifier (PNN) training. We have shown that it is possible to segment and category other satellite images by using the trained SLPP subspace and PNN.
Acknowledgments This work was supported in part by the Strategic Information and Communications R&D Promotion Program (SCOPE) under the Grand No. 072311002.
References 1. Avery, T.E., Berlin, G.L.: Fundamentals of Remote Sensing and Airphoto Interpretation. Macmillan Publishing Co., New York (1992) 2. Murai, H., Omatsu, S., OE, S.: Principal Component Analysis for Remotely Sensed Data Classified by Kohonen’s Feature Mapping Preprocessor and Multi-Layered Neural Network Classifier. IEICE Trans.Commun. E78-B(12), 1604–1610 (1995) 3. Zeng, X.-Y., Chen, Y.-W., Nakao, Z.: Classification of remotely sensed images using independent component analysis and spatial consistency. Journal of Advanced Computational Intelligence and Intelligent Informatics 8, 216–222 (2004) 4. He, X., Niyogi, P.: Locality Preserving Projections. In: Advances in Neural Information Processing Systems, Vancouver, Canada, vol. 16 (2003) 5. Specht, D.F.: Enhancements to Probabilistic Neural Networks. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN 1992), vol. 1, pp. 761–768 (1992) 6. Chen, Y.-W., Han, X.-H.: Classification of High-Resolution Satellite Images Using Supervised Locality Preserving Projections. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part II. LNCS (LNAI), vol. 5178, pp. 149–156. Springer, Heidelberg (2008)
Author Index
Alleysson, David 12 Angulo, Jes´ us 91
Le Callet, Patrick 12 Lecca, Michela 41 Lenz, Reiner 140
Bakke, Arne Magnus 160 Balinsky, Alexander 101 Battiato, Sebastiano 62, 130 Benoit, Alexandre 12 Bianco, Simone 31 Bochko, Vladimir 120 Bosco, Angelo 130 Bruna, Arcangelo 130 Cabestaing, Fran¸cois 170 Chao, Jinhui 140 Chen, Yen-Wei 71, 226 Ciocca, Gianluigi 31 Cusano, Claudio 31 Doi, Motonori
188
Goda, Naokazu 1, 23 Guarnera, Mirko 180 Halawana, Hachem 170 Han, Xian-hua 226 Hardeberg, Jon Yngve 81 Herault, Jeanny 12 Hikima, Rie 188 Horiuchi, Takahiko 216 Huang, Xinyin 71 Ibrahim, Abdelhameed
216
Kamata, Sei-ichro 207 Kek¨ al¨ ainen, Jukka 198 Koida, Kowa 23 Komatsu, Hidehiko 1, 23
Macaire, Ludovic 170 Martinkauppi, J. Birgitta Meccio, Tony 180 Messelodi, Stefano 41 Messina, Giuseppe 180 Mochizuki, Rika 140 Mohammad, Nassir 101 Mourad, Safer 150
198
Ohshima, Satoshi 140 Ohtsuki, Rie 188 Parkkinen, Jussi 120, 198 Pedersen, Marius 81 Pipirigeanu, Adrian 120 Provenzi, Edoardo 109 Rav`ı, Daniele 180 Rizzo, Rosetta 130 Rundo, Francesco 62 Scheller Lichtenauer, Matthias Shatilova, Yevgeniya 198 Simon, Klaus 150 Sobue, Shota 71 Stanco, Filippo 62 Tajima, Johji 51 Tanno, Osamu 188 Thomas, Jean-Baptiste 160 Tominaga, Shoji 188, 216 Zhang, Jian 207 Zolliker, Peter 150
150