Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen
2732
3
Berlin Heidelberg New York Hong Kong London Milan Paris Tokyo
Chris Taylor J. Alison Noble (Eds.)
Information Processing in Medical Imaging 18th International Conference, IPMI 2003 Ambleside, UK, July 20- 25, 2003 Proceedings
13
Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editors Chris J. Taylor University of Manchester Imaging Science and Biomedical Engineering Stopford Building, Oxford Road Manchester, UK, M13 9PT E-mail:
[email protected] J. Alison Noble University of Oxford Department of Engineering Science Parks Rd, Oxford, OX1 3PJ, UK E-mail:
[email protected] Cataloging-in-Publication Data applied for A catalog record for this book is available from the Library of Congress. Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at
.
CR Subject Classification (1998): I.4, I.5, I.2.5-6, J.1, I.3 ISSN 0302-9743 ISBN 3-540-40560-7 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH http://www.springer.de © Springer-Verlag Berlin Heidelberg 2003 Printed in Germany Typesetting: Camera-ready by author, data conversion by PTP Berlin GmbH Printed on acid-free paper SPIN: 10929094 06/3142 543210
Preface
IPMI occupies an important position in the scientific calendar. Every two years, it brings together leading researchers in medical image formation, analysis and interpretation, for an international workshop that allows extensive, in-depth discussion of new ideas. Many of the most influential developments in the field were first presented at IPMI, and the series has done much to foster a rigorous scientific approach to information processing in medical imaging. IPMI 2003 was held over 5 days in July 2003 at St. Martin’s College, Ambleside, in the heart of the English Lake District. Full papers were invited on any aspect of information processing in medical imaging, with particular encouragement for submissions exploring generic mathematical or computational principles. Recognizing the rapidly evolving nature of the field, we encouraged a broad interpretation of medical imaging: from macroscopic to molecular imaging; from applications in patient care to those in biomedical research. We received 123 submissions by the deadline in February 2003. Each paper was reviewed by four members of the Scientific Committee, placing particular emphasis on originality, scientific rigor, and biomedical relevance. Papers were selected for the meeting by a Paper Selection Committee, based on reviewers’ rankings and their detailed comments. A total of 28 papers were accepted as oral presentations and 29 as posters. Unfortunately, the standard was so high that we had to turn down many excellent papers. The programme that emerged continues themes that have dominated recent IPMIs – image registration, model-based segmentation, and shape analysis – but also displays encouraging diversity, with important papers on performance assessment, fMRI and MEG analysis, cardiac motion analysis, and diffusion tensor imaging. The meeting was attended by 115 active researchers, with numbers strictly limited so as to promote the in-depth discussion that is a hallmark of IPMI. Oral presentations were allocated sufficient time for detailed exposition and each paper was followed by lengthy discussion. It is a tradition of IPMI that no timelimit is placed on discussion; this presents some ‘interesting’ challenges for the organizers, but makes for a truly stimulating and rewarding meeting. Another IPMI tradition is to encourage the participation of the best young researchers, allowing them to explore new ideas with some of the leading researchers in the field. IPMI 2003 was no exception, with just over half the participants attending their first IPMI. Of these, 18 were eligible for the prestigious Erbsmann prize, awarded to a young researcher making their first IPMI presentation. At the time of writing the IPMI 2003 Erbsmann prizewinner has not been decided but, whatever the outcome, it is clear from the field of candidates that the high standards of previous recipients will be maintained. IPMI is hard work for the participants, but there is also a tradition of encouraging informal interaction – an important factor in developing the ‘IPMI
VI
Preface
community.’ This year an afternoon was spent walking the lakeland fells with their breathtaking views, and cruising at sunset on Lake Windermere. The IPMI choir gave its usual performance, and the traditional US vs Rest of the World soccer match took place – as usual, the result was known to the referee in advance, but not announced until after the match! To those who participated in the meeting we hope that these proceedings will form a useful reminder of an enjoyable and stimulating event. To those who were not able to attend, we hope that you will find this snapshot of some of the best research in information processing in medical imaging a useful reference, and an encouragement to participate in the next IPMI, which will be held in the US in 2005 (see www.ipmi-conference.com for information). May 2003
Chris Taylor Alison Noble
Acknowledgements IPMI 2003 would not have been possible without the support of many dedicated individuals and generous organizations. First, the editors wish to thank all those who submitted papers to the conference – new ideas are the lifeblood of any scientific meeting and the large number of high-quality submissions meant that we had no problem in maintaining the traditionally high standards of IPMI. Our only regret is the number of excellent papers we had to reject. Particular thanks go to the members of the Scientific Committee – despite a short timescale and a typical load of 16 full manuscripts each, they provided consistent, in-depth reviews, allowing us to identify the best papers, and they provided useful feedback to authors to help them improve their manuscripts. We are also grateful to the members of the Paper Selection Committee who shared with us the difficult task of assimilating the referees’ comments and choosing the papers to include in the conference. We gratefully acknowledge the support of our colleagues and institutions in making it possible for us to organize the meeting and prepare the proceedings. Particular thanks go to Mike Rogers for the Web-based conference administration system, Gareth Jones for considerable help in preparing the proceedings, and Angela Castledine, Pam Griffiths and Christine Cummings for general administrative and creative support. Finally, we are grateful to the following organizations for their generous financial support, without which it would have been difficult to make the meeting accessible to the young researchers who are the future of IPMI: Philips Medical Systems Image Metrics iMorphics Mirada Solutions
VIII
Francois Erbsmann Prizewinners
Francois Erbsmann Prizewinners 1987 (Utrecht, The Netherland): John M. Gauch, Dept. of Computer Science, University of North Carolina, Chapel Hill, USA. JM Gauch, WR Oliver, SM Pizer: Multiresolution shape descriptions and their applications in medical imaging. 1989 (Berkeley, CA, USA): Arthur F. Gmitro, Dept. of Radiology, University of Arizona, USA. AF Gmitro, V Tresp, V Chen, Y Snell, GR Gindi: Video-rate reconstruction of CT and MR images. 1991 (Wye, Kent, UK): H. Isil Bozma, Dept. of Electical Engineering, Yale University, USA. HI Bozma, JS Duncan: Model-based recognition of multiple deformable objects using a game-theoretic framework. 1993 (Flagstaff, AZ, USA): Jeffrey A. Fessler, Division of Nuclear Medicine, University of Michigan, USA. JA Fessler: Tomographic reconstruction using information-weighted spline smoothing. 1995 (Brest, France): Maurits K. Konings, Dept. of Radiology and Nuclear Medicine, University Hospital, Utrecht, The Netherlands. MK Konings, WPTM Mali, MA Viergever: Design of a robust strategy to measure intravascular electrical impedance. 1997 (Poultney, VT, USA): David Atkinson, Radiological Sciences, Guy’s Hospital, London, UK. D Atkinson, DLG Hill, PNR Stoyle, PE Summers, SF Keevil: An autofocus algorithm for the automatic correction of motion artifacts in MR images. 1999 (Visegrad, Hungary): Liana M. Lorigo, Massachusetts Institute of Technology, Cambridge, MA, USA. LM Lorigo, O Faugeras, WEL Grimson, R Keriven, R Kikinis, C-F Westin: Co-dimension 2 geodesic active contours for MRA segmentation. 2001 (Davis, CA, USA): Viktor K. Jirsa, Florida Atlantic University, FL, USA. VK Jirsa, KJ Jantzen, A Fuchs, JA Scott Kelso: Neural field dynamics on the folded three-dimensional cortical sheet and its forward EEG and MEG.
Conference Committee
IX
Conference Committee Chairs J. Alison Noble Chris Taylor
University of Oxford, UK University of Manchester, UK
Paper Selection Committee Alan Colchester University of Kent, UK David Hawkes Guy’s Hospital, London, UK Andrew Todd-Pokropek University College London, UK
Scientific Committee Stephen Aylward Christian Barillot Harrison Barrett Yves Bizais Djamal Boukerroui Aaron Brill Elizabeth Bullitt Gary Christensen Ela Claridge Timothy Cootes Christos Davatzikos James Duncan Jeffrey Fessler James Gee Guido Gerig Polina Golland Michael Goris Derek Hill Michael Insana Nico Karssemeijer Frithjof Kruggel Attila Kuba Richard Leahy Gabriele Lohmann Gregoire Malandain Wiro Niessen Stephen Pizer Jerry Prince
University of North Carolina, USA IRISA/INRIA, France University of Arizona, USA Universit´e de Bretagne Occidentale, France Universit´e de Technologie de Compi`egne, France Vanderbilt University, USA University of North Carolina, USA University of Iowa, USA University of Birmingham, UK University of Manchester, UK University of Pennsylvania, USA Yale University, USA University of Michigan, USA University of Pennsylvania, USA University of North Carolina, USA Massachusetts Institute of Technology, USA Stanford University, USA Guy’s Hospital, London, UK University of California, Davis, USA University Medical Center, Nijmegen, The Netherlands Max-Planck-Institute of Cognitive Neuroscience, Germany University of Szeged, Hungary University of Southern California, USA Max-Planck-Institute of Cognitive Neuroscience, Germany INRIA Sophia-Antipolis, France University Medical Center, Utrecht, The Netherlands University of North Carolina, USA Johns Hopkins University, USA
X
Conference Committee
Daniel Rueckert Martin Samal Albert Sinusas Milan Sonka Gabor Szekely Baba Vemuri
Imperial College London, UK Charles University, Prague, Czech Republic Yale University School of Medicine, USA University of Iowa, USA Swiss Federal Institute of Technology, Switzerland University of Florida, USA
IPMI Board Yves Bizais Harrison Barrett Randy Brill Alan Colchester Stephen Bacharach Frank Deconinck Robert DiPaola James Duncan Michael Goris Michael Insana Attila Kuba Doug Ortendahl Stephen Pizer Andrew Todd-Pokropek Max Viergever
Table of Contents
Shape Modelling Shape Modelling Using Markov Random Field Restoration of Point Correspondences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rasmus R. Paulsen, Klaus B. Hilger Optimal Deformable Surface Models for 3D Medical Image Analysis . . . . P. Horkaew, G.Z. Yang Learning Object Correspondences with the Observed Transport Shape Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alain Pitiot, Herv´e Delingette, Arthur W. Toga, Paul M. Thompson Shape Discrimination in the Hippocampus Using an MDL Model . . . . . . . Rhodri H. Davies, Carole J. Twining, P. Daniel Allen, Tim F. Cootes, Christopher J. Taylor
1
13
25
38
Posters I: Shape Modelling and Analysis Minimum Description Length Shape and Appearance Models . . . . . . . . . . . Hans Henrik Thodberg
51
Evaluation of 3D Correspondence Methods for Model Building . . . . . . . . . Martin A. Styner, Kumar T. Rajamani, Lutz-Peter Nolte, Gabriel Zsemlye, G´ abor Sz´ekely, Christopher J. Taylor, Rhodri H. Davies
63
Localization of Anatomical Point Landmarks in 3D Medical Images by Fitting 3D Parametric Intensity Models . . . . . . . . . . . . . . . . . . . . . . . . . . Stefan W¨ orz, Karl Rohr Morphology-Based Cortical Thickness Estimation . . . . . . . . . . . . . . . . . . . . Gabriele Lohmann, Christoph Preul, Margret Hund-Georgiadis
76
89
The Shape Operator for Differential Analysis of Images . . . . . . . . . . . . . . . 101 Brian Avants, James Gee Feature Selection for Shape-Based Classification of Biological Objects . . . 114 Paul Yushkevich, Sarang Joshi, Stephen M. Pizer, John G. Csernansky, Lei E. Wang
XII
Table of Contents
Corresponding Articular Cartilage Thickness Measurements in the Knee Joint by Modelling the Underlying Bone (Commercial in Confidence) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Tomos G. Williams, Christopher J. Taylor, ZaiXiang Gao, John C. Waterton Adapting Active Shape Models for 3D Segmentation of Tubular Structures in Medical Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Marleen de Bruijne, Bram van Ginneken, Max A. Viergever, Wiro J. Niessen A Unified Variational Approach to Denoising and Bias Correction in MR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Ayres Fan, William M. Wells, John W. Fisher, M¨ ujdat C ¸ etin, Steven Haker, Robert Mulkern, Clare Tempany, Alan S. Willsky
Shape Analysis Object-Based Strategy for Morphometry of the Cerebral Cortex . . . . . . . . 160 J.-F. Mangin, D. Rivi`ere, A. Cachia, D. Papadopoulos-Orfanos, D.L. Collins, A.C. Evans, J. R´egis Genus Zero Surface Conformal Mapping and Its Application to Brain Surface Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Xianfeng Gu, Yalin Wang, Tony F. Chan, Paul M. Thompson, Shing-Tung Yau
Segmentation Coupled Multi-shape Model and Mutual Information for Medical Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 A. Tsai, William M. Wells, Clare Tempany, E. Grimson, Alan S. Willsky Neighbor-Constrained Segmentation with 3D Deformable Models . . . . . . . 198 Jing Yang, Lawrence H. Staib, James S. Duncan Expectation Maximization Strategies for Multi-atlas Multi-label Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 Torsten Rohlfing, Daniel B. Russakoff, Calvin R. Maurer Quantitative Analysis of Intrathoracic Airway Trees: Methods and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 K´ alm´ an Pal´ agyi, Juerg Tschirren, Milan Sonka
Table of Contents
XIII
Posters II: Segmentation, Colour, and Performance Multi-view Active Appearance Models: Application to X-Ray LV Angiography and Cardiac MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 ¨ umc¨ C.R. Oost, B.P.F. Lelieveldt, M. Uz¨ u, H. Lamb, J.H.C. Reiber, Milan Sonka Tunnelling Descent: A New Algorithm for Active Contour Segmentation of Ultrasound Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 Zhong Tao, C. Carl Jaffe, Hemant D. Tagare Improving Appearance Model Matching Using Local Image Structure . . . . 258 I.M. Scott, Tim F. Cootes, Christopher J. Taylor Knowledge-Driven Automated Extraction of the Human Cerebral Ventricular System from MR Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 Yan Xia, QingMao Hu, Aamer Aziz, Wieslaw L. Nowinski Volumetric Texture Description and Discriminant Feature Selection for MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 Constantino Carlos Reyes-Aldasoro, Abhir Bhalerao CAD Tool for Burn Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 Bego˜ na Acha, Carmen Serrano, Jos´e I. Acha, Laura M. Roa An Inverse Method for the Recovery of Tissue Parameters from Colour Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 Ela Claridge, Steve J. Preece Ideal Observer Model for Detection of Blood Perfusion and Flow Using Ultrasound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 Roger J. Zemp, Craig K. Abbey, Michael F. Insana Permutation Tests for Classification: Towards Statistical Significance in Image-Based Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 Polina Golland, Bruce Fischl
Performance Characterisation Ideal-Observer Performance under Signal and Background Uncertainty . . 342 S. Park, M.A. Kupinski, E. Clarkson, H.H. Barrett Theoretical Evaluation of the Detectability of Random Lesions in Bayesian Emission Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 Jinyi Qi
XIV
Table of Contents
Registration – Modelling Similarity A Unified Statistical and Information Theoretic Framework for Multi-modal Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 Lilla Z¨ ollei, John W. Fisher, William M. Wells Information Theoretic Similarity Measures in Non-rigid Registration . . . . . 378 William R. Crum, Derek L.G. Hill, David J. Hawkes A New & Robust Information Theoretic Measure and Its Application to Image Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388 F. Wang, B.C. Vemuri, M. Rao, Y. Chen Gray Scale Registration of Mammograms Using a Model of Image Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 Peter R. Snoeren, Nico Karssemeijer
Registration – Modelling Deformation Constructing Diffeomorphic Representations of Non-rigid Registrations of Medical Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Carole J. Twining, Stephen Marsland Topology Preservation and Regularity in Estimated Deformation Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 Bilge Kara¸calı, Christos Davatzikos Large Deformation Inverse Consistent Elastic Image Registration . . . . . . . 438 Jianchun He, Gary E. Christensen Gaussian Distributions on Lie Groups and Their Application to Statistical Shape Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 P. Thomas Fletcher, Sarang Joshi, Conglin Lu, Stephen M. Pizer
Posters III: Registration, Function, and Motion Non-rigid Image Registration Using a Statistical Spline Deformation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Dirk Loeckx, Frederik Maes, Dirk Vandermeulen, Paul Suetens A View-Based Approach to Registration: Theory and Application to Vascular Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 Charles V. Stewart, Chia-Ling Tsai, Amitha Perera Fusion of Autoradiographies with an MR Volume Using 2-D and 3-D Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 Gr´egoire Malandain, Eric Bardinet
Table of Contents
XV
Bayesian Multimodality Non-rigid Image Registration via Conditional Density Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 Jie Zhang, Anand Rangarajan Spatiotemporal Localization of Significant Activation in MEG Using Permutation Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 Dimitrios Pantazis, Thomas E. Nichols, Sylvain Baillet, Richard M. Leahy Symmetric BEM Formulation for the M/EEG Forward Problem . . . . . . . . 524 Geoffray Adde, Maureen Clerc, Olivier Faugeras, Renaud Keriven, Jan Kybic, Th´eodore Papadopoulo Localization Estimation Algorithm (LEA): A Supervised Prior-Based Approach for Solving the EEG/MEG Inverse Problem . . . . . 536 J´er´emie Mattout, M´elanie P´el´egrini-Issac, Anne Bellio, Jean Daunizeau, Habib Benali Multivariate Group Effect Analysis in Functional Magnetic Resonance Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548 Habib Benali, J´er´emie Mattout, M´elanie P´el´egrini-Issac Meshfree Representation and Computation: Applications to Cardiac Motion Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560 Huafeng Liu, Pengcheng Shi Visualization of Myocardial Motion Using MICSR Trinary Checkerboard Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 Moriel NessAiver, Jerry L. Prince Velocity Estimation in Ultrasound Images: A Block Matching Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586 Djamal Boukerroui, J. Alison Noble, Michael Brady
Cardiac Motion Construction of a Statistical Model for Cardiac Motion Analysis Using Nonrigid Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599 Raghavendra Chandrashekara, Anil Rao, Gerardo Ivar Sanchez-Ortiz, Raad H. Mohiaddin, Daniel Rueckert Fast Tracking of Cardiac Motion Using 3D-HARP . . . . . . . . . . . . . . . . . . . . . 611 Li Pan, Joao A.C. Lima, Nael F. Osman
fMRI Analysis Analysis of Event-Related fMRI Data Using Best Clustering Bases . . . . . . 623 Fran¸cois G. Meyer, Jatuporn Chinrungrueng
XVI
Table of Contents
Estimation of the Hemodynamic Response Function in Event-Related Functional MRI: Directed Acyclic Graphs for a General Bayesian Inference Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635 Guillaume Marrelec, Philippe Ciuciu, M´elanie P´el´egrini-Issac, Habib Benali Nonlinear Estimation and Modeling of fMRI Data Using Spatio-temporal Support Vector Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 647 Yongmei Michelle Wang, Robert T. Schultz, R. Todd Constable, Lawrence H. Staib
Diffusion Imaging and Tractography A Constrained Variational Principle for Direct Estimation and Smoothing of the Diffusion Tensor Field from DWI . . . . . . . . . . . . . . . . . . . . 660 Z. Wang, B.C. Vemuri, Y. Chen, T. Mareci Persistent Angular Structure: New Insights from Diffusion MRI Data . . . . 672 Kalvis M. Jansons, Daniel C. Alexander Probabilistic Monte Carlo Based Mapping of Cerebral Connections Utilising Whole-Brain Crossing Fibre Information . . . . . . . . . . . . . . . . . . . . . 684 Geoff J.M. Parker, Daniel C. Alexander
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
Shape Modelling Using Markov Random Field Restoration of Point Correspondences Rasmus R. Paulsen1,2 and Klaus B. Hilger2 1
Oticon Research Centre Eriksholm Kongevejen 243, DK-3070 Snekkersten, Denmark http://www.oticon.com/ 2 Informatics and Mathematical Modelling, Technical University of Denmark IMM, DTU, Richard Petersens Plads, Building 321, DK-2800 Kgs. Lyngby, Denmark {rrp,kbh}@imm.dtu.dk, http://www.imm.dtu.dk/
Abstract. A method for building statistical point distribution models is proposed. The novelty in this paper is the adaption of Markov random field regularization of the correspondence field over the set of shapes. The new approach leads to a generative model that produces highly homogeneous polygonized shapes and improves the capability of reconstruction of the training data. Furthermore, the method leads to an overall reduction in the total variance of the point distribution model. Thus, it finds correspondence between semi-landmarks that are highly correlated in the shape tangent space. The method is demonstrated on a set of human ear canals extracted from 3D-laser scans.
1
Introduction
Point distribution models (PDMs) are widely used in modeling biological shape variability over a set of annotated training data [1,2]. The generative models are highly dependent on the initial labeling of corresponding point sets which is typically a tedious task. Moreover, the labeling is often erroneous and sparse. A good representation of the training data is particularly hard to obtain in three dimensions. Finding a basis of homologous points is thus a fundamental issue that comes before generalized Procrustes alignment [3] and decomposition [4] in the shape tangent space. A method for building a statistical shape model of the human ear canal is presented in [5]. An extension to this method is proposed in this paper using Markov Random Field (MRF) regularization for improving the initial set of point correspondences. The new approach leads to a more compact representation and improves the generative model by better reconstruction capabilities of the 3D training data. Related work include the application of Geometry Constrained Diffusion (GCD) [6,7] and Brownian Warps [8] for non-rigid registration. A more compact model is obtained, since the shape tangent space residuals of the new representation have increased correlation. It thus indicates that a better correspondence field is obtained between the 3D semi-landmarks. Related work on obtaining a minimum description length of PDMs is proposed in [9,10] based on information theoretic criteria. C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 1–12, 2003. c Springer-Verlag Berlin Heidelberg 2003
2
R.R. Paulsen and K.B. Hilger
Fig. 1. Left: An example of a surface representation of an ear canal with the anatomical landmarks and the separating planes that defines the region of interest. The thin tubular structure in the top is the actual canal. The larger lower section is the concha, of which only the upper part is of interest. A cutoff plane through the concha is therefore defined. Right: The model mesh, shown by a wireframe, fitted to a target shape using Thin Plate Spline warping.
The data consists of 29 3D ear canal surfaces extracted from laser scans of ear impressions. The local surface geometry of the ear canals varies much from one individual to another. Therefore, only very few ridges and extremal points are stable when comparing groups of ear canals. A set of 18 anatomical landmarks of varying confidence are placed on each ear canal, and constitute a sparse correspondence between the surfaces of the ear canals in the training set. The surfaces of the ear canals are not closed due to the opening of the ear canal and because the ear impressions are terminated in front of the ear drum. It is therefore necessary to identify the region of interest of each ear canal. Hence, planes are defined, which separates the valid parts of the surface from the invalid parts. In Fig. 1, left, an ear canal with the anatomical landmarks and separating planes is shown. The remaining paper is organized in three additional sections. Section 2 describes the proposed statistical method for improving the point correspondences. Section 3 presents the results of applying the extended algorithm. In Section 4 we summarize and give some concluding remarks.
2 2.1
Methods Surface Correspondence Using Thin Plate Spline Warping
The anatomical landmarks do not constitute an exhaustive description of the surface of the ear canal. It is therefore necessary to generate a more dense set of landmarks describing the shape. For that purpose a model mesh is constructed and fitted to all shapes in the training set. The model mesh is chosen as a decimated version of a natural well-formed ear canal labeled with the anatomical landmarks. The model mesh is fitted to each of the shapes in the training set using a Thin Plate Spline (TPS) warp based on the corresponding anatomical landmarks. TPS is a warp function that minimizes the bending energy [11].
Shape Modelling Using Markov Random Field Restoration
3
Since the TPS transform is exact only for the anatomical landmark locations, the vertices of the model mesh will not lie on the surface of the target shape, see Fig. 1, right. Projecting each vertex in the warped model mesh to the closest point on the target surface produces a non-rigid deformation field and generates a dense correspondence. However, using the Point to Surface Projection (PSP) introduces a critical risk of inversions, where the vertices of the model mesh shift place and cause folds in the mesh. Another secondary artifact is the nonuniformity of the correspondence vector field shown in Fig. 2a,b giving rise to poor reconstruction of the target shape. In order to improve the correspondence vector field and avoid the problems inherent in applying point to surface projection a regularization must be included. Lorenz and Krahnst¨ over [12] propose a method for relaxing a polygonization into a more homogeneous representation, however, such methods are not suited when the polygonization is constrained to an underlying correspondence field. We propose to relax the problem by using a stochastic approach described in the following. 2.2
Markov Random Field Regularization
To obtain better reconstruction and correspondences we cast the problem of finding the deformation vector field into a Bayesian framework of MRF restoration. We thus follow the four successive stages of the Bayesian paradigm. 1: Construction of a prior probability distribution p(d) for the deformation field D matching the source shape S s onto the target shape S t . 2: Formulation of an observation model p(y|d) that describes the distribution of the observed shapes Y given any particular realization of the prior distribution. 3: Combination of the prior and the observation model into the posterior distribution by Bayes theorem p(d|y) = p(y|d)p(d)/p(y).
(1)
4: Drawing inference based on the posterior distribution. We start by some useful definitions from graph theory in order to describe a probability distribution on a spatial arrangement of points. Given a graph of n connected sites S = {si }ni=1 . A neighborhood system N = {Ns , s ∈ S} is any collection of subsets of S for which i) s ∈ / Ns , and ii) r ∈ Ns ⇔ s ∈ Nr , then Ns are the neighbors of s. A clique C is a subset of sites S for which every pair of sites are neighbors. We use i ∼ j to denote that i and j are neighbors. Given a neighborhood system N on the set of sites S we now consider the probability distribution of any family of random variables indexed by S, i.e. D = {Ds |s ∈ S}. For simplicity we first consider a finite state space Λ = 1, · · · , L of D but later generalize to continuous distributions. Let Ω denote the set of all possible configurations Ω = {d = {di }ni=1 | di ∈ Λ}. A random field D is a Markov Random Field (MRF) with respect to N iif i) p(d) > 0 ∀ d ∈ Ω, and ii) p(ds |dr , r = s) = p(ds |dr , r ∈ Ns ) ∀ s ∈ S, d ∈ Ω. The first constraint is the positivety condition and can be satisfied by specifying a neighborhood
4
R.R. Paulsen and K.B. Hilger
(a)
(b)
(c)
(d)
Fig. 2. a) The correspondence vector field derived using point to surface projection for moving the vertices of the source to the target shape. b) The resulting dense mesh representation of the target shape. c) The correspondence vector field derived using using the Markov random field restoration of the deformation field for moving the vertices of the source to the target shape. d) The improved dense mesh representation of the target shape.
large enough to encompass the Markovianity condition in the second constraint. Although the second condition is on the state of neighboring sites only, it does not exclude long range correlations in the probability distribution over the entire graph. Given a neighborhood system N = {Ns } let all cliques be denoted by C. For all C ∈ C we assume that we have a family of potential functions VC . We may now define an energy function of any given configuration of d i.e. U (d) = C∈C Vc . This leads to the definition of the Gibbs measure. The Gibbs measure induced by the energy function U (d) is p(d) = Z1 exp(−U (d)/T ), where Z is the partition function and T is a parameter referred to as temperature. The
Shape Modelling Using Markov Random Field Restoration
5
Gibbs measure maximizes entropy (uncertainty) among all distributions with the same expected energy. The temperature controls the “peaking” of the density function. The normalizing constant may be impossible to obtain due to the curse of dimensionality but often we need only ratios of probabilities and the constant cancels out. The Hammersley-Clifford theorem gives the relation between MRF and Gibbs random fields and states that D is a Markov random field with respect to N iif p(d) is a Gibbs distribution with respect to N [13,14]. Thus the task is to specify potentials that induce the Gibbs measure in order encompass MRF properties of D on the graph. So far the description only encompasses a one-dimensional finite state space. However, it generalizes to multivariate distributionssince any high dimensional process may be recast into a single state space with i Li states, where Li is the cardinality of the ith variable. Furthermore, the description generalizes to the case of continuous distributions in which case exp(−U (d)/T ) must be integrable. Since we wish to model correspondence between S s and S t the displacements are bound to the surfaces, in effect only posing constraints on the length of the three dimensional displacements at the individual sites. In practice the constraint may be enforced by projection of the displacements onto the closest point of the target surface in every site update of the MRF relaxation. 2.3
Prior Distributions
Similar to pixel priors [15] we construct energy functions based on differences between neighboring sites. Extending to the multivariate case we get the general expression of the energy governing the site-priors ||di − dj ||pp (2) Usite (d) = i∼j
where || · ||p is the p-norm, 1 ≤ p ≤ 2, and di represents the multivariate displacement in the ith site. With p = 2 the energy function induces a Gaussian prior on the deformation field. Neglecting regions with strong surface dynamics the local optimization becomes convex and the maximum likelihood (ML) estimate of the displacement at the ith site is taken as the mean of the neighboring displacements. By applying a weighted average ˆi = d wi dj / wj (3) j∈Ni
j∈Ni
and using Gaussian weights, derived from a fixed kernel size, the maximum aposteriori (MAP) state-estimate of the MRF is similar to the steady state of the algorithm for geometry constrained diffusion (GCD). GCD of D : IR3 → IR3 mapping the surface S s onto the surface S t is given in [6] by ∂t D =
nTSt ∆D ∆D − nSt ||n 2 if x ∈ S s St || ∆D if x ∈ / Ss
(4)
6
R.R. Paulsen and K.B. Hilger
where nSt is the unit surface normal of S t (D(x) + x). Thus, GCD is numerical scheme for solving a space and time discretized version of the heat equation on the deformation field with certain boundary conditions. Notice that in the MRF formulation we explicitly constrain the correspondence problem on the source and target surfaces, whereas the GCD implementation works on volume-voxel diffusion. Abandoning homogenity and isotropy of the MRF non-global kernels may be introduced. Thus, adaptive Gaussian smoothing may be applied, e.g. by setting the standard deviation of the kernel to the square-root of the edge length of the closest neighbor of site i on the graph. Moreover, using the p = 1 norm induces a median prior, with the ML estimate being the median of the displacements at the weighted neighboring sites. This property makes the MRF attractive for correspondence fields with discontinuities, thus avoiding the smearing of edges attained by the Gaussian prior. 2.4
Observation Models
Given a realization of the prior distribution, the observation model p(y|d) describes the conditional distribution of the observed data Y . By specifying an observation model we may favor a mapping that establish correspondences between regions of similar surface properties. The similarity measures may include derived features of the observed data such as curvature, orientation of the surface normals, or even texture. The simple dot product between the normals may form the basis for specifying a governing energy function that favors correspondence between regions of similar orientation by ||nTSs ,i nSt ,i − 1||q , (5) Unorm (y|d) = i
where nSs ,i is the surface normal at location xi on the source Ss , and nSt ,i is the normal of the target surface St at the coordinate xi + di . The parameter q > 0 controls the sensitivity of the energy function. 2.5
Maximum a Posteriori Estimates
Normalization of the energy terms from the different prior and observation models is typically chosen such that they operate on the same domain. However, the data analyst may choose to favor some terms over others, e.g. by relaxing the smoothness conditions in favor of correspondences between regions of similar curvature orientation of the surface normals. The posteriori conditional probability distribution is given by p(d|y) ∝ exp(−Utotal /T ),
(6)
where we use Utotal = (1 − α)Unorm + αUsite , in which α ∈ [0 : 1] weighs the influence of the model terms. In searching for the MAP estimate ˆ = argmax p(d|y) d d
(7)
Shape Modelling Using Markov Random Field Restoration
7
The Iterative Conditional Modes (ICM) method is a typical choice of optimization if the objective functional is convex. However, this is often only the case for simple MRFs and ML estimates are not always available. More advanced optimization can be done e.g. by simulated annealing using Gibbs sampling or the Metropolis-Hastings (MH) algorithm, followed by averaging or application of ICM in search of the most optimal state of the random field. When applying simulated annealing the a-posteriori probability distribution is linked to the prior and the observation model by p(d|y) ∝ (p(y|d)p(d))1/T ,
(8)
where T is the temperature governing the process. At high temperatures all states are equally likely, however, decreasing the temperature increases the influence of the model terms. If the temperature is decreased slowly enough the algorithm will converge to the MAP estimate [16]. See [17,18] for decreasing temperature schemes.
3
Results
Markov random field restoration using the Gaussian site-prior is applied to the training data after the TPS deformation of the model mesh using the PSP for initialization. In Fig. 2c,d we show a correspondence field after the MRF relaxation and the resulting reconstruction of the target shape. The figure is to be compared to Fig. 2a,b using the point to surface projection. Problems in the registration field using PSP are removed by applying the MRF restoration. This is the case in respect to both the regularity of the polygonization, and the reconstruction error in representing the target shape by the deformed model surface. To obtain a measure of the uniformity of the polygonization of the target shape we examine the regularity of its triangular structure. By measuring the coefficient of variance of the edge lengths we obtain a standardized measure of the width of the underlying distribution. Results are shown in Fig. 3 for all subjects. The left plot shows the coefficients before and after MRF restoration of the correspondence field, and the right figure shows a histogram of the reductions in the coefficients of variance. A rank test shows the significance of the MRF regularization since a reduction in the coefficient is obtained for all subjects. The improvement in shape reconstruction is show in Table 1. Applying the observation model is performed with α = 0.5. This parameter should be chosen using cross-validation in a more exhaustive search for an optimal deformation field. However, since the shapes are relatively smooth and regular the results shows no significant improvement in the reconstruction error by introducing the observational term. In Fig. 4 the reconstruction error of the target shape of subject 1 is shown using PSP and MRF restoration based on the Gaussian site-prior. Notice the improved reconstruction using MRF. When the model mesh is warped to another shape, it occurs that some correspondences are placed outside the region of interest on the target shape. Therefore, the model mesh is pruned to contain only the points that are warped to
8
R.R. Paulsen and K.B. Hilger 7
0.62
0.6 6 0.58
Frequency
Coefficient of variance
5 0.56
0.54
0.52
4
3
0.5 2 0.48 1 0.46
0.44
5
10
15
20
Subject index 1−29
25
0
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Reduction
Fig. 3. Left: Comparison between the point to surface projection (upper curve) and the MRF regularization (lower curve) by evaluating the coefficient of variance of the edge lengths of the polygonization of the target surface. Right: A histogram of the reduction in coefficient of variance over the training data.
valid areas for all shapes in the training set. The model mesh contains approximately 3000 vertices after pruning. Having established a dense correspondence field it is now possible to dispose of the anatomical landmarks as well as the original meshes of the training set. The set of meshes with dense correspondence is applied in the following statistical shape analysis. The shapes are aligned by a generalized Procrustes analysis [19]. The pure shape model is built using a similarity transformation in the Procrustes alignment while a rigid-body transformation is used to build the size-and-shape model [20]. An Active Shape Model (ASM) [2] is constructed based on a Principal Component Analysis (PCA) of the Procrustes aligned shapes. Let each aligned shape be represented as a vec-
Fig. 4. The reconstruction error [mm] for subject one using the point to surface projection (left) and the MRF correspondence restoration (right).
Shape Modelling Using Markov Random Field Restoration
9
Table 1. Reconstruction errors [mm] using PSP and MRF regularization. The mean ± one std. is shown for each method. The site-prior is governed by the p-norm and q controls the sensitivity of the observational energy term dependent on the surface normals. Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Average
PSP 0.048 ± 0.013 0.046 ± 0.013 0.048 ± 0.014 0.044 ± 0.012 0.045 ± 0.013 0.045 ± 0.014 0.047 ± 0.014 0.040 ± 0.011 0.041 ± 0.011 0.049 ± 0.015 0.046 ± 0.013 0.050 ± 0.014 0.042 ± 0.010 0.048 ± 0.013 0.043 ± 0.012 0.049 ± 0.013 0.064 ± 0.019 0.051 ± 0.015 0.064 ± 0.020 0.053 ± 0.015 0.049 ± 0.013 0.048 ± 0.014 0.040 ± 0.011 0.043 ± 0.013 0.044 ± 0.013 0.056 ± 0.014 0.042 ± 0.011 0.049 ± 0.013 0.048 ± 0.014 0.048 ± 0.013
Registration Method MRFp=2 MRFp=1 0.044 ± 0.013 0.049 ± 0.014 0.042 ± 0.013 0.043 ± 0.012 0.042 ± 0.013 0.043 ± 0.013 0.038 ± 0.011 0.040 ± 0.011 0.042 ± 0.012 0.043 ± 0.012 0.046 ± 0.015 0.045 ± 0.015 0.046 ± 0.014 0.046 ± 0.014 0.038 ± 0.011 0.039 ± 0.011 0.039 ± 0.011 0.039 ± 0.011 0.044 ± 0.013 0.045 ± 0.013 0.046 ± 0.014 0.045 ± 0.013 0.043 ± 0.013 0.044 ± 0.013 0.037 ± 0.009 0.039 ± 0.009 0.040 ± 0.011 0.042 ± 0.012 0.041 ± 0.012 0.040 ± 0.012 0.043 ± 0.012 0.044 ± 0.012 0.049 ± 0.014 0.059 ± 0.018 0.042 ± 0.012 0.048 ± 0.013 0.052 ± 0.015 0.058 ± 0.017 0.049 ± 0.015 0.050 ± 0.015 0.041 ± 0.011 0.045 ± 0.012 0.042 ± 0.012 0.044 ± 0.013 0.037 ± 0.011 0.038 ± 0.011 0.041 ± 0.013 0.042 ± 0.013 0.037 ± 0.011 0.039 ± 0.011 0.046 ± 0.011 0.052 ± 0.012 0.039 ± 0.011 0.040 ± 0.011 0.041 ± 0.011 0.045 ± 0.013 0.045 ± 0.014 0.045 ± 0.013 0.042 ± 0.012 0.045 ± 0.013
MRFp=2,q=1 0.043 ± 0.013 0.040 ± 0.012 0.040 ± 0.012 0.038 ± 0.012 0.040 ± 0.012 0.043 ± 0.014 0.046 ± 0.015 0.050 ± 0.013 0.038 ± 0.011 0.043 ± 0.013 0.055 ± 0.014 0.041 ± 0.012 0.041 ± 0.009 0.040 ± 0.011 0.038 ± 0.011 0.052 ± 0.013 0.064 ± 0.016 0.053 ± 0.013 0.049 ± 0.015 0.050 ± 0.013 0.039 ± 0.010 0.048 ± 0.014 0.042 ± 0.011 0.048 ± 0.014 0.046 ± 0.012 0.058 ± 0.013 0.039 ± 0.012 0.047 ± 0.013 0.047 ± 0.013 0.045 ± 0.013
tor of concatenated x, y and z coordinates xi = [xi1 , yi1 , zi1 , . . . , xin , yin , zin ]T , i = 1, . . . , s, where n is the number of vertices and s is the number of shapes. The PCA is performed on the shape matrix D = [(x1 − x)| . . . |(xs − x)], where x is the average shape. A new shape exhibiting the variance observed in the training set is constructed by adding a linear combination of eigenvectors to the average shape xnew = x + Φb, where b is a vector of weights controlling the modes of shape variation and Φ = [φ1 |φ2 | . . . |φt ] is the matrix of the first t eigenvectors of DDT . The three first modes of variation of the size-and-shape shape model derived using Gaussian MRF regularization are shown in Fig. 5. All the generated shapes look like natural ear canals with no deformations or folds in the mesh.
10
R.R. Paulsen and K.B. Hilger
(a) Mode 1
(b) Mode 2
(c) Mode 3
Fig. 5. Size-and-shape shape model. The first three modes of variation shown at +3 (top) and −3 (bottom) standard deviations from the mean shape.
Mode 1 consists of a bending of the canal and a flattening of the concha part. Mode 2 explains some of the shape variation observed in the inner part of the ear canal. Mode 3 is a combination of a flattening and twisting of the inner part of the ear canal and a general shape change of the concha. The distribution of the modes against each other is examined using pairwise plots and no obvious abnormalities were found (results not shown). In comparing the effect of the MRF regularization over the PSP method in the shape tangent space we find a reduction of more than 4% of the total variance of the resulting point distribution model. In Fig. 6 the variance contained in each principal component is shown together with the pct. reduction of the variance in each subspace. The average reduction of variance in each subspace is approximately 8% and the pct. reduction generally increases for higher dimensions.
4
Summary and Conclusions
A method is proposed for building statistical shape models based on a training set with an initial sparse annotation of corresponding landmarks of varying confidence. A model mesh is aligned to all shapes in the training data using the Thin Plate Spline transformation based on the anatomical landmarks. From the deformed model mesh and a target shape we derive a dense registration field of point correspondences. Applying the Markov Random Field restoration
Shape Modelling Using Markov Random Field Restoration
11
1800 20 Pct. reduction in variance
1600 1400 Variance
1200 1000 800 600 400
15
10
5
200 0 5
10 15 20 Dimensions
25
5
10
15 Modes
20
25
Fig. 6. Left: the variance contained in each principal component, the dotted line using point to surface projection and the solid line applying the MRF regularization step. Right: the reduction in the variance as a function of dimensionally of the model. The average reduction in each subspace is approximately 7% and the reduction of the total variance in the shape tangent space more than 4%.
we obtain a dense, continuous, invertible registration field (i.e. a homeomorphism). The stochastic restoration acts as a relaxation on the TPS constrained model mesh with respect to the biological landmarks. The landmarks are identified with varying confidence and the MRF relaxation allows for a data driven enhancement of the object correspondences. Using the site-prior, the algorithm converges to the most simple deformation field which creates a tendency to match points of similar geometry since the field otherwise must be more complex. Moreover, inclusion of observational models could compensate further where the prior fails in more complex regions. In the present case study of smooth and regular shapes no significant benefit of applying more complex MRF were obtained. In comparison to applying point to surface projection the MRF regularization provides i) improved homogeneity of the target shape polygonization free of surface folds, ii) better reconstruction capabilities, and iii) a more compact Active Shape Model description of all the training data. The point to surface projection performs reasonably well in representing the target shape over most regions of the ear canals. However, it fails in regions with strong surface dynamics and when the source and target surfaces are too far apart. The fact that the MRF regularization produces a reduction of more than 4% of the total variance contained in shape tangent space is noteworthy. The reduction is explained by increased collinearity between semi-landmarks distributed over the entire shape. It indicates an improvement in the shape representation in terms of homologous point correlation and thus constitutes a better basis for generative modeling.
Acknowledgments. The authors would like to thank Dr. Rasmus Larsen, IMM, DTU, for valuable discussions on MRFs, and Audiology Technician Claus Nielsen, Oticon Research Centre Eriksholm, for annotating the ear canals.
12
R.R. Paulsen and K.B. Hilger
References 1. Cootes, T.F., Taylor, G.J., Cooper, D.H., Graham, J.: Training models of shape from sets of examples. In: British Machine Vision Conference: Selected Papers 1992, Berlin, Springer-Verlag (1992) 2. Cootes, T., Cooper, D., Taylor, C., Graham, J.: Active shape models - their training and application. Comp. Vision and Image Understanding 61 (1995) 38–59 3. Gower, J.: Generalized Procrustes analysis. Psychometrika 40 (1975) 33–51 4. Larsen, R., Hilger, K.B.: Statistical 2D and 3D shape analysis using non-euclidean metrics (to appear). Medical Image Analysis (2003) 5. Paulsen, R.R., Larsen, R., Laugesen, S., Nielsen, C., Ersbøll, B.K.: Building and testing a statistical shape model of the human ear canal. In: Medical Image Computing and Computer-Assisted Intervention - MICCAI, Springer (2002) 373–380 6. Andresen, P.R., Nielsen, M.: Non-rigid registration by geometry-constrained diffusion. In: Medical Image Computing and Computer-Assisted Intervention - MICCAI, Springer (1999) 533–543 7. Andresen, P.R., Bookstein, F.L., Conradsen, K., Ersbøll, B.K., Marsh, J., Kreiborg, S.: Surface-bounded growth modeling applied to human mandibles. IEEE Transactions on Medical Imaging, 19 (2000) 1053–1063 8. Nielsen, M., Johansen, P., Jackson, A., Lautrup, B.: Brownian warps: A least committed prior for non-rigid registration. In: Medical Image Computing and Computer-Assisted Intervention - MICCAI, Springer (2002) 557–564 9. Davies, R., Cootes, T., Twining, C., Taylor, C.: An information theoretic approach to statistical shape modelling. In: British Machine Vision Conference. (2001) 3–11 10. R.H.Davies, C.J.Twining, T.F.Cootes, J.C.Waterton, C.J.Taylor: 3D statistical shape models using direct optimisation of description length. In: Proc. ECCV. Volume 3. (2002) 3–20 11. Bookstein, F.L.: Shape and the information in medical images: A decade of the morphometric synthesis. Comp. Vision and Image Understanding 66 (1997) 97–118 12. Lorenz, C., Krahnst¨ over, N.: Generation of point-based 3D statistical shape models for anatomical objects. Comp. Vision and Image Understanding 77 (2000) 175–191 13. Besag, J.: Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B 36 (1974) 192–236 14. Geman, D.: Random fields and inverse problems in imaging. In: Saint-Flour lectures 1988. Lecture Notes in Mathematics. Springer-Verlag (1990) 113–193 15. Besag, J.: Towards Bayesian image analysis. Journal of Applied Statistics 16 (1989) 395–407 16. Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (1984) 721–741 17. Vidal, R.V.V.: Applied simulated annealing. In: Lect. Notes in Econom. and Math.Syst. Volume 396. Springer Verlag, Berlin (1993) 18. Cohn, H., Fielding, M.: Simulated annealing: searching for an optimal temperature schedule. SIAM Journal of Optimization 9 (1999) 779–802 19. Hilger, K.B.: Exploratory Analysis of Multivariate Data. PhD thesis, Informatics and Mathematical Modelling, Technical University of Denmark, DTU, Richard Petersens Plads, Building 321, DK-2800 Kgs. Lyngby (2001) 20. Dryden, I., Mardia, K.: Statistical Shape Analysis. Wiley, Chichester (1997)
Optimal Deformable Surface Models for 3D Medical Image Analysis P. Horkaew and G.Z. Yang Royal Society/Wolfson Foundation MIC Laboratory, Department of Computing, Imperial College of Science, Technology and Medicine, United Kingdom {phorkaew, gzy}@doc.ic.ac.uk
Abstract. We present a novel method for building an optimal statistical deformable model from a set of surfaces whose topological realization is homeomorphic to a compact 2D manifold with boundary. The optimal parameterization of each shape is recursively refined by using hierarchical PBMs and tensor product B-spline representation of the surface. A criterion based on MDL is used to define the internal correspondence of the training data. The strength of the proposed technique is demonstrated by deriving a concise statistical model of the human left ventricle which has principal modes of variation that correspond to intrinsic cardiac motions. We demonstrate how the derived model can be used for 3D dynamic volume segmentation of the left ventricle, with its accuracy assessed by comparing results obtained from manual delineation of 3D cine MR data of 8 asymptomatic subjects. The extension of the technique to shapes with complex topology is also discussed.
1
Introduction
In cardiac imaging, an accurate delineation of anatomical boundaries is essential to the quantification of cardiac mass, volume and function. Thus far, the applicability of fully automatic methods in quantifying structural information still remains difficult, mainly due to the inconsistencies in image quality and morphological variations across subjects. Over the years, there have been a number of different techniques that have been developed to address this problem. Image segmentation based on deformable models [1,2] recovers the underlying shape by exploiting a priori knowledge about the geometry of anatomical structures. Deformable models can accommodate significant variabilities of biological structures over time and across different individuals. The Active Shape Model (ASM) [3,4,5,6] represents a robust parametric deformable structure, which captures plausible variations of the training set. The model deforms to fit to the unseen shape in a new image with specific constraints found in the Statistical Shape Model (SSM) derived from a set of labeled examples. A key challenge to statistical shape modeling is defining a set of dense correspondence points across a set of segmented shapes. Until recently, correspondence has often been defined by using subjective landmarks based on anatomical features. This approach is time consuming and prone to subjective errors, C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 13–24, 2003. c Springer-Verlag Berlin Heidelberg 2003
14
P. Horkaew and G.Z. Yang
thus leading to sub-optimum models. Over the years, the problem of automating shape correspondences has gained considerable attention in the computer vision and graphics communities due to its widespread applications in reverse engineering, computer animations, 3D shape metamorphosis, as well as medical imaging [7,8]. Whilst morphing between objects with different topologies is reported to be problematic and requires extensive human intervention, methods for shapes belonging to restricted, topologically similar classes have been discussed extensively. For instance, DeCarlo and Gallier [9] used a sparse control mesh on each surface to define a mapping between the input objects, where the change in topology was treated with degenerate faces. Kanai et al. [10] proposed geometric morphing between two arbitrary polyhedra that were homeomorphic to a sphere or a disc. Harmonic maps were used to create an overlapped interpolation domain, which had the same connectivity as the original meshes. Lamecker et al. [24] recently extended this technique to deal with shape with arbitrary topology. The correspondence between a given pair of shapes was computed by mappings between semantic regions, which were partitioned along high curvature paths. Minimal metric dispersion was used as a basis for optimization. In cases where an object is represented in a volumetric form, correspondence via dense vector field has been sought as an alternative. Rueckert et al. [12] used non-rigid registration to maximize the similarity of a set of medical images. PCA was then applied to the resulting B-spline deformation grids to build a statistical shape model. A Similar idea was proposed by Fleute et al. [23]. In their framework, a generic triangular mesh model was matched against a set of image features to infer the smooth deformation by minimizing the Euclidean distance. The volumetric non-rigid deformation was then used to morph the template to the actual data. Methods for establishing global correspondences within a group of objects have found their common uses in multiple views CAD, n-way shape blending and Digital Geometry Processing (DGP), which involve many models simultaneously. Eggert et al. [11] described a variation of the Iterative Closest Points (ICPs) algorithm for recovering global transformations using elastic force-based optimization. Praun et al. [13], suggested a solution that computes a consistent parameterization for a group of orientable genus zero models with identical predefined features. The parameterization provided immediate point correspondences between all the models and allowed each model to be re-meshed with the same connectivity. Kotcheff and Taylor [14] tackled the correspondence problem with direct optimization of shape parameterization. The optimality of the model was defined in terms of compactness, as measured by the determinant of its covariance matrix. Based on a similar theoretical framework, Davies et al. [15] suggested an improved approach to statistical shape modeling, in which compact shape parameterization is derived from the simplest model that provides a good fit to the entire training data. Their optimization criterion was based on the Minimum Description Length (MDL) principle [16] and shapes were described with spherical harmonic parameterization. Its extension to 3D star-like
Optimal Deformable Surface Models for 3D Medical Image Analysis
15
shapes was discussed but dealing with a set of arbitrary surfaces, however, is not straightforward. In the work presentd by Lamecker et al. [24], although it claimed that the method could be easily extended to arbitrary topology, their algorithm did not guarantee consistent parameterization over the entire training set. Furthermore, it is also unclear how to choose a particular shape in the set as a reference. Although global consistency can be resolved to some extent with heuristic partitioning [13], these techniques are not optimal in a statistical sense. For a given base domain, the internal mapping is computed by minimizing a shape preserving objective function. We later report in this paper that, a shape preserving objective function alone does not necessarily guarantee a valid correspondence. We will illustrate this fact by building a model of normal human left ventricles (LV) which undergoes complex morphological changes at different phases of the cardiac cycle. The structure neither constitutes to sufficient high curvature paths for constrained partitioning nor does it pose as a homogenically flat region to produce a reliable single domain mapping. Motivated by the work of Praun et al. [13] and Davies et al. [15], this paper focuses on the building of an optimal statistical deformable model from a set of surfaces, whose topological realization is homeomorphic, to a compact 2D manifold with boundary. This is a basic structure of more complicated surfaces [21, 24]. Given a set of partitioned surfaces, the optimal internal correspondence within the training data was identified by MDL objectives on a minimum distortion space. The parameterization of each shape is recursively refined by using hierarchical Piecewise Bilinear Maps (PBM) and tensor product B-spline representation of the surfaces. The concept of tensor analysis on manifolds [27] also provides a natural means of creating models in hyperspace spanned by orthogonal basis, e.g., space-time statistical models. The potential value of the proposed method is demonstrated by building a concise but physiologically plausible statistical model of the left ventricle (LV) from cine 3D multi-slice Magnetic Resonance (MR) images. We demonstrate how the derived model can be used for 3D dynamic volume segmentation of the left ventricle, with its accuracy assessed by comparing results obtained from manual delineations of images from eight asymptomatic subjects.
2
Material and Methods
It has been shown that an arbitrary surface can be decomposed into multiple patches, each topologically equivalent to a disk [23,13,21]. In [13], for example, a set of feature points and their connectivity were specified first. The geometric paths of the boundaries of adjacent patches were then traced by using a heuristic method. For anatomical structures with complex topological structures, they can be separated using anatomical landmarks. In the case of ventricles, they can be separated from the rest of the structure by using mitral and aortic valves as the patch boundaries, which are all clearly identifiable in cine 3D MR images. The discussion about general approaches that can be used for landmark identification
16
P. Horkaew and G.Z. Yang
and patch boundary localization, however, falls out of the scope of the paper. The main emphasis of the paper is on the establishment of optimal correspondence, from the statistical as well as geometrical perspectives, among the training set with single quadrilateral base type. 2.1
Surface Embedding Using PL Harmonic Maps
The key step involved in the proposed method is the embedding of training surfaces with harmonic maps. Its main purpose is to construct a small distortion parameterization of the topological disk M ⊂ R3 over a convex region N ⊂ R2 . A well-studied problem that has a unique solution with the desired property is called Harmonic Maps. Harmonic maps between Riemannian manifolds are solutions of a system of nonlinear partial differential equations [25]: Let (M, g) and (N, h) be two smooth Riemannian manifolds of dimensions m and n, respectively and φ : (M, g) → (N, h) be a smooth map. By denoting (x1 , . . . , xm ) as the local coordinate of M and (u1 , . . . , un ) for N, the integral energy of φ over a compact domain D of M m is given by: 1 E(φ, D) = 2
D
g ij hαβ
∂φα ∂φβ |G|dx1 . . . dxm ∂xi ∂xj
(1)
where G = gij dxi dxj A harmonic map is a C ∞ mapping which is a critical point of E(φ, D) with respect to variations of φ support in D. This, in effect, is also embedding i.e., the inverse φ−1 is a parameterization of M over N. The mapping can be intuitively visualized as stretching the boundary of the mesh M, composed of elastic, triangular rubber sheets, over that of a homeomorphic quadrilateral N. During the internal mapping, the positions of the rest of the vertices are uniquely defined by minimizing metric dispersion - a measure of the extent to which a map stretches regions of small diameter in M. The solution to a Harmonic Map, which minimizes the total energy of this configuration, involves solving a complex system of nonlinear partial differential equations [18]. A simpler alternative is to compute its piecewise linear (PL) approximation which retains such a homoeomorphism [19], i.e., Eharm [φ] =
1 2
κij φ(i) − φ(j)2
(2)
{i,j}∈M
where the spring constants κij are computed for each edge {i, j} as follows: κij =
1 2
(vi − vk1 ) · (vj − vk1 ) (vi − vk2 ) · (vj − vk2 ) + (vi − vk1 ) × (vj − vk1 ) (vi − vk2 ) × (vj − vk2 )
(3)
The unique minimum of equation (2), which defines the continuous one-to-one correspondence, can be found by solving a sparse linear system for the values φ(i) at the critical point. Let N be the matrix of mapped vertices in R2 and
Optimal Deformable Surface Models for 3D Medical Image Analysis
17
H be a sparse matrix representing the surface topology (vertices connectivity). The energy function can then be written in a quadratic form [10]: Hii Hib Ni (4) E = NT HN or E = NTi NTb Hbi Hbb Nb where subscript i and b, indicate internal and boundary vertices, respectively. Since the energy function remains constant for the fixed (boundary) part, only the variable (internal) part needs to be solved, that is, at the critical point, ∇E =
∂E = 2Hii Ni + 2Hib Nb = 0 ∂Ni
(5)
Note that the topology matrix H has non-zero non-diagonal elements when there exists an edge connecting vertices related to the ith row and j th column. Therefore, H can be considered as a sparse matrix. A bi-conjugate gradient method [20] was then employed to solve the sparse linear system in O (n). 2.2
Shape Correspondence
To construct a smooth B-spline surface patch from the triangular mesh, the harmonic embedding was re-parameterized over uniform knots on the 2D base. Given a set of distinct points X = {x1 , . . . , xn |xi ∈ R3 } in the parameterized base domain N, sampled from a single B-rep surface patch M, the approximating tensor product B-spline assumes the form of: s(u, v) =
m 1
Bi (u)Cj (v)cij , cij ∈ R3
(6)
i=1 j=1
where Bi and Cj are B-spline coefficients over uniform knots. Given the minimal distortion map, its least squares approximation by B-spline with a thin-plate regularized energy term yields well defined smooth surfaces. That is, the vector values for the B-spline control points are obtained by solving a sparse linear system [17]. The regularization factor can be used to adjust the emphasis of the approximation between error minimization and smoothing. The correspondences of the training set, represented by tensor product Bspline surfaces, s(u), were manipulated by reparameterizing these surfaces over the unit base domain. si (u) → si (Φi (u)), {Φi (u) : [0, 1] × [0, 1] → [0, 1] × [0, 1]}
(7)
Such reparameterizations are defined by piecewise bilinear maps (PBM). Multiresolution decomposition can then be applied to PBMs, resulting in a hierarchical representation of the parameterizations spaces, thereby those with higher dimensions can model more localized and finer detail of the distortion. In order to build a PBM lattice the base domain was first partitioned into 2L × 2L regular squares, where L indicates the level-of-detail of the reparameterization.
18
P. Horkaew and G.Z. Yang
Each of the squares is defined by four vertices points aij . A point p, which lies in a square domain, is mapped to the reparameterized space according to the bi-linear weighted sum of its surrounding control points cij . While the vertices aij are fixed at a given detail level, the vertices cij vary to represent different maps. All possible configurations define the linear space TL . By subdividing each square domain into four smaller ones, the linear space TL+1 for the next higher level is defined. It is possible for the corresponding parameterization to control more local distortions in the given shape, by varying these new parameters. Similar to the work by Davies et al. [15], the Minimum Description Length (MDL) was chosen as the criterion for selecting the set of parameterizations that could used to construct a statistical shape model. The MDL principle suggests choosing the model that provides the shortest description of the data. Equivalently, it casts statistical modeling as a means of generating codes whose lengths provide a metric by which we can compare candidate models. At the coarsest level (T0 ), the Gaussian curvature was evaluated on each surface. The PBMs were deformed such that they matched the points with highest curvature, normally corresponding to the apex of each LV. In the subsequent levels, the parameterizations were refined and the PBM parameter vectors were optimized according to the MDL objective function. It is worth emphasizing that, as the PBM were refined at the next detail-level, the sampling rate on each B-spline surface increased, creating the concurrent hierarchy on both the parameterization domain and the shapes. This is to ensure fast and reliable convergence of the proposed algorithm. For optimization Polak-Ribiere’s conjugate gradient optimization [20] was adopted in this experiment. The images used for this study were acquired using a Siemens Sonata 1.5T scanner (40 mT/m, 200 mT/m/ms) using phased array coil with two anterior and two posterior coils. A dual flip angle (20/60) cine TrueFISP sequence (TE = 1.5 ms, TR = 3 ms, Slt = 7 mm) was used to acquire a short axis view of the left ventricle within a single breath hold. A total of eight subjects were recruited for this study with inform consent. 2.3
Model Based 3D Image Segmentation
To make use of the geometrical models image segmentation, PCA was applied to the embedded LV surfaces for establishing the principal modes of variation that can be used to fit the model with actual imaging data. For a given 3D data set, the approach requires the identification of the mitral ring and the apex, which is done manually in this study. The initial pose of the LV was then estimated by minimizing the shortest Euclidean distances from the constrained points to the mean shape. Since the derived surface model was described in a Bspline parametric form, a nearest surface point to each constraint was obtained by searching on a 2D manifold. Least-square fitting was then applied to the resultant pairs to approximate the pose parameters. Once the initial pose of the LV model is established, local deformation is applied by following the ASM approach. A given data was first filtered [26] to reduce any spurious features due to imaging noise. The updating points xi
Optimal Deformable Surface Models for 3D Medical Image Analysis
19
on the 3D image which suggest a set of adjustments were searched along the normal to the LV surface. The quality of the fitted model to the imaging data was determined by the second moment of all feature points, which were signified by high gradient values and located perpendicular to the surface in association with the surface control points, that is, xi = argx min{−∇V (x) × (1 + ni · r(x))}
(8)
where V is the image intensity, ni is the normal vector of the LV surface at the control point i, and r(x) is the 3D orientation of the intensity pattern [26]. The resultant set of displacements was used to update the pose parameters, and the remaining residual displacement of each control point was then rectified by updating the model parameters by following the principal modes of variation. The process is repeated until convergence.
3
Results
For this study, we used 38 LV shapes for building the statistical model and a total of 160 LV volumes (inclusive of those used for training) for assessing the accuracy of the segmentation process. With the limited number of shapes used for training, the statistical model may not be able to capture the complete variations. Therefore, to allow an unseen LV volume to be segmented accurately, each control point was then iteratively displaced subject to both the cost function given in equation (8) and the thin-plate spline regularization energy, similar to that proposed in [2]. The algorithm was run for three levels of recursions, giving a total of 16 PBM lattices per shape. The model built by the automatic method was compared against that obtained from uniform sampling of B-spline surface. Fig. 1 shows an example of the harmonic embedding of an LV surface mesh. Faces of the original surface were projected onto the square disk without self-intersection, whether an object has convex or concave regions. The corresponding B-spline surface patch is shown on the right. Fig. 2 illustrates the variations captured by the first three modes of the uniformly sampled (left) an optimal (right) model. The shape parameters were varied by ±2σ, seen within the training set. Fig. 3 shows a quantitative comparison of the statistical model from the automatic and uniform parameterizations. The variance describing each mode and the corresponding accumulative variances of both models were plotted against the number of modes. For assessing the quality of using the derived statistical model through embedding for segmentation. Fig. 4 illustrates the original 3D multi-slice image (a), the model fitted LV by using the first 7 modes of variation (b), and the result after shape localization with local deformation (c). It is evident that the residual after model fitting is small and local shape deformation is able to capture the residual irregularities that are mainly subject specific. Fig. 4 (d)
20
P. Horkaew and G.Z. Yang
Fig. 1. (a) Triangulated surface of a left ventricle, (b) its corresponding Harmonic Map on a unit quadrilateral domain and (c) its tensor product B-spline representation.
Fig. 2. The variation captured by the first three modes of variations. The shape parameters were varied by ±2σ, seen within the training set. In the optimal model, the 1st mode captures the contraction as well as radial twisting while the 2nd mode captures shortening. These variations correspond to those found in a normal human heart. However, the uniformly sampled model represents invalid variations.
and (e) show the extracted LVs over the entire cardiac cycle from the same subject after model fitting and shape localization. Fig. 5 illustrates two example volume curves covering different phases of the cardiac cycle, which represent different levels of success in the initial model fitting. In both cases, the results after shape localization provided accurate final results. For the entire study group involving 160 LV volumes, Fig. 5 (c) illustrates the correlation between manually delineated results compared to those from the proposed technique, demonstrating the accuracy that can be achieved in practice.
Optimal Deformable Surface Models for 3D Medical Image Analysis
21
Fig. 3. Comparison of the compactness showing individual (left) and accumulative (right) variance captured by each mode. The variance explained by first three modes of the optimized and uniform models are 0.1007, 0.0308, 0.0198 and 0.0520, 0.0224, 0.0117, respectively. These modes capture 88.5% of total variations in the optimized model, compared to only 80.8% in the uniform model.
Fig. 4. An example showing the original 3D multi-slice image (a) and segmentation result after ASM segmentation (b) and after applying regularized deformation (c). The extracted LVs at selected phases from the same sequence after ASM (d) and regularized deformation (e) are also shown.
22
4
P. Horkaew and G.Z. Yang
Discussions and Conclusion
We have described an algorithm for establishing global correspondences within a group of simple surfaces based on shape embedding. For each surface, harmonic maps and tensor product B-spline were used to find the initial parameterization, on a unit quadrilateral base domain. The global parameterizations were then recursively deformed, in a coarse to fine manner, so that the corresponding statistical model yields the most concise description. MDL was adopted as an optimal criterion and the strength of the proposed method is demonstrated by obtaining optimal statistical shape models of the left ventricle from 3D MR acquisitions. The resultant models employed in an image segmentation framework based on deformable models demonstrates the great potential of the proposed method for efficient and robust extraction of LV surfaces from in vivo cine MR data sets.
Fig. 5. Two example volume curves covering different phases of the cardiac cycle, which represent different levels of success in the initial model fitting (a, b) and the scattered plot showing correlation between manually and automatically segmented volumes for all subjects (c).
Optimal Deformable Surface Models for 3D Medical Image Analysis
23
The main ingredient of our algorithm is based on building a compact manifold, which represents the parameterization of a surface. We have shown that it is feasible to compute such a parameterization for a surface whose topology is homeomorphic to a 2D disc. It is expected that the current work can be extended to shapes with more complex topologies, such as aortic or mitral valves. For these structures, the geometrical landmarks are easily identifiable for separating the overall structure into physiologically independent and topologically simple patches. In other situation, however, this may not be the case. Nevertheless, the approach proposed by Eck and Hoppe [21] may be adapted for the automatic reconstruction of B-spline patche networks based on Voronoi subdivision. Although voronoi subdivision for a given surface may be arbitrary, it has also been shown that similar base complexes can be created consistently within a group of surfaces [13]. It is evident from the results that the proposed method produces a model that is significantly more compact than the one obtained from the uniform-sampled surfaces. It is also interesting to note that the principal modes of variation of the optimal model conforms to the intrinsic physiological shape variation of the normal human heart, representing contraction, radial twisting, and shortening [22].
References 1. T. McInerney ands D. Terzopoulos, ”Deformable models in medical image analysis: A survey,” Medical Image Analysis 1, 1996: 91–108. 2. L.D. Cohen and I. Cohen. Finite-Element Methods for Active Contour Models and Balloons for 2-D and 3-D Images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(11);1993:1131–1147. 3. T.F. Coots, C.J. Taylor, D.H. Cooper and J. Graham, Active Shape Models – Their Training and Application, Computer Vision and Image Understanding, 1995;61(1):38–59. 4. T.F. Coots, A. Hill, C.J. Taylor and J. Haslam, The Use of Active Shape Models for Locating Structures in Medical Images, Image and Vision Computing, 1994; 12(6): 355–366. 5. N. Duta and M. Sonka, Segmentation and Interpretation of MR Brain Images: an Improved Active Shape Model, IEEE Trans. Med. Imag, 1998; 17: 1049–1062. 6. Y. Wang, and L.H. Staib, Boundary Finding with Correspondence using Statistical Shape Models, Proc. Conf. Computer Vision and Pattern Recognition, Santa Barbara, California, 1998: 338–345. 7. G. Wolberg. Image morphing: a survey. The Visual Com-puter, 1998, 14(8/9):360– 372. 8. F. Lazarus and A. Verroust. Three-dimensional Metamorpho-sis: a Survey. The Visual Computer, 1998,14(8/9):373–389. 9. D. DeCarlo, J. Gallier, Topological Evolution of Surfaces, Graphic Interface’96, Toronto, Canada, Canadian Human-Computer Communications Society, pp 194– 203. 10. T. Kanai, H. Suzuki and F. Kimura, Metamorphosis of Arbitrary Triangular Meshes, IEEE Computer Graphics and Applications, 20 (2), March/April 2000:62– 75.
24
P. Horkaew and G.Z. Yang
11. D.W. Eggert, A.W. Fitzgibbon and R.B. Fisher, Note: Simultaneous Registration of Multiple Range Views for Use in Reverse Engineering of CAD Models, Computer Vision and Image Understanding, 69(3), 1998:253–272. 12. D. Rueckert, F. Frangi and J.A. Schnabel, ”Automatic construction of 3D statistical deformable models using non-rigid registration, in Proc. of MICCAI’01, 2001;77–84. 13. E. Praun, W. Sweldens, P. Schroder, Consistent Mesh Parameterizations, Computer Graphics Proceedings (SIGGRAPH ’01), 2001:179–184. 14. A. C.W. Kotcheff and C.J. Taylor, Automatic construction of of eigenshape models by direct optimisation, Medical Image Analysis, 2 (4), 1998, pp 303–314. 15. R.H. Davies, C.J. Twinig, T.F Coots J.C. Waterton and C.J. Taylor, A Minimum Description Length Approach to Statistical Shape Modelling, TMI, 2002. 16. M. H. Hansen and B. Yu, Model Selection and the Principle of Minimum Description Length, Technical Memorandum, Bell Labs, Murray Hill, N.J. 1998. 17. M. S. Floater, Meshless Parameterization and B-spline Surface Approximation, in The Mathematics of Surfaces IX, R. Cipolla and R. Martin (eds.), Springer-Verlag (2000), 1–18. 18. T. Duchamp, A. Certain, A. DeRose, and W. Stuetzle, Hierarchical Computation of PL Harmonic Embedding, Technical Report, University of Washington, 1997, pp 1–21. 19. M. Eck, T. DeRose, T. Dutchamp, H. Hoppe, M. Lounsbery, W. Stuetzle, Multiresolution Analysis of Arbitrary Meshes, Computer Graphics Proceedings (SIGGRAPH ’95) 1995:173–182. 20. W.H. Press, S.A. Teukolsky , W.T. Vetterling. and B.P. Flannery. Numerical Recipes in C, 2nd ed., Cambridge University Press, 1996, ISBN 0-521-43108-5. 21. M. Eck, H. Hope, Automatic Reconstruction of B-Spline Surfaces of Arbitrary Topological Type, Computer Graphics Proceedings (SIGGRAPH ’96) 1996:325– 334. 22. Moore CC, Lugo-Olivieri CH, McVeigh ER, Zerhouni EA. Three-Dimensional Systolic Strain Patterns in the Normal Human Left Ventricle: Characterization with Tagged MR Imaging. Radiology; Feb, 2000; 214(2):453–66. 23. M. Fleute, S. Lavallee, and L. Desbat, Integrated Approach for Matching Statistical Shape Models with Intra-operative 2D and 3D Data, in Proc. of MICCAI’02, 2002;364–372. 24. H. Lamecker, T. Lange, and M. Sebass, A Statistical Shape Model for Liver, in Proc. of MICCAI’02, 2002;421–427. 25. A.P. Fordy and J.C. Wood, Harmonic Maps and Integrable Systems, Aspects of Mathematics, volE23, Vieweg, Braunschweig/Wiesbaden, 1994. 26. GZ Yang, P Burger, DN Firmin, SR Underwood, Structural Adaptive Anisotropic Image Filtering, Image and Vision Computing, 1996;14(2):135–145. 27. R.L. Bishop, S.I. Goldberg, Tensor Analysis on Manifolds, Dover Publications, 1981.
Learning Object Correspondences with the Observed Transport Shape Measure Alain Pitiot1,2 , Herv´e Delingette1 , Arthur W. Toga2 , and Paul M. Thompson2 1
Epidaure, INRIA, 2004 route des lucioles BP 93, 06 902 Sophia-Antipolis, France 2 LONI, UCLA School of Medicine, Los Angeles, CA 90095, USA Abstract. We propose a learning method which introduces explicit knowledge to the object correspondence problem. Our approach uses an a priori learning set to compute a dense correspondence field between two objects, where the characteristics of the field bear close resemblance to those in the learning set. We introduce a new local shape measure we call the “observed transport measure”, whose properties make it particularly amenable to the matching problem. From the values of our measure obtained at every point of the objects to be matched, we compute a distance matrix which embeds the correspondence problem in a highly expressive and redundant construct and facilitates its manipulation. We present two learning strategies that rely on the distance matrix and discuss their applications to the matching of a variety of 1-D, 2-D and 3-D objects, including the corpus callosum and ventricular surfaces.
1
Introduction
From signal processing to pattern recognition, the issue of object matching permeates a broad range of image related fields. In computer vision for instance, the search for target patterns often requires matching a given template to pictorial elements in an input image [1]. In medical imaging, the objects may be instances of a given anatomical structure, for which a statistical model, a shape average, or a segmentation is desired [2]. In computer graphics, matched objects may be used to derive a series of intermediate shapes to “morph” one object into the other [3], etc. In this paper, we approach the issue of object matching as a process of computing a dense correspondence field between two objects. At a glance, defining a correspondence between two objects entails finding in them pairs of corresponding elements that share particular similarities, in terms of shape, position, or both. More formally, given two objects O1 and O2 with any a priori parameterizations represented by two functions O1 and O2 : O1 :
I1 ⊂ Rm → Rn I ⊂ Rm → Rn (m ≤ n) , O2 : 2 x →O2 (x) x →O1 (x)
we are looking for a reparameterization of O1 and O2 , that is, for two diffeomorphisms f1 and f2 , such that O1∗ = O1 ◦ f1 and O2∗ = O2 ◦ f2 and ∀x1 ∈ I1 , ∀x2 ∈ I2 , x1 “close to” x2 ⇒ O1∗ (x1 ) “very similar to” O2∗ (x2 ) (1) where “very similar to” is defined with respect to a given similarity metric. C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 25–37, 2003. c Springer-Verlag Berlin Heidelberg 2003
26
A. Pitiot et al.
Fig. 1. Illustration of the proposed matching framework in the case of 2-D parametric curves (m = 1, n = 2)
Following [4], to allow multiple points in I1 to be matched to a single point in I2 and conversely, we restate our problem (see Figure 1) as that of finding a monotonically increasing and continuous function ϕ : I ⊂ Rm → I1 × I2 such that: x → (ϕ1 (x) , ϕ2 (x)) ∀x ∈ I, O1 (ϕ1 (x)) “very similar to” O2 (ϕ2 (x))
(2)
A number of automated methods for curve/surface matching have been presented in the literature, that tackle some or all of the above issues. Trouv´e and Younes detailed in [4] an axiomatic formulation for 1-D matching: they introduced, among others, the concepts of symmetry (ϕO1 →O2 should be the inverse of ϕO2 →O1 ) and consistent self-matching(∀object O, ϕO→O = (Id, Id); in the general case, ϕ should not be not too dissimilar from the identity) and proposed a matching framework for 2-D piecewise lines that statisfies their axioms. In [5], Cohen et al. compared the bending and stretching energies of one curve (O1 ) and a reparameterization of the other (O2∗ ), in a PDE framework, to find the best match. Fleut´e et al. [6] minimized the Euclidean distance between an input shape and a registered template, which assumed smooth transition paths in between them. Wang et al. [7] used geodesic interpolation to compute the dense correspondence field between two surfaces once an initial sparse set of corresponding points had been obtained with an automated shape-based matching algorithm. In [8], the first elliptical harmonics of the expansion of the input objects (which must have spheroidal shapes) served to establish a correspondence. In [9], Sebastian et al. used a dynamic programming approach similar to [4] to find the best match between two 2-D curves, using a similarity measure based on “alignment” between segments of the curves. Elastic registration and warping approaches have also been investigated. In [10], Thompson et al. mapped the input surfaces to two spheres whose
Learning Object Correspondences
27
Fig. 2. Matching two corpus callosum outlines
coordinates were then warped under anatomical feature curve constraints. Davatzikos et al. [11] also identified corresponding points on object boundaries in two images before aligning them using elastic warping. Along different lines, Davies et al. [12] presented a curve matching method, in the context of the search for the most compact statistical shape model. An information theoretic criterion was designed and controlled the correspondence between objects. The common drawback of those approaches, despite their diversity, lies in their lack of control over the similarity measure introduced in equation 1, which is often defined a priori, once and for all, and uses only limited domain-based information (or information learned from examples). Typically, these matching processes can be reduced to optimizing a functional whose minimum corresponds to a “good” correspondence field. The difficulty of designing an adequate functional comes from the difficulty of characterizing an adequate correspondence field. In [5] for instance, the authors assume that points with similar curvature should be matched. This may suit some applications, but is not always desirable. Figure 2 illustrates such a situation: here two corpus callosum have been delineated and we wish to compute their average shape: – Suppose that part of the fornix (a2) has been improperly delineated together with the corpus callosum, then we would like segments {a1, (b1,b2)}, {(a2,a3), b3} and {a4, b4} to be matched, in spite of the fact that the curvature signature of segment a2 more closely resembles that of b2 than that of b3. – On the other hand, we may decide to trust the delineation and assume that a lesion is the cause of the odd looking bulge (a1) in the corpus callosum in 2.a. Then, we would like a match: {a1, b1}, {a2, b2}, {a3, b3} and {a4, b4}. Clearly, choosing between these 2 scenarios requires the introduction of explicit knowledge into the matching algorithm. To overcome this issue, we propose a learning approach where a learning set helps the matching algorithm compute a correspondence field whose characteristics bear close resemblance to those of the a priori given learning correspondence fields. Our method relies on the use of a distance matrix derived from the values of a local shape measure which is computed on every pair of points of the objects to be matched. We argue that this shape distance matrix embeds the matching problem in a highly expressive and redundant construct which is more easily manipulated. This matrix is both visually interesting (as it allows for visual inspection of the specific reparameterization problem at hand) and enables us to
28
A. Pitiot et al.
Fig. 3. Observed Transport Measure principle
recast the matching problem as the search for a geodesic in another metrizable space: the space of reparameterizations (which is a group). We introduce in section 2 the so-called “observed transport” shape measure and discuss the properties that make it particularly amenable to the matching problem. We then present the various learning techniques that we have developed in section 3 and discuss their applicability to 1-D, 2-D and 3-D objects along with some examples from medical imaging.
2
Observed Transport Local Shape Measure
We first define our local shape measure in a variety of cases before presenting some of its properties. I ⊂ R → R2 be a 2-D curve (open or closed), pau → (x (u) , y (u)) rameterized with respect to a scalar u. We define the observed transport measure ρC as follows: C (t) − C (u) . C (u) .du (3) ∀t ∈ I, ρC (t) =
1-D case. Let C :
VC (t)
where VC (t) is the arc of C “visible” within range r ∈ R+ from C (t) : VC (t) = {C (u) s.t. [C (t) C (u)] ∩ C = {C (t) ; C (u)} and C (t) C (u) ≤ r} with [C (t) C (u)] the line segment between points C (t) and C (u). ρC (t) can be regarded as the minimal total amount of work it takes to transport the elementary elements du with mass C (u) · du that are visible within range r from point C (t), from their location C (u), to C (t) (in the fashion of a Monge-Kantorovich transport problem [13]). Figure 3 displays (thick lines) the arcs of C that are visible from point P at range r, for a given vertebra outline.
Learning Object Correspondences
29
I 2 ⊂ R2 → R3 be a 2-D sur(u, v) → S (u, v) = (x (u, v) , y (u, v) , z (u, v)) face, parameterized with scalars u and v. ρS becomes:
2-D case. Let S :
ρS u , v
∂ (x, y, z) .dudv S (u, v) − S u , v . ∂ (u, v) VS (u ,v )
=
(4)
where ∂(x,y,z) ∂(u,v) is the Jacobian of S, and VS is defined analogously to the 2-D case. Discrete approximation. We define a discrete version of object O as an unsorted N collection of n-D points: O = {Oi ∈ Rn }i=1 (that is, we do not assume any a priori parameterization). We then use a centered space finite difference approximation to derive a discrete version of ρO in n-D: ∀i ∈ 1 . . . N,
ρO (i) = dg
N
Oi − Oj
(5)
j=1,Oj ∈VO (Oi )
with dg the grid step size in the n directions. Example. Figure 4 shows how our local shape measure ρ behaves on a few 2D and 3-D objects. Curve (a) demonstrates how our measure can model shape characteristics: even though ρ evidently depends on the curvature of the curve at the point at which it is computed, ρ (A) = ρ (B) and ρ (C) = ρ (D), which correctly reflects the differences in the shape landscape surrounding those points. As such, the observed transport measure is both a measure of local shape and an indicator of context, that is, of where we are in the object (with large ranges r): for instance, it adequately discriminates between the belly and the back of the corpus callosum (Figure 4.d). Note that a curvature measure would not necessarily exhibit such behavior since for instance, in Figure 4.a, the curvatures at A and B, and at C and D, are the same. Also, our measure bears some resemblance to the “shape contexts” [14]. It can however intrisically handle both continuous and discontinuous objects, and is an actual measure (that is, a scalar value, as opposed to a histogram). Properties. – The observed transport measure is independent of (i.e. invariant to) reparameterization. – Given an observed transport signature (a series of consecutive observed transport values), there is only one curve that can be reconstructed from it, modulo rotation and translation. – It is invariant with respect to rigid transformation. However, it is not scale invariant as we believe the scale of an object is an important shape characteristic when trying to match anatomical structures. We could easily introduce scale invariance by normalizing it to the largest observed transport value across the entire object, or by using a scale parameter in subsequent optimizations.
30
A. Pitiot et al.
Fig. 4. Observed transport measure (black is lowest, white is highest) computed over: (a) a u-parameterized 2-D curve, (b) a set of 2-D points, (c) a u,v -parameterized 2-D surface (ventricle) and (d) a set of 3-D points (corpus callosum).
3
Learning the Correspondence Field
We present in this section the learning algorithms we have developed to bias the search for a correspondence field between two objects towards instances that are admissible with respect to an a priori given learning set. We first briefly describe a non-learning algorithm before detailing how we can derive learning strategies from this first approach. We have tackled 3 distinct cases, to which all or only some of these methods can be applied: 1-D case: 2-D and 3-D u-parameterized curves: we consider objects defined on an interval of R, taking values in R2 or R3 respectively; m = 1, n = 2 or 3 with notations of the first section. 2-D case: discrete 2-D point-set (unsorted collections of points of R2 ) and u,v parameterized 2-D surfaces; m = 2, n = 2 or 3. 3-D case: discrete 3-D point-set (unsorted collections of points of R3 ); m = 3, n = 3. 3.1
Optimal Path in the Shape Distance Matrix [m = 1, n = 2 or 3]
Following the Trouv´e approach [4], we define the best reparameterization ϕ∗C1 →C2 between curves C1 and C2 to be that which minimizes the overall cumulative distance between measures computed for all pairs of matched points: ∗ ϕC1 →C2 = arg min |ρC1 (ϕ1 (u)) − ρC2 (ϕ2 (u))| .du (6) ϕ
I
In the discrete case (and for piecewise linear curves, see [4] for details), a dynamic programming approach can be used to find the optimal reparameterization. Let D be the shape distance matrix associated with the curves N1 N2 C1 = Ci1 i=1 and C2 = Ci2 i=1 : D = [dij ] i = 1 . . . N , ∀ (i, j) dij = ρC1 Ci1 − ρC2 Ci2 1
j = 1 . . . N2
(7)
Learning Object Correspondences
31
Fig. 5. Non-learning reparameterization: (a & b) reparameterized curves, (c) shape distance matrices and optimal paths (in white), (d) point by point average curves.
Finding the best reparameterization then boils down to finding in D (see Figure 1) the minimal cost path between points S (start) and E (end), which requires that a single matching pair (M1 ∈ C1 , M2 ∈ C2 ) be given (for open curves, one could choose the extremities; this condition can also be relaxed if circular shifts are included in the optimization as well). A dynamic programming approach then yields an O (N1 .N2 ) complexity. Note that when a number of consecutive points have the same shape measure (in a circle for instance), there is not a unique best path with respect to the above criterion. To bias the search towards “natural” reparameterizations (the “consistent self-matching” axiom), we introduce in equation 6 a constraint to prevent the path from deviating too much from the diagonal of D, i.e. for some α ∈ [0, 1]: ϕ∗C1 →C2 = arg min(α. |ρC1 (ϕ1 (u)) − ρC2 (ϕ2 (u))| .du + ϕ I (8) (1 − α) . ϕ1 (u) .C2 (u) − ϕ2 (u) .C1 (u) .du) I
Figure 5 displays four pairs of reparameterized curves (a pair per column) along with the point by point averages derived from them. Some corresponding pairs of points are indicated with Greek letters. Note in particular how the discriminating power of our shape measure enabled the triangular indentations to be correctly matched together in the first column, and against the corresponding points in the rectangle in the second column.
32
A. Pitiot et al.
Fig. 6. Pattern matching strategy
3.2
A Pattern Matching Approach to the 2-D Reparameterization Problem [m = 1, n = 2 or 3]
An interesting feature of the shape distance matrix is that it embeds, in a highly redundant way, information about all possible reparameterizations between the two input objects. In Figure 5 for instance, we can notice clear patterns corresponding to the triangles on the first line. A local “matching scenario” (e.g. “discarding the fornix” in Figure 2, or “matching the triangles together” in Figure 5) then corresponds to a path in a sub-matrix extracted from the shape distance matrix of the objects. Note that even though our shape measure is independent of reparameterization, pairs of objects with different initial parameterizations will produce different looking shape distance matrices. Care should thus be taken to use the same (or similar) parameterization for the objects to be matched and the ones in the learning set. We derive the corresponding algorithm (see Figure 6): Step 1 (a priori ). Given a number of desired local matching scenarios, a human operator first forms a learning set by selecting instances for each scenario (a careful process as the operator must ensure that the learning set adequately represents the desired matching characteristics). An instance consists of a 2-D sub-matrix Mi,j ∈ Mmi,j ×ni,j , and its associated connected mi,j +ni,j k path Pi,j = xki,j , yi,j . The sub-matrices are extracted from shape k=1 distance matrices computed from objects which should be “similar” to the
Learning Object Correspondences
33
ones the operator wants the algorithm to reparameterize. Pi,j is the path in Mi,j which represents a local matching scenario, in the same fashion that the optimal cost path in Section 3.1 represents the “optimal” global matching scenario. For each instance, we also compute the distance map of its path. Let S1 = {S1,1 , . . . , S1,N1 } , . . . , SK = {SK,1 , . . . , SK,NK } be the K scenarios, with their instances Si,j = (Mi,j , Di,j ) where Mi,j is the shape distance sub-matrix, and Di,j the associated distance map. Step 2. Once we have computed the shape distance matrix M ∈ Mm,n from the two input objects O1 and O2 , a pattern matching algorithm is used to find in M sub-matrices that bear close resemblance to those of the learning set. We have developed a straightforward multi-scale framework where each sub-matrix Mi,j in the learning set is matched against sub-matrices, extracted from M , at a number of positions and scales. For each Mi,j , we record the translation t∗i,j and scale s∗i,j for which the maximal similarity is achieved: t∗i,j , s∗i,j = arg maxt,s (similarity (Mi,j , M |[tx , tx + s.mi,j ] × [ty , ty + s.ni,j ] )) where M |[tx , tx + s.mi,j ] × [ty , ty + s.ni,j ] is the sub-matrix of M of size T s.mi,j × s.ni,j which starts at index tx , ty (with t = [tx , ty ] ). We also discard instances for which the associated similarity measure is too low. Step 3. For each scenario in the learning set, we then average the distance maps of the paths associated with their instances (once we have applied the proper translation and scale from step #2). The averaging process is done pixel by pixel. In Figure 6, we average the maps of the two instances of scenario #2; no averaging is required for scenario #1 since it only has 1 instance. ∗ with the underlying shape Step 4. We then combine the average maps Di,j distance matrix M to bias the dynamic programming search towards the sub-paths from the learning set: ∗ Mx,y = Mx,y +
Ni K i=1 j=1
(λi,j .1[t∗x ,t∗x +s∗ i,j
i,j
i,j .mi,j
∗y (x, y) . ∗ ]×[t∗y i,j ,ti,j +si,j .ni,j ]
∗y T ∗ (x, y)), with t∗i,j = t∗x Di,j i,j , ti,j
(9)
The relative weight of the average distance maps with respect to the shape distance matrix λi,j could be controlled by the quality of the match between the sub-matrices from the learning set and the matrix M . That quality could also be used to compute a weighted average distance map instead of an equal-weight one. Figure 7 illustrates this approach on two geometric examples. In the first case (first row), we make sure to match triangles together, whereas in the second case (second row), we discard them as noise, and match them against the directly corresponding rectangle pieces. The learning set sub-matrices were taken from the matrices of Figure 5. Incidentally, the same method can be used to rule out certain sub-matches. When a pattern in the learning set has no associated sub-path, its distance map
34
A. Pitiot et al.
Fig. 7. Pattern matching examples: (a) learning set, (b & c) reparameterized curves, (d) the resulting point by point average curve
is infinite everywhere and thus the dynamic programming algorithm will avoid the corresponding area in the shape distance matrix. 3.3
Towards a Registration Approach to the n-D Reparameterization Problem [m ∈ N∗ , n ∈ N∗ , m ≤ n]
Even though noticeable patterns are still present in higher dimensional distance matrices, the lack of a single-scalar parameterization for n-D objects prevents us from using the dynamic programming approach. However, we can still capitalize on the advantageous aspects of the shape distance matrix by considering the problem of reparameterization between two objects to be that of deforming and adapting a hyper-surface given a priori (associated to an a priori shape distance matrix) to the shape distance matrix of the input objects. In doing so, we avoid the issue of the parameterization of the input objects (and can thus consider collections of points). The resulting algorithm is very similar to that of section 3.2: – Given a number of 2m-D shape distance matrices computed from pairs of already matched objects (and their associated matching hyper-surfaces), we non-linearly register them to the shape distance matrix computed from the two input objects. – The resulting non-linear transforms are then applied to the distance maps of the hyper-surfaces associated to the learning items. – These transformed distance maps are then averaged and the zero-level set of the average map becomes the new reparameterization. Note that a matching criterion (the integral of the deformation field for instance) could be used to compute a weighted average. With this approach, we transform an m-D matching issue into a 2m-D registration problem. Despite the curse of dimensionality, we are left with a simpler problem given the high expressivity of the distance matrices (see [15] for a similar dimension increase for surface matching). Consequently, the performance of our method depends on the robustness and accuracy of the non-linear registration algorithm. In the 1-D case (2-D shape distance matrix) we use the
Learning Object Correspondences
35
Fig. 8. Registration examples. 3 sample caudates (left) and mean caudate (right)
PASHA method [16] where the amount of regularization depends on the estimated discrepancy between the instances in the learning set and the objects to be reparameterized. We have adapted it in 4-D to treat the 2-D case (4-D shape distance matrix). Even though extending it to 6-D (3-D case) is not theoretically impossible, the size of the search space makes the registration intractable. We are currently experimenting with sparse space techniques to tackle this. Figure 8 shows how our registration method behaved on a series of 20 caudate nuclei (a u,v -parameterized surface). One caudate was selected as a target and the remaining 19 others were resampled together with it, using a 2-item learning set built by an expert neuroanatomist. We show 3 sample caudates (out of the 20) with some corresponding points (Greek letters) and the resulting mean caudate (rightmost column), obtained by averaging homologous points across the resampled test set. Visual inspection confirmed the agreement between the parameterization of the structures in the learning set and those in the test set. 3.4
Building the Learning Set
Our approaches require that the correspondence between the objects of the learning set be established a priori. This may not be a trivial task for 3-D objects with complex shapes. However, it only has to be specified once and for a small number of instances. Also a sparse subset of the correspondence field could be specified by the user to generate a learning set. Most of the fully automated techniques presented in the introduction could produce a meaningful set that could then be manually corrected if need be. Note that using a learning set implies that the objects we want to reparameterize should not be too different from those in the learning set. In fact, similarities between objects do not matter so much as similarities between the pairs of objects to be reparameterized and the pairs of objects in the learning set. Of course, the former is a sufficient condition for the latter. However, a unique advantage of our approach lies in its ability to learn a matching strategy for even very dissimilar objects, provided that we apply it to the same dissimilar matching situations.
4
Conclusion
We have presented a learning approach to the object correspondence problem. Our method makes adequate use of known correspondences from an a priori
36
A. Pitiot et al.
learning set to compute between two given objects a dense correspondence field whose characteristics are similar to those of the learning set. We can then exert explicit control over the reparameterization. As such, this technique proves useful to put into correspondence the “outliers” of an object set whose “ordinary” instances may be treated with direct non-learning algorithms. We have also introduced a new local shape measure, the observed transport measure, and illustrated the highly discriminating properties that make it particularly amenable in this context. Finally, technical difficulties (curse of dimensionality) prevented us from implementing our method for full 3-D objects. We are currently exploring alternative approaches to alleviate this problem.
References 1. Pitiot, A., Toga, A., Thompson, P.: Elastic segmentation of brain MRI via shape model guided evolutionary programming. IEEE Trans. on Medical Imaging 21 (2002) 910–923 2. Cootes, T.F., Hill, A., Taylor, C.J., Haslam, J.: Use of Active Shape Models for Locating Structures in Medical Images. Image and Vision Computing 12 (1994) 355–366 3. Kanai, T., Suzuki, H., Kimura, F.: Metamorphosis of Arbitrary Triangular Meshes. IEEE Computer Graphics and Applications 20 (2000) 62–75 4. Trouv´e, A., Younes, L.: Diffeomorphic Matching Problems in One Dimension: Designing and Minimizing Matching Functionals. In: Proc. of ECCV. (2000) 573– 587 5. Cohen, I., Ayache, N., Sulget, P.: Tracking Points on Deformable Objects using Curvature Information. In: Proc. of ECCV. (1992) 458–466 6. Fleut´e, M., Lavall´ee, S., Julliard, R.: Incorporating a Statistically Based Shape Model into a System for Computer-Assisted Anterior Cruciate Ligament Surgery. Medical Image Analysis 3 (1999) 209–222 7. Wang, Y., Peterson, B., Staib, L.: Shape-Based 3D Surface Correspondence using Geodesics and Local Geometry. In: Proc. of CVPR. (2000) 644–651 8. Kelemen, A., Szekely, G., Gerig, G.: Three-Dimensional Model-based Segmentation of Brain MRI. IEEE Trans. on Medical Imaging 18 (1999) 838–849 9. Sebastian, T., Crisco, J., Klein, P., Kimia, B.: Constructing 2D Curve Atlases. In: Proc. of CVPR. (2000) 70–77 10. Thompson, P., Toga, A.: Detection, Visualisation and Animation of Abnormal Anatomic Structure with a Deformable Probabilistic Brain Atlas Based on Random Vector Field Transformations. Medical Image Analysis 1 (1997) 271–294 11. Davatzikos, C., Prince, J., Bryan, N.: Image Registration Based on Boundary Mapping. IEEE Trans. on Medical Imaging 15 (1996) 212–215 12. Davies, R., Twining, C., Cootes, T., Waterton, J., Taylor, C.: A Minimum Description Length Approach to Statistical Shape Modelling. IEEE Trans. on Medical Imaging 21 (2002) 13. Haker, S., Angenent, S., Tannenbaum, A.: Minimizing Flows for the MongeKantorovich Problem. SIAM Journal of Mathematical Analysis (2003) to appear. 14. Belongie, S., Jitendra, M., Puzicha, J.: Shape Matching and Object Recognition Using Shape Contexts. IEEE Trans. on PAMI 24 (2002) 509–522
Learning Object Correspondences
37
15. Huot, E., Yahia, H., Cohen, I., Herlin, I.: Surface Matching with Large Deformations and Arbitrary Topology: A Geodesic Distance Evolution Scheme on a 3-Manifold. In: Proc. of ECCV. (2000) 769–783 16. Cachier, P., Bardinet, E., Dormont, D., Pennec, X., Ayache, N.: Iconic Feature Based Nonrigid Registration: The PASHA Algorithm. CVIU — Special Issue on Nonrigid Registration (2003) In Press.
Shape Discrimination in the Hippocampus Using an MDL Model Rhodri H. Davies2,3 , Carole J. Twining1 , P. Daniel Allen1 , Tim F. Cootes1 , and Chris J. Taylor1 1
Division of Imaging Science, University of Manchester, UK. [email protected] 2 Centre for Neuroscience 3 Howard Florey Institute; University of Melbourne, Australia. [email protected], www.cfn.unimelb.edu.au/rhhd
Abstract. We extend recent work on building 3D statistical shape models, automatically, from sets of training shapes and describe an application in shape analysis. Using an existing measure of model quality, based on a minimum description length criterion, and an existing method of surface re-parameterisation, we introduce a new approach to model optimisation that is scalable, more accurate, and involves fewer parameters than previous methods. We use the new approach to build a model of the right hippocampus, using a training set of 82 shapes, manually segmented from 3D MR images of the brain. We compare the results with those obtained using another previously published method for building 3D models, and show that our approach results in a model that is significantly more specific, general, and compact. The two models are used to investigate the hypothesis that there are differences in hippocampal shape between age-matched schizophrenic and normal control subgroups within the training set. Linear discriminant analysis is used to find the combination of shape parameters that best separates the two subgroups. We perform an unbiased test that shows there is a statistically significant shape difference using either shape model, but that the difference is more significant using the model built using our approach. We show also that the difference between the two subgroups can be visualised as a mode of shape variation.
1
Introduction
Statistical models of shape show considerable promise as a basis for segmenting and interpreting images [9]. The basic idea is to establish, from a training set, the pattern of ‘legal’ variation in the shapes and spatial relationships of structures for a given class of images. Statistical analysis is used to give an efficient parameterisation of this variability, providing a compact representation of shape. This allows shape constraints to be applied effectively during image interpretation [9] and provides a basis for studying shape change. A key step in model-building involves establishing a dense correspondence between shape boundaries/surfaces over a reasonably large set of training images. It is important to establish the ‘correct’ correspondences, otherwise an inefficient parameterisation of shape can C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 38–50, 2003. c Springer-Verlag Berlin Heidelberg 2003
Shape Discrimination in the Hippocampus Using an MDL Model
39
result, leading to unnecessarily complex and non-specific models. In 2D, correspondence is often established using manually defined ‘landmarks’ but this is a time-consuming, error-prone and subjective process. In principle, the method extends to 3D, but in practice, manual landmarking becomes impractical. Recently there has been considerable progress in developing methods for building effective models by defining correspondences automatically across a training set of shapes [6,17,27,29,21,23,15,14,22]. The simplest approach to defining correspondences between sets of shapes is to adopt some appropriate parameterisation of the shapes and assume correspondence between equivalently parameterised points [1,3,28,21]. It has been shown previously that such approaches tend to lead to suboptimal models [14]. Alternatively, correspondences can be established using local shape features [5, 4,29,23]. This has an intuitive appeal, but does not guarantee correspondences that are in any sense optimal. Another approach is to warp the space in which the shapes are embedded [7,28,27,8]. A model can then be built from the resulting deformation fields. Although this can produce plausible results, the resulting correspondences are essentially arbitrary: there are many non-rigid deformations that could match the intensities in two images and those chosen are, in effect, a side-effect of the optimisation process employed. A more robust approach is to treat the task as an optimisation problem. Several authors describe methods for minimising the model variance as a function of the set of correspondences [20, 2,25]. It can be shown, however, that model variance is not an ideal objective function [10]. Kotcheff and Taylor [22] describe an objective function based on the determinant of the model covariance. They use an explicit representation of the set of shape parameterisations {φi } and optimise the model directly with respect to {φi } using genetic algorithm search. Their representation of {φi } is, however, problematic and does not guarantee a diffeomorphic mapping between members of the training set. In this paper we build on our previous work [15,14], which was based on explicit optimisation of a measure of model quality, with respect to the set of correspondences (inspired by [22]). We measured model quality using an objective function based on a minimum description length criterion [15]. The correspondences between members of the training set were manipulated using a method of shape re-parameterisation based on the use of Cauchy kernels [14]. This approach guaranteed a diffeomorphic mapping between training shapes. The method used a multi-resolution approach to optimisation, which required several parameters to be chosen, did not necessarily find the global optimum, and did not scale well to large training sets. In this paper, we describe an optimisation approach that requires the selection of a single (non-critical) parameter, is scalable to large training sets, and finds significantly better correspondences than the previous method. We also describe how the model can be used in clinical research. We have trained shape models of the right hippocampus, using manually segmented 3D shapes from MR images of schizophrenic patients and age-matched healthy control subjects. We compare two methods of model building, one based on the
40
R.H. Davies et al.
minimum description length (MDL) optimisation approach outlined above, the other using the spherical harmonic (SPHARM) method of Gerig and co-workers [3,28,21]. We propose objective methods for comparing the performance of different models, which allow the significance of differences to be evaluated, and demonstrate that the MDL approach leads to significantly better results. Using the models generated from the hippocampus data, each example can be described using a reasonably small number of shape parameters. We show how linear discriminant analysis can be used to find the linear combination of these parameters that best separates the normal control and schizophrenic groups. This defines a mode of shape variation between the two groups that can be visualised. Using this linear discriminant function, we assess the extent to which the two groups can be separated on the basis of shape alone, and show that a more significant separation is obtained using the MDL model.
2
Statistical Shape Models
A statistical shape model is built from a training set of example shapes, aligned to a common coordinate frame. Each shape, Si , (i = 1, . . . ns ), can (without loss of generality) be represented by a set of n points regularly sampled on the shape, as defined by some parameterisation φi . This allows each shape Si to be represented by an np -dimensional shape vector xi , formed by concatenating the coordinates of its sample points. Using Principal Component analysis, each shape vector can be expressed using a linear model of the form: ¯ + Pbi = x ¯+ pm bm (1) xi = x i , m
where x ¯ is the mean shape vector, P = {pm } are the eigenvectors of the covariance matrix (with corresponding eigenvalues {λm }) that describe a set of orthogonal modes of shape variation and b = {bm } are shape parameters that control the modes of variation. We have shown previously that the shape model in (1) can be extended to deal with training sets of continuous shapes [15].
3
The SPHARM Approach
Gerig and co-workers describe a method of building shape models from a set of closed 3D surfaces by defining correspondence through the parameterisation of each shape [3,28,21]. We describe the method in some detail since it is used for comparison purposes later in the paper. An initial parameterisation of each shape is found using the method of Brechb¨ uhler et al. [3], which poses surface parameterisation as an optimisation problem by finding the mapping from the surface to a sphere that minimises area distortion. Using this parameterisation, each shape is represented by its expansion into a sum of spherical harmonics. The shapes are aligned so that the axes of their first spherical harmonics (which are ellipsoidal) coincide and principal component analysis is performed on the coefficients of the expansion. Since the expansion to spherical harmonics is a
Shape Discrimination in the Hippocampus Using an MDL Model
41
linear process involving integration over the surface, the net effect is the continuous equivalent of equally spacing points over the surface (according to its parameterisation). SPHARM models have been used as a basis for segmenting medical images [21] and to represent shape variation in studies of structural discrimination between diseased and control groups [19,18].
4
The MDL Approach
We have previously shown that the correspondence problem can be solved by treating it as an integral part of the shape learning process both in 2D [15,12, 13] and in 3D [14]. The basic idea is to choose the correspondences that build the ‘best’ model, treating model building as an optimisation task. This requires a framework involving a method of manipulating correspondences, an objective function to assess the ‘quality’ of the model built from a given set of correspondences, and a method of optimising the objective function with respect to the set of correspondences. Each component is discussed briefly in the following sections. For a more detailed description, see [14] or [10]. 4.1
Manipulating Correspondence
The problem of corresponding continuous curves/surfaces is treated as one of reparameterisation. A different re-parameterisation function φi (u) is defined for each shape Si (u) – where u is some initial parameterisation of the shape– allowing points to be moved around on the boundary/surface. For correspondences to be legal, φi (u) must be a diffeomorphic mapping – that is, φi must not cause folds or tears. Following [14], a parametric representation of each surface is obtained using the method of Brechb¨ uhler et al. [3] . Each surface, S, in the training set can then be represented using a spherical polar parameterisation: Sx (u) (2) S(u) = Sy (u) , Sz (u) where u = (θ, ψ) are spherical polar coordinates. Correspondence can now be manipulated by re-parameterising the spherical coordinates, ui of each training surface using a re-parameterisation function, φi : Si (θ, ψ) → Si (θ , ψ ), θ = φθi (θ, ψ), ψ = φψ i (θ, ψ).
(3)
Note that we have a separate parameterisation function φi = (φθi , φψ i ) for each training surface. Valid parameterisation functions φi must be exact diffeomorphic mappings. We have shown that this can be achieved by using compositions of symmetric wrapped Cauchy functions [14]:
42
R.H. Davies et al.
(1 + w2 ) cos θ − 2w 1 θ + A arccos φ (θ; w, A) = 1+A 1 + w2 − 2w cos θ θ
φψ (ψ) = ψ,
(4)
where w (w ≡ e−α , α ∈ R) is the width, and A (A ≥ 0) is the amplitude of the Cauchy kernel. This describes a kernel positioned at the north pole of the sphere; for a kernel applied at an arbitrary position, a, the north pole is rotated to a. The constant term is included so that φθ (θ) = θ when A = 0. i.e. the parameterisation is unchanged when the Cauchy has zero magnitude. 4.2
The MDL Objective Function
We have described previously a principled basis for choosing an objective function that directly favours models with good generalisation ability, specificity and compactness [15]1 . The ability of a model to generalise whilst being specific depends on its ability to interpolate and, to some extent, extrapolate the training set. In order to achieve these properties, we applied the principle of Occam’s razor, which can be paraphrased as: “the simplest description of the training set will interpolate/extrapolate best”. The notion of the ‘simplest description’ can be formalised using ideas from information theory – in particular, by applying the minimum description length (MDL) principle [26]. The basic idea is to minimise the length of a message required to transmit a full description of the training set, using the model to encode the data. As the message is encoded, the receiver must know the encoding model in order to fully reconstruct the original data, making it necessary to measure the description length of the encoding model as well as the encoded data. Under appropriate conditions, the objective function can be approximated by:
ng ns f (ns , R, ∆) + (ns − 2) log(σ p ) + FM DL (∆) ≈ (5) p=1 2 q 2
ng +1+nmin σ ns f (ns , R, ∆) + (ns − 2) log(σmin ) + , + q=n g +1 2 σmin where (σ m )2 is the variance of the data in the mth principal direction, (σmin )2 is the lower bound on the variance that we choose to model, ng is the number of directions where the first case (σ m > σmin ) holds, nmin is the number of directions where the second case (σ m ≤ σmin ) holds, f (ns , R, ∆) is a function that is constant for a given training set and ∆ is the accuracy to which the data is coded. σmin is directly related to ∆. Full details of the derivation can be found in [15]. In [15] the value of ∆ is chosen as an estimate of the expected uncertainty in the training data, which is rarely known. Here we overcome this by averaging FM DL over a distribution of ∆, resulting in an objective function with more 1
An initial version of this work appeared in [11].
Shape Discrimination in the Hippocampus Using an MDL Model
43
continuous behaviour close to convergence. In the experiments reported below a uniform distribution for ∆ over the range ∆min to ∆max is assumed. F =
∆max
∆min
d∆FM DL (∆).
(6)
The integral can be solved by numerical integration (e.g. by using Simpson’s rule [24]). We used ∆min = 0.01 and ∆max = 2 pixels for the results reported in this paper. Due to the high computational cost of calculating (6), we use the approximation in (5) (using a fixed value of ∆ = 0.1) to obtain an initial estimate of the final solution. This is then refined using the full objective function in [15]. 4.3
Optimising Correspondence
We have previously described a multi-resolution approach to optimising the MDL objective function by manipulating the re-parameterisations of the training set [14]. The positions, {ak }, and widths, {wk }, of the Cauchy kernels were fixed at each resolution and the magnitudes, {Ak }, of the kernels were used as the parameters of the optimisation. The basic idea was to begin with broad kernels and to iteratively refine the parameterisation by introducing additional, narrower kernels between the existing ones. The method gave reasonably good results, but required the choice of several parameters (initial resolution; number of resolution levels; separation of resolution levels; number of iterations per level; position, spacing and widths of Cauchy kernels at each resolution; etc). The complexity of the optimisation algorithm also scaled poorly with the number of training shapes. We describe here a simpler scheme that involves only one parameter, scales linearly with the number of training examples, and turns out to produce better models. In our new approach, the values for the positions, {ak }, and widths, {wk }, of the Cauchy kernels are chosen stochastically. The magnitudes, {Ak }, of the kernels are still used as the parameters of the optimisation. The values for {ak } are selected from a uniform distribution over the surface of the sphere. The widths of the kernels, {wk }, are chosen from the positive half of a Gaussian distribution with zero mean and standard deviation σG . The convergence of the algorithm is relatively insensitive to the value of σG . A value of σG = 12 was used in the experiments reported below. Our previous optimisation method also scales poorly with the number of training examples. For larger training sets ( 100 examples) the number of parameters to be optimised simultaneously prevents the local optimisation algorithm from converging reliably. It is also not well suited to an iterative modelbuilding scheme where examples are segmented and added one by one. These problems can be overcome by optimising the parameterisation of one example at a time. This is achieved by cycling through the training set, optimising the current re-parameterisation of each example before moving on to the next iteration. Note that we still consider the entire training set (i.e. the model is built using the current parameterisations of all examples) but the parameterisation
44
R.H. Davies et al.
of each example is optimised independently. To remove any bias, the ordering of the training set is permutated at random before each iteration. Finally, the positions of corresponding points depend on the pose parameters of each example as well as the shape parameterisation. We explicitly optimise the pose parameters (scaling, s; rotation, R; and translation, t) of each shape. The translation can be dealt with directly by setting the centre of gravity of each re-parameterised shape to the origin after each iteration, the other pose parameters are optimised explicitly. The algorithm is summarised below for clarity: 1. repeat a) Randomise the ordering of the examples b) for each example i: i. randomly select a value for wi and ai ii. optimise F wrt Ai 2 iii. transform the centroid of the re-parameterised shape to the origin iv. optimise F wrt si and Ri using the Simplex algorithm. 2. until convergence.
5
Measuring Model Performance
We need an objective and consistent basis for evaluating the performance of different models. The properties in which we are interested are: generalisation ability, compactness and specificity. Each is described in more detail below. Each of the measures we propose is a function of the number of modes retained in the shape model. It is also possible to compute the standard error associated with each measure, allowing the significance of differences between results obtained using competing approaches to be assessed. Details are given in [10]. Generalisation Ability. The generalisation ability of a model measures its ability to represent unseen instances of the class of object modelled. This is a fundamental property – if a model is over-fitted to the training set, it will be unable to generalise to unseen examples. Generalisation ability can be measured using leave-one-out reconstruction. A model is built using all but one member of the training set and then fitted to the excluded example. The accuracy to which the model can describe the unseen example is measured
ns and the process |xi (M ) − xi |2 , is repeated excluding each example in turn. G(M ) = n1s i=1 where ns is the number of shapes, and xi (M ) is the model reconstruction of shape xi using the model built excluding xi with M modes retained. 2
The Nelder-Mead Simplex algorithm [24] was used to obtain the results reported here.
Shape Discrimination in the Hippocampus Using an MDL Model
45
Specificity. Specificity is central to the usefulness of a model. A specific model should only generate instances of the object class that are similar to those in the training set. This can be assessed by generating a population of N instances using the model and comparing them to the members of the training set. We
N define a quantitative measure of specificity S(M ) = N1 j=1 |xj (M )−xj |2 where xj (M ) are shape examples generated by the model using M shape modes and xj is the nearest member of the training set to xj . The value of N is not critical as long as it is large compared to ns ; in the experiments reported below we used N = 10000. Compactness. A compact model is one that has as little variance as possible and requires as few parameters as possible to define an instance. We measure compactness
Musing the cumulative variance for the first M modes of the model: C(M ) = m=1 {λm }.
6
Shape Analysis in Clinical Studies
The compact parametric description of shape variation provided by a statistical shape model provides a basis for analysing shape differences between subgroups or over time. We consider the case of two subgroups. If a shape model is constructed using the methods described above for a set of training shapes, each example xi in the training set can be described by its shape vector bi . If the training set consists of two subgroups, linear discriminant analysis (LDA) of the shape vectors can be used to find the vector w in b space that best separates the two subgroups. The shape vectors bi of the training shapes are projected onto w and Fisher’s criterion (the ratio between the within class and between class variances) is optimised with respect to the choice of w. There is a closed-form solution for w [30]. Since w is a vector in b space, the shape variation that accounts for the difference between the two subgroups can be animated using x = x ¯ + Pwd where d is a scalar. Having performed LDA, we wish to know how well it is possible to separate the two subgroups using w. In our experiments the data were normally distributed along w, so we were able to use a simple t test. In order to obtain an unbiased estimate of separability, different training and test sets should be used. This can be achieved, when only a limited number of shape examples are available by using bootstrap sampling from the data set [16] to produce many (10 000 in the experiments reported below) independent training sets. This allows many trials of LDA to be performed, providing a distribution of t values. When two different methods of model building are used for shape analysis, their relative performance in separating the two subgroups can be assessed by comparing their distributions of t values.
46
R.H. Davies et al.
a: Compactness.
b: Generalisation ability.
c: Specificity.
Fig. 1. A quantitative comparison of hand models built using manual landmarks to define correspondence (red), the method described in [14] (green) and that presented in this paper (blue). Error bars are ±1 Standard Error.
a: Compactness.
b: Generalisation ability.
c: Specificity.
Fig. 2. A quantitative comparison of the SPHARM (red) and MDL (blue) hippocampus models. Error bars are ±1 Standard Error.
7 7.1
Results Optimisation Strategy
The stochastic optimisation method proposed in this paper was compared to the multi-resolution scheme described in [14] and a manually defined correspondence (for details see [10]). The methods were tested on a set of 10 2D hand outlines. The manual method gave a value of F = 745.19 for the objective function, the multi-resolution scheme gave a value of F = 725.13 whilst the stochastic, multi-scale method proposed here gave a value of F = 710.02, a substantial improvement. The compactness, generalisation and specificity measures for each model are plotted in figure 1. The results show clearly that the optimisation method proposed here offers a significant improvement in terms of generalisation and specificity. Although there is also a difference in compactness, it is not statistically significant. Similar results are obtained for other objects, see [10] for details. 7.2
Hippocampus Model
Shape models of the right hippocampus were constructed from a set of 82 right hippocampus shapes segmented from magnetic resonance images of the brain,
Shape Discrimination in the Hippocampus Using an MDL Model
47
one using the SPHARM method [21] and one using the MDL method described above. The SPHARM model had an objective function value of F = 19529, substantially larger than the MDL model which has a value of F = 18406. The compactness, specificity and generalisation ability of the two models are compared in figure 2. The plots show that the MDL model is significantly better than the SPHARM model.
7.3
Shape Discrimination of Hippocampi in Schizophrenia
The models described in the previous section were used to discriminate between the hippocampal shape of schizophrenic patients and normal controls. The models parameters were divided into two subgroups: C = {ci : i = 1 . . . 26} (the control subjects) and S = {si : i = 1 . . . 56} (the schizophrenic patients). LDA was performed to give a discrimination vector w. The difference in shape between the two groups can be visualised by creating shape parameters that produce a mode of variation along w using (6). The discriminant mode of variation for the MDL model is shown in figure 3. The main effect, as the discriminant parameter is moved towards the schizophrenic group, is a thickening of the structure. To investigate the separability between the control and schizophrenic groups we performed repeated classification trials for the full data set, using discriminant vectors w derived from bootstrap samples, as described in section 6. Each trial gave a t value for the separation between group means so, for each model, we obtained a distribution of t values which allowed us to calculate a mean and standard deviation for tM DL and tSP HARM . These values depended on the number of modes retained in the model so, in each case, we took the number of modes that gave the largest ratio between the mean t value and its standard error (i.e. the t value most distinct from zero). For both models this resulted in 10 modes being retained. For the MDL model we obtained a mean tM DL of 2.128 with a standard deviation of 0.220, giving a probability that both groups were drawn from the same distribution of p = 0.036. For the SPHARM model we obtained a mean tSP HARM of 2.115 with a standard deviation of 0.232 and p = 0.038. Thus, not only did the MDL model have significantly better generalisation, specificity and compactness than the SPHARM model, it also gave more significant discrimination between the two patient groups. To test if the difference between the two models was significant we performed a second t-test on the distributions of t values (which were approximately normal). The difference between the means of the two distributions of t values was highly significant (t = 5.52, p ≈ 10−8 ), suggesting a real (though in this case relatively small) advantage for the MDL model. We repeated the experiments using a linear support vector machine classifier (which may be expected to give better generalisation), and obtained almost identical results.
48
R.H. Davies et al.
Fig. 3. The discriminant mode of the hippocampal data varied from C (control) to S (schizophrenic) using the MDL model. The range of the animation is ±3 [standard deviations found over the training set]. The most noticeable effect is a thickening of the structure.
8
Discussion and Conclusions
We have described an efficient method of optimising MDL shape models. The new method of optimisation produces significantly improved results, scales better with the number of training shapes and eliminates many of the parameters required by the method of Davies et al. [14]. The improvement is probably due to the multi-scale nature of the method, which allows it to select Cauchy kernels of arbitrary size at any time, enabling it to escape local minima. The effect is similar to that obtained by the practice of ‘bouncing’ (restarting at a lower resolution) as often applied in multi-resolution methods. It was also shown how statistical models of shape can be used to characterise and quantify shape differences between two subgroups of a class of object. The method was applied to discriminate between the hippocampal shapes of healthy and schizophrenic subjects. It was shown that the MDL model provided better discrimination than the SPHARM model. This is due to the superior specificity of the MDL model. The SPHARM model also captures structural noise, which partially conceals the real biological effects present in the data. Although the results show that neither the SPHARM or MDL model can be used to classify subjects reliably on an individual basis, both can be used to discriminate between populations. This could have important application in disease progression studies and drug trials. The results also show that fewer subjects would be required to achieve a given level of significance using the MDL model rather than the SPHARM model, an important practical consideration. This establishes the MDL approach as a practical tool for biomedical research and clinical application.
Shape Discrimination in the Hippocampus Using an MDL Model
49
Acknowledgements. For most of this work, Rhodri Davies was funded by the BBSRC and AstraZeneca, Alderley Park, Macclesfield, UK. Carole Twining was funded by the EPSRC/MRC IRC grant (“from medical images and signals to clinical information”). The hippocampus dataset and the SPHARM model were kindly provided by G. Gerig, M. Styner and co-workers from University of North Carolina, Chapel Hill. The schizophrenia study was supported by the Stanley Foundation.
References 1. A. Baumberg and D. Hogg. Learning flexible models from image sequences. In European conference on Computer Vision (ECCV) 1994. 2. A. Baumberg and D. Hogg. An adaptive eigenshape model. In British Machine Vision Conference (BMVC) 1995. 3. C. Brechb¨ uhler, G. Gerig, and O. K¨ ubler. Parameterisation of closed surfaces for 3-D shape description. Computer Vision, Graphics and Image Processing, 61:154– 170, 1995. 4. A. Brett and C. Taylor. Construction of 3d shape models of femoral articular cartilage using harmonic maps. In Medical Image Computing and Computer Assisted Intervention (MICCAI) 2000. 5. A. D. Brett, A. Hill, and C. J. Taylor. A method of 3D surface correspondence and interpolation for merging shape examples. Image and Vision Computing (IVC), 17:635–642, 1999. 6. A. D. Brett, A. Hill, and C. J. Taylor. A method of automatic landmark generation for automated 3d pdm construction. IVC, 18:739–748, 2000. 7. G. E. Christensen, S. C. Joshi, and M. Miller. Volumetric transformation of brain anatomy. IEEE TMI, 16:864–877, 1997. 8. D. L. Collins, C. J. Holmes, T. M. Peters, and A. C. Evans. Automatic 3d modelbased neuroanatomical segmentation. Human Brain Mapping, 3:190–208, 1995. 9. T. F. Cootes, A. Hill, C. J. Taylor, and J. Haslam. The use of active shape models for locating structures in medical images. IVC, 12(6):276–285, July 1994. 10. Rh. H. Davies. Learning Shape : Optimal Models of Natural Variability. PhD thesis, University of Manchester, UK, 2002. www.isbe.man.ac.uk/theses/rhodridavies2002.pdf 11. Rh. H. Davies, T. F. Cootes, and C. J. Taylor. A minimum description length approach to statistical shape modelling. In Information Processing in Medical Imaging(IPMI) 2001 12. Rh. H. Davies, T. F. Cootes, C. J. Twining, and C. J. Taylor. An information theoretic approach to statistical shape modelling. In BMVC 2001 13. Rh. H. Davies, T. F. Cootes, J. C. Waterton, and C. J. Taylor. An efficient method for constructing optimal statistical shape models. In MICCAI 2001 14. Rh. H. Davies, C. J. Twining, T. F. Cootes, J. C. Waterton, and C. J. Taylor. 3d statistical shape models using direct optimisation of description length. In ECCV 2002 15. Rh. H. Davies, C. J. Twining, T. F. Cootes, J. C. Waterton, and C. J. Taylor. A minimum description length approach to statistical shape modelling. IEEE TMI, 21(5):525–537, May 2002. 16. B. Efron, editor. The Jacknife, the bootstrap and other re-sampling plans. SIAM, 1982.
50
R.H. Davies et al.
17. A. F. Frangi, D. Rueckert, J. A. Schnabel, and W. J. Niessen. Automatic 3d asm construction via atlas-based landmarking and volumetric elastic registration. In IPMI 2001 18. G. Gerig, M. Styner, D. Jones, D. Weinberger, and J. Lieberman. Shape analysis of brain ventricles using spharm. In Mathematical Methods in Biomedical Image analysis (MMBIA) 2001 19. G. Gerig, M. Styner, M. Shenton, and J. Liberman. Shape vs. size: Improved understanding of the morphology of brain structures. In MICCAI 2001 20. A. Hill and C. J. Taylor. Automatic landmark generation for point distribution models. In BMVC 94 21. A. Kelemen, G. Szekely, and G. Gerig. Elastic model-based segmentation of 3D neurological data sets. IEEE TMI, 18(10):828–839, 1999. 22. A. C. W. Kotcheff and C. J. Taylor. Automatic construction of eigenshape models by direct optimisation. Medical Image Analysis, 2(4):303–314, 1998. 23. D. Meier and E. Fisher. Parameter space warping: Shape-based correspondence between morphologically different objects. IEEE TMI, 21:31–47, 2002. 24. W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C (2nd Edition). Cambridge University Press, 1992. 25. A. Rangagajan, H. Chui, and F. L. Bookstein. The softassign procrustes matching algorithm. In IPMI 1997 26. J. R. Rissanen. Stochastic Complexity in statistical inquiry. World Scientific, 1989. 27. D. Rueckert, A. F. Frangi, and J. A. Schnabel. Automatic construction of 3d statistical deformation models using non-rigid registration. In MICCAI 2001 28. G. Szekely, A. Kelemen, C. Brechbuhler, and G. Gerig. Segmentation of 2-d and 3-d objects from mri volume data using constrained elastic deformations of flexible fourier contour and surface models. Medical Image Analysis, 1:19–34, 1996. 29. Y. Wang, B. S. Peterson, and L. H. Staib. Shape-based 3d surface correspondence using geodesics and local geometry. In Computer Vision and Pattern Recognition 2000 30. A. Webb. Statistical pattern recognition. Arnold, 1999.
Minimum Description Length Shape and Appearance Models Hans Henrik Thodberg Informatics & Mathematical Modelling, Technical University of Denmark, 2800 Lyngby, Denmark [email protected], http://www.imm.dtu.dk/~hht
Abstract. The Minimum Description Length (MDL) approach to shape modelling is reviewed. It solves the point correspondence problem of selecting points on shapes defined as curves so that the points correspond across a data set. An efficient numerical implementation is presented and made available as open source Matlab code. The problems with the early MDL approaches are discussed. Finally the MDL approach is extended to an MDL Appearance Model, which is proposed as a means to perform unsupervised image segmentation. Keywords: Shape Modelling, Minimum Description Length, Appearance Models, Point Correspondence Problem, Unsupervised Vision, Image Segmentation.
1 Introduction In order to construct an Active Shape or Active Appearance Model [1,2] one needs a number of training examples, where the true location of the underlying shape is known. From thereon these models are automatically generated. This paper addresses the problem of constructing these training examples automatically. The problem is divided into two: The first is to define the shapes in terms of contours. The second is to define marks on these contours. The marks should be defined so that marks on different examples are located at corresponding locations, hence the second problem is sometimes denoted the point correspondence problem, and it has been the subject of a series of papers by Taylor and collaborators [3,4,5] founded on MDL. This paper reviews the development, describes a simple and efficient implementation and demonstrates it on open and closed contours. The Matlab code and the examples are published to facilitate the dissemination of this technique in medical imaging and other applied fields.
C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 51–62, 2003. © Springer-Verlag Berlin Heidelberg 2003
52
H.H. Thodberg
Finally it is proposed to extend the MDL approach to solve also the first problem, defining the shape contours in the first place through unsupervised learning with the MDL Appearance Model.
2 History of Minimal Shape Modelling The development of the MDL approach to the point correspondence problem is marked by three important papers. Kotcheff and Taylor – 1998 [3]. In this paper the problem is formulated as finding a reparametrisations of each contour. The cost function Cost = Σ log (λm + λcut) is the sum over all eigenvalues λm “moderated” by a cut-off λcut, and the optimisation technique is a genetic algorithm (GA). The contribution of the paper is conceptual while the algorithm performance is not spectacular. Davies, Cootes and Taylor – 2001 [4]. This paper uses MDL as basis for the cost function. The paper computes the cost of transmitting the PCA model and the PCAcoded data, and the optimal trade-off between the precisions of the various components is derived. This leads to a description length expression, which allows a determination of the optimal number of principal components independent of the precision. A new a new method of representing reparametrisation is introduced. The performance is impressive on several 2D data sets. The optimisation is still a GA, but in the same year more powerful optimisation methods were introduced, so that the computation time is practical – of the order of four hours in Matlab. In addition generalisation to 3D is possible. This work attracted a lot of attention due to its combination of a principled approach and wide applicability, and it received several awards. Davies, Twining, Cootes, Waterton and Taylor – 2002 [5]. This is the first journal article on MDL shapes and contains a change in the formalism. Gone is the full PCA MDL model, and a master example is selected to have fixed parameterisation to avoid that the shapes collapse onto a small part of the contour. Questions. Several questions come to mind in this development • Why was the MDL approach of 2001 abandoned, and is there something wrong it? • Can MDL be used to determine the number of principal components in shape modelling? • Is it possible to run MDL with something faster than genetic algorithms and still avoid local minima? • How does one prevent the reparametrisations to diverge (run away)? Is one fixed master example sufficient? • How does the formalism apply to open curves? • What is the best way to begin using MDL on 2D problems?
Minimum Description Length Shape and Appearance Models
53
Outline These questions are answered in this paper, which is organised as follows: • Section 3 describes a simple and efficient version of the MDL approach to 2D shapes and demonstrates it on artificial data. • Section 4 analyses the theoretical development of the MDL shape approach. • Section 5 applies the method to medical shape data. • Section 6 generalises the method to MDL Appearance Models. • Section 7 contains the conclusions. Matlab source code and test data are available on www.imm.dtu.dk/~hht.
3 An Efficient MDL Shape Algorithm This section describes the efficient MDL shape algorithm used for the simulations in this paper. The algorithm applies to a set of shapes defined as curves in 2D space. Shape sets are classified into three kinds: Closed curves, open curves with fixed endpoints and open curves with free end-points. Fixed end-points means that the shape has its end-points fixed at the curve end-points, while free end-points means that the “true” shape is an unknown subset of the open curve, i.e. the determination of the shape end-points is part of the task. The curves are represented as a polyline i.e. an ordered list of points. The arc length along the curve is normalised to run from 0 to 1. We are now seeking a set of 2L+1 marks on each curve to represent the shape. They are called marks to indicate that they carry a meaning (like landmarks, postmarks, hallmarks etc). For closed shapes, the start- and end-points (number 0 and 2L) are identical. The mark locations are specified in a hierarchical manner (as described by Davies 2001), on L levels. For closed curves with 65 marks, we specify on the first level the coordinates of mark 0 and 32 by their absolute arc length position. On the second level, mark 16 and 48 are specified by parameters between 0 and 1. For example mark 16 can be anywhere on the curve between mark 0 and 32, corresponding to the extremes 0 and 1. On the third level the marks 8, 24, 40 and 56 are specified in between already fixed marks. This is continued until level 6 so that all marks are specified. For open fixed-end curves, level 1 places only mark 32, while for open free-end curves there are three marks on level 1, namely 0, 32 and 64. The end-marks are defined by two positive parameters describing the distance of the end-marks from the ends of the curve. The initial shape can be defined by marks placed evenly in arc length by setting all parameters to a = 0.5 (except for the end-points). Alternatively a priori knowledge of a good starting guess can be used. Closed curves should be roughly aligned initially. Statistical shape analysis is now performed in the usual way. The number of marks is N = 2L for closed curves and N = 2L+1 for open curves (free as well as fixed). First the shapes are centred and aligned to the mean shape normalised to one, i.e. the rms radius of the mean is 1/√N. The mean is determined using Kent’s method [6] by rep-
54
H.H. Thodberg
resenting mark positions as complex numbers and diagonalising the hermitian N-by-N covariance matrix of the set; the mean is the leading eigenvector. If the number of shapes s is smaller than N, the “dual” s-by-s matrix is diagonalised instead. The covariance matrix of the aligned shapes (normalised with the number of shapes, but not with the number of points) is then formed and principal component analysis is performed yielding the eigenvalue spectrum. The optimisation does not need to be done on all marks, but only on marks up to a given level, typically we adjust level 1, 2, and 3. These active marks are called nodes because the curve reparametrisations evolve kinks at these marks. The optimisation adjusts the node parameters to optimise the correspondence of all the marks over the set of examples. The parameters of level 4, 5, 6 are frozen at 0.5 corresponding to even distribution in arc length to capture the shape variation between the nodes. The objective function is derived from the MDL principle. The cost describes the information needed to transmit the PCA representation of the shapes, i.e. the principal components. For a mode m with large eigenvalue the cost is log(λm), while for smaller lambdas it should tend to a constant. We therefore introduce an important parameter λcut which separates these two regimes and use a cost expression from Davies 2002 in the low lambda region: log(λcut) + (λm / λcut – 1). (Davies’ expression was simplified by approximating (s+3) / (s–2) by 1). Adding the constant 1 – log(λcut) leads to our final choice for the total cost Description Length = Σ Lm (1) Lm = 1 + log(λm/λcut) for λm ≥ λcut Lm = λm/λcut for λm < λcut This cost has the attractive properties that it tends to zero when all eigenvalues tend to zero, and both Lm and dLm/dλm are continuous at the cut-off. In plain words, when λm falls below λcut, the benefit of decreasing it further is no longer logarithm, but levels off and reaches a minimum, 1 unit below the transition point. A mode with eigenvalue λcut contributes on average a variance of λcut/N per mark, and since the rms radius of the aligned shapes is 1/√N, the mode contributes a standard deviation per rms radius of σcut = √λcut . We specify λcut in terms of σcut and use σcut = 0.003 in all the simulations in this paper. This corresponds to a cut-off at 0.3 pixels for shapes with original rms radius 100 pixels. A crucial requirement for the cost is that it be insensitive to N in the high-N limit: If N is doubled, the rms radius of the aligned shapes decreases by √2. This balances the doubling of all eigenvalues that would otherwise occur, with the result that the high end of the eigenvalue spectrum - and hence the cost - are unchanged. The shape representation assigns the same weight to all marks in the Procrustes alignment and in the PCA; one could have a weight proportional with the spacing and with some prior, but that would complicate the algorithm. As a consequence, the centre of gravity and rms radius of a shape changes as the nodes shift. This gives rise to effects, which are often not desirable, but a consequence of the chosen simplicity. One effect is that the marks can pile up in some areas and thereby avoid describing the rest and reach a small description length. One way to avoid this run-away is to select a single shape as master example (as introduced by Davies 2002) for which the marks are not allowed to move. Its marks
Minimum Description Length Shape and Appearance Models
55
can be positioned at landmark position for instance by manual annotation by an expert, at conspicuous locations e.g. where the curvature is at a local maximum. The iterative optimisation can then begin. The nodes, e.g. 8, are ordered according to ascending level. Each node is associated with a step length, initially set to 0.01. These 8 step lengths are automatically decreased by the algorithm. Now the parameters anode for each node and each example are probed, one at a time according to the following pseudo-code, which runs over a number of passes, until the results has stabilised, typically 40 passes. Loop over passes Loop over nodes Loop over 5 steps Loop over examples Loop over + and - step Probe a(node) = a(node) +- step of example Recompute marks of example Do Procrustes of set Do PCA of set Compute new MDL If new MDL is lower accept and break loop Undo a(node) change End of +- step loop End of example loop If <5% of a(node)’s changed, divide step(node) by 2 End of step loop End of node loop End of passes loop
The three lines in bold each account for approx 1/3 of the processing time. The step lengths are adaptive, so there are no parameters to tune. For the master example a special code is used: The nodes of this example should not be moved, but if a node parameter is far out of correspondence with the rest, it is very slow for all the others to move individually to the lonely master. If Mohammed does not to come to the mountain, then the mountain must come to Mohammed and therefore when the master example is encountered in the pseudo-code above, all the other examples are given a step simultaneously to allow for a collective move towards the master. The convergence time increases with decreasing cut-off, so if a very low cut-off is desired, it can be effective to run the optimisation with larger cut-off initially, decreasing to the desired value at the end. This was not needed for the examples in this paper. Example Box-Bump As a first example, consider the 24 shapes in Figure 1 using σcut = 0.003, 40 passes, 64 marks and 8 nodes. The basic evaluation in this procedure consists of the reparametrisations and the diagonalisation of a complex 24-by-24 matrix and a real 24-by-24 real matrix. 140 evaluations are made per second on a 1.2 GHz PC in Matlab. The number of evaluations in 40 passes is 40 passes · 8 nodes · 5 steps · 24 examples · 2 signs = 80,000 evaluations, which takes approximately 10 minutes. Figure 2 shows the convergence of the node parameters, and Figure 3 illustrates the modes.
56
H.H. Thodberg
Fig. 1. The 24 ‘box-bump’ shapes created with a bump at a varying location and with varying aspect ratio of the box (first used by Davies et al). Eight nodes are optimised, and the first example was annotated manually as shown, while the corresponding nodes were placed on the other examples by the MDL algorithm in 10 minutes.
Fig. 2. The course of the optimisation of the parameter of the node to the right of the bump in the box-bump problem is shown for each example. The master example is kept fixed.
Minimum Description Length Shape and Appearance Models
57
Fig. 3. The two first modes of the box-bump set after MDL. Shown is the mean shape in green with red marks. The whiskers emanating from the marks indicate three standard deviations of the principal components. The mean has indeed captured the shape of a small semicircular bump and the first component is close to a horizontal displacement of the bump. To keep the centre of gravity fixed, the movement of the bump is counterbalanced by an opposite movement of the box. The second principal component describes the aspect ratio.
Improved Control of Run-Away In some cases, a single fixed master example is not sufficient to keep the whole set in place. For example the free endpoints of open curves can drift systematically to one side or the other, neglecting the master. This is because the statistical weight of the majority can outweigh the single master and the gain of run-away exceeds the cost of a single outlier. A remedy to this mutiny is to add a stabilising term to the MDL cost. Instead of fixing the node parameters of the master, one introduces a target aitarget for the average parameter aiaverage for each node i by means of a quadratic cost: NodeCost = Σ (aiaverage – aitarget)2/T2
(2)
where T is a chosen tolerance, so if the average drifts e.g. T = 0.05 away from the target, one unit is added to the cost. The algorithm can in general be run in four modes: Mode 1 fixes the node parameters of one master example. Mode 2 fixes the averages of the node parameters as described above. Mode 1 maintains the pure learning idea of one labelled example among the unlabelled rest, while mode 2 has stability towards runaway and a symmetric treatment of all examples. Finally we introduce mode 3 and 4 where the dynamics is controlled by the node cost with “moving target” values, which are adjusted at the start of each pass. In mode 3, an annotated master example is provided, and the moving targets are adjusted such that the master case relaxes in agreement with the annotation. In mode 4, no annotated example is used. Instead the moving targets are adjusted such that the average node parameters relax on desired values – a neutral choice is 0.5 for all parameters, which leads the marks to be located on average evenly in arc length. All four modes are supported by the open source Matlab code. Mode 3 is to be preferred over mode 1. It maintains the attractive learning paradigm of one labelled example among many unlabelled, but avoids the troublesome, unsymmetrical treatment of the master example of mode 1.
58
H.H. Thodberg
4 Theoretical Development of MDL Shape Models Davies 2001 presents a fundamental MDL analysis of PCA models. The transmission of the shape data set is done by transmitting the mean value and t eigenvectors (the model) and for each data example transmitting the t principal component and the n residuals (n = 2N). The residuals stem from the use of a limited number t, plus an enhancement due to the finite precision used to transmit the model and the principal components. The enhancement factor is computed to α = ns / (n(s – 1) – t (n – s)). The optimal number of principal components can be determined, i.e. MDL can be used to determine the proper model complexity for the data set at hand, and if the number of data examples increases, the optimal t increases. For a fixed number t, the MDL expression is of the form DL = Σλm≥λcut log λm + K log (Σλm<λcut λm)
(3)
where K is a constant. This expression is similar to the Bayesian BIC expression in [7]. There are two problems with this MDL shape model. Firstly, α was computed wrongly; the correct expression is α = ns / (ns – (n + tn + ts)). This diverges when the dimension of the PCA coding (the mean value contributes n, the loadings vector tn and the scores ts) equals the dimension ns of the original data. Thus the total MDL cost can be computed only for t < (ns – n)/(n + s) and not as in Davies 2001 all the way up to s (if s
Minimum Description Length Shape and Appearance Models
59
Now there is little left of the delicate balance between model complexity and data misfit, which is the strength of MDL, and the optimal number of principal components is no longer determined. However, this is not needed for the point correspondence problem and this cost is sufficiently powerful to guide the search for the optimal shape parameterisation. As Ockham said, one should not do with more, what one can do with less, and in this sense, the simpler 2002 version is the true way of doing MDL shape analysis. The new way is not only simpler, it is also more consistent. It can be considered as a reformulation of the problem in a proper reference frame - a transformation of the data to the natural coordinates of the problem. These coordinates, the principal components, are independent of the granularity n by which the shapes are sampled spatially, as demonstrated in Section 3. The formalism has one important and meaningful parameter σcut, which controls the desired level of detail in the shape modelling as explained in Section 3. It is interesting to note the circular path of history of the point correspondence problem. The Davies 2002 paper returns to a cost, which is close to the original, intuitive notion of compactness (with a cut-off) of Kotcheff and Taylor 98.
5 Examples from Medical Images The MDL method is applied to a set of 24 contours of metacarpals deduced from standard projection radiographs of the hand in the posterior-anterior projection. We use 64 marks, 8 nodes and a master example, and the algorithm converges after 40 passes in 10 minutes - see Figure 4.
Fig. 4. MDL shape analysis of the second metacarpal. The mean and 3 standard deviations of the first principal component are displayed.
The method is also applied to a set of 32 contours of femurs in the supine projection deduced from projection X-rays (Figure 5). This is treated as an open contour with free end-points. Using a single master example causes a slight run-away, so the more powerful control method mode 2 is used instead with T = 0.05. The target parameters at both ends are set to 0.04 and the internal node parameter targets are 0.5, corresponding to marks distributed on average evenly in arc length. This is the most neutral choice, and the 9 nodes of the shapes will then in general not relax at conspicuous locations. If such marks are needed, e.g. for visual validation they can easily be constructed afterwards by interpolation between marks. The computation uses 65 marks, 9 nodes, 40 passes and takes 11 minutes.
60
H.H. Thodberg
In both examples is was checked that starting with different initial conditions leads to the same minimum, so there is no sign of problems with local minima, and therefore there is no need to use genetic algorithms for shapes of this kind.
Fig. 5. Result of MDL analysis of femur contours. Here 14 of the 32 examples are shown with the optimised node positions. It is seen that they appear to be placed in a corresponding manner, and the free end-points have selected different portions of the available shafts.
6 Generalisation to MDL Appearance Model This work was motivated by its use for creating training examples for ASM and AAM. In the introduction, this problem was divided into two parts: (1) Finding the shape contours and (2) defining marks on the contours, and only the latter problem was treated so far. In this section it is briefly proposed how the MDL approach could be extended to attack the first problem as well. First notice that the point correspondence problem can be viewed as segmentation on a contour: the bump and the sides of the boxes are the segments of the object. These segments are annotated on the master example and the task of the optimisation is to locate the corresponding segments on the other examples. Thus we are performing unsupervised segmentation with a single labelled example. The problem is solved by brute force using a greedy algorithm minimizing the coding length of the shapes expressed in the PCA frame of reference. Now consider the analogous problem of unsupervised image segmentation: Assume that we have 100 images of objects of a certain class, for instance X-ray images of a specific bone in a specific view. With just one example labelled by marks on the object boundary, it is often possible for humans to correctly segment the other 99 images in a corresponding way. However, this problem has not been solved in computer vision, despite the human proof-of-concept, despite the huge computer power available, and despite the obvious application e.g. for construction of AAMs for medical imaging. In addition it is an interesting cognitive question how humans manage to solve the task. It is suggested that Ockham’s principle of minimum description length or economy in explanation can be the guiding principle of this process. If so, the human mind must have extra-ordinarily flexible and powerful optimisation skills.
Minimum Description Length Shape and Appearance Models
61
The successful and fast MDL shape model reviewed and improved in this paper naturally encourages us to apply a similar solution to the unsupervised object segmentation problem. The following scheme is proposed: Match the master shape approximately onto the unlabeled examples using a rigid transformation. This can be done either manually or by some simple template matching method based on image correlation. The shapes on each example are now allowed to reparameterise individually along the contour, exactly as in the MDL shape model as shown in Figure 6 left using the same hierarchical system of nodes. But the shapes are now also allowed to deform by displacements perpendicular to the curve, as shown in Figure 6 right. Again this is done in a coarse-fine manner using the same node hierarchy, where the displacement drops linearly to zero towards the neighbouring nodes on the same level. 1
1
0.8
0.8
Changing this node value 0.6
0.6
0.4
0.4
0.2
0.2
0
0
−0.5
Changing this nodes value
0
0.5
1
−0.5
0
0.5
1
Fig. 6. The reparametrisation used in MDL shape model (left) and the additional reparametrisation (right) introduced in the MDL Appearance Model for unsupervised vision.
The cost function is based on an appearance model of the set of images. An image template is defined in terms of a triangulation controlled by the shape marks. This is designed on the labelled example. Auxiliary points are constructed from the shape marks in order to span a margin around the object as is often done in AAM [8], and the triangulation of the image template is defined using the marks and the auxiliary points as vertices. Image texture sampling points are defined relative to this mesh. A priori knowledge can be inserted into the modelling at this stage by increasing the density of sampling points in certain areas of the object, e.g. more points near the shape boundary and fewer points in irrelevant areas. In addition to sampling image intensities, edge intensities can be sampled [9] to emphasise that we want accurate modelling of the edges. For any placement of the marks on the examples, a shape and a texture PCA model can be built and combined to an appearance model. The cost function is then defined in terms of the eigenvalue spectrum and a suitable cut-off value. The algorithm is slower than the MDL shape method due to the sampling of some or all of say 10,000 texture values every time an example is reparameterised, but as demonstrated by Stegmann [10] this can be speeded up using modern graphics cards.
62
H.H. Thodberg
7 Conclusions The MDL shape method was reviewed with the following original contributions: 1) Correction of the 2001 PCA MDL formula. 2) Explanation of the problems of full MDL on PCA of shapes. 3) Efficient treatment of closed and open curves. 4) The “Mohammed and the mountain” trick. 5) Alternative control of run-away using average node parameters. 6) A new optimisation scheme with adaptive step length and a multiple coarsefine strategy. 7) Efficient numerical method: 10 minutes on a 1.2 GHz machine in Matlab. 8) Open source Matlab code and test examples. 9) Generalisation to MDL Appearance Models (a kind of “Manchester United”) for unsupervised image segmentation from one labelled example. Correspondence with Carole Twining is acknowledged. Pronosco is acknowledged for providing the contours of metacarpals and femurs. Davies clarifies many of the same issues in his thesis [11].
References 1.
Cootes, T.F Hill, A., Taylor, C. J., Haslam, J.: The use of active shape models for locating structures in medical images. Image Vis. Comput. vol 12 (1994) 355–366. 2. Cootes, T.F., Edwards, G. J., and Taylor, C. J.: Active appearance models. In Proc. European Conf. on Computer Vision, volume 2, Springer (1998) 484–498. 3. Kotcheff, M. and Taylor, C.J.: Automatic construction of eigenshape models by direct optimisation, Med. Image Anal., vol 2 (1998) 303–314. 4. Davies, R.H., Cootes, T.F, and Taylor, C.J.: A Minimum Description Length Approach to Statistical Shape Modeling. 14th IPMI, Springer, 2001. 5. Davies, R.H., Twining, C.J, Cootes, T.F, Waterton, J. C and Taylor, C.J: A Minimum Description Length Approach to Statistical Shape Modelling, IEEE Trans Med. Imaging, Vol 21 (2002) 525–537. 6. Dryden, I. L., Mardia, K.V, Statistical Shape Analysis, Wiley (1998) 7. Minka, T.P: Automatic choice of dimensionality for PCA, Technical Report 514, MIT Media Laboratory, 2000 8. Thodberg, H. H.: Hands-on Experience with Active Appearance Models, Medical Imaging 2002: Image Proc., Eds. Sonka & Fitzpatrick, Proc. SPIE Vol. 4684 (2002), 495-506. 9. Cootes, T.F and Taylor, C. J.: On representing edge structure for model matching, Proc. IEEE CVPR, vol. 1, (2001) 1114–1119. 10. M. B. Stegmann, Analysis and Segmentation of Face Images using Point Annotations and Linear Subspace Techniques, IMM Technical Report IMM-REP-2002–22, (2002) 11. Davies, R.H., thesis, Univ. of Manchester (2002), http://www.cfn.unimelb.edu.au/rhhd/
Evaluation of 3D Correspondence Methods for Model Building Martin A. Styner1 , Kumar T. Rajamani1 , Lutz-Peter Nolte1 , abor Sz´ekely2 , Chris J. Taylor3 , and Rhodri H. Davies3 Gabriel Zsemlye2 , G´ 1
M.E. M¨ uller Institute for Surgical Technology and Biomechanics, University of Bern, P.O.Box 8354, 3001 Bern, Switzerland [email protected] 2 Computer Vision Lab, Gloriastrasse 35, ETH-Zentrum, 8092 Z¨ urich, Switzerland 3 Division of Imaging Science and Biomedical Engineering, Stopford Building, Oxford Road, University of Manchester, Manchester, M13 9PT, UK †
Abstract. The correspondence problem is of high relevance in the construction and use of statistical models. Statistical models are used for a variety of medical application, e.g. segmentation, registration and shape analysis. In this paper, we present comparative studies in three anatomical structures of four different correspondence establishing methods. The goal in all of the presented studies is a model-based application. We have analyzed both the direct correspondence via manually selected landmarks as well as the properties of the model implied by the correspondences, in regard to compactness, generalization and specificity. The studied methods include a manually initialized subdivision surface (MSS) method and three automatic methods that optimize the object parameterization: SPHARM, MDL and the covariance determinant (DetCov) method. In all studies, DetCov and MDL showed very similar results. The model properties of DetCov and MDL were better than SPHARM and MSS. The results suggest that for modeling purposes the best of the studied correspondence method are MDL and DetCov.
1
Introduction
Statistical models of shape show considerable promise as a basis for segmenting, analyzing and interpreting anatomical objects from medical datasets [5,14]. The basic idea in model building is to establish, from a training set, the pattern of legal variation in the shapes and spatial relationships of structures for a given class of images. Statistical analysis is used to give a parameterization of this variability, providing an appropriate representation of shape and allowing shape †
We are thankful to C. Brechb¨ uhler for the SPHARM software and to G. Gerig for support and insightful discussions. D. Jones and D. Weinberger at NIMH (Bethesda, MD) provided the MRI ventricle data. J. Lieberman and the neuro-image analysis lab at UNC Chapel Hill provided the ventricle segmentations. This research was partially funded by the Swiss National Centers of Competence in Research COME (Computer assisted and image guided medical interventions). The femoral head datasets were provided within CO-ME by F. Langlotz.
C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 63–75, 2003. c Springer-Verlag Berlin Heidelberg 2003
64
M.A. Styner et al.
constraints to be applied. A key step in building a model involves establishing a dense correspondence between shape boundaries over a reasonably large set of training images. It is important to establish the correct correspondences, otherwise an inefficient parameterization of shape will be determined. The importance of the correct correspondence is even more evident in shape analysis, as new knowledge and understanding related to diseases and normal development is extracted based on the established correspondence [10,21]. Unfortunately there is no generally accepted definition for anatomically meaningful correspondence. It is thus difficult to judge the correctness of an established correspondence. In 2D, correspondence is often established using manually determined landmarks [1], but this is a time-consuming, error-prone and subjective process. In principle, the method extends to 3D, but in practice, due to very small sets of reliably identifiable landmarks, manual landmarking becomes impractical. Most automated approaches posed the correspondence problem as that of defining a parameterization for each of objects in the training set, assuming correspondence between equivalently parameterized points. In this paper we compare methods introduced by Brechb¨ uhler[2], Kotcheff[16] and - [8]. A fourth method is based on manually initialized subdivision surfaces similar to Wang[24]. These methods are presented in more detail in sections 2.1-2.4. Similar approaches have also been proposed e.g. Hill[11] and Meier [18]. Christensen[4], Szeliski[22] and Rueckert[20] describe conceptionally different methods for warping the space in which the shapes are embedded. Models can then be built from the resulting deformation field [13,9,20]. Brett[3], Rangarajan[19] and Tagare[23] proposed shape features (e.g. regions of high curvature) to establish point correspondences. In the remainder of the paper, we first present the studied correspondence methods and the measures representing the goodness of correspondence in order to compare the methods. In the result section we provide the qualitative and quantitative results of the methods applied on three populations of anatomical objects (left femoral head, left lateral ventricle and right lateral ventricle).
2
Methods
Alignment – As a prerequisite for any shape modeling, objects have to be normalized with respect to a reference coordinate frame. A normalization is needed to eliminate differences across objects that are due to rotation and translation. This normalization is achieved in studies based on the SPHARM correspondence (section 2.2) using the Procrustes alignment method without scaling. In the study based on the MSS correspondence (section 2.1) the alignment was achieved using manually selected anatomical landmarks. MDL and DetCov can align the object via direct pose optimization, an option not used in this paper. Principal Component Analysis (PCA) model computation – A training population of n objects described by individual vectors xi can be modeled by a multivariate Gaussian distribution. Principal Component Analysis (PCA) is performed to define axes that are aligned with the principal directions. First the mean vector x ¯ and the covariance matrix D are computed from the set of object
Evaluation of 3D Correspondence Methods
65
Fig. 1. Left: Visualization of the left and right lateral ventricle in a transparent human head. Right: The manually selected landmarks on the left ventricle template.
vectors(1). The sorted eigenvalues λi and eigenvectors pi of the covariance matrix are the principal directions spanning a shape space with x ¯ at its origin. Objects of the xj in the shape space are described as a linear combination √ eigenvectors √ on x ¯ (2). The shape space here is defined within [−3 · λi . . . 3 · λi ]. n
x ¯=
n
1 1 xi ; D = (xi − x ¯) · (xi − x ¯)T n 1 n−1 1
¯+P ·b P = {pi }; D · pi = λi pi ; xj = x
2.1
(1) (2)
MSS: Manually Initialized Subdivision Surfaces
This method is the only semi-automatic one, all others are fully automatic. The correspondence starts from a set of predefined anatomical landmarks and anatomically meaningful curves determined on the segmented objects using a interactive display (e.g. spline on the crista intertrochanterica). After a systematic discretization of the higher dimensional landmarks, a sparsely sampled point set results, which is triangulated in a standardized manner and further refined via subdivision surfaces. The correspondence on the 0th level meshes is thus given by the manually placed control curves, on the subsequent levels by the subdivision rule: the triangles are split to four smaller ones, the new vertices are the midpoints of the pseudo-shortest path between the parent vertices. This path is the projection of the edges connecting in three-space the parent vertices to the original surface. The direction of the projection is determined by the normals of the neighboring triangles. This method was successfully applied on organs with a small numbers of anatomical point-landmarks. 2.2
SPHARM: Uniform Area Parameterization Aligned to First Order Ellipsoid
The SPHARM description was introduced by Brechb¨ uhler[2] and is a parametric surface description that can only represent objects of spherical topology. The spherical parameterization is computed via optimizing an equal area mapping of
66
M.A. Styner et al.
the 3D quadrilateral voxel mesh onto the sphere and minimizing angular distortions [2]. The basis functions of the parameterized surface are spherical harmonics. SPHARM can be used to express shape deformations [15], and is a smooth, fine-scale shape representation, given a sufficiently small approximation error. Based on an uniform icosahedron-subdivision of the spherical parameterization, we obtain a Point Distribution Model (PDM) directly from the coefficients via linear mapping [15]. The correspondence of SPHARM is determined by aligning the parameterization so that the ridges of the first order ellipsoid coincide. It is evident that the correspondence of objects with rotational symmetry in the first order ellipsoid is ambiguously defined. 2.3
DetCov: Determinant of the Covariance Matrix
Kotcheff et al [16] and later - [6] propose to use an optimization process that assigns the best correspondence across all objects of the training population, in contrast to MSS and SPHARM, which assign inherently a correspondence to each individual object. This view is based on the assumption that the correct correspondences are, by definition, those that build the optimal model given the training population. For that purpose they proposed to use the determinant of the covariance matrix as an objective function. The disadvantages of the original implementation was the computationally expensive genetic optimization algorithm, and the lack of a re-parameterization scheme. The implementation in this paper is different and is based on the optimization method by - [6], which efficiently optimizes the parameterization of the objects. This same optimization scheme was also used for the MDL criterion described in the next section. DetCov has the property to minimize the covariance matrix and so explicitly favors compact models. 2.4
MDL: Minimum Description Length
- [6,8] built on the idea of the DetCov method, but proposed a different objective function for the optimization process using on the Minimum Description Length (MDL) principle. The DetCov criterion can be viewed as a simplification of the MDL criterion. The MDL principle is based on the idea of transmitting a dataset as an encoded message, where the code originates from some pre-arranged set of parametric statistical models. The full transmission then has to include not only the encoded data values, but also the coded model parameters. Thus MDL balances the model complexity, expressed in terms of the cost of transmitting the model parameters, against the quality of fit between the model and the data, expressed in terms of the coding length. The MDL objective function has similarities to the one used by DetCov [6]. The MDL computations for all our studies were initialized with the final position of the DetCov method.
Evaluation of 3D Correspondence Methods
2.5
67
Measures of Correspondence Quality
In this section we present the measures of the goodness of correspondence used in this paper. Such measures are quite difficult to define, since there is no general agreement on a mathematical definition of correspondence. All methods in this paper produce correspondences that are fully continuous and have an inherent description of connectivity without any self-crossings. Measures of goodness evaluating a method’s completeness and continuity (e.g. as suggested in Meier et al. [18]) are thus not applicable here. We propose the use of four different measures, each biasing the analysis to its viewpoint on what constitutes correct correspondence. The first goodness measure is directly computed on the corresponding points as differences to manually selected anatomical landmarks. The three other ones are of indirect nature, since they are computed using the PCA model based on the correspondence. Further details not discussed in this paper about the following methods can be found in [7]. These three model based methods are in brief: – Generalization: The ability to describe instances outside of the training set. – Compactness: The ability to use a minimal set of parameters. – Specificity: The ability to represent only valid instances of the object. Distance to manual landmarks as gold standard – In medical imaging human expert knowledge is often used as a substitute for a gold standard, since ground truth is only known for synthetic and phantom data, but not for the actual images. In the evaluation of correspondence methods this becomes even more evident, because the goal is not clearly defined, in contrast to other tasks such as the segmentation of anatomical structures. We propose to use a small set of anatomical landmarks selected manually by a human expert on each object as a comparative evaluation basis. We computed the mean absolute distance (MAD) between the manual landmarks and each method’s points corresponding to the same landmarks in a template structure. For comparison, we report the reproducibility error of the landmark selection. Model compactness – A compact model is one that has as little variance as possible and requires as few parameters as possible to define an instance. This suggests that the the Mcompactness ability can be determined as the cumulative variance C(M ) = i=1 λi , where λi is the ith eigenvalue. C(M ) is measured as a function of the number of shape parameters M . The standard error of C(M ) M is determined from training set size ns : σC(M ) = i=1 2/ns λi Model generalization – The generalization ability of a model measures its capability to represent unseen instances of the object class. This is a fundamental property as it allows a model to learn the characteristics of an object class from a limited training set. If a model is overfitted to the training set, it will be unable to generalize to unseen examples. The generalization ability of each model is measured using leave-one-out reconstruction. A model is built using all but one member of the training set and then fitted to the excluded example. The accuracy to which the model can describe the unseen example is measured.
68
M.A. Styner et al.
Fig. 2. Visualization of the correspondences of a set of landmarks from the template (see Figure 1) in three selected objects from the two ventricle populations using the different methods. The manually determined landmarks are shown as star-symbols and the SPHARM, DetCov and MDL corresponding locations are shown as spheres.
The generalization ability is then defined as the approximation error (MAD) averaged over the complete set of trials. It is measured as a function G(M ) of the number of shape parameters M used in the reconstruction. Its standard error the sampling standard deviation σ and the training set σG(M ) is derived from √ size ns as: σG(M ) = σ/ ns − 1 Model specificity – A specific model should only generate instances of the object class that are similar to those in the training set. It is useful to assess this qualitatively by generating a population of instances using the model and comparing them to the members of the training set. We define the quantitative measure of specificity S(M ) (again as a function of M ) as the average distance of uniformly distributed, randomly generated objects in the model shape space to their nearest √ member in the training set. The standard error of S(M ) is given by: σS(M ) = σ/ N , where σ is the sample standard deviation of S(M ) and N is the number of random samples (N was chosen 10 000 in our experiments). The distance between two objects is computed using the MAD. A minimal model specificity is important in cases when newly generated objects need to be correct, e.g. for model based deformation or shape prediction.
Evaluation of 3D Correspondence Methods
69
Fig. 3. Table with errors graphs of compactness (C(M )), generalization (G(M )) and specificity (S(M )) for the two ventricle studies (left column: left lateral ventricle, right column: right lateral ventricle). The plot view is zoomed to a M value below 30, since for higher M the plot values did not change.
70
M.A. Styner et al.
Table 1. Table with mean, maximal and minimal MAD between the manual landmarks and the studied methods for the ventricle studies. It is clearly visible that there is little change between DetCov and MDL. On the left side, DetCov and MDL have better results than SPHARM. For comparison, the mean landmark selection error was 1.9mm. Left ventricle Right ventricle Mean Max Min Mean Max Min SPHARM 4.47 mm 6.57 mm 1.72 mm 4.32 mm 6.70 mm 1.11 mm DetCov 4.00 mm 6.16 mm 1.50 mm 4.28 mm 6.69 mm 1.10 mm MDL 4.00 mm 6.15 mm 1.48 mm 4.28 mm 6.68 mm 1.10 mm
Method
Model specificity is of lesser importance in the case of shape analysis since no new objects are generated.
3
Results on 3D Anatomical Structures
In the following sections we present the results of the application of the studied correspondence methods to 3 different population: a left femoral head population of 36 subjects, and a left and a right lateral ventricle population of each 58 subjects. The application of the model constructed from the femoral head populations is the femoral model-based segmentation from CT for patients undergoing total hip replacement. The application of the two ventricle populations is shape analysis for finding population differences in schizophrenia. In this document, we focus only on the correspondence issue. It is noteworthy that the studied populations are comprised not only of healthy subjects, but also of patients with pathologically shaped objects. 3.1
Lateral Ventricles
This section describes the studies of the left and the right lateral ventricle structure (see Figure 1) in a population of 58 subjects. The segmentation was performed with a single gradient-echo MRI automatic brain tissue classification [17]. Postprocessing with 3D connectivity, morphological closing and minor manual editing provided simply connected 3D objects. The manual landmarks were selected by an expert with an average error of 1.9mm per landmark. In Figure 2 the results of the correspondence methods in three exemplary cases are shown. The first row shows the good correspondence with the manual landmarks, as it is seen in the majority of the objects in this study. The second row shows the frequent case in the remaining objects, in which all three methods have a rather large difference to the manual landmarks. In most cases of disagreement with the manual landmarks, all methods produced similar results. The last row shows the rare case in which SPHARM is clearly further away from the landmarks than DetCov and MDL. The opposite case was not observed.
Evaluation of 3D Correspondence Methods
71
Fig. 4. Visualization of bad alignment in the first femoral head study on 3 example objects seen from the same viewpoint. A large rotational alignment error is clearly visible for a rotation around the long axis of the first order ellipsoid, which is close to the femoral neck-axis.
Fig. 5. Left: Display in MSS tool with a single femur head object and manually placed anatomical curves (anterior viewpoint). Right: Visualization of the femoral head template (posterior viewpoint) and four of its anatomical landmarks (Fovea, center lesser trochanter, tip greater trochanter).
Table 1 displays the landmark errors and Figure 3 displays the error plots, which both suggest that DetCov and MDL produce very similar results. Both show smaller errors than SPHARM. 3.2
Femoral Head
This section describes the results on a population of objects from the head region of the femoral bone. The segmentations were performed from CT-images with a semi-automated slice-by-slice explicit snake algorithm [12]. The correspondence study was done in 2 steps. Initially we only computed SPHARM, DetCov and MDL on all available cases (30 total). We realized that the SPHARM correspondence was not appropriate, so the results of the following computation were meaningless (discussed further down). In a second step, we selected only those cases, which contained the lesser trochanter in the dataset. For these cases (16) we then computed MSS, SPHARM, DetCov and MDL. The first study was based on the full 30 cases including 14 datasets with missing data below the calcar. The distal cut of the femoral bone was performed through the calcar perpendicular to the bone axis. The alignment was performed
72
M.A. Styner et al.
Fig. 6. Top row, Bottom row left: Table with Errors graphs of compactness (C(M )), generalization (G(M )) and specificity (S(M )) for the femoral head study. Bottom row right: Table with mean, maximal and minimal MAD between the manual landmarks and the studied methods for the femoral head study. There is little change between DetCov and MDL. SPHARM shows clearly the worst results of all studied methods. For comparison, the mean landmark selection error was 2.5mm.
using the Procrustes alignment based on the SPHARM correspondence. MSS was not computed in this case. We observed a bad alignment due to the SPHARM correspondence. In Figure 4 we visualize this inappropriate alignment in three cases. As a consequence the DetCov and MDL results were inappropriate. The bad SPHARM correspondence resulted from a rotational symmetry along the long first order ellipsoid axis, which is close to the neck-axis. Due of the bad correspondence, we do not present here the error analysis of these cases. The second study was based on a subset of the original population comprising only of those datasets that include also the lesser trochanter. The distal cut of the femoral bone was performed by a plane defined using the lesser trochanter center, major trochanter center and the intertrochanteric crest. For the MSS
Evaluation of 3D Correspondence Methods
73
method the anatomical landmarks for the subdivision surfaces were chosen as follows: the fovea, the half-sphere approximating the femoral head, the circle approximating the orthogonal cross-section of the femoral neck at its thinnest location, the intertrochanteric crest and the lower end of the major trochanter. The landmarks for the MSS alignment were the lesser trochanter, the femoral head center, and the center of the circle approximating the neck at its smallest perimeter. Each landmark was selected on the respective 3D femur model either directly on the reconstructed bone surface or using 3D spherical primitives. The manual landmarks for the comparison were selected by a different expert with an average error of 2.5mm per landmark. The landmarks sets for the MSS and the comparison were not exclusive, due to the scarceness of good landmarks. All correspondences were based on the MSS alignment. We observed that the SPHARM correspondence was visually better behaved in this study due to the inclusion of the lesser trochanter, which eliminated some problems with the rotational symmetry. However, figure 6 shows that the landmark errors for SPHARM alignment are clearly the worst of the studied methods. MSS shows the best average agreement with the manual landmarks, which is not surprising since the landmarks contained points used also to construct MSS. MDL was surprisingly better than both DetCov and MSS in regard to the minimal and maximal MAD, although the MAD differences are rather small. In figure 6 it is clearly visible that MDL and DetCov have similar and better modeling properties than SPHARM and MSS. Only for G(M ) MSS is better than SPHARM.
4
Conclusions
In this paper, we have presented a comparison of the SPHARM, DetCov, MDL and MSS correspondence methods in three populations of anatomical objects. The goal in all of the presented studies is a model-based application. We have analyzed both the direct correspondence via manually selected landmarks as well as the properties of the model implied by the correspondences, in regard to compactness, generalization and specificity. The results for SPHARM of the first femoral head study revealed that in case of rotational symmetry in the first order ellipsoid, independent of the higher order terms, the correspondence is inappropriate. Since correspondence and alignment are dependent on each other, such a bad correspondence cannot be significantly improved using methods like MDL and DetCov. In all studies, DetCov and MDL showed very similar results. The model properties of DetCov and MDL were better than both SPHARM and MSS. The findings suggest that for modeling purposes the best of the studied correspondence method are MDL and DetCov. The manual landmark errors are surprisingly large for all methods, even for the MSS method, which is based on landmarks. This finding is due to the high variability for the definition of anatomical landmarks definition by human experts, which is usually in the range of a few millimeters. In the lateral ventricle studies we plan to do the following shape analysis with the model built on the MDL correspondence. Other current research in our
74
M.A. Styner et al.
labs suggest that the shape analysis could gain statistical significance by using MDL rather than SPHARM. In the femoral head study, we plan to use the MDL model for shape prediction in the shape space. The results of the specificity error is in this study very relevant, since it is desired to generate ’anatomically correct’ objects from the shape space.
References 1. Bookstein, F.L.: Morphometric Tools for Landmark Data: Geometry and Biology, Cambridge University Press (1991) 2. Brechb¨ uhler, C., Gerig, G., K¨ ubler, O.: Parameterization of Closed Surfaces for 3-D Shape Description. Comp. Vision and Image Under. 61 (1995) 154–170 3. Brett, A.D., Taylor, C.J.: Construction of 3D Shape Models of Femoral Articular Cartilage Using Harmonic Maps. MICCAI (2000) 1205–1214 4. Christensen, G., Joshi,S., Miller, M.: Volumetric Transformation of Brain Anatomy. IEEE Trans. Med. Imag., 16 6 (1997) 864–877 5. Cootes, T, Hill, A., Taylor, C.J., Haslam, J.: The Use of Active Shape Models for Locating Structures in Medical Images. Img. Vis. Comp. 12 (1994) 355–366 6. Davies, Rh.H, Twining, C.J., Cootes, T.F., Waterton, J. C., Taylor, C.J.: 3D Statistical Shape Models Using Direct Optimization of Description Length. ECCV (2002) I. 7. Davies, Rh.H: Learning Shape: Optimal Models for Analysing Natural Variability. Dissertation University of Manchester, 2002. 8. Davies, Rh.H, Twining, C.J., Cootes, T.F., Waterton, J. C., Taylor, C.J.: A Minimum Description Length Approach to Statistical Shape Model. IEEE TMI 21 (2002) 9. Fleute, M., Lavallee, S.: Building a Complete Surface Model from Sparse Data Using Statistical Shape Models, MICCAI (1998) 879–887 10. Gerig, G., Styner, M.: Shape versus Size: Improved Understanding of the Morphology of Brain Structures, MICCAI (2001) 24–32 11. Hill, A., Thornham ,A., Taylor ,C.J.: Model-Based Interpretation of 3D Medical Images. Brit. Mach. Vision Conf. BMCV, (1993) 339–348 12. Hug, J., Brechb¨ uhler, C., Sz´ekely, G.: Tamed Snake: A Particle System for Robust Semi-automatic Segmentation. MICCAI (1999) 106–115 13. Joshi, Banerjee, Christensen, Csernansky, Haller, Miller, Wang:Gaussian Random Fields on Sub-Manifolds for Characterizing Brain Surfaces. IPMI (1997) 381–386 14. McInerney, T., Terzopoulos, D.: Deformable Models in Medical Image Analysis: A Survey. Med. Image Analysis 1 2 (1996) 91–108 15. Kelemen, A., Sz´ekely, G., Gerig, G.: Elastic Model-Based Segmentation of 3D Neuroradiological Data Sets. IEEE Trans. Med. Imag. 18 (1999) 828–839 16. Kotcheff, A.C.W., Taylor, C.J.: Automatic Construction of Eigenshape Models by Direct Optimization. Med. Image Analysis 2 4 (1998) 303–314 17. Van Leemput, K., Maes, F., Vandermeulen, D., Suetens, P.: Automated Modelbased Tissue Classication of MR Images of the Brain, IEEE TMI 18 (1999) 897– 908 18. Meier, D., Fisher, E.: Parameter Space Warping: Shape-Based Correspondence Between Morphologically Different Objects. Trans. Med. Imag. 12 (2002) 31–47 19. Rangarajan, A., Chui, H., Bookstein, F.L.: The Softassign Procrustes Matching Algorithm, IPMI (1997) 29–42
Evaluation of 3D Correspondence Methods
75
20. Rueckert, D., Frangi, A.F., Schnabel, J.A.: Automatic Construction of 3D Statistical Deformation Models Using Non-rigid Registration. MICCAI (2001) 77–84 21. Styner, M., Gerig, G., Lieberman, J., Jones, D., Weinberger, D.: Statistical Shape Analysis of Neuroanatomical Structures Based on Medial Models. Med. Image Anal. 22. Szeliski, R., Lavallee, S.: Matching 3-D anatomical surfaces with non rigid deformations using octree-splines, Int. J. Computer Vision, 18 2 (1996) 290–200 23. Tagare, H.: Shape-Based Nonrigid Correspondence with Application to Heart Motion Analysis. IEEE Trans. Med. Imag. 18 7 (1999) 570–580 24. Wang, Y., Peterson B. S., Staib, L. H.,: Shape-based 3D Surface Correspondence Using Geodesics and Local Geometry. CVPR 2 (2000) 644–651
Localization of Anatomical Point Landmarks in 3D Medical Images by Fitting 3D Parametric Intensity Models Stefan W¨ orz and Karl Rohr School of Information Technology, Computer Vision & Graphics Group International University in Germany, 76646 Bruchsal {woerz,rohr}@i-u.de
Abstract. We introduce a new approach for the localization of 3D anatomical point landmarks based on 3D parametric intensity models which are directly fit to the image. We propose an analytic intensity model based on the Gaussian error function in conjunction with 3D rigid transformations as well as deformations to efficiently model tip-like structures of ellipsoidal shape. The approach has been successfully applied to accurately localize anatomical landmarks in 3D MR and 3D CT image data. We have also compared the experimental results with the results of a previously proposed 3D differential operator. It turns out that the new approach significantly improves the localization accuracy.
1
Introduction
The localization of 3D anatomical point landmarks is an important task in medical image analysis. Landmarks are useful image features in a variety of applications, for example, for the registration of 3D brain images of different modalities or the registration of images with digital atlases. The current standard procedure, however, is to localize 3D anatomical point landmarks manually which is difficult, time consuming, and error-prone. To improve the current situation it is therefore important to develop automated methods. In previous work on the localization of 3D anatomical point landmarks, 3D differential operators have been proposed (e.g., Thirion [14], Rohr [12]). Recently, an evaluation study of nine different 3D differential operators has been performed by Hartkens et al. [8]. 2D differential approaches for extracting point landmarks in 2D medical images have been described in Briquer et al. [3] and Hartkens et al. [7]. For other approaches for extracting point landmarks in 2D images, see Walker et al. [15] and Likar et al. [9]. While being computationally efficient, differential operators incorporate only small local neighbourhoods of an image and are therefore relatively sensitive to noise, which leads to false detections and also affects the localization accuracy. Recently, an approach based on deformable models was introduced (Frantz et al. [5], Alker et al. [1]). With this approach tip-like anatomical structures are modeled by surface models, which are fit to the image data using an edge-based fitting measure. However, the approach C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 76–88, 2003. c Springer-Verlag Berlin Heidelberg 2003
Localization of Anatomical Point Landmarks in 3D Medical Images
77
L R L frontal horns
R
occipital horns L fourth ventricle
ext. occ. protuberance
temporal horns
Fig. 1. Ventricular horns of the human brain (from [13]) and the human skull (from [2]). Examples of 3D point landmarks are indicated by black dots.
requires the detection of 3D image edges as well as the formulation of a relatively complicated fitting measure, which involves the image gradient as well as 1st order derivatives of the surface model. We have developed a new approach for the localization of 3D anatomical point landmarks. In contrast to previous approaches the central idea is to use 3D parametric intensity models of anatomical structures. In comparison to differential approaches, larger image regions and thus semi-global image information is taken into account. In comparison to approaches based on surface models, we directly exploit the intensity information of anatomical structures. Therefore, more a priori knowledge and much more image information is taken into account in our approach to improve the robustness against noise and to increase the localization accuracy. In addition, a much simpler fitting measure can be used which does not include the image gradient or derivatives of the model. This paper is organized as follows: First, we introduce our 3D parametric intensity model (Section 2). Then, we describe the model fitting process (Section 3). Experimental results of applying our new approach to 3D synthetic data and 3D tomographic images of the human head are presented in Section 4.
2
Parametric Intensity Model for Tip-Like Structures
Our approach uses 3D parametric intensity models which are fit directly to the intensities of the image data. These models describe the image intensities of anatomical structures in a semi-global region as a function of a certain number of parameters. The main characteristic, e.g., in comparison to general deformable models, is that they exhibit a prominent point which defines the position of the landmark. By fitting the parametric intensity model to the image intensities we obtain a subvoxel estimate of the position as well as estimates of the other parameters, e.g., the image contrast. In [11] such type of approach has been used for localizing 2D corner and edge features. As an important class of 3D anatomical point landmarks we here consider tip-like structures. Such structures can be found, for example, within the human head at the ventricular system (e.g., the tips of the frontal, occipital, or temporal
78
S. W¨ orz and K. Rohr
horns, see Fig. 1) and at the skull (e.g., the tip of the external occipital protuberance). The shape of these anatomical structures is ellipsoidal. Therefore, to model them we use a (half-)ellipsoid defined by three semi-axes (rx , ry , rz ) and the intensity levels a0 (outside) and a1 (inside). We also introduce Gaussian smoothing specified by a parameter σ to incorporate image blurring effects. The exact model of a Gaussian smoothed ellipsoid cannot be expressed in analytic form and thus is computationally expensive. To efficiently represent the resulting 3D intensity structure we developed an analytic model as an approximation. This x −1/2 −ξ 2 /2 model is based on the Gaussian error function Φ (x) = −∞ (2π) e dξ and can be written as √ 2 3 r r r x2 y2 (z + rz ) x y z 1− (1) gEll. (x) = a0 + (a1 − a0 ) Φ + 2 + σ rx2 ry rz2 where x = (x, y, z). We define the tip of the ellipsoid w.r.t. the semi-axis rz as the position of the landmark, which also is the center of the local coordinate system. In addition, we include a 3D rigid transform R with rotation parameters (α, β, γ) and translation parameters (x0 , y0 , z0 ). The translation parameters define the position of the landmark in the 3D image. Moreover, we extend our model to a more general class of tip-like structures by applying a tapering deformation T with the parameters ρx and ρy , and a bending deformation B with the parameters δ (strength) and ν (direction), which are defined by x − z 2 δ cos ν x (1 + z ρx /rz ) (2) T (x) = y (1 + z ρy /rz ) and B (x) = y − z 2 δ sin ν z z This results in our parametric intensity model with a total of 16 parameters:
3
gM (x, p) = gEll. (T (B (R (x))))
(3)
p = (rx , ry , rz , a0 , a1 , σ, ρx , ρy , δ, ν, α, β, γ, x0 , y0 , z0 )
(4)
Model Fitting Approach
Estimates of the model parameters in (4) are found by a least-squares fit of the model to the image intensities g (x) within semi-global regions-of-interest (ROIs), thus minimizing the objective function 2 (g (x, p) − g (x)) (5) x∈ROI M Note, the fitting measure does not include any derivatives. This is in contrast to previous fitting measures for surface models which incorporate the image gradient as well as 1st order derivatives of the model (e.g., [5]). For the minimization we apply the method of Levenberg-Marquardt, incorporating 1st order partial derivatives of the intensity model w.r.t. the model
Localization of Anatomical Point Landmarks in 3D Medical Images
79
parameters. The partial derivatives can be derived analytically using the generalized chain rule (e.g., [4]). Note, we do not need to compute the image gradient as is the case with surface models. We need 1st order derivatives of the intensity model only for the minimization process, whereas the surface model approach requires 2nd order derivatives for the minimization. 3.1
Means to Improve Stability
To improve the robustness as well as the accuracy of model fitting, we separated the model fitting process into three different phases. In the first phase, only a subset of the model parameters are allowed to vary in the minimization process (parameters for semi-axes, rotation, and smoothing). In the second phase, the parameters for the intensity levels and the translation are allowed to vary additionally. Finally, the bending and tapering parameters are included in the third phase. During model fitting, which is an iterative process, it may happen that the minimizer yields an invalid value for a certain parameter, e.g., a negative value for the smoothing parameter σ or a semi-axis. We developed two strategies to cope with this problem. The first strategy continues the minimization with the last valid parameter vector where the problematic parameter is not allowed to vary for a few iterations. Normally, after a few iterations it is safe to activate this parameter again as the overall parameter vector has changed. Rarely, mainly when using synthetic data, the first strategy does not solve the problem. In this case, the second strategy is applied. With this strategy the last valid parameter vector is modified by slightly changing the value of the problematic parameter towards the invalid value. In our implementation, we use the two strategies alternatingly, always starting with the preferable first strategy (which does not change the parameter values). 3.2
Calibration of the Intensity Model
Our 3D intensity model in (1) represents an approximation to a Gaussian smoothed ellipsoid. In order to validate our model, we applied the fitting scheme to synthetic 3D images, which have been obtained by discrete Gaussian smoothing of an ideal (unsmoothed) ellipsoid. In these experiments it turned out that we obtain a systematic error in estimating the landmark position. To cope with these errors we developed a nonlinear correction function which “calibrates” the model. The correction function depends on the estimated parameters r x , r y , r z as well as σ
and is given by −1
+ c3 σ
2 + c4 + c5 σ
+ c6 σ
2 2 r z (
rx + r y ) z0 = c1 + c2 σ
(6)
To determine the coefficients c1 , . . . , c6 we devised a large number of experiments and systematically varied the respective parameters. In total, we used more than 2000 synthetic 3D images (not considering tapering and bending). Incorporating the correction function, we achieved an average localization error of less than 0.2 voxels.
80
3.3
S. W¨ orz and K. Rohr
ROI Size Selection
From initial experiments it turned out that the used size of the ROI for model fitting has a major influence on the success and the accuracy. If the ROI is too small then we do not incorporate enough image information into the model fitting process to guarantee a successful fitting. On the other hand, if the ROI is too large we might include neighboring structures which negatively influences the estimated parameters. In addition, with an increasing ROI it becomes more likely that our intensity model does not well describe the anatomical structure at hand since a larger part of the structure has to be modelled. As a consequence, the size of the ROI should be well chosen for each landmark in order to improve the results. Also, though the tapering and bending deformations greatly extend the spectrum of shapes that can be modelled, in some cases these deformations tend to decrease the robustness of model fitting. Thus, for each landmark it should be well decided whether the deformations are included or not. In order to choose an appropriate size for the ROI and decide which types of deformations should be included in the model, we propose the following scheme, which was successfully applied in our experiments. For each landmark, we varied the diameter of the spherical ROI from 11 voxels to 41 voxels in steps of two voxels. For each value of the diameter, we apply our intensity model in four variants: without deformations, with bending deformation only, with tapering deformation only, and with both types of deformations. To gain information about the robustness of model fitting we apply the model fitting for each variant 20 times with different sets of initial parameters. The different parameter sets are obtained by randomly varying the initial values in a range of ±2 voxels for the semi-axes and translations, ±8 grey levels for the intensities, ±0.25 voxels for σ, and ±0.15 radians for the angles. From these 20 fits we automatically exclude all results which obviously are outliers. These are results with a distance of more than 5 voxels to the initial position, results with an estimated parameter rz smaller than one or both of the other semi-axes (thus the result is not a tip defined as the location of maximal curvature of the ellipsoid), and results which are drastic outliers, e.g., a semi-axis larger than 1000 voxels or a smoothing value σ of more than 10. Using the remaining fits, we calculated the product of the variances of the three translation parameters as a measure for the robustness of the model fitting w.r.t. different initial parameters. Finally, we chose the combination of ROI size and deformation variant where the robustness measure was minimal and more than half of the fits were included. Only in one case, namely, the right temporal horn in one image, we accepted the minimum even with less than half of the fits included, because the initialization with the differential operator was very poor. For most landmarks, this simple heuristic leads to a good choice of the ROI size and deformation variant. However, for some landmarks the estimated position was far away from the ground truth position. In these few case, we manually selected the ROI size and the deformation variant, which resulted in an estimated position closest to the initial position and still being sufficiently robust.
Localization of Anatomical Point Landmarks in 3D Medical Images
81
Table 1. Size and resolution of the medical 3D images used in the experiments. Image Woho (MR) C06 (MR) C06 (CT)
4
Slices sagittal axial axial
Size in Voxels 256 × 256 × 256 256 × 256 × 120 320 × 320 × 87
Resolution in mm3 1.0 × 1.0 × 1.0 0.859 × 0.859 × 1.2 0.625 × 0.625 × 1.0
Experimental Results
Our approach has been applied to 3D synthetic data as well as to two 3D MR images and one 3D CT image of the human head. 4.1
3D Synthetic Data
In the first part of the synthetic experiments we applied our approach to 3D image data generated by the intensity model itself with added Gaussian noise. In total we carried out about 2400 experiments with different parameter settings and achieved a very high localization accuracy with an error in the estimated position of less than 0.12 voxels. We also found that the approach is robust w.r.t. the choice of initial parameters. Additionally, for about 1600 experiments with similar settings but very intense Gaussian noise down to a signal-to-noise ratio of ca. 1, the localization error turned out to be less than 0.52 voxels. In the second part of the experiments, we used 3D image data generated by discrete Gaussian smoothing of an ideal (unsmoothed) ellipsoid with added Gaussian noise. After applying the correction function (6) we found that the average error in the estimated position was 0.25 voxels. In contrast, the uncorrected position had an average error of 1.25 voxels. 4.2
3D Medical Images
We also applied the new approach to three real 3D tomographic images of the human head (datasets Woho and C06). The sizes and resolutions of the images are listed in Table 1. To achieve isotropic image data in case of the C06 image pair, we applied an interpolation based on 3rd-order polynomials (Meijering [10]) prior to model fitting. We considered seven tip-like landmarks, i.e. the frontal, occipital, and temporal horns (left and right) as well as the external occipital protuberance. For these landmarks in all three images we used as ground truth, positions that were manually determined in agreement with up to four persons. For the CT image, we did not consider the temporal horns since either the ground truth position was missing due to low signal-to-noise ratio (left horn) or it was not possible to successfully fit the intensity model (right horn). Figure 2 shows the image data in a ROI of the right horn. Particularly with this landmark the image quality was relatively bad. In general, the quality of the CT image at the ventricular system was worse in comparison to the MR images.
82
S. W¨ orz and K. Rohr
Fig. 2. Five axial 2D slices showing a ROI of 21 × 21 × 5 voxels of the right temporal horn in the C06 image pair (top MR, bottom CT). The ground truth position of the landmark is marked by the square in the center image. The slices on the left are directly below and the slices on the right are directly above the center slice in the 3D image.
Parameter Settings. The fitting procedure described above requires the determination of suitable initial parameter values. The specification of these values is an important and not trivial task. Often all parameter values are initialized manually, which is time-consuming. Here, we automatically initialize half of the model parameters. Values for the most important parameters, namely, the translation parameters (x0 , y0 , z0 ) defining the position of the landmark were obtained by a 3D differential operator. Here we used the operator Op3 = detC g /traceC g , where C g is the averaged dyadic product of the image gradient ([12]). The smoothing parameter σ was always initialized with 1.0 and the deformation parameters ρx , ρy , δ, and ν were all initialized with 0.0; thus, the intensity model was always initialized as an ellipsoid without deformation. The remaining parameters for the semi-axes (rx , ry , rz ), the intensity levels a0 and a1 , and the rotation angles (α, β, γ) were initialized manually. For the left and right occipital horns in the Woho image, the resulting positions of the 3D differential operator Op3 are relatively far away from the ground truth positions (see Table 4). In addition, the anatomical structure of the occipital horns in this image is rather untypical, thus requiring good initial parameters for successful model fitting. Therefore, we initialized the translation parameters in these two cases manually. Results. Tables 2, 3, and 4 show the fitting results for the considered landmarks. Having chosen the ROI size and the deformation variant we applied the model fitting 100 times with different sets of randomly chosen initial parameters to obtain accurate means and standard deviations of the estimated parameter values. On average, model fitting succeeded for each landmark in 59 out of 100 cases with an average of 75 iterations and a mean fitting error (positive root of the mean squared error) of e¯M F E = 20.48 grey levels. For the external occipital protuberance we obtained a relatively large fitting error. Excluding the result for this landmark the mean fitting error improves to e¯M F E = 10.60 grey levels. There are two reasons for the larger mean fitting error of the external occipital
Localization of Anatomical Point Landmarks in 3D Medical Images
83
Fig. 3. 3D contour plots of the fitted intensity model for the external occipital protuberance within the original image pair C06 (left MR and right CT). The marked axes indicate the estimated landmark positions. Note, the size of the ROI and the used deformations are different.
protuberance. In case of the CT image, the difference in the intensity levels a0 and a1 is more than a magnitude larger than in all other landmarks. Thus, a larger mean fitting error is a consequence. In case of the two MR images, it turns out that our intensity model is not as well suited for this landmark as for the other landmarks. The reason is that the model assumes homogeneous intensities outside of the ellipsoid. In case of the external occipital protuberance which is located directly at the skull, we always find within the ROI three different intensity levels for the intensity a0 , i.e. skin (white and grey) and air (black). Thus, the model has to average these intensities, which explains the relatively large mean fitting error and also causes the larger standard deviations in the position estimates. However, as the estimated landmark position always turns out to be very good, our model is nevertheless applicable and we included the results. We have visualized the results for this landmark for the C06 image pair in Figure 3 using 3D Slicer ([6]). The fitted intensity model is visualized as a contour plot, using the model’s intensity at the estimated landmark position as contour value. The average distance between the estimated landmark positions and ground truth positions for all 19 landmarks computes to e¯ = 1.14mm. In comparison, using the 3D differential operator Op3, we obtain an average distance of e¯Op3 = 2.18mm. Thus, the localization accuracy with our new approach turns out to be much better. For the surface model approach ([5]), only comparable data for four landmarks is available, namely, the left and right frontal and occipital horns of the C06 MR image. The average distance of the surface model approach for these four landmarks is e¯Surf = 1.26mm, whereas our approach yields e¯ = 0.68mm and the differential operator e¯Op3 = 2.17mm. Bearing in mind the small number of landmarks, we can conclude that the localization accuracy with our new approach is better than the surface model approach, while the differential operator Op3 yields the worst result. Note, the listed distances e¯ are not calibrated since for nearly all landmarks deformations were included in the model. In Figure 4 we visualized the fitting result for the left occipital horn in the C06 (MR) image. Besides the 3D contour plot of the fitted intensity model within three adjacent slices of the original data we also marked the estimated
84
S. W¨ orz and K. Rohr
Fig. 4. 3D contour plots of the fitted intensity model for the left occipital horn within the C06 (MR) image. The result is shown with and without the model for three adjacent slices of the original data. The marked axes indicate the estimated landmark positions for the new approach (white) and the differential operator Op3 (black).
Fig. 5. 3D contour plots of the fitted intensity models for the left and right frontal horn within an MR image (Woho). The result is shown for four different slices of the original data.
Localization of Anatomical Point Landmarks in 3D Medical Images
85
Table 2. Fitting results for the ventricular horns and the external occipital protuberance for the C06 image (MR) for ca. 60 experiments. The chosen diameter d of the spherical ROI and the type of deformation are listed. An asterisk marks the landmarks where the robustness criterion was not sufficient to automatically choose d and the deformation type. The estimated landmark position and intensity levels are given together with their standard deviations. Also, the mean fitting error e¯M F E in grey levels and the distance e¯ to the ground truth position are listed. For comparison, the distance e¯Op3 of the differential operator Op3 to the ground truth position is given. d x ˆ0 29 150.65 (No deformation)∗ 0.011 Right frontal horn 11 112.34 (Bending only)∗ 0.000 Left occipital horn 15 143.91 (Tapering only) 0.000 Right occipital horn 17 107.82 (Tapering only) 0.000 Left temporal horn 19 164.01 (Bending only) 0.482 Right temporal horn 13 98.98 (Tapering and bend.)∗ 0.001 Ext. occ. protub. 35 130.05 (Tapering and bend.) 0.147 C06 (MR) Left frontal horn
yˆ0 79.58 0.007 76.85 0.000 200.85 0.000 195.98 0.000 117.26 0.356 112.23 0.001 230.94 0.413
zˆ0 68.14 0.004 69.02 0.000 53.01 0.000 56.04 0.001 45.38 0.282 40.63 0.000 32.97 0.562
a ˆ0 91.6 0.0 93.9 0.0 84.9 0.0 86.6 0.0 82.4 0.2 80.0 0.0 61.6 0.0
a ˆ1 22.3 0.1 18.8 0.0 15.2 0.0 20.0 0.0 12.8 2.2 18.8 0.0 8.7 5.0
e¯M F E e¯ eOp3 9.37 1.27mm 1.92mm 8.57 0.58mm 1.72mm 6.37 0.15mm 3.32mm 6.73 0.70mm 1.72mm 10.81 1.20mm 1.71mm 12.43 0.97mm 2.10mm 48.01 0.06mm 1.21mm Mean 0.70mm 1.96mm
Table 3. Same as Table 2 but for the C06 image (CT). For this image, we restricted the maximal ROI size to 21 voxels as the image only captures a part of the head and therefore three landmarks are close to the image borders. d x ˆ0 11 192.80 (No deformation) 0.136 Right frontal horn 15 135.31 (Tapering and bend.)∗ 0.013 Left occipital horn 13 184.07 (Bending only) 0.584 Right occipital h. 15 129.50 (No deformation) 0.019 Ext. occ. protub. 17 161.20 (No deformation) 0.002 C06 (CT) Left frontal horn
yˆ0 93.94 0.086 90.46 0.005 260.57 0.390 255.77 0.170 309.43 0.003
zˆ0 77.04 0.142 78.14 0.011 69.21 0.181 72.88 0.009 48.01 0.001
a ˆ0 1043.5 0.2 1036.7 0.0 1038.5 0.2 1045.0 0.9 1007.9 0.1
a ˆ1 996.8 1.5 1001.8 0.0 989.7 4.2 994.0 2.0 2679.0 0.3
e¯M F E e¯ eOp3 12.92 1.33mm 0.63mm 11.71 1.26mm 2.10mm 9.65 0.66mm 0.00mm 9.83 0.94mm 1.33mm 116.03 1.10mm 1.72mm Mean 1.06mm 1.16mm
landmark positions for the new approach (white) and the differential operator Op3 (black). It can be seen that the model describes the depicted anatomical structures fairly well. Here, the distance between the estimated position of our approach to the ground truth position (not shown) is 0.15mm whereas the dis-
86
S. W¨ orz and K. Rohr Table 4. Same as Table 2 but for the Woho image (MR).
d x ˆ0 15 111.26 (Tapering only) 0.000 Right frontal horn 11 111.49 (Tapering only) 0.000 Left occipital horn 11 189.38 (Tapering and bend.) 0.002 Right occipital horn 11 182.63 (Tapering only) 0.002 Left temporal horn 37 134.90 (Tapering only)∗ 0.301 Right temporal horn 19 129.24 (Tapering only) 0.045 Ext. occ. protub. 29 232.14 (Tapering only)∗ 0.242 Woho Left frontal horn
yˆ0 78.26 0.000 77.54 0.000 101.53 0.001 97.42 0.003 111.86 0.426 114.36 0.042 149.73 0.175
zˆ0 101.84 0.000 132.27 0.000 91.62 0.002 150.02 0.002 88.81 0.155 150.16 0.004 120.96 0.571
a ˆ0 124.0 0.0 117.3 0.0 107.3 0.0 112.7 0.0 95.1 0.0 109.6 0.0 84.2 0.1
a ˆ1 23.8 0.0 20.1 0.0 23.3 0.0 15.9 0.1 44.3 2.1 35.8 0.3 26.8 0.3
e¯M F E e¯ e¯Op3 9.58 2.22mm 3.16mm 8.93 1.44mm 2.24mm 10.15 2.31mm 4.12mm 7.92 0.68mm 3.61mm 21.11 1.80mm 2.83mm 13.49 1.46mm 4.58mm 55.54 1.48mm 1.41mm Mean 1.63mm 3.14mm
tance of the differential operator Op3 is 3.32mm. The estimated position of the differential operator is clearly inside the structure and relatively far away from the tip of the horn. This is a typical result for the long and thin ventricular horns in our experiments. The reason for this systematic localization error results from smoothing the image data when computing the image gradient, which is necessary in order to calculate the response of the differential operator. In contrast, our approach directly exploits the image intensities (without smoothing) and is therefore not vulnerable to this effect. Also, the figures demonstrate that the spectrum of possible shapes of our intensity model is relatively large. For example, Figure 3 shows a strongly deformed ellipsoid (left) as well as a normal ellipsoid (right) whereas Figure 4 shows a long and thin tapered ellipsoid and Figure 5 shows wider tapered ellipsoids. The execution time of our algorithm is mainly dependent on the size of the ROI, the chosen variant of the deformation, and the quality of the initial parameters. As a typical example, the fitting time for the right temporal horn in the Woho image including tapering and bending deformations and a diameter of the ROI of 19 voxels is ca. 1s (on a AMD Athlon, 1.7GHz, running Linux).
5
Discussion
The experiments verify the applicability of our new approach, which yields subvoxel positions of 3D anatomical landmarks. The intensity model describes the anatomical structures fairly well as can be seen from the 3D contour plots. Also, the figures demonstrate that the spectrum of possible shapes of our intensity model is relatively large. Issues for further work are the automatic initialization of all parameters of the model based on differential properties of the image as well as to improve the computational efficiency for selecting the ROI size.
Localization of Anatomical Point Landmarks in 3D Medical Images
87
Acknowledgement. The original MR and CT images have kindly been provided by Philips Research Hamburg and W.P.Th.M. Mali, L. Ramos, and C.W.M. van Veelen (Utrecht University Hospital) via ICS-AD of Philips Medical Systems Best.
References 1. M. Alker, S. Frantz, K. Rohr, and H.S. Stiehl, “Improving the Robustness in Extracting 3D Point Landmarks from 3D Medical Images Using Parametric Deformable Models”, Proc. MICCAI’2001, Utrecht, The Netherlands, Oct. 14–17, 2001, Lecture Notes in Computer Science 2208, W.J. Niessen and M.A. Viergever (Eds.), Springer-Verlag Berlin Heidelberg 2001, 582–590 2. R. Bertolini and G. Leutert, Atlas der Anatomie des Menschen. Band 3:Kopf, Hals, Gehirn, R¨ uckenmark und Sinnesorgane, Springer-Verlag, Berlin, 1982 3. L. Le Briquer, F. Lachmann, and C. Barillot, “Using Local Extremum Curvatures to Extract Anatomical Landmarks from Medical Images”, Medical Imaging 1993: Image Processing, 16–19 Febr. 1993, Newport Beach, California/USA, Proc. SPIE 1898, M.H. Loew (Ed.), 549–558 4. I.N. Bronstein and K.A. Semendjajew, Taschenbuch der Mathematik, 19. Auflage, Verlag Harri Deutsch, Thun und Frankfurt/Main, 1981 5. S. Frantz, K. Rohr, and H.S. Stiehl, “Localization Of 3D Anatomical Point Landmarks In 3D Tomographic Images Using Deformable Models”, Proc. MICCAI’2000, Pittsburgh, Pennsylvania/USA, Oct. 11–14, 2000, Lecture Notes in Computer Science 1935, S.L. Delp, A.M. DiGioia, and B. Jaramaz (Eds.), SpringerVerlag Berlin Heidelberg, 2000, 492–501 6. D.T. Gering, A. Nabavi, R. Kikinis, W.E.L. Grimson, N. Hata, P.Everett, F. Jolesz, and W.M. Wells, “An integrated Visualization System for Surgical Planning and Guidance using Image Fusion and Interventional Imaging”, Proc. MICCAI’99, Cambridge England, Sep. 19–22, 1999, Lecture Notes in Computer Science 1679, C. Taylor and A. Colchester (Eds.), Springer-Verlag Berlin Heidelberg, 1999, 808–819 7. T. Hartkens, K.Rohr, and H.S. Stiehl, “Evaluierung von Differentialoperatoren zur Detektion charakteristischer Punkte in tomographischen Bildern”, Proc. 18. DAGM-Symposium Mustererkennung (DAGM’96), 11.–13. Sept. 1996, Heidelberg/Germany, Informatik aktuell, B. J¨ ahne, P. Geißler, H. Haußecker, and F.Hering (Eds.), Springer-Verlag Berlin Heidelberg, 1996, 637–644 8. T. Hartkens, K.Rohr, and H.S. Stiehl, “Evaluation of 3D Operators for the Detection of Anatomical Point Landmarks in MR and CT Images”, Computer Vision and Image Understanding 85, 2002, 1–19 9. B. Likar and F. Pernuˇs, “Automatic Extraction of Corresponding Points for the Registration of Medical Images”, Medical Physics 26, 1999, 1678–1686 10. E.H.W. Meijering, K.J. Zuiderveld, and M.A. Viergever, “Image Reconstruction by Convolution with Symmetrical Piecewise nth-Order Polynomial Kernels”, IEEE Trans. on Image Processing, 8(2), 1999, 192–201 11. K. Rohr, “Recognizing Corners by Fitting Parametric Models”, International J. of Computer Vision 9:3, 1992, 213–230 12. K. Rohr, “On 3D differential operators for detecting point landmarks”, Image and Vision Computing 15:3, 1997, 219–233 13. J. Sobotta, Atlas der Anatomie des Menschen. Band 1: Kopf, Hals, obere Extremit¨ at, Haut, Urban & Schwarzenberg, M¨ unchen, 19th edition, 1988
88
S. W¨ orz and K. Rohr
14. J.-P. Thirion, “New Feature Points based on Geometric Invariants for 3D Image Registration”, Int. J. of Computer Vision 18:2, 1996, 121–137 15. K.N. Walker, T.F. Cootes, and C.J. Taylor, “Locating salient object features”, Proc. 9th British Machine Vision Conference (BMVA’98), Southampton, UK, Sep., 1998, J.N. Carter and M.S. Nixon (Eds.), volume 2, BMVA Press, 1998, 557–566
Morphology-Based Cortical Thickness Estimation Gabriele Lohmann, Christoph Preul, and Margret Hund-Georgiadis Max-Planck-Institute of Cognitive Neuroscience, Leipzig, Germany
Abstract. We describe a new approach to estimating the cortical thickness of human brains using magnetic resonance imaging data. Our algorithm is part of a processing chain consisting of a brain segmentation (skull stripping), as well as white and grey matter segmentation procedures. In this paper, only the grey matter segmentation together with the cortical thickness estimation is described. In contrast to many existing methods, our estimation method is voxel-based and does not use any surface meshes. While this fact poses a principal limit on the accuracy that can be achieved by our method, it offers tremendous advantages with respect to practical applicability. In particular, it is applicable to data sets showing severe cortical atrophies that involve areas of high curvature and extremely thin gyral stalks. In contrast to many other methods, it is entirely automatic and very fast with computation times of a few minutes. Our method has been used in two clinical studies involving a total of 27 patients and 23 healthy subjects.
1
Introduction
The human cerebral cortex consists of highly convoluted layers of neuronal cells. Its thickness varies considerably across different regions of the brain [1],[2],[3],[4]. The neocortex is organized into six separate layers [5]. However, at the level of spatial resolution that in vivo magnetic resonance imaging currently permits, it is impossible to visualize the laminar organization. In healthy subjects, the cortex has an average thickness of about 3mm. It tends to be thinnest in the calcarine cortex (about 2mm) and thickest in the precentral gyrus (about 4mm) [2]. Previous studies have suggested that the cortical thickness may be effected by various forms of brain diseases, for instance by epilepsy [2], mental retardation [1] and neurodegenerative diseases [6]. In this paper, we will present an algorithm that can be used to estimate grey matter volume as well as cortical thickness. These two tasks are not identical. For cortical volume measurement it is not necessary to differentiate between the cortices of opposing sulcal walls residing within the same sulcal bed whereas for cortical thickness estimation this problem is essential. Both tasks however depend upon a reliable detection of the cortical surfaces that mark the boundary towards other tissue classes within the brain. For ease of notation, we will use the following definitions (see figure 1): C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 89–100, 2003. c Springer-Verlag Berlin Heidelberg 2003
90
G. Lohmann, C. Preul, and M. Hund-Georgiadis
B A C
Fig. 1. The different types of grey matter surface. The area marked “A” is a GM/dura boundary, “B” shows a GM/CSF boundary, and “C” is a GM/GM boundary. For the estimation of cortical volume the detection of the GM/GM boundary is not essential, whereas for the estimation of cortical thickness it is essential.
1. the term “GM/CSF boundary” denotes the boundary between grey matter and CSF, 2. the term “GM/dura boundary” denotes the boundary between grey matter and the dura in places where the dura is so closely attached to the grey matter that the CSF is not visible in the MR image, 3. the term “GM/GM boundary” denotes the boundary between cortices close to sulcal fundus regions where the CSF is not visible in the MR image due to partial volume effects. Cortical thickness estimation is a demanding task. Partial volume effects make a clear separation between different tissue classes difficult so that even experts find it difficult to make unequivocal decisions. Furthermore, the definition of cortical thickness is not easy. Ideally, cortical thickness should reflect the length of axonal connections along the columnar organization of the cortex. Due to the limited spatial resolution of MR imaging this is however impossible. The easiest alternative approach is to define cortical thickness as the length of the shortest path from each point of the outer cortical surface towards the nearest point on the inner cortical surface. A more sophisticated definition was proposed by Jones et al. [2] who use Laplace’s equation for thickness definition. Their method provides an anatomically plausible definition of cortical thickness even though it is only derived from a mathematical model rather than based upon experimental observation. Manual estimations of cortical thickness in MR images are prone to error. It is next to impossible to visually guess the length of shortest connection from one boundary surface towards the other in a 3D voxel image. Therefore, man-
Morphology-Based Cortical Thickness Estimation
91
ual estimations of thickness were often simply based upon measurements of distances within one image slice. Unless the slice is directly aligned with the surface normals this method leads to consistent over-estimations of cortical thickness. Post-mortem analysis of cortical thickness is also unreliable due to the shrinkage of the specimen. A number of researchers have therefore proposed automatic algorithms for cortical thickness estimation as well as for cortical volume estimation [3],[7], [8],[2], [9], [10],[11], [12],[13],[14]. Most of these algorithms are based on the generation of surface representations that mark the inner and outer boundaries of the cortex. The above algorithms can be roughly grouped into three categories, namely static isosurface generation, active contours and geodesic active contour approaches. Geodesic active contours are also called “level set based shape recovery” or “geometric deformable models”. Isosurface generation [12],[13],[14] uses a voxel-based initial segmentation that attributes a tissue type label to each voxel. The isosurface is then generated using a surface mesh generator such as marching cubes. This surface mesh is however static, i.e. it does not change its location once it has been established. In contrast, active and geodesic contours are spatially variable by definition and are allowed to move in order to obtain a better fit to the image data. Proponents of active contours are [3],[7], [8],[2],[9], [15]. Geodesic contours were used in this context by Zeng et al. [10], and Goldenberg et al. [11]. Active contours are defined by an energy minimization functional that is constrained by image features such as edges and by model features such as bending or stretching terms. Geodesic contours are defined by evolving surfaces. For an overview of active contours see [16]. A comprehensive review of geodesic contours is given by Suri et al. [17]. A number of problems must be solved in any cortical thickness estimation procedure. The first problem concerns the topological correctness and prevention of self-intersection of the surface meshes. This problem is relevant within the interior of sulcal beds as surface meshes of the two opposing cortical sheets are very close to each other and may easily self-intersect. The point where the surfaces intersect identifies the GM/GM-boundary. In addition, the prevention of self-intersection is relevant for topology preservation. A topologically correct surface mesh can be easily flattened. Fischl et al. [3] as well as MacDonald et al. [8] include special constraints within their methods to prevent self-intersection even though some post-editing may be necessary [7]. The method by Joshi et al. [13] also requires manual editing of the surface. The second problem that surface-generating algorithms face is the problem of high curvature that occur both at the vicinity of sulcal fundi as well as at the top of gyral crowns. This problem becomes especially relevant in brain data sets showing cortical tissue loss (see figure 3). The method proposed by Fischl et al. [3] tries to alleviate inaccuracies at high curvature areas as their method produces surfaces that are only second-order smooth.
92
G. Lohmann, C. Preul, and M. Hund-Georgiadis
Davatzikos et al. [9] propose an active contour approach that is based upon a ribbon representation of the cortical sheet. Their method was later extended by Xu et al. [15] who have presented a method for generating an active contour based on generalized gradient vector flow (GGVF) that starts out from an adaptive fuzzy c-means segmentation. Both Davatzikos et al. and Xu et al. did not primarily target the estimation of cortical thickness but the generation of an accurate surface representation. The computational demands of surface fitting can be very high. MacDonald et al. report a computation time of 30 hours on a 180 MHz Silicon Graphics R10000 processor for a single data set [8]. Likewise, Fischl et al. report a computation time of 5 hours per data set on a 500 MHz Pentium 3 processor [3]. Both author teams attribute the majority of the processing time to the prevention of selfintersection. Manual surface editing as required in the methods by Fischl et al. and Joshi et al. [13] makes the entire time spent on data evaluation even longer. The goal of the present study is to present a method that is capable of segmenting the cortical sheet so that morphometric measurements of both cortical thickness and cortical volume become possible. Our method is particularly targeted towards segmenting data sets of diseased patients with cortical atrophies. Figures 2,fig:geo1 show examples. Note that the cortical surfaces show extremely sharp bends because the white matter stalks are very thin and reduced to onevoxel thin sheets in some areas. Algorithms that are based on deformable models are bound to fail in the presence of such circumstances as they require smoothness constraints to regularize the resulting surface meshes. Therefore, we propose to use an approach that is not based upon deformable surfaces. Rather, we adhere to the voxel representation of the original MR image. This strategy allows us to segment irregularly shaped structures such as the ones shown in figure 2.
2
The Data
Our input data consisted of high-resolution magnetic resonance images (MRI) of human brains. Imaging was performed at 3T on a Bruker Medspec 30/100 system. T1-weighted MDEFT [18] images were obtained with a non slice-selective inversion pulse followed by a single excitation of each slice [19]. The spatial resolution between planes was approx. 1.5mm and the within-plane resolution was set to approx. 0.95mm × 0.95mm. The images were subsequently resampled to obtain isotropic voxels of size 1mm × 1mm × 1mm so that each data set contained 160 slices with 200×160 pixels in each slice. All data sets were rotated into the Talairach coordinate system and covered the entire brain.
3
Grey Matter Segmentation and Cortical Thickness
Our grey matter segmentation algorithm proceeds as follows. We assume that a white matter segmentation as described in [20] has been performed beforehand (see figure 3). Initially, a distance transform is performed on the white matter
Morphology-Based Cortical Thickness Estimation
93
Fig. 2. An MR image showing a severe pathological tissue loss around the area indicated by the white cross. Images of this type pose serious problems to algorithms that require smoothness constraints due to the presence of surface areas with high curvature.
segmented image. As a result, each non-white matter voxel receives a label that represents its distance from the nearest white matter voxel. Secondly, a depth labelling is performed that is based upon the white matter closure. The WM closure is obtained by a morphological closing operation using a structuring element of spherical shape with a large radius. In our experiments, the radius was set to 12 mm. The depth labelling is obtained by applying a distance transform to the WM closure. It helps to distinguish between gyral crowns and sulcal fundus regions. The algorithm proceeds by adding non-white matter voxels to the white matter image provided they pass several tests. Voxels are considered for addition in the order provided by the distance labelling. Voxels that are close to the white matter surface are inspected first. Topology preservation is an important criterion. The addition of voxels should not introduce holes or disconnected areas. The topological correctness is checked using Bertrand’s algorithm [21]. Furthermore, voxels should not be added beyond the grey matter/CSF boundary. This criterion is difficult to check and several different cases must be distinguished. We assume that white matter voxels are brighter than grey matter voxels, and grey matter voxels are brighter than CSF voxels. These conditions are fulfilled in T1-weighted MRI data. The basic idea is to check the grey value profile as the distance from the white matter surface increases as illustrated in figure 4. Note that the profiles on gyral crowns are qualitatively similar to profiles within sulci. On gyral crowns, the GM/CSF border is characterized by a sharp drop in the profile and a slight rise after the border towards the dura. Within sulci, the profile drops and rises again towards the opposite sulcal wall. The criterion for voxel addition is checked as follows. Let v denote the voxel under inspection. We investigate its neighbourhood N of size 5 × 5 × 5mm. We
94
G. Lohmann, C. Preul, and M. Hund-Georgiadis
Fig. 3. The results of a white matter segmentation that is used as input into the cortical thickness estimation procedure. This image shows a severe cortical atrophy that results in extremely thin gyral stalks. The grey/white matter boundary as well as the grey matter/CSF boundary exhibit areas of high curvature and are anything but smooth. 255
255
0
0
WM/GM border
GM/CSF border
dura
WM/GM border
GM/CSF border
WM/GM border
Fig. 4. Schematic illustration of the grey value profile as the distance from the white matter surface increases. The left image shows a typical profile on gyral crowns, the right image shows the profile within sulci.
now split this neighbourhood into three parts (see figure 5). Let N< denote the set of all voxels within N that are closer to the WM surface than v, let N> denote the set of all voxels within N that are farther away, and let N= denote all voxels that have the same distance as v. Furthermore, let mean< the average grey value within N< , and let mean> denote the average grey value within N> , In addition, we investigate the correlation between grey values and distance from the WM surface. Let corr< denote the correlation between grey values and distance within N< , and let corr> denote this correlation within N> . Detecting the GM/CSF boundary. If N resides completely within the grey matter compartment, then mean< and mean> are almost equal. However, at the GM/CSF boundary, mean> is much smaller than mean< . Also, the correlation corr< is negative as with increasing distance the grey profile drops. At the same
Morphology-Based Cortical Thickness Estimation
95
Fig. 5. The neighbourhood N around the current voxel v is split into three parts: N< contains all voxels closer to the WM surface than v, voxels in N> are farther way from WM. N= consists of voxels along the dotted line that intersect the neighbourhood N .
time, corr> is positive as the grey value profile rises after passing through the GM/CSF boundary. In summary, at the GM/CSF boundary the following conditions must hold: mean< mean>
> t and corr< < 0, corr> > 0.
Detecting the GM/dura boundary. The removal of dura tissue is part of our skull stripping procedure so that we assume that dura tissue is mostly removed prior to the call of the cortical thickness estimation. However, to make the procedure more robust, we perform an additional test at this point. First note that the GM/dura boundary cannot occur within the sulcal interior. This condition is easily checked using the depth labelling. Secondly, we compare the grey level of the current voxel gcurr with the mean grey levels mean> , and mean< . At the boundary between the cortex and the dura, this grey value should be less than both mean> and mean< . At the same time, the correlation between depth and grey values should show the same pattern as at the GM/GM boundary. Thus, at the GM/dura boundary, the following conditions are tested: mean< > gcurr ,
mean< > gcurr ,
corr< < 0, corr> > 0.
Detecting the GM/GM boundary. The above conditions do not suffice to detect the GM/GM boundary between adjacent cortices of opposite sulcal walls. In this case, we employ a different strategy. We assume that the GM/GM boundary resides in the center between the two opposite sulcal walls, so that its location may be estimated using topological skeletonization techniques. In some cases, the assumption of centrality may not be justified. However, due to the insufficient spatial resolution of the MRI data, a more informed approach is not possible. We propose the following approach. We assume that the GM/GM boundary is a subset of the medial skeleton as depicted in figure 6. The medial skeleton is detected as follows. First note that a
96
G. Lohmann, C. Preul, and M. Hund-Georgiadis
Fig. 6. The medial skeleton. It is detected using morphological filters and topological thinning. The GM/GM boundary is a subset of the medial skeleton.
voxel v is medial if all voxels in a local n × n × n neighbourhood that are at the same depth are closer to the white matter surface than v. Figure 7 illustrates this idea. In a discrete voxel grid equality of depth or distance are not well defined. Therefore, we allow a small range within which voxels are considered to have equal depth or distance. We additionally apply a topological thinning algorithm so that a thin discrete surface results. In our experiments, we use Tsao’s algorithm [22] for this purpose, although other algorithms might have been used just as well (see for instance [23],[24],[25]). In the present case, 3D topological thinning leads to very stable results as the digital objects to be processed are already very thin and almost skeletonized due to the prior selection of medial voxels. Thus, the exact choice of the thinning algorithm is not critical. A voxel is classified as belonging to the GM/GM boundary if it is medial and resides on the 3D skeleton and if it is not on the GM/CSF or GM/dura boundary or within the CSF compartment. 3.1
Estimating Cortical Thickness
To estimate cortical thickness we need to identify all voxels that belong to either the inner or the outer surface of the cortex. The inner surface is assumed to be known from the white matter segmentation. A voxel is defined to belong to the white matter surface if there exists a 26-adjacent voxel that is not a white matter voxel. The outer surface of the cortex is more complex as the grey matter segmentation yields three different types of boundaries: the GM/CSF boundary, the GM/dura boundary and the GM/GM boundary. A grey matter voxel is defined to belong to the GM/CSF or to the GM/dura boundary if there exists
Morphology-Based Cortical Thickness Estimation
97
CSF GM non−medial
x
WM
isodepth medial
x
isodepth
Fig. 7. Detecting the GM/GM boundary. A voxel is medial if all adjacent voxels that are at the same depth are closer to the white matter surface. In the illustration the voxels at equal depth reside along the lines marked as “isodepth”.
a 26-adjacent voxel that is neither a grey matter nor a white matter voxel. In addition, a grey matter voxel is defined to belong to the GM/GM boundary if it was marked as a GM/GM voxel during the grey matter segmentation. Cortical thickness can now be estimated using a 3D Euclidean distance transform with respect to the white matter surface [26]. For each voxel that is classified as belonging to any of the three GM boundary types we read off its distance label. The 3D Euclidean distance transform yields the length of the shortest Euclidean path in 3D from each GM boundary voxel to the nearest white matter voxel. Figure 8 shows an example.
Fig. 8. Results of cortical thickness estimation. The thickness is color-coded and superimposed onto voxels that belong to the outer cortical surface. This data set shows an atrophy of the anterior portion of the temporal lobes.
98
4
G. Lohmann, C. Preul, and M. Hund-Georgiadis
Experiments
The entire processing chain was implemented in a C/Unix environment. The computation time for a cortical thickness estimation of a single data set is less than 1 minute on a 1500 MHz AMD Athlon(TM) XP1800+ processor. We applied the entire processing chain to T1-weighted MRI data of 4 patients with a proven diagnosis of fronto-temporal dementia and also to data from 23 patients showing a cerebral microangiopathy and to a control group of 23 healthy subjects. All patients and test subjects gave informed consent. The segmentation results were visually inspected by two experts and found to be satisfactory. A quantitative evaluation is under way.
5
Conclusion
We have presented a new algorithm for cortical thickness estimation using T1weighted MRI data. Our method is voxel-based and it does not use surface-mesh representations. This fact has both advantages and disadvantages. A disadvantage incurred by this fact is that the segmentation accuracy is principally limited by the spatial resolution of the voxel grid. In view of the fact that the cortical sheet has an average thickness of 3-5 mm this may indeed be problematic. On the other hand, even surface based methods cannot truly achieve subvoxel accuracy. Deformable surface models require some form of smoothness constraint that makes accurate segmentations in high curvature areas next to impossible. Thus, such methods are likely to produce quite inaccurate results in the type of data sets that we were primarily interested in, namely MRI data of patients with cortical tissue loss. Such conditions effectively prohibit the use of smooth surface models. In contrast, our approach is quite robust in the presence of irregularly shaped surfaces, and can thus be effectively applied to pathological MRI data. Visual inspection by experts has confirmed the correctness of the segmentation results. Another major advantage of our method is its computational efficiency. The entire processing chain takes no more than a few minutes on a state-of the-art Linux/workstation while several competing methods take many hours. One reason for this computational advantage is the fact that our approach to the detection of GM/GM boundary (the “buried cortices” problem) is solved by very fast morphological operations. In contrast, surface based methods must deal with this problem by including mechanisms that prevent the self-intersection of surfaces. Such mechanisms are computationally very expensive. Furthermore, our method is easy to use as no manual interventions are needed. Again, this is a major advantage over competing methods that often require manual intervention for topological correction. Manual editing of surface meshes can be very bothersome and time-consuming, and are not required in our approach. In summary, the major advantage of our approach is its applicability in terms of ease of use, computational efficiency and robustness when applied to data showing severe cortical tissue loss.
Morphology-Based Cortical Thickness Estimation
99
References 1. N. Kabani, G. Le Goualher, D. MacDonald, A.C. Evans. Measurement of cortical thickness using an automated 3-D algorithm: a validation study. Neuroimage, 13:375–380, 2001. 2. S.E. Jones, B.R. Buchbinder, I. Aharon. Three-dimensional mapping of cortical thickness using laplace’s equation. Human Brain Mapping, 11(1):12–32, 2000. 3. B Fischl, A.M. Dale. Measuring the thickness of the human cerebral cortex from magnetic resonance images. PNAS, 97(20):11050–11055, 2000. 4. K. Brodmann. Vergleichende Lokalisationslehre der Grosshirnrinde in ihren Prinzipien dargestellt auf Grund des Zellaufbaus. Barth, Leipzig, Germany, 1909. 5. C. Von Economo, G. Koskinas. Die Cytoarchitektonik der Hirnrinde des erwachsenen Menschen. Springer, Berlin edition, 1925. 6. P. Thompson,J.Moussai, S, Zohoori, A. Goldkorn,A.A. Khan, M.S. Mega, G.W. Small, J.L. Cummings, A.W. Toga. Cortical variability and asymmetry in normal aging and alzheimer’s disease. Cerebral Cortex, 8:492–509, september 1998. 7. A.M. Dale, B. Fischl, M.I. Sereno. Cortical surface-based analysis, I. segmentation and surface reconstruction. Neuroimage, 9(2):179–194, 1999. 8. D. MacDonald, N. Kabani, A.C. Evans. Automated 3-D extraction of inner and outer surfaces of cerebral cortex from MRI. Neuroimage, 12:34–356, 2000. 9. C. Davatzikos, J.L. Prince. An active contour model for mapping the cortex. IEEE Transactions on Medical Imaging, 14(1):65–80, 1995. 10. X.L. Zeng, L.H. Staib, R.T. Schultz, J.S. Duncan. Segmentation and measurement of the cortex from 3-D MR images using coupled-surfaces propagation. IEEE Trans. Med. Imaging, 18(10):927–937, 1999. 11. R. Goldenberg, R. Kimmel, E. Rivlin, and M. Rudzsky. Variational and level set methods in computer vision. In IEEE Workshop on Variational and Level Set Methods in Computer Vision, Vancouver, Canada, July 2001. 12. P.C. Teo, G.Sapiro, B.A.Wandell. Creating connected representations of cortical grey matter for functional MRI visualization. IEEE Transactions on Medical Imaging, 16:852–863, 1997. 13. M. Joshi, J. Ciu, K. Doolittle, S. Joshi, D.V. Essen, L. Wang, M.I. Miller. Brain segmentation and the generation of cortical surfaces. Neuroimage, 9:461–476, 1999. 14. M.I. Miller, A.B. Massie, J.T. Ratnanather, K.N.Botteron, J.G. Csernansky. Bayesian construction of geometrically based cortical thickness metrics. Neuroimage, 12:676–687, 2000. 15. C. Xu, D. L. Pham, M. E. Rettmann, D. N. Yu, and J. L. Prince. Reconstruction of the human cerebral cortex from magnetic resonance images. IEEE Trans. Med. Imaging, 18(6):467–480, 1999. 16. T. McInerney, D. Terzopoulos. Deformable models in medical image analysis: A survey. Medical Image Analysis, 1(2):91–108, 1996. 17. J.S. Suri, K. Liu, S. Singh, S.N. Laxminarayan, X. Zeng, L. Reden. Shape recovery algorithms using level sets in 2D/3D medical imagery: a state of the art review. IEEE trans. on information technology in biomedicine, 6(1):8–28, 2002. 18. K. Ugurbil, M. Garwood, J. Ellermann, K. Hendrich, R. Hinke, X. Hu, S.-G. Kim, R. Menon, H. Merkle, S. Ogawa, R. Salmi. Imaging at high magnetic fields: Initial experiences at 4T. Magn. Reson. Quart., 9(259), 1993. 19. D.G. Norris. Reduced power multi-slice MDEFT imaging. J. Magn. Reson. Imaging, 11:445–451, 2000.
100
G. Lohmann, C. Preul, and M. Hund-Georgiadis
20. G. Lohmann, C. Preul, M. Hund-Georgiadis. Geometry-preserving white matter segmentation using T1-weighted MRI data. In Human Brain Mapping 2003 Meeting, New York, USA, June 18-22, 2003 (accepted). 21. G. Bertrand, G. Malandain. A new characterization of three-dimensional simple points. Pattern Recognition Letters, 15:169–175, Feb. 1994. 22. Y.F. Tsao, K.S. Fu. A parallel thinning algorithm for 3D pictures. Computer Graphics Image Proc., 17:315–331, 1981. 23. G. Malandain, S. Fernandez-Vidal. Euclidean skeletons. Image and Vision Computing, 16:317–327, 1998. 24. G. Borgefors, I. Nystr¨ om, G. Sanniti Di Baja. Computing skeletons in three dimensions. Pattern Recognition, 32:1225–1236, 1999. 25. A. Manzanera, T. Bernard, F. Preteux, B.Longuet. nD skeletonization: a unified mathematical framework. Journal of Electronic Engineering, 11:25–37, Jan. 2002. 26. T. Saito, J.-I. Toriwaki. New algorithms for euclidean distance transformation of an n-dimensional digitized picture with applications. Pattern Recognition, 27(11):1551–1565, 1994.
The Shape Operator for Differential Analysis of Images Brian Avants and James Gee University of Pennsylvania Philadelphia, PA 19104-6389 {avants,gee}@grasp.cis.upenn.edu
Abstract. This work provides a new technique for surface oriented volumetric image analysis. The method makes no assumptions about topology, instead constructing a local neighborhood from image information, such as a segmentation or edge map, to define a surface patch. Neighborhood constructions using extrinsic and intrinsic distances are given. This representation allows one to estimate differential properties directly from the image’s Gauss map. We develop a novel technique for this purpose which estimates the shape operator and yields both principal directions and curvatures. Only first derivatives need be estimated, making the method numerically stable. We show the use of these measures for multi-scale classification of image structure by the mean and Gaussian curvatures. Finally, we propose to register image volumes by surface curvature. This is particularly useful when geometry is the only variable. To illustrate this, we register binary segmented data by surface curvature, both rigidly and non-rigidly. A novel variant of Demons registration, extensible for use with differentiable similarity metrics, is also applied for deformable curvature-driven registration of medical images.
1
Introduction
Surface-based image analysis is a topic of interest to researchers in anatomy [1, 2,3], image representation and registration [4,5,6,7] and object segmentation and recognition [8]. Neurological interest in differential analysis of the brain comes in part from the convoluted shape of the cerebral cortex, considered a key to human intelligence [3]. The major sulci and gyri on the cortical surface have distinct geometric properties and are conserved between individuals, making them useful landmarks for morphometric comparisons [2]. They also delimit the boundaries of major functional regions. Furthermore, the cerebral cortex has an important role in psychiatric and neuro-degenerative conditions. For these and other reasons, detailed study of the cortical surface may help elucidate evolutionary differences between humans and other animals as well as the genesis of pathology [2,5]. A strong motivation for technical interest in surface measurements is the invariance properties of differential structure. Davatzikos [4] used a deformable surface to find a parametric model of the cortex, along with curvature information. Curvature-related point sets may be extracted and used to drive pointbased image registration [9]. For medical purposes, curvature provides a measure C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 101–113, 2003. c Springer-Verlag Berlin Heidelberg 2003
102
B. Avants and J. Gee
of shape that may be useful in inter-modality registration. Geometry may also aid in automatic landmarking as it is often associated with meaningful anatomical structure [1,10]. The mean of principal curvatures (mean curvature), for example, is related to the local folding of the surface [3]. Measures related to the principal curvatures are used in [10] to automatically segment the cortical sulci. Segmented sulci may subsequently be used for image registration [6]. The methods proposed here will be useful in applications similar to those cited above, but does not require the generation of meshes. As discussed in [1], the meshing process is prone to segmentation inaccuracy, step artifacts and reconstruction errors. Correcting these errors is non-trivial and thus ad-hoc smoothing and iterative alterations of the reconstruction are usually preferred. These post-processing methods may obscure fine image structure, making nonlinear diffusion necessary. Working directly with image data enables us to avoid meshing errors. The surface is instead represented only locally and thus no topological assumptions are made, allowing identical application to spherical and non-spherical topology. Differential image structure is generated with a new technique for finding the shape operator and is applicable to images as well as on meshes. To minimize voxel-size discretization error, a robust neighborhood construction is included. We apply these tools to rigid and non-rigid registration of volumetric images via the computed surface representation.
2
Surface Representation
A mathematical surface, M, is a mapping from a two-dimensional parameter domain into a higher-dimensional space, such as R3 , M : u × v → Rd .
(1)
We assume that locally the surface is of class C 2 , that is, differentiable up to order 2. This allows computation of the principal curvatures, which help classify the type of surface and provide insight into its intrinsic and extrinsic shape variation. The principal curvatures are directly related to the regional variation of the surface’s normal directions. 2.1
Local Frame
We consider the surfaces in our volumetric images as existing as the interfaces of homogenous intensity regions. This assumption locates the surface at high gradient points in the image. This also allows us to construct the local frame for the surface directly from the gradient. We base the approach on approximating the surface by a local surface (or Monge) patch, as shown in Figure 1. The normal is first given by, N=
∇I(xo ) , ∇I(xo )
(2)
The Shape Operator for Differential Analysis of Images
103
where xo denotes the position in space. A local surface frame, F , in three dimensions is a set of three orthogonal unit vectors. These vectors are defined (non-uniquely) in 3D by the normal. We first construct a vector perpendicular to N, such that given N = (N 1 , N 2 , N 3 ), N⊥ =
1 (−(N 2 + N 3 ), N 1 , N 1 ), N1
if N 1 = 0. Similar constructions can be found if N 1 is zero [11]. The local frame is then, F = {N, T1 =
N⊥ , T2 = N × T1 }, N⊥
(3)
where T(xo ) = (T1 , T2 ) defines the local tangent plane. Note, however, these tangents do not necessarily correspond to the principal directions (of minimum and maximum curvature), which the shape operator method, given below, will recover. This frame allows us to represent points in space, near xo , with a twodimensional position, (u, v), within the local co-ordinate chart. This local coordinate is gained by projection of the point x onto the tangent plane, such that, (u, v) = ((x − xo ) · T1 , (x − xo ) · T2 ).
(4)
The surface near xo is then represented in local co-ordinates, giving an explicit construction of the map in Equation 1.
Fig. 1. Monge patch (left) and intrinsic and extrinsic surface distances (right). The local frame and descriptions of the principal curvatures are also shown in the Monge patch image.
104
2.2
B. Avants and J. Gee
Construction of Local Surface Patch
We now provide two distinct views of the local neighborhood surrounding a given surface point. Each is constructed by locating points within a given distance from the point of consideration. The extrinsic neighborhood exists in an open ball surrounding the local origin. The intrinsic neighborhood is within a given geodesic radius of the origin. The differences are illustrated in Figure 1. Each representation provides a local surface patch from which we may estimate differential properties. Consider a location xo in an image at which the gradient magnitude exceeds a small threshold. We define a general membership distance as, m(x) = η(xo , x).
(5)
The function η measures a difference of image-related properties at the given co-ordinates. We use either the magnitude of the gradient or the (perhaps partial volume) labeling of a segmentation. If m(x) is small, then the point x is considered as representative of the surface interface of which xo is part. Note that membership is not exclusive, allowing points to be members of multiple regions. The extrinsic neighborhood about xo is the set of points given by, Ne (xo ) = {xi | xi − xo < δ},
(6)
where δ is a threshold distance. The term “extrinsic” refers to the fact that this distance depends upon how the surface is folded. Two points distant upon a plane may be proximal under the Euclidean metric. We also call this neighborhood an open ball. We now define the intrinsic neighborhood given by the geodesic distance. The geodesic neighborhood, N (xo ), is the set of points that lie within a geodesic circle of the local origin. This is a connected series of points satisfying membership, none of which have across-surface distance beyond δ, N (xo ) = {xi | g(xo , xi ) < δ}.
(7)
The function g(·, ·) gives the shortest distance across the surface. Shortest path methods enable one to compute this neighborhood in O(N log N ) time. The distance is “intrinsic” because it is independent of the local surface folding. Note that computing the average geodesic distance between all points allows us to measure compactness [3]. Typically, the geodesic neighborhood over all the edges of a brain segmentation may be constructed in under half a minute, if the geodesic distance is a few voxels. Either of these neighborhoods define approximations of local surface patches within a metric distance of xo . Intrinsic neighborhoods are used, here, for measuring the geometry of shapes in space, whereas we use extrinsic neighborhoods for computing the geometry of intensity distributions, as shown in Figure 2. Note that intrinsic neighborhoods may also be used for the latter approach,
The Shape Operator for Differential Analysis of Images
105
with increased computational complexity. The open ball definition may also be more useful when the surface is represented as a point cloud. However, cortical analysis requires that the surface be represented explicitly as a thin sheet. Thus, pre-labelling the surface with an alternative segmentation method may be required, along with the intrinsic neighborhood definition. One approach is to segment the image into homogenous regions, such that our image assumptions hold. This guarantees that the surface does indeed exist only at a thin interface.
Fig. 2. The shape operator-derived mean (top left) and Gaussian (top right) curvature applied to a cortical surface represented in space. The mean curvature from the shape operator applied directly to the MRI intensity (with an open ball of radius 1 and a small threshold on the gradient magnitude as a membership function) is shown in the image at bottom.
3
Differential Structure
Meshless generation of differential properties often relies on numerical differentiation, as in isophote curvature [12], or level set propagation. Finite differencing of vectors is, however, numerically unstable, especially without neighborhood information or imposing a specific parameterization on the data. Furthermore, incorporating larger neighborhood information requires re-formulation of the finite differencing scheme. For these and other reasons, techniques which use local neighborhood information are usually preferred. Flynn and Jain [13] categorize surface curvature methods into analytical and numerical types. The former type uses least square approximations to fit
106
B. Avants and J. Gee
functions to the local surface. The latter type integrate numerical approximations to the curvature in the local neighborhood. A typical analytical approach might use the distance between a point in the neighborhood and the patch’s tangent plane to fit a polynomial surface. The Gaussian and mean curvature may then be computed from the first and second derivatives of the polynomial as performed by Boyer [14]. The output of such an approach, applied to a surface labeled in an image, is shown in Figure 3. We, however, propose a different approach that focuses on the relation of the normal variation to the curvature. A numerical advantage is gained as only first derivatives need be estimated.
Fig. 3. Shape operator (left) and polynomial fit-derived (right) magnitude of curvature on the inner skull surface. Note that the shape operator is sensitive enough to assign high curvature to small structures, such as the vessel impressions on the inner skull surface. The polynomial-fit curvature image was processed with surface-constrained smoothing, to reduce noise, while the shape operator curvature did not require smoothing.
Differential Structure from the Shape Operator. The technique given here estimates differential structure from a Monge patch’s shape (or Weingarten) operator. Although applied to images, it may easily be adapted for use with meshes. The Gauss map, N, is derived from the observation that a surface’s normals, when given unit length, provide a direct mapping between the given normal and a point on the unit sphere. If we view the normal N as a 3D co-ordinate, then its points will give x2 + y 2 + z 2 = 1, which is the equation of the sphere. The way the Gauss map changes locally relates to the surface’s curvature. Thus, it is natural to measure dN(p), the derivative of the Gauss map near a point on the surface, p. Note that Nu , the derivative of the normal in the u direction is by definition perpendicular to N, as is Nv . Thus, these derivatives live in the tangent plane, (T1 , T2 ), and can be expressed in that basis, such that, Nu = aT1 + cT2 , Nv = bT1 + dT2 .
(8) (9)
The Shape Operator for Differential Analysis of Images
107
The jacobian of dN(p) expressed in terms of local co-ordinates (u, v) gives the shape operator, S, ac S= . (10) bd The important property of the matrix S is that its eigenvalues are the principal curvatures and its eigenvectors are the principal directions. Proof of these facts is given in [15]. The local shape operator comes from finding the Monge patch’s Gauss map and taking its derivative with respect to the local domain. Given the neighborhood N (xo ), one may approximate the local Gauss map in each of its components with a degree one polynomial, Nk (u, v) = a0 + a1 u + a2 v.
(11)
Taking N = (N 0 , N 1 , N 2 ), the super-script k denotes the dimension of the Gauss map we are approximating. This equates to a least squares minimization problem, argmin U a a
− r2 .
(12)
The matrix U has rows with Ui = (1, ui , vi ), where the pair (ui , vi ) gives the local co-ordinates of the ith normal and the sub-script i denotes the ith member of the neighborhood. The right-hand side has members ri = Nki . Note that the singular-value decomposition requires that U be decomposed only once as its members are constant for each k. Thus, the majority of the cost is the same as for solving a single-dimensional least squares fit. Differentiation of the polynomial solutions for each of (N 0 , N 1 , N 2 ) yields the derivatives of the Gauss map surface, ( Nu = (a01 , a11 , a21 ), Nv = (a02 , a12 , a22 ) ).
(13)
Only the constants remain as these derivatives are evaluated at the origin, such that (u = 0, v = 0). The shape operator is found by projecting these computed normal derivatives onto the tangent plane, T(xo ), giving explicit values for the jacobian of dN. Eigendecomposition of this matrix then yields the principal curvatures and directions. These are used to estimate the mean, H, and Gaussian, K, curvatures, H=
1 (κ1 + κ2 ), 2
K = κ 1 κ2 .
(14)
Mean and Gaussian curvature on the cortical surface, as determined by the shape operator, are shown in Figure 2. The method applied to the smooth inner surface of a skull is shown in Figure 3. No smoothing of the curvature images was required.
108
B. Avants and J. Gee
Fig. 4. Segmentation of gyri (top, gyri bright) and sulci (bottom, sulci bright) by the sign of the mean curvature. The gyral and sulcal images at a coarser scale are also shown (far right).
4
Segmentation by Mean and Gaussian Curvature
An advantage of basing the computation of differential structure on the shape operator is that the image’s gradient may be computed with a derivative of Gaussian filter, thus allowing natural adjustment for scale and noise by changing the Gaussian’s scale parameter. The Gauss map and shape operator will reflect the associated scale differences. For example, cortical sulci are segmented at two scales, as shown in Figure 4, by setting σ = 2 and σ = 4. The segmentation is given by thresholding below zero to find small mean curvatures. Gyri are segmented by thresholding above zero to find large mean curvature. Recall that the mean curvature is a signed measure of shape, related to the surface’s extrinsic properties, such as folding. Related results for meshes may be found in [8], where curvature zero-crossings are used to segment data into parts. Gaussian curvature is typically used to classify surfaces into primitive types, as it is an intrinsic measure. Its scalar value without the shape operator (as shown in Figure 2) is difficult to interpret. With the mean curvature, it may be used to classify surfaces as pits, peaks, ridges, troughs, planes and three saddletype shapes [3]. We classified the cortical surface into these types in Figure 5. Trough and ridge structure requires one of the principal curvatures to be zero. Neither of these shapes appeared using a curvature threshold of 1.e-6. Thus, only four colors were needed to encode the surface structures (exempting planar structures, which were mapped to background), effectively labeling the cortex as either concave or convex.
The Shape Operator for Differential Analysis of Images
109
Fig. 5. The cortical surface is labeled into shape primitives. The Gaussian curvature may be used along with the mean curvature to classify surfaces into eight types. Only four shape classes are found here. They are used to segment gyri (middle center) and sulci (far right) by labeling the surface as either concave or convex.
5
Curvature-Based Registration
We now propose to register images by the mean curvature. The motivation comes from the connection of this geometric measure with anatomical structure, as illustrated in the previous section. Surface registration is usually performed by flattening to a plane or by using a specific spherical parameterization over a mesh [5,7]. However, each of these methods becomes problematic if the object is composed of pieces that may or may not be disjoint, or if the object has holes. Furthermore, mapping the parameterization domain requires a covariant formalism that ensures the solution is not affected by the parameterization. Thus, we propose to register our surface representations by deforming the volume in which they are embedded. Note that we are able to use existing intensity-based registration tools for this purpose, such as the Demons algorithm [16], as the images of the curvatures may be matched directly. However, if they exist only at a thin sheet, the “capture region” for the registration may not be large enough. Therefore, multi-resolution techniques that blur the curvature function into surrounding regions are essential and are used here in all subsequent experiments. 5.1
Rigid Registration of Curvature
Given a pair of binary images with similar shape, it is a difficult task to use existing rigid-registration algorithms that depend upon intensity features for their performance. Binary data provides little information, except through shape. Edge and gradient operations alone may not improve the situation significantly. However, the curvature function on the images will naturally map similarly
110
B. Avants and J. Gee
shaped object parts to similar intensities, thus providing an attractive situation for using intensity-based registration methods. An image of an inner skull surface extracted by two different methods is shown in Figure 6, along with their initial positions and curvature functions. The shape operator was used to gain the mean curvature function for each. Subsequently, a gradient descent rigid registration was performed to align them. The result is shown in Figure 6. We have found that this approach outperforms matching on other functions computed from these images, such as distance or edge transformations. 5.2
Non-rigid Registration of Curvature
We use a variation of the Demons algorithm for curvature-based non-rigid image registration. The idea is to treat the image similarity function as a hyper-surface and to allow it to evolve along its normal, similar to [17], but with Demons-type regularization. Recalling that H is the mean curvature, the similarity for a pair of mean curvature images is S(H1 , H2 , V ) = (H1 − H2 ◦ V )2 ,
(15)
where V is the vector field. The metric surface, S, is then allowed to evolve along its normal direction, St =
∇S . ∇S
(16)
The evolution of the metric surface is tracked by the deformation field, Vn+1 = G (Vn +
∇S ), ∇S
(17)
where G is a Gaussian kernel and the gradient is taken with respect to the displacements. The rationale is that surfaces which are allowed to evolve in their normal direction tend to simplify their shapes. We also expect the metric surface itself to flow into a minimum-energy configuration as the formulation may be viewed as an adaptive step-size gradient descent. Validation of this approach for a variety of similarity metrics is currently underway. We now apply the non-rigid registration method given above for locating the transformation between the segmented white-matter of a human and chimp brain. Registration by curvature is justified, in this case, as the constant image intensity of a segmentation would not provide this extra shape guidance to the matching, but would only follow the shape constraints native to the regularization. Figure 7 shows the registration of the gray and white matter interfaces of a human and chimp brain. The figure shows that differences in the folding patterns remain after registration, although major folds appear to be aligned. We note that, although we show the surface rendering here, internal structures in these images, such as the ventricles and corpus callosum, were also curvaturelabeled and registered. These results illustrate the effectiveness of the registration method given here for both smooth and convoluted surfaces.
The Shape Operator for Differential Analysis of Images
111
Fig. 6. Mean curvature based rigid registration of inner skull surfaces. The skull surfaces are registered in order to evaluate the shape effects of the methods by which the skulls were obtained. The far left shows the image that will be rigidly aligned with the image to its right. The result is right of center. The mean distance of the surfaces after registration was near 1 mm.
Fig. 7. The gray-white interfaces of a human (left) and a chimp (far right) are registered by the mean curvature. The result is shown in the center column.
6
Conclusions
We contributed a novel procedure for the analysis of images via surface geometry. It is emphasized that an advantage of the method is that the same code may be used for computing the shape operator directly from intensity or from a surface represented in space as a mesh or labeled image. The procedure consists of defining either an intrinsic or extrinsic neighborhood operation on the image. Subsequently, a novel method for computing the shape operator based upon the regional information was given. The mean and Gaussian curvatures, as well as principal directions, are recovered by this method. These measures were shown to correspond to meaningful anatomical labelings of the human cortex. Furthermore, the method may naturally be used at multiple scales. We also pro-
112
B. Avants and J. Gee
posed and applied a modification of the Demons algorithm, inspired by surface evolution. The formulation is general with respect to the similarity metric used. Finally, we used these methods for rigid and non-rigid registration of medical images. In future work, we would like to apply these algorithms more thoroughly to segmentation, multi-modality registration and surface analysis. In particular, we would like to further investigate the surface evolution formulation of image registration and its extension to multi-modality registration. Comparison of the method given here with related methods, [12,10] is also planned.
References 1. P. G. Batchelor, S. A. D. Castellano, D. L. G. Hill, D. J. Hawkes, T. C. S. Cox, and A. F. Dean, “Measures of folding applied to the development of the human fetal brain,” IEEE Trans. Medical Imaging, vol. 21, no. 8, pp. 953–965, 2002. 2. D. C. V. Essen, H. A. Drury, S. Joshi, and M. I. Miller, “Functional and structural mapping of human cerebral cortex: solutions are in the surfaces,” Proc. Nat. Acad. Sci. USA, vol. 95, pp. 788–795, 1998. 3. L. D. Griffin, “The intrinsic geometry of the cerebral cortex,” Journal of Theoretical Biology, vol. 166, no. 3, pp. 261–273, 1994. 4. C. Davatzikos and R. Bryan, “Using a deformable surface model to obtain a shape representation of the cortex,” IEEE Trans. Medical Imaging, vol. 15, no. 6, pp. 785– 795, 1996. 5. P. Thompson and A. Toga, “A surface-based technique for warping 3-dimensional images of the brain,” IEEE Trans. Medical Imaging, vol. 15, no. 4, pp. 402–417, 1996. 6. Y. Wang and L. Staib, “Shape-based 3D surface correspondence using geodesics and local geometry,” in Computer Vision and Pattern Recognition, vol. II, pp. 644– 651, 200. 7. D. Meier and E. Fisher, “Parameter space warping: shape-based correspondence between morphologically different objects,” IEEE Trans. Medical Imaging, vol. 21, no. 1, pp. 31–47, 2002. 8. F. Mokhtarian, N. Khalili, and P. Yuen, “Curvature computation on free-form 3-D meshes at multiple scales,” Computer Vision and Image Understanding, vol. 83, no. 2, pp. 118–139, 2001. 9. H. Chui and A. Rangarajan, “A new algorithm for non-rigid point matching,” Computer Vision and Pattern Recognition, vol. 2, pp. 44–51, 2000. 10. G. L. Goualher, C. Barillot, L. L. Briquer, J. C. Gee, and Y. Bizais, “3-D detection and representation of cortical sulci,” in Proc. Computer Assisted Radiology (H. U. Lemke, K. Inamura, C. C. Jaffe, and R. Felix, eds.), (Berlin), pp. 234–240, SpringerVerlag, 1995. 11. K. Joshi, “On the differential geometry of the cortical surface,” in Vision Geometry IV, vol. 2573, pp. 304–310, 1995. 12. J. Koenderink and A. J. van Doorn, “Surface shape and curvature scales,” Image and Vision Computing, vol. 10, no. 8, pp. 557–565, 1996. 13. P. J. Flynn and A. K. Jain, “On reliable curvature estimation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 110–116, 1989. 14. K. Boyer and R. Srikantian, “Saliency sequential surface organization for free-form object recognition,” Computer Vision and Image Understanding, vol. 88, no. 3, pp. 152–188, 2002.
The Shape Operator for Differential Analysis of Images
113
15. M. DoCarmo, Differential Geometry of Curves and Surfaces. Prentice-Hall, 1976. 16. J. Thirion, “Non-rigid matching using demons,” in IEEE Computer Vision and Pattern Recognition, pp. 245–251, 1996. 17. A. Yezzi, S. Kichenassamy, P. Olver, and A. Tannenbaum, “A gradient surface evolution approach to 3D segmentation,” in International Conference on Computer Vision, pp. 810–815, 1995.
Feature Selection for Shape-Based Classification of Biological Objects Paul Yushkevich1 , Sarang Joshi1 , Stephen M. Pizer1 , John G. Csernansky2 , and Lei E. Wang2 1
Medical Image Display and Analysis Group, University of North Carolina, Chapel Hill, NC, USA 2 Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, USA
Abstract. This paper introduces a method for selecting subsets of relevant statistical features in biological shape-based classification problems. The method builds upon existing feature selection methodology by introducing a heuristic that favors the geometric locality of the selected features. This heuristic effectively reduces the combinatorial search space of the feature selection problem. The new method is tested on synthetic data and on clinical data from a study of hippocampal shape in schizophrenia. Results on clinical data indicate that features describing the head of the right hippocampus are most relevant for discrimination.
1
Introduction
Recent advances in medical imaging and image processing techniques have enabled clinical researchers to link changes in shape of human organs with the progress of long-term diseases. For example, it has been reported that the shape of the hippocampus is different between schizophrenia patients and healthy control subjects [5,8,6,22]. Results of this nature help localize the effects of diseases to specific organs and may subsequently lead to better understanding of disease processes and potential discovery of treatment. This paper addresses the problem of further localizing the effects of diseases to specific regions of objects. Like a number of other methods (e.g., [5,15,24,28,6,16,9]), our approach uses statistical classification to gain insight into the differences in the shape of biological objects between distinct classes of subjects. We enhance classification by using feature selection as a tool for localizing inter-class shape differences and for improving the generalization ability of classifiers. The difference between the feature selection method proposed in this paper and the more traditional approaches to dimensionality reduction, such as principal components analysis (PCA), is that the feature subsets yielded by our method have local support in the shape representation, while features such as PCA coefficients have global support. Local feature support makes it possible to identify regions of objects where differences between classes are most significant.
Corresponding author. [email protected].
C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 114–125, 2003. c Springer-Verlag Berlin Heidelberg 2003
Feature Selection for Shape-Based Classification
115
The main contribution of this paper is the extension of an existing feature selection method [2] in a way that takes advantage of special properties of features that describe shape. The extended algorithm, called window selection, searches for subsets of features that are both highly relevant for classification and are localized in shape space. Window selection takes advantage of a heuristic that the relevance of a given feature for classification correlates with the relevance of the features that describe neighboring locations. This heuristic effectively reduces the otherwise combinatorial search space of feature selection. The performance analysis of window selection, as compared to feature selection without locality, is reported in this paper for simulated and clinical data. In the synthetic experiments, classes of normally distributed data are generated in a way that simulates the locality of shape features. The ability of the selection algorithms to correctly detect relevant features and the ability to generalize well to new data are compared. The clinical data comes from a study of hippocampal shape in schizophrenia [6], and it is used to compare the results of window selection with previous findings of the relevant regions of the hippocampus. This paper is organized in five sections. Section 2 describes the details of the window selection algorithm. Sections 3 and 4 present experimental results using simulated and clinical data, respectively. Finally, Sec. 5 discusses the work planned for the future.
2 2.1
Methods Feature Selection
Feature selection is a machine learning methodology that reduces the number of statistical features in high-dimensional classification problems by finding subsets of features that are most relevant for discrimination (e.g., [19,14,20,13,25]). Classifiers constructed in the subspace of the selected features tend to generalize better to new data than do classifiers trained on the entire feature set. This paper extends a feature selection method developed by Bradley and Mangasarian [2,3]. Their method uses elements from support vector machine theory and formulates feature selection as a smooth optimization problem that can be expressed as a sequence of linear programming problems. The input to this feature selection algorithm consists of a training set of objects that fall into two classes of sizes m and k. Each object is represented by an n-dimensional feature vector. The classes are represented by the feature matrices Am×n and Bk×n . We wish to find the set of features, i.e., a subset of columns of A and B, that are most relevant for discriminating between the two classes. The idea of [2] is to look for a relevant subset of features by finding a hyperplane P = x ∈ IRn : wT x = γ (1) that optimally separates the two classes, while lying in the minimal number of dimensions, as formulated by the energy minimization problem P = arg min Esep (γ, w) + λEdim (w) . γ,w
(2)
116
P. Yushkevich et al.
The term Esep measures how well the hyperplane P separates the elements in A from the ones in B. It is expressed as Esep (γ, w) =
1 1 (−Aw + eγ + e)+ 1 + (Bw − eγ + e)+ 1 m k
(3)
where e represents a vector of appropriate size whose elements are all equal to 1, and (•)+ is an operation that replaces the negative elements of • with zero. Let P − and P + be a pair of hyperplanes parallel to P , whose distance to P is 1/w. Then, Esep measures the distance to P + of those elements of A that lie on the ‘wrong side’ of P + , as well as the distance to P − of the elements of B that lie on the ‘wrong side’ of P − . By wrong side, we mean that half-space of P − or P + which contains the hyperplane P . The energy term Edim in (2) is used to reduce the number of dimensions in which the hyperplane P lies. It has the general form Edim (w) = eT I(w),
(4)
where I(w) is an indicator function that replaces each non-zero element of w with 1. However, since indicator functions are inherently combinatorial and badly suited for optimization, Bradley and Mangasarian suggest approximating the indicator function with a smooth function (5) I ({w1 . . . wn )}) = 1 − ε−α|w1 | , . . . , 1 − ε−α|wn | , which, according to [1], yields the same solutions as the binary indicator function for finite values of the constant α . 2.2
Window Selection for Shape Features
General feature selection algorithms make minimal assumptions about the nature and the properties of features. For instance, the same algorithm may be used for classifying documents on the basis of word frequency or for breast cancer diagnosis. Without prior knowledge of feature properties, the feature selection problem is purely combinatorial, since in a set of n features there are 2n possible subsets and all of them are considered to be equally worthy candidates for selection. In shape classification problems, features are typically derived from dense geometrical object representations [4,23,18,21,10,9,7,15], and special relationships exist between features derived from neighboring locations in the objects. We hypothesize that by incorporating the heuristic knowledge of these relationships into a feature selection algorithm, we can improve its performance and stability when applied to shape classification. Features that describe shape are geometric in nature and the concept of distance between two features can be defined, usually in terms of geometric distance between locations described by the features. Furthermore, natural biological processes exhibit locality: geometric features capturing shape of anatomical objects that are close together are likely to be highly correlated. General features, such as word frequencies in documents, may not exhibit this property of locality.
Feature Selection for Shape-Based Classification
117
Locality makes it possible to impose a prior on the search space of a feature selection algorithm. Locality implies that feature sets consisting of one or a few clusters are more likely candidates than feature sets in which the selected features are isolated. To reward locality, the energy minimization in (2) is expanded to include an additional locality energy term Eloc (w): P = arg min Esep (γ, w) + λEdim (w) + ηEloc (w) . γ,w
(6)
Eloc (w) estimates the number of clusters formed by the features selected by w, thus rewarding the locality of the selected features. Let J ⊂ {1 . . . n} be the set of non-zero features in w. To measure how clustered the components of J are, we define an ‘alphabet’ of structured subsets of {1 . . . n} called windows, and measure the most compact description needed to express J using this alphabet. We define feature windows are structured sets of ‘neighboring features’. The neighborhood relationships between the features in the set {1 . . . n} depend on the topology of the underlying space that is being described by the features. For instance, if features are computed at points that are regularly sampled from a boundary manifold, then two features are neighbors if the geodesic distance between the points from which they are computed is small. Let dij be a metric that assigns a non-negative distance to every pair of features i, j ∈ {1 . . . n}. This distance metric is used to define feature windows. A set W ⊂ {1 . . . n} is called a window of size q if (i) dij ≤ q for all i, j ∈ W , and (ii), there does not exist a superset of W in {1 . . . n} for which the condition (i) holds. An alphabet of windows is just a set of all possible windows of sizes 1, . . . , wmax . The distance metric allows us to define windows on arbitrarily organized features. For instance, when features are organized in a one-dimensional lattice, the distance metric dij = |i − j| yields windows that are contiguous subsets of features, while dij = |i − j| mod n allows for wrap-around windows, which are useful when features are sampled along a closed curve. On higher-dimensional lattices, different distance metrics such as Euclidean or Manhattan distance generate differently shaped windows. For features computed at vertices in a mesh, windows can be constructed using transitive distance, which counts the smallest number of edges that separate a pair of vertices. Let W = {W1 . . . WN } be a set of windows of various sizes over the feature set {1 . . . n}. The minimal window cover of a feature subset J is defined as the smallest set α ⊂ {1 . . . N } for which J = i∈α Wi . The locality energy component Eloc (w) is defined as the size of the minimal window cover of the set J of non-zero features in the vector w. While such a formulation is combinatorial in nature, in the following section it is elegantly expressed in terms of linear programming.
118
2.3
P. Yushkevich et al.
Linear Programming Formulation
According to Bradley and Mangasarian [2], the feature selection problem (2) can be formulated as the following smooth non-linear program: minimize γ,w,y,z,v
subject to
eT y m
+
eT z k
+ λeT I(v),
−Aw + eγ + e ≤ y Bw − eγ + e ≤ z y ≥ 0, z ≥ 0 , −v ≤ w ≤ v .
(7)
This formulation does not directly minimize the objective function (2), but rather it minimizes positive vectors y, z, and v, which constrain the components of the objective function. Such a transformation of the minimization problem is frequently used in support vector methodology in order to apply linear or quadratic programming to energy minimization problems. The vector v constraints w from above and below and thus eliminates the need for using the absolute value of w in the objective function, as is done in (3). The non-zero elements of v correspond to selected features. In order to introduce the locality energy Eloc into the linear program, we can express the non-zero elements of v as a union of a small number of windows, and penalize the number of windows used. Let W1 . . . WN be an ‘alphabet’ of windows, as defined in Sec. 2.2. Let Ω be an n × N matrix whose elements ωij are equal to 1 if the feature i belongs to the window Wj , and are equal to 0 otherwise. Let u be a sparse positive vector of length N whose non-zero elements indicate a set of selected windows. Then the non-zero elements of Ωu indicate a set of features that belong to the union of the windows selected by u. In order to implement window selection as a smooth non-linear program, the terms u and Ωu are used in place of v in the objective function. The resulting formulation penalizes both the number of selected windows and the number of features contained in those windows: T T minimize emy + e k z + λeT Ω + ηeT I(u), γ,w,y,z,u
subject to
−Aw + eγ + e ≤ y Bw − eγ + e ≤ z y ≥ 0, z ≥ 0 , −Ωu ≤ w ≤ Ωu .
(8)
This formulation of the objective function is identical to the energy minimization formulation (6) if none of the windows selected by u overlap. In case of an overlap, the penalty assessed on the combined number of features in all of the selected windows, and not on the total number of windows in the vector w. We use a fast successive linear approximation algorithm outlined in [2] to solve the program (8). The algorithm is randomly initialized and iteratively solves a linear programming problem in which the concave term I(u) is approximated using the Taylor series expansion. The algorithm does not guarantee a global optimum but does converge to a minimum after several iterations. The
Feature Selection for Shape-Based Classification
119
resulting vector u, whose non-zero elements indicate the selected windows, is very sparse. The Sequential Object-Oriented Simplex Class Library (SoPlex), developed by Roland Wunderling [26], is used for solving the linear programming problems. The parameters λ and η affect the numbers of features and windows (and hence the sizes of the windows) selected by the window selection algorithm. Larger values of λ yield fewer features, and similarly, larger values of η yield fewer windows. When both parameters are zero, the algorithm performs no feature selection and acts as a linear support vector machine classifier. The number of features yielded in this case is bounded only by the size of the training set. Bradley and Mangasarian [2] suggest reserving a small portion of the training set and using it to search for the value of parameter λ that leads to optimal cross-validation performance. In the synthetic data experiments described below, we have found that cross-validation performance is poorly suited for finding optimal parameters because of its low signal-to-noise ratio. Parameters yielded by optimization seldom correctly identified the relevant sets of features [27]. However, if in a particular application one roughly knows how many relevant features and windows are desired, then the parameter values needed to produce such windows can be determined experimentally and a search for optimal values is unnecessary.
3
Results on Simulated Data
This section summarizes a simulated data experiment is described in full in [27]. In this experiment window selection and feature selection were compared in a setting where the features are normally distributed, and the relevant features are clustered. In each variation of the experiment, two training classes were randomly sampled from pairs of 15-dimensional normal distributions with identity covariance matrices and with means that differ in only 6 of the 15 dimensions. The relevant dimensions in one case are arranged into a single contiguous block, and in another case they form two disjoint blocks of 3 features. Feature selection and window selection with windows defined using distance metric dij = |i − j| were applied to the training samples. Classifiers were constructed in the subspaces defined by the selected features, and their expected generalization ability was computed empirically. The experiment was repeated for different sizes of the training set (30, 60, 90, and 120), and for each training set size, it was repeated 40 times, with the average generalization rate recorded. Figure 1 shows the results of these experiments: classifiers based on window selection outperformed the classifier based on feature selection, especially in the first case when the relevant features are arranged into a single block. Both selection schemes resulted in better classifiers than the classifier constructed on the entire feature set. Also, window selection correctly identified the relevant sets features with significantly higher frequency than feature selection (see [27]).
120
P. Yushkevich et al. 30
60
90
120
30
60
90
120
0.32 Error Rate
Error Rate
0.32 0.3 0.28 0.26 0.24
0.3 0.28 0.26 0.24
30
60 90 Sample Size
120
30
60 90 Sample Size
120
Fig. 1. Performance of window and feature selection on Gaussian data. Relevant features are arranged into one block (left plot) and two blocks (right block). Plotted are the expected error rates of the window selection algorithm (diamond, dotted line), the feature selection algorithm (square, dashed line), and global discriminant analysis (triangle, solid line) versus training sample size.
4
Results on Clinical Hippocampus Data
The window and feature selection algorithms were applied to the study of the shape of the hippocampus in schizophrenia using the data set that is identical to the one reported in [6]. The data set consists of 117 subjects, 52 of whom are schizophrenia patients, and the remaining 65 are matched healthy controls. The left and right hippocampi of each subject are described using boundary meshes that consist of 6,611 vertices and 13,218 triangular faces. These segmentations were obtained using large-deformation diffeomorphic image matching described in [15,12,5,6]. Hippocampus is not a homogenous structure but rather consists of many identifiable sub-regions, which may be affected differently by schizophrenia. Indeed, [6] stipulates that ”the pattern of shape abnormality suggested a neuroanatomical deformity of the head of the hippocampus, which contains neurons that project to the frontal cortex”. However, the statistical methodology employed in [6] is based on the eigenshape formulation that does not allow local specificity of shape variation. The motivation for applying feature and window selection to this data set is to find the regions of the hippocampus where the shape differences associated with schizophrenia are most significant. In order to use window and feature selection to produce regions large enough to cover 10%-20% of the hippocampal surface, we reduced the number of features from nearly 40, 000 that result from using the x, y, z coordinates of each mesh vertex as features, to 160 summary features, which describe small patches on the surface of the hippocampus. The reduction was necessary because window selection and feature selection algorithms yield fewer features than there are subjects in the training set and because of the prohibitive computational cost of using so many features. Patch features were computed as follows. We aligned the sets of 117 left and 117 right meshes using the Generalized Procrustes algorithm [11] restricted to translation and orientation. In the process, we computed the mean left and right hippocampal meshes. We subdivided each mesh into 80 patches of roughly equal
Feature Selection for Shape-Based Classification
121
Table 1. Results of leave-one-out experiments with feature selection and window selection on clinical data with patch summary features. Each column represents one set of 117 experiments. Legend: λ and η are the parameters from (6) that affect the num¯win is the average number of selected windows, ber of selected features and windows, N ¯feat is the average number of selected features, and R is the leave-one-out correct N classification rate, in percent. η 0.0 0.0 0.0 0.0 0.04 0.04 0.04 0.04
λ 0.04 0.08 0.12 0.16 0.04 0.08 0.12 0.16
¯win N
11.8 8.5 4.2 2.1
¯feat N 22.9 16.4 7.5 4.6 28.7 19.3 8.5 4.0
R (%) 55.6 65.0 65.0 68.4 68.4 69.2 62.4 54.7
η 0.08 0.08 0.08 0.08 0.12 0.12 0.12 0.12
λ 0.04 0.08 0.12 0.16 0.04 0.08 0.12 0.16
¯win N 10.8 5.7 2.8 1.6 9.2 3.9 2.1 1.4
¯feat N 28.1 13.5 6.1 2.9 24.5 9.9 4.7 2.8
R (%) 61.5 64.1 64.1 59.0 68.4 62.4 57.3 61.4
area using METIS graph partitioning software [17] on a graph whose vertices correspond to the mesh triangles and are weighted by the average areas of the triangles. The partitioned left and right mean meshes are shown in the top row of Fig. 2. Each patch was represented by a single summary feature, which measures the average inward/outward deformation of the patch with respect to the mean mesh. The use of a single feature per location makes is easier to define a distance metric between features, as having multiple features per location would either require defining the distance between them to be zero, which would result in them always being selected together, or it would require two distance functions, one for features from different locations and another for features from the same location. An alphabet of windows was defined over the patch summary features using the transitive distance function, which counts the number of patch edges that separate any two patches. Under this function, single patches form windows of size 0 and sets of mutually adjacent patches form windows of size 1. For computational efficiency, windows of larger size were not included in the alphabet. Feature selection and window selection algorithms were applied to patch summary features in a series of leave-one-out cross-validation experiments. In each leave-one-out iteration, one subject was removed from the data set, the selection algorithm was applied to the remaining subjects, an L1 support vector classifier was constructed in the subspace spanned by the selected features, the left out subject was assigned a class label by the classifier, and this class label was compared to the true class label of the left out subject. The average correct classification over 117 leave-one-out iterations was recoded. The feature selection and window selection experiments were repeated for different values of modulation parameters λ and η. Table 1 shows the results of these experiments. In [6], using a 10-fold cross-validation methodology, a similar classification rate of 68.4% is reported. The methods in [6] are based on eigenanalysis of the entire set of 40, 000 features. The results in Table 1 show that with intelligent
122
P. Yushkevich et al.
Subdivision into patches
Patches selected most frequently by feature selection
Patches selected most frequently by window selection
Patch wise p-values Fig. 2. Top row: mean left and mean right hippocampal meshes partitioned into 80 patches each. The meshes are shown from superior and anterior viewpoints. Second row: ten patches that were selected most frequently during leave-one-out validation of feature selection. Third row: ten windows that were selected most frequently during leave-one-out validation of feature selection ( some of the windows overlap, and patches that belong to more than one window are shaded darker on the cyan-red hue scale). Bottom row: p-values of the mean difference tests computed at each patch; the negative logarithm of the p-values is displayed using the cyan-red hue scale (cyan = no significance, red = high significance).
Feature Selection for Shape-Based Classification
123
feature selection a similar classification rate can be achieved with only 160 summary features. The feature selection methodology also specifies the local regions of the hippocampus that are significant for discrimination. The second row of Fig. 2 shows the ten patches that were selected most frequently in the 117 leave-one-out experiments conducted with the feature selection algorithm with λ = 0.16. The third row of Fig. 2 shows the ten most frequently selected patch windows in the window selection experiment with λ = 0.12 and η = 0.08. Window selection results in fewer isolated features than feature selection. For reference, the bottom row of Fig. 2 plots the p-values of mean difference hypothesis tests computed at each patch. No correction for the repeated nature of tests has been applied. While the pattern of patches selected by the window and feature selection algorithms closely resembles the pattern of patches with low p-values, the selected patches do not correspond to the patches with lowest p-values. As stipulated in [6], the head of the right hippocampus was shown by window selection to be most relevant for discrimination.
5
Discussion and Conclusions
It is unlikely that a classification technique will one day make it possible to accurately diagnose schizophrenia on the basis of hippocampal shape. Therefore, our goal in developing the window selection algorithm was not so much to build a better classifier but rather to find the regions of the hippocampus that are significant for discrimination. With respect to this goal, the results presented in this paper are encouraging. However, these results require further validation using a different hippocampal data set. We plan to perform this validation in the future. We also plan to perform window and feature selection on hippocampal patches selected manually on the basis of biological homogeneity and function. The use of anatomically significant patches in the selection algorithms could open new insights into schizophrenia. On the theoretical front, we plan to extend this paper’s framework to select features in a hierarchical manner. Selected patches would be further partitioned into smaller patches, and the selection algorithms would be performed again on the residuals, resulting in a high-resolution set of selected features. Hierarchical feature selection would eliminate the information loss incurred by reduction to patch summary features. In conclusion, we have presented a framework for using feature selection in shape characterization, developed a new window selection algorithm for handling localized shape features, and applied feature and window selection to synthetic and clinical data. The results on clinical data confirm an earlier finding from [6] that the head of the hippocampus is significant in respect to schizophrenia and suggest that the framework does provide useful locality and effective discrimination. Acknowledgements. The research reported in this paper was carried out under partial support of the NIH grant P01 CA47982 and the Silvio Conte Center at Washington University School of Medicine grants MH56584 and MH62130.
124
P. Yushkevich et al.
Dr. Guido Gerig, Dr. J.S. Marron, Dr. Dr. James Damon, Dr. Keith E. Muller, Sean Ho, P. Thomas Fletcher, and other participants of the Statistics of Shape Seminar held at the University of North Carolina have contributed to this research through constructive criticism and advise. We thank Dr. Adam Cannon at Columbia University for his help in stimulating this research.
References 1. P. Bradley, O. Mangasarian, and J. Rosen. Parsimonious least norm approximation. Technical Report 97-03, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, March 1997. 2. P. S. Bradley and O. L. Mangasarian. Feature selection via concave minimization and support vector machines. In Proc. 15th International Conf. on Machine Learning, pages 82–90. Morgan Kaufmann, San Francisco, CA, 1998. 3. P. S. Bradley, O. L. Mangasarian, and W. N. Street. Feature selection via mathematical programming. INFORMS Journal on Computing, 10:209–217, 1998. 4. T. Cootes, C. Taylor, D. Cooper, and J. Graham. Active shape models - their training and application. Computer Vision, Graphics, and Image Processing: Image Understanding, 1(61):38–59, 1994. 5. J. Csernansky, S. Joshi, L. Wang, J. Haller, M. Gado, J. Miller, U. Grenander, and M. Miller. Hippocampal morphometry in schizophrenia via high dimensional brain mapping. In Proc. National Academy of Sciences, volume 95, pages 11406–11411, 1998. 6. J. G. Csernansky, L. Wang, D. Jones, D. Rastogi-Cruz, J. A. Posener, G. Heydebrand, J. P. Miller, and M. I. Miller. Hippocampal deformities in schizophrenia characterized by high dimensional brain mapping. Am. J. Psychiatry, 159:2000– 2006, 2002. 7. C. Davatzikos, M. Vaillant, S. Resnick, J. Prince, S. Letovsky, and R. Bryan. A computerized approach for morphological analysis of the corpus callosum. Journal of Computer Assisted Tomography, 20:207–222, 1995. 8. G. Gerig, M. Styner, M.E. Shenton, and J. Lieberman. Shape versus size: Improved understanding of the morphology of brain structures. In W Niessen and M Viergever, editors, Medical Image Computing and Computer-Assisted Intervention (MICCAI), volume 2208, pages 24–32, New York, October 2001. Springer. 9. P. Golland, B. Fischl, M. Spiridon, N. Kanwisher, R. L. Buckner, M. E. Shenton, R. Kikinis, A. M. Dale, and W. E. L. Grimson. Discriminative analysis for imagebased studies. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), volume 1, pages 508–515. Springer, 2002. 10. P. Golland, W.E.L. Grimson, and R. Kikinis. Statistical shape analysis using fixed topology skeletons: Corpus callosum study. In International Conference on Information Processing in Medical Imaging, LNCS 1613, pages 382–388. Springer Verlag, 1999. 11. J.C. Gower. Generalized procrustes analysis. Psychometrika, 40:33–51, 1975. 12. J.W. Haller, A. Banerjee, G.E. Christensen, M. Gado, S. Joshi, M.I. Miller, Y.I. Sheline, M.W. Vannier, and J.G. Csernansky. Three-dimensional hippocampal MR morphometry by high-dimensional transformation of a neuroanatomic atlas. Radiology, 202:504–510, 1997.
Feature Selection for Shape-Based Classification
125
13. Tony S. Jebara and Tommi S. Jaakkola. Feature selection and dualities in maximum entropy discrimination. In Uncertainty in Artificial Intelligence: Proceedings of the Sixteenth Conference (UAI-2000), pages 291–300, San Francisco, CA, 2000. Morgan Kaufmann Publishers. 14. George H. John, Ron Kohavi, and Karl Pfleger. Irrelevant features and the subset selection problem. In International Conference on Machine Learning, pages 121– 129, 1994. 15. S. Joshi, U. Grenander, and M. Miller. On the geometry and shape of brain sub-manifolds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11:1317–1343, 1997. 16. S. Joshi, S. Pizer, P.T. Fletcher, P. Yushkevich, A. Thall, and J.S. Marron. Multiscale deformable model segmentation and statistical shape analysis using medial descriptions. Invited submission to IEEE-TMI, page t.b.d., 2002. 17. G. Karypis and V. Kumar. MeTiS – A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices – Version 4.0. University of Minnesota, 1998. 18. Andr´ as Kelemen, G´ abor Sz´ekely, and Guido Gerig. Elastic model-based segmentation of 3D neuroradiological data sets. IEEE Transactions on Medical Imaging, 18:828–839, October 1999. 19. K. Kira and L. Rendell. The feature selection problem: Traditional methods and a new algorithm. In Tenth National Conference Conference on Artificial Intelligence (AAAI-92), pages 129–134. MIT Press, 1992. 20. Ron Kohavi and George H. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1-2):273–324, 1997. 21. S.M. Pizer, D.S. Fritsch, P. Yushkevich, V. Johnson, and E. Chaney. Segmentation, registration, and measurement of shape variation via image object shape. IEEE Transactions on Medical Imaging, 18:851–865, October 1999. 22. M.E. Shenton, G. Gerig, R.W. McCarley, G. Szekely, and R. Kikinis. Amygdalahippocampus shape differences in schizophrenia: The application of 3D shape models to volumetric mr data. Psychiatry Research Neuroimaging, pages 15–35, 2002. 23. L.H. Staib and J.S. Duncan. Boundary finding with parametrically deformable models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(11):1061–1075, November 1992. 24. M. Styner. Combined Boundary-Medial Shape Description of Variable Biological Objects. PhD thesis, University of North Carolina at Chapel Hill, Chapel Hill, NC, 2001. 25. J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature selection for SVMs. In Advances in Neural Information Processing Systems 13, pages 668–674. MIT Press, 2001. 26. Roland Wunderling. Paralleler und Objektorientierter Simplex-Algorithmus. PhD thesis, Konrad-Zuse-Zentrum f¨ ur Informationstechnik, Berlin, 1996. ZIB technical report TR 96-09. 27. P. Yushkevich. Statistical Shape Characterization using the Medial Representation. PhD thesis, University of North Carolina at Chapel Hill, Chapel Hill, NC, 2003. 28. P. Yushkevich, Pizer S.M., S. Joshi, and Marron J.S. Intuitive, localized analysis of shape variability. In International Conference on Information Processing in Medical Imaging, pages 402–408, Berlin, Germany, 2001. Springer-Verlag.
Corresponding Articular Cartilage Thickness Measurements in the Knee Joint by Modelling the Underlying Bone (Commercial in Confidence) Tomos G. Williams1 , Christopher J. Taylor1 , ZaiXiang Gao1 , and John C. Waterton2 1 2
Imaging Science and Biomedical Engineering, University of Manchester, Manchester, U.K. Enabling Science & Technology, AstraZeneca, Alderley Park, Macclesfield, Cheshire, U.K.
Abstract. We present a method for corresponding and combining cartilage thickness readings from a population of patients using the underlying bone structure as a reference. Knee joint femoral bone and cartilage surfaces are constructed from a set of parallel slice segmentations of MR scans. Correspondence points across a population of bone surfaces are defined and refined by minimising an objective function based on the Minimum Description Length of the resulting statistical shape model. The optimised bone model defines a set of corresponding locations from which 3D measurements of the cartilage thickness can be taken and combined for a population of patients. Results are presented for a small group of patients demonstrating the feasibility and potential of the approach as a means of detecting sub-millimetre cartilage thickness changes due to disease progression.
1
Introduction
Osteoarthritis is a major cause of suffering and disability. This has lead to a growing demand for effective alternatives to surgical treatments, which are only suitable in extreme cases [2]. It is known that osteoarthritis causes degeneration of articular cartilage, although characterising cartilage and bone changes during disease progression is still the subject of current research [16,15]. MR imagery of the knee can be used to monitor cartilage damage in vivo [19,3,14]. Most studies suggest that total cartilage volume and mean thickness are relatively insensitive to disease progression [12,4,22] though there are some conflicting results [25, 18]. There is evidence to suggest that osteoarthritis causes regional changes in cartilage structure with some regions exhibiting thinning or loss of cartilage whilst swelling may occur elsewhere on the articular surface. For this reason, localised measures of cartilage thickness are likely to provide a fuller picture of the changes in cartilage during the disease process. C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 126–135, 2003. c Springer-Verlag Berlin Heidelberg 2003
Corresponding Articular Cartilage Thickness Measurements
127
Semi-automatic segmentation of cartilage in MR images of the knee have been shown to yield reproducible estimates of cartilage volume [25,3], however, in healthy subjects knee articular cartilage is, on average, only 2mm thick [5,9, 15] and thickness changes over the short time scale useful in drug development (6–12 months), are likely to be in the sub-millimetre region [22,3]. It is unlikely that such small changes will be detected in individual pairs of MR scans given practical scan resolutions and segmentation accuracies. Previous work has shown that small but systematic changes in thickness between two time points can be measured in a group of subjects by registering the set of cartilage segmentations and computing mean change at each point of the cartilage surface [24,23]. These studies used elastic registration of the segmented cartilage shapes in normal volunteers. This has two obvious problems: there is no guarantee that anatomically equivalent regions of cartilage are corresponded, even in normal subjects, and the correspondences become unpredictable when the cartilage shape changes during disease (particularly when there is loss from the margins). In this paper we propose using the underlying bone as an anatomical frame of reference for corresponding cartilage thickness maps between subjects over time. This has the advantage that anatomically meaningful correspondences can be established, that are stable over time because the disease does not cause significant changes in overall bone shape. We find correspondences between anatomically equivalent points on the bone surface for different subjects using the minimum description length method of Davies el al. [7,6] which finds the set of dense correspondences between a group of surfaces that most simply account for the observed variability. This allows normals to be fired from equivalent points on each bone surface, leading to directly comparable maps of cartilage thickness.
2 2.1
Method Overview
MR images of the knee were obtained using a fat-suppressed T1 sequence to visualise cartilage and a T2 sequence to visualise the endosteal bone surface, both with 0.625 × 0.615 × 1.6mm resolution. Semi-automatic segmentations of the femoral cartilage and endosteal surface of the femur were performed slice-byslice using the EndPoint software package (Imorphics, Manchester, UK). These slice segmentations were used to build continuous 3D surfaces, an MDL model of the bone was constructed and standardised thickness maps were generated as described in some detail below. The data used contained images of both left and right knees. To simplify subsequent processing, all left knees were reflected about the medial axis of the femur so they could be treated as equivalent to right knees. 2.2
Surface Generation
Articular cartilage is particularly difficult to segment due to it thin and highly curved nature. Segmenting each image slice individually using guided edge-
128
T.G. Williams et al.
detection algorithms proved the most reliable method for identifying the cartilage. This produced a stack of parallel 2D segmentations. To provide a common reference across all examples, each bone segmentation was truncated to include a length of femoral shaft proportional to the width of the femoral head. Where adjacent segmentations differed significantly, additional contour lines were inserted at the mid line of the two segmentations. This operation was performed recursively until neighbouring contours were sufficiently similar to allow for surface construction by triangulation of equally spaced points along the length of each contour. An example of a resulting surface is shown in figure 1(a).
Fig. 1. (a) Sample Bone Surface. The original slice segmentations are shown as solid lines. (b) Example Cartilage Surface. The inner or exosteal surface which forms the interface between the cartilage and cortical bone is coloured red and the outer surface is shaded in green
Surface construction from the cartilage segmentations proved more challenging due to significant variation between neighbouring slices and the thin, curved shape of the cartilage. Various documented approaches such as NUAGES triangulation [13] and Shape Based Interpolation [20] proved unable to produce plausible surfaces so an alternative surface construction method specifically for surface was developed. Post processing of the segmentations was needed to identify the exosteal surface or bone-cartilage interface and outer surface of the cartilage. This simplified surface construction by allowing the structure connecting each segment to be determined by the inner surface and then inherited by the outer surface. The segments’ connection sequence was also specified. In general, a segment is connected the segments on the neighbouring slices. In the case of bifurcation of the cartilage, however, multiple segments may appear on one slice and the specifying which segments should be connected to each other determines the topology
Corresponding Articular Cartilage Thickness Measurements
129
of the cartilage. Both the inner/outer surface and segment connection sequence operations were performed automatically with manual correction if required. During cartilage surface constriction, regions of the segments were categorised as either spans (connecting two segments) or ridges (overhangs where the surface is closed and connected to itself). The underlying structures were represented as quadrilateral meshes and connected to ensure that the surface was closed. Surface generation was performed by triangulation of this mesh. An example of a constructed cartilage surface is shown in figure 1(b). 2.3
Bone Statistical Shape Model
We adopted the method of Davies et al. [7,6] to find an optimal set of dense correspondences between the bone surfaces The bone surfaces were pre-processed to move their centroids resided to the origin and scaled so that the Root Mean Square of the vertices’ distance from the centroid was unity. This initial scaling facilitated model optimisation by minimising the effect of differences in the overall size of the examples on the shape model. Additional pose refinement is incorporated in the optimisation process. Each bone surface was mapped onto a common reference; an unit sphere is chosen since it possessed the same topology as the bone and provides a good basis for the manipulation of the points by reducing the number of point parameters from the three Cartesian points of the shape vertices to two spherical coordinates. The diffusion method of Brechb¨ uhler [1] was used to produce the spherical mappings . A set of equally spaced points were defined on the surface of the unit sphere and mapped back onto each bone surface by finding their position on the spherically mapped surfaces — the triangle on which they are incident and their precise position on this triangle in barycentric coordinates — and computing the same location on the corresponding triangle on the original surface. This provided a first approximation to a set of corresponding points across the population of bone surfaces. At this stage there is, however, no reason to expect anatomical equivalence between corresponding points The automatic model optimisation method of Davies at al. [7,8] is based on finding the set of dense correspondences over a set of shapes that produce the ‘simplest’ linear statistical shape model. A minimum description length (MDL) objective function is used to measure model complexity [7], and optimised numerically with respect to the correspondences. The basic idea is that ‘natural’ correspondences give rise to simple explanations of the variability in the data. One shape example was chosen as a reference shape and the positions of its correspondence points remained fixed throughout. The optimisation process involved perturbing the locations of the correspondence points of each shape in turn optimising the MDL objective function. Two independent methods of modifying the positions of the correspondence points were used: global pose and local Cauchy transform perturbations on the unit sphere. Global pose optimisation involved finding the six parameters (x y z translation and rotation) applied to the correspondence points of a shape that minimise the objective function. Reducing the
130
T.G. Williams et al.
sizes of the population of shapes trivially reduces the MDL objective function so the scale of each shape was fixed throughout the optimisation. Local perturbation of the correspondence points on the unit sphere, guaranteed to maintain shape integrity, is achieved by using Cauchy kernels to locally re-parametrise the surface. Each kernel has the effect of attracting points toward the point of application. The range of the effect depends on the size of the kernel. One step in the optimisation involved choosing a shape at random, optimising the objective function with respect to the pose, place a kernel of random width (from an interval) at random points on the unit sphere and finding the amplitude (size of effect) that optimised the objective function. This was repeated until convergence. 2.4
Measuring Cartilage Thickness from the Bone
Different measures of cartilage thickness have been proposed, all taking their initial reference points from the exosteal surface of the cartilage [10,14,23,5, 17]. Our work differs in that the reference points for the measurements are taken from the endosteal surface of the cortical bone along 3D normals to the bone surface at the correspondence points determined as described above. The direction of the normals are computed as the average normal directions of the triangles adjoining the measurement point weighted by the angles each triangle makes at the vertex. The triangulation of the measurement points is determined when the equally spaced points are defined on the unit sphere. A thickness measurement along a 3D normal direction is favoured at the expense of other proposed thickness measuring methods, such as minimal spacial distance [21,11], since it ensures that consistent regions of the cartilage in relation to the bone surface are measured for each corresponding point and the dimensions of holes or lesions in the cartilage are accurately represented. On firing a normal out of the bone surface, the expected occurrence is to either find no cartilage, as is the case around regions of the bone not covered by an articular cartilage, or intersect with the cartilage surface at two points, on its inner and outer surfaces. The thickness of the cartilage is recorded as the distance along the bone normal between its points of intersection with the inner and outer cartilage surface. By taking a cartilage thickness reading at each correspondence point a cartilage thickness map can be drawn onto the bone surface. Sets of cartilage thickness readings taken at the corresponding points, defined by the MDL model, can be combined for sets of patients and compared between different time-points.
3
Results
18 sets of bone segmentations for 6 at risk patients were processed. The data was equally divided between two time-points (0 and 6 months) with 3 of the patients segmented by two different segmentors independently. With this small set of data the intention was to demonstrate the feasibility of the approach rather
Corresponding Articular Cartilage Thickness Measurements
131
than deduce any characteristics of cartilage thickness change during arthritic disease progression. Surface construction from the bone segmentations yielded on average 4168 (range 3154–4989) vertices and 8332 (6304–9974) triangles. 4098 correspondence points were defined on the unit sphere and projected onto each bone surface, from which the statistical model was built and refined. Figure 2(a) shows the convergence of the model optimisation and a proportion of the resultant correspondence points projected onto a sub-set of the population is shown in 2(b). It can be seen that the correspondences are anatomically plausible.
Fig. 2. (a) Convergence of the statistical shape model objective function as a function of the number of optimisation steps. (b) Distribution of all correspondence points on the reference shape illustrating that choosing 4098 points provides sufficient coverage over the surface area. Due to area distortion during spherical mapping correspondence points tend to be concentrated around regions of high curvature.
Only a proportion of the bone correspondence points reside on regions of the surface which are covered by cartilage. Typically, 950 of the 4098 corresponding measurement points resulted in cartilage thickness readings. For a cartilage endosteal surface area of 4727mm2 this represents coverage of 0.201 thickness readings per mm2 and an average separation of 2.23mm between readings; sufficient coverage and number of points to perform statistical analysis of the data. Figure 3 illustrates how populations of results can be combined and compared. Mean thickness measurements for each corresponding point is displayed as colour maps on the mean bone shape. The results for time points 0 and 6 months scans are illustrated together with the difference between these aggregate maps. The difference map, demonstrates thinning of cartilage in the load-bearing regions such as the patellofemoral (middle left) and medial tibiofemoral (upper right) compartments which is analogous to the finding reported in a diurnal study [24]. A larger study will be required to draw firm conclusions.
132
T.G. Williams et al.
Fig. 3. (colour) (a–d) A sub-set of the correspondence points shown on 4 of the population of bone surfaces. The objective is for the corresponding points to reside on the same anatomical regions of the bone across all the shapes. These plots illustrate that the model has been able to provide good correspondence across the population of shapes. (e–g) Mean cartilage thickness from the time-point 1 (e) and time-point 2 (f) (0 and 6 months) segmentations and the difference (g) all represented as cartilage thickness mapped onto the average bone shape. Regions where swelling of the cartilage occurs are coloured red while blue indicates thinning.
Corresponding Articular Cartilage Thickness Measurements
4
133
Conclusions and Further Work
We have demonstrated the feasibility of using the underlying bone as a reference for cartilage thickness measurements. The bone provides a stable reference for examining surfaces built from segmentations of cartilage scans taken at different time points. Inter-patient comparisons can be achieved by building and optimising a Statistical Shape Model of the femoral head. Cartilage thickness measurements are taken over all bone examples at the resultant corresponding locations which allows for the aggregation of results from a population of patients and comparisons between sets of patients. The approach was illustrated by applying it to a small population of 18 bone segmentations divided between two time-points. Two sets of measurements were combined to produce mean thickness maps which were then compared to each other to illustrate a comparative cartilage thickness map illustrating regional cartilage thickness changes. The immediate requirement is to complete larger scale experiments and extend the approach to the other (tibial and patellal) articular surfaces of the knee joint. A larger data set would provide scope for more sophisticated statistical analysis of the data set in order to identify and quantify cartilage thickness changes during disease progression. Further refinement of the surface construction and image registration of the bone and cartilage scans could yield greater accuracy in cartilage thickness measurements. In order to gain an understanding into the effects of arthritis disease progression on cartilage thickness, corresponded measurements from a larger set of patients is required. Coupled with statistical analysis, this data should provide insights into how disease affects regional changes in cartilage dimensions and a tool to asses the efficiency of therapeutic interventions.
References 1. C. Brechb¨ uhler, G. Gerig, and O. Kubler. Parametrization of closed surfaces for 3-D shape-description. Computer Vision and Image Understanding, 61(2):154–170, 1995. 2. J. A. Buckwalter, W. D. Stanish, R. N. Rosier, R. C. Schenck, D. A. Dennis, and R. D. Coutts. The increasing need for nonoperative treatment of patients with osteoarthritis. Clin. Orthop. Rel. Res., pages 36–45, 2001. 3. R. Burgkart, C. Glaser, A. Hyhlik-Durr, K. H. Englmeier, M. Reiser, and F. Eckstein. Magnetic resonance imaging-based assessment of cartilage loss in severe osteoarthritis — accuracy, precision, and diagnostic value. Arthritis Rheum., 44:2072–2077, 2001. 4. F. M. Cicuttini, A. E. Wluka, and S. L. Stuckey. Tibial and femoral cartilage changes in knee osteoarthritis. Ann. Rheum. Dis., 60:977–980, 2001. 5. Z. A. Cohen, D. M. McCarthy, S. D. Kwak, P. Legrand, F. Fogarasi, E. J. Ciaccio, and G. A. Ateshian. Knee cartilage topography, thickness, and contact areas from MRI: in-vitro calibration and in-vivo measurements. Osteoarthritis and Cartilage, 7:95–109, 1999.
134
T.G. Williams et al.
6. Rhodri H Davies, Tim F Cootes, Carole J Twining, and Chris T Taylor. Constructing optimal 3D statistical shape models. In Medical Imaging Understanding and Analyis, pages 57–61, Portsmouth, U.K., July 2002. 7. Rhodri H Davies, Carole J Twining, Tim F Cootes, John C Waterton, and Chris T Taylor. A minimum description length approach to statistical shape modelling. IEEE Trans. on Medical Imaging, 21(5):525–537, May 2002. 8. Rhodri H Davies, Carole J Twining, Tim F Cootes, John C Waterton, and Chris T Taylor. 3D statistical shape models using direct optimisation of description length. In 7th European Conference on Computer Vision, pages 3–21, 2002. 9. F. Eckstein, M. Winzheimer, J. Hohe, K. H. Englmeier, and M. Reiser. Interindividual variability and correlation among morphological parameters of knee joint cartilage plates: analysis with threedimensional MR imaging. Osteoarthritis Cartilage, 9:101–111, 2001. 10. Felix Eckstein, Maximillian Reiser, Karl-Hans Englmeier, and Reinhard Putz. Invivo morphometry and functional analysis of human articular cartilage with quantitative magnetic resonance imaging — from image to data, from data to theory. Anatomy and Embryology, 203:147–173, 2001. 11. S. C. Faber, F. Eckstein, S. Lukasz, R. Muhlbauer, J. Hohe, K. H. Englmeier, and M. Reiser. Gender differences in knee joint cartilage thickness, volume and articular surface areas: assessment with quantitative three-dimensional MR imaging. Skeletal Radiol., 30:144–150, 2001. 12. Stephen J Gandy, Alan D Brett, Paul A Dieppe, Michael J Keen, Rose A Maciwicz, Chris J Taylor, and John C Waterton. No change in volume over three years in knee osteoarthritis. In Proc. Intl. Soc. Magnetic Resonance, page 79, 2001. 13. Bernhard Geiger. Three-dimensional modeling of human organs and its application ´ to diagnosis and surgical planning. Th`ese de doctorat en sciences, Ecole Nationale Sup´erieure des Mines de Paris, France, 1993. 14. J Hohe, G Ateshian, M Reiser, KH Englmeier, and F Eckstein. Surface size, curvature analysis, and assessment of knee joint incongruity with MRI in-vivo. Magnetic Resonance in Medicine, 47(3):554–561, 2002. 15. M. Hudelmaier, C. Glaser, J. Hohe, K. H. Englmeier, M. Reiser, R. Putz, and F. Eckstein. Age-related changes in the morphology and deformational behavior of knee joint cartilage. Arthritis Rheum., 44:2556–2561, 2001. 16. J. A. Martin and J. A. Buckwalter. Aging, articular cartilage chondrocyte senescence and osteoarthritis. Biogerontology, 3:257–264, 2002. 17. C. A. McGibbon, D. E. Dupuy, W. E. Palmer, and D. E. Krebs. Cartilage and subchondral bone thickness distribution with MR imaging. Acad. Radiol., 5:20–25, 1998. 18. C. G. Peterfy, C. F. Vandijke, D. L. Janzen, C. C. Gluer, R. Namba, S. Majumdar, P. Lang, and H. K. Genant. Quantification of articular-cartilage in the knee with pulsed saturation-transfer subtraction and fat-suppressed MR-imaging – optimization and validation. Radiology, 192:485–491, 1994. 19. Charles G Peterfy. Magnetic resonance imaging in rheumatoid arthritis: Current status and future directions. Journal of Rheumatology, 28(5):1134–1142, May 2001. 20. S. P. Raya and J. K. Udupa. Shape-based interpolation of multidimensional objects. IEEE Trans. on Medical Imaging, 9(1):32–42, 1990. 21. T. Stammberger, F. Eckstein, K. H. Englmeier, and M. Reiser. Determination of 3D cartilage thickness data from MR imaging: Computational method and reproducibility in the living. Magn. Reson. Med., 41:529–536, 1999.
Corresponding Articular Cartilage Thickness Measurements
135
22. T. Stammberger, J. Hohe, K. H. Englmeier, M. Reiser, and F. Eckstein. Elastic registration of 3D cartilage surfaces from MR image data for detecting local changes in cartilage thickness. Magn. Reson. Med., 44(4):592–601, 2000. 23. S. K. Warfield, M. Kaus, F. A. Jolesz, and R. Kikinis. Adaptive, template moderated, spatially varying statistical classification. Med. Image Anal., 4(1):43–55, 2000. 24. John C Waterton, Stuart Solloway, John E Foster, Michael C Keen, Stephen Grady, Brian J Middleton, Rose A Maciewicz, Iain Watt, Paul A Dieppe, and Chris J Taylor. Diurnal variation in the femoral articular cartilage of the knee in young adult humans. Magnetic Resonance in Medicine, 43:126–132, 2000. 25. A. E. Wluka, S. Stuckey, J. Snaddon, and F. M. Cicuttini. The determinants of change in tibial cartilage volume in osteoarthritic knees. Arthritis Rheum., 46(8):2065–2072, August 2002.
Adapting Active Shape Models for 3D Segmentation of Tubular Structures in Medical Images Marleen de Bruijne, Bram van Ginneken, Max A. Viergever, and Wiro J. Niessen Image Sciences Institute, University Medical Center Utrecht, The Netherlands
Abstract. Active Shape Models (ASM) have proven to be an effective approach for image segmentation. In some applications, however, the linear model of gray level appearance around a contour that is used in ASM is not sufficient for accurate boundary localization. Furthermore, the statistical shape model may be too restricted if the training set is limited. This paper describes modifications to both the shape and the appearance model of the original ASM formulation. Shape model flexibility is increased, for tubular objects, by modeling the axis deformation independent of the cross-sectional deformation, and by adding supplementary cylindrical deformation modes. Furthermore, a novel appearance modeling scheme that effectively deals with a highly varying background is developed. In contrast with the conventional ASM approach, the new appearance model is trained on both boundary and non-boundary points, and the probability that a given point belongs to the boundary is estimated non-parametrically. The methods are evaluated on the complex task of segmenting thrombus in abdominal aortic aneurysms (AAA). Shape approximation errors were successfully reduced using the two shape model extensions. Segmentation using the new appearance model significantly outperformed the original ASM scheme; average volume errors are 5.1% and 45% respectively.
1
Introduction
Segmentation methods that are trained on examples are becoming increasingly popular in medical image analysis. The techniques that model both the shape and the gray level appearance of the object, such as Active Shape Models (ASM) [1], Active Appearance models [2], and M-Reps [3], can produce correct results even in the case of missing or confusing boundary evidence. In this paper we shall concentrate on the frequently used ASMs, which consist of a landmark-based linear shape model, linear gray value appearance models around the landmarks, and an iterative optimization scheme. ASMs have been applied to various segmentation tasks in medical imaging [4,5,6,7,8], most successfully in 2D segmentation of objects with fairly consistent shape and gray level appearance. However, many segmentation problems in medical imaging are 3D, and gray levels may be variable. Often not enough training data is available to build a correct 3D model. The model will be over-constrained and hence does not generalize well to new shapes of the same class. Furthermore, if the object to segment lies within variable anatomy, such that a given landmark can be next to different tissue types, boundary appearance C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 136–147, 2003. c Springer-Verlag Berlin Heidelberg 2003
Adapting Active Shape Models for 3D Segmentation
137
may vary largely. In that case, a linear model of gray value appearance may produce unreliable results. We show how ASMs can be adapted to deal with these problems. We focus on the segmentation of tubular structures, but some of the adaptations presented are more generally applicable. We propose three main modifications to conventional ASMs. First, elongated structures are modeled more flexible by modeling the axis and cross-sectional shape deformation separately, thus assuming both types of variation to be uncorrelated. The two models are combined into one model describing both deformations, that is fitted using the regular ASM optimization scheme. Second, supplementary smooth deformation is introduced by adding synthetic covariance. Our approach is similar to that of Wang and Staib [9], but differs in that we decouple the smooth deformation in x, y and z-directions, which makes the approach feasible in 3D. Third, the linear one-class gray value model that is used in ASM is replaced by a novel non-parametric multi-class model that can deal with arbitrary gray value distributions and makes more effective use of the prior information on gray level structure around the object contour. We have evaluated our method on segmentation of thrombus in abdominal aortic aneurysms (AAA) in CTA data. Most publications on computerized AAA segmentation have concentrated on segmentation of the contrast-filled lumen. Thrombus segmentation is a more difficult problem, complicated by regions of low boundary contrast and by many neighboring structures in close proximity to the aneurysm wall. Previously reported approaches yield inaccurate results [10] or need extensive user interaction [7]. Leave-one-out experiments were performed on 23 routinely acquired CTA scans of different patients, to compare the proposed modifications with the conventional ASM approach. All images were segmented manually by an expert.
2
Methods
The original ASM segmentation scheme is briefly described in Section 2.1. Section 2.2 presents several extensions to the shape model. The new appearance model is discussed in Section 2.3. 2.1 Active Shape Models In ASMs [1], shape variations in a training set are described using a Point Distribution Model (PDM). The shape model is used to generate new shapes, similar to those found in the training set, which are fitted to the data using a model of local gray value structure. Point distribution models. A statistical model of object shape and shape variation is derived from a set of s training examples. Each training example is described by a shape vector x containing the coordinates of n landmark points that correspond between shapes. Variations in the coordinates of these landmark points describe the variation in shape and pose across the training set. The shape vectors are aligned using Procrustes Analysis and transformed into the tangent space to the mean shape [1]. Principal Component Analysis (PCA) is applied to the aligned shape vectors. To this end, the mean shape x, the covariance matrix S, and the eigensystem of S are computed.
138
M. de Bruijne et al.
The eigenvectors φi of S provide the modes of shape variation present in the data. The eigenvectors corresponding to the largest eigenvalues λi account for the largest variation; a small number of modes usually explains most of the variation. Each shape x in the set can then be approximated using x ≈ x + Φb, where Φ consists of the eigenvectors corresponding to the t largest eigenvalues, Φ = (φ1 |φ2 | . . . |φt ), and b is the model parameter vector that weighs the contribution of each of the modes. Appearance model. Fitting the shape model to a new image requires a measure of probability that an image point belongs to the boundary. In the original ASM formulation, a linear model is constructed from gray value profiles that are sampled around the landmarks from the training set, perpendicular to the object contour. The effect of global intensity changes is reduced by sampling the first derivative and normalizing the profile. The normalized samples are assumed to be distributed as a multivariate Gaussian, and the mean g and covariance matrix Sg are computed. The measure of dissimilarity of a new profile gs to the profiles in the distribution is given by the squared Mahalanobis distance f (gs ) from the sample to the model mean: f (gs ) = (gs − g)T S−1 g (gs − g).
(1)
Optimization. The shape model is fitted to new images using a fast deterministic optimization scheme. The process initializes with a plausible shape, usually the mean. The appearance model determines for each landmark its optimal new position among 2ns + 1 candidate positions along the line perpendicular to the contour, ns on both sides. Iteratively, the shape is translated, rotated, scaled, and deformed, such that the squared distance between the landmarks and the optimal positions is minimized. √ To ensure plausible shapes, the shape parameters bi are constrained to lie within ±3 λi . This process of adjusting landmark positions and shape parameters is repeated a fixed number of N times, whereupon it is repeated at the next level of resolution. 2.2
Extending PDMs
A common problem in statistical shape modeling is that the model can be too specific to fit to new shapes properly, owing to a limited amount of training data. This is often the case with three-dimensional models, where a large number of landmarks is needed to describe a shape properly. This section describes two different approaches for generalizing models of curvilinear structures. Hereto, first the automatic landmarking used in our experiments needs to be explained. Landmarking strategy. A fixed number of slices nz is interpolated between beginning and end of the object. In AAA segmentation, the original CT-slices are used since they are perpendicular to the body axis and give approximately perpendicular cross-sectional views of the aorta. Alternatively, reformatted slices perpendicular to the object axis could be used. An equal number of landmarks nxy is placed in each slice, equidistantly along contours that were drawn manually by an expert. The starting point of a contour is the posterior point with the same x-coordinate as the center of mass.
Adapting Active Shape Models for 3D Segmentation
139
Fig. 1. Shape models built from the two input shapes on the top left. The axis of one of the input shapes is straight while the diameter of its cross-section increases towards the vertical center. The other input shape has a constant circular cross-section around a curved axis. A PDM built directly from these two input shapes contains one mode of shape variation, varying between thin shapes curved to the left and fat shapes curved to the right. The combined model (TPDM) finds two modes of variation; the first describes a curving of the object’s axis and the second describes an increase or decrease in diameter from the ends towards the center.
We model 3D cylindrical shape variations, restricting the deformation to in-slice landmark displacements. Before the model is fitted to a new image, the user indicates the beginning and end of the desired segmentation, thus removing the need for scaling in the z direction. As a consequence, the shape vectors contain only x and y coordinates. Modeling axis and cross-sections separately. The ability of the model to generalize to unseen shapes can be increased by modeling the axes and cross-sections separately, thus assuming that both types of shape variation are uncorrelated. Subsequently, the two models are combined into one model describing both deformations. To this end, s central axes and s straightened shapes are extracted from the s aligned training shapes. Each axis contains one landmark per slice, defined by the centroid of the contour in that slice. The straightened shapes are formed by translating each contour such that its centroid is in the origin. PDMs are derived for both shape distributions as described in Section 2.1. To combine the mode vectors of the two models they need to be of equal dimensions. However, while the axis modes have 2nz coordinates, the straightened modes are of dimension 2nz nxy . To extend a mode of axis variation to 2nz nxy , a shape vector is constructed which has the nxy landmarks in each slice positioned at the axis points. If this deformation is applied to a shape x, the landmarks in each slice of x are translated such that their centroid coincides with the deformed axis. In general, the two models will not be linearly independent. A second PCA is therefore performed to remove any correlation between the axis and the cross-sectional modes.
140
M. de Bruijne et al.
1st x-mode
2nd x-mode
4th x-mode ψ1
ψ2
ψ5
Fig. 2. Examples of synthetic deformation of a cylinder. The frequency of z-deformation increases from left to right, and that of x-deformation from top to bottom.
The modes of shape variation of the combined model are thus given by the principal components of (Φcross Wcross |Φaxis Waxis ), where Φcross and Φaxis are concatenations of mode vectors and Wcross and Waxis are diagonal weight matrices of the corresponding √ λi . The resulting model contains at maximum 2(s − 1) modes, provided that s − 1 < n, while a model built from all shapes directly would contain only s − 1 modes. Figure 1 illustrates the effect of this generalization. Additional smooth variation. Several authors have investigated the combination of statistical deformation modes with synthetic smooth deformation obtained from finite element method (FEM) models of shape instances, or smooth deformation independent of the object’s shape. For instance, Wang and Staib [9] apply smoothness constraints in the form of a smoothness covariance matrix C that consists of positive numbers on the diagonal and off-diagonal elements representing neighboring points, so that each point is allowed more variation, and neighboring points are more likely to vary together. C is added to the covariance matrix S obtained from the training data, and an extended shape model is obtained by computing the eigenvectors of the new covariance matrix. A disadvantage of this procedure is that the eigenvectors of the full D ×D covariance matrix have to be computed, while in the case that the number of samples s is smaller than the dimensionality of the shape vectors D, PCA requires only the eigenvectors of an s × s matrix. Eigenvector decomposition is an O(D3 ) problem and becomes impractical for high dimensions. Our approach is similar to that of Wang and Staib [9], but we circumvent the computation of the eigenvectors of the full covariance matrix by decoupling the deformation in the x, y, and z directions. The 3D deformation modes of a cylindrical object are thus built up of smooth deformations of cyclic sequences of x and y-coordinates and a non-cyclic sequence of z-coordinates. For the cyclic sequences, C is circulant and there-
Adapting Active Shape Models for 3D Segmentation
141
fore has sinusoidal eigenvectors. The first eigenvector is a constant, corresponding to a translation of the entire object, and subsequent eigenvectors correspond to sine-cosine pairs with an increasing number of full periods. For the non-cyclic sequence, the first eigenvector approximates a half period of a sine. Subsequent eigenvectors correspond to approximate sines with an increasing number of half periods. We set the elements of the synthetic covariance matrix according to a Gaussian. The x and y deformation are then given by the eigensystem of the circulant nxy × nxy matrix di,j
2
with elements e−( 2σ ) , where i and j are the matrix row and column indices, nxy is the number of landmarks in one slice, and di,j = Min{|i − j|, |i − j + nxy |, |i − j − nxy |}. The z deformation is given by the eigensystem of a similar but non-circulant nz × nz matrix, with nz the number of slices in the model and di,j = |i − j|. In the following we denote the eigenvectors of the xy and z deformation by χi and ψi respectively. The deformations in the xy plane of the entire shape are now given by 2nxy shape vectors where the elements corresponding to x-coordinates in each slice are set according to one of the smooth x deformation modes, while the y-elements are zero, or the other way around. To include all possible variations along the z-axis, each of the xy-modes is combined with each of the z-modes by multiplying the elements in a slice of the xy-mode by the corresponding element of the z-mode: xi,j = 0 xi,j = ψ(i) · χ(j) or (2) yi,j = 0 yi,j = ψ(i) · χ(j) where i is the slice index and j is the number of the landmark in the slice. The resulting deformation vectors are centered around the origin and normalized to unit length. The eigenvalues, used for weighting of the modes, are obtained through multiplication of the eigenvalues that correspond to the original xy and z modes. The result is an orthonormal set of 2n vectors describing smooth cylindrical deformations. In practice, a much smaller number of harmonics is chosen, such that only low-frequency deformations remain. The eigenvalues are multiplied by a weight factor α and the model is combined with the statistical model in the same way as the axis and cross-section models are combined in the previous subsection. Figure 2.2 shows several examples of smooth deformation modes applied to a cylinder. The parameters involved in this augmented model are the smoothing scale σ, the number of synthetic modes retained, and the weight factor α. The scale σ mainly weighs the modes of different frequencies; a larger σ increases the eigenvalues for low frequencies and decreases the eigenvalues of high frequency variation, thus favoring smoother deformation. The weight factor α weighs the synthetic model with respect to the statistical model and should decrease if more training shapes are added. These parameters can for instance be selected by defining a threshold on the maximum reconstruction error allowed in leave-one-out experiments on the training data. 2.3 A Nonlinear Appearance Model We previously showed that the Mahalanobis distance to the average training profile does not perform well in AAA boundary localization [7]. A shortcoming of this gray value model is that only the appearance of the correct boundary is learned from the training
142
M. de Bruijne et al.
set. Furthermore, the underlying assumption of a normal profile distribution often does not hold. To deal with a non-linear profile distribution, Bosch et al. [11] performed a non-linear normalization to transform an asymmetric, but unimodal distribution into a Gaussian. Brejl and Sonka [12] applied fuzzy c-means clustering to feature vectors derived from intensity profiles, allowing for a (known) number of separate normal distributions. Van Ginneken and co-authors [5] did not use intensity profiles, but applied local texture features and kNN classification to determine the boundary between object and background, hence allowing arbitrary distributions as long as the texture of object and background are different. In medical image segmentation tasks, surrounding structures are often similar to the object in gray value and texture, and the ordering of gray values along the profile can become important. We propose to treat the position evaluation step in the ASM optimization as a classification of boundary profiles. Like in the original ASM formulation, gray value profiles are sampled from the training set, but now a classifier is trained on both correct and incorrect boundary profiles. Raw intensity profiles are used instead of the normalized derivative profiles of the linear model. For each landmark, one boundary profile is sampled around the landmark and perpendicular to the contour, and 2nshift non-boundary profiles are sampled in the same direction, nshift displaced outwards and nshift displaced inwards. In a new image, the probability that a given profile lies on the aneurysm boundary is given by the posterior probability from the classifier for that profile. In this work, a kNN classifier is used and the posterior probability is given by P (boundary|gs ) =
nboundary , k
(3)
where nboundary is the number of boundary samples among the k nearest neighbors. Like in the original ASM formulation, separate models are built for different resolutions.
3
Experiments and Results
A series of leave-one-out experiments is performed on 23 routinely acquired CTA images including pre-operative as well as post-operative scans. The scan resolution is 0.488 × 0.488 × 2.0 mm. Parameter settings. We have selected a set of parameters on the basis of pilot experiments, and keep those settings fixed throughout the experiments. The shapes are described by nz = 30 slices each containing nxy = 50 landmarks; a total of 3000 landmark coordinates. The number of modes of the axis and cross-section models is chosen such that both models describe at least 99% of the total variance across the training set. The smooth deformation modes are built of a smoothness matrix with scale σ = 4, and the 26 strongest xy deformation modes and 12 strongest z deformation modes are selected, thus allowing up to 6 sine periods in all directions. The weight factor α is set such that the contribution of the synthetic model to the total variance is 10% of that of the statistical model. The statistical shape model is applied to obtain an initial estimate,
Adapting Active Shape Models for 3D Segmentation
3.5
3D PDM Axis & cross−sections Smooth
16 Segmentation error [mm]
Reconstruction error [mm]
4
3 2.5 2 1.5 1
ASM KNN
14 12 10 8 6 4 2
0.5
0
0 5
(a)
143
10 15 Dataset number
5
20
10 15 Dataset number
20
(b)
Fig. 3. (a) Root mean squared reconstruction error for all 23 datasets using all modes of variation, for a normal three-dimensional PDM (white), the combined model of axis and cross-sections (gray), and the combined model with additional smooth deformation (black). (b) Root mean squared segmentation error for all 23 datasets, for the linear model (gray) and the kNN model (black).
up to the second highest resolution. The fit is then refined on the smallest scale using the model extended with synthetic deformation. The profiles of the gray value models consist of 7 samples. The kNN appearance model contains, in addition to a correct boundary profile, examples shifted dshift = 2 voxels inwards and outwards, for each landmark. The number of neighbors considered in the kNN probability estimation, k, is 80. The fitting algorithm evaluates ns = 5 possible new positions on both sides of the present landmark position, and performs 5 iterations at each of 4 resolution levels. Shape model evaluation. The validity of the shape model is tested by fitting the model directly to the manual segmentations, which gives an upper bound for the accuracy that can be obtained when the model is fitted to new image data. Figure 3.a shows the root mean squared landmark-to-contour error for all datasets. Modeling the axis and cross-section separately reduced the reconstruction leave-one-out error in all cases; the average error was reduced from 2.2 to 1.6 mm. The average error decreases to 0.74 mm if smooth deformation modes are added to the model. Appearance model evaluation. In vascular images, there is no true anatomical correspondence between the landmarks of different shapes. Therefore, we use one appearance model for all landmarks together, instead of building separate models as is more commonly done in ASM. Pilot experiments on image slices have shown that this approach gives slightly better results for both the linear and the kNN model, even if many training examples are available. Figure 4 shows the error in optimal position selection as a function of the size of search region ns around the manual contour. The kNN model performs significantly better than the conventional ASM gray value model at all resolutions.
144
M. de Bruijne et al.
20
20 scale 0 scale 1 scale 2 scale 4 random
18 16
16 14 RMS error [mm]
RMS error [mm]
14 12 10 8
12 10 8
6
6
4
4
2
2
0
0 0
5
10
15
20
25
30
35
0
Search region [mm]
(a)
scale 0 scale 1 scale 2 scale 4 random
18
5
10
15
20
25
30
35
Search region [mm]
(b)
Fig. 4. Root mean squared error of landmark positioning, without fitting the shape model, as a function of the length of the search region on either side of the contour, for (a) the original ASM gray value model, and (b) the kNN gray value model. The dotted line corresponds to the expected error for random landmark selection.
Initialization and constrained optimization. The complexity of the images and the local nature of ASM optimization require an accurate initialization. In our segmentation system, the user draws the top and bottom contours of the aneurysm manually. To aid the model in establishing the correct object axis an additional point is placed in the approximate aneurysm center of the central slice. An initial estimate is obtained by iteratively fitting the shape model to these points. After each iteration, the landmarks of the manually drawn slices are replaced to their original position and the landmarks of the central slice are translated such that their average position coincides with the manually identified center point. Alternatively, an automatic estimate of the luminal or aneurysmal axis or a — more easily automated — lumen segmentation could be used for initialization. Subsequently, a fixed number of slices is interpolated from the image, and the shape model is fitted at multiple resolutions to optimally match the local image structure, given the two manually drawn contours. The segmentation process is constrained by keeping the two manually drawn slices fixed. To make the fitting process more resistant to outliers, we have applied dynamic programming regularization [6] followed by a weighted least squares fit [8], in which the weights are given by the posterior probability obtained from the gray value model. Segmentation results. Given this initialization and the constrained optimization scheme, the segmentation method using the extended shape model and the kNN gray value model converged successfully in 21 out of 23 cases. Examples of segmented slices, randomly chosen from these 21 datasets, are shown in Figure 5. Figure 3.b shows the segmentation errors obtained using the two gray value models. The kNN model yields significantly better results than the original ASM model (p < 0.00001 in a paired t-test). Average root mean squared errors are 1.9 and 8.1 mm (3.9
Adapting Active Shape Models for 3D Segmentation
145
Fig. 5. Image slices taken randomly from the 21 successful segmentations, with the manually drawn contour (dots), the segmentation obtained using original ASM (pluses) and the segmentation obtained with the kNN gray value model (continuous line). The kNN model obtains a segmentation near the manual contour in all four cases, while the original ASM gray value model finds a satisfactory segmentation only in the third image.
and 17 voxels). The relative volumes of overlap are 95% and 64%, and average volume errors are 5.1% and 45%. There are two datasets in which the error obtained using the kNN model is larger than half a centimeter. One of these combines an extremely wide aneurysm with calcifications, which are usually found only at the boundary, inside the aneurysm; in the other dataset the aneurysm is embedded in other structures with similar gray value for over 10 adjacent slices, while the total region comprised by the aneurysm and its surrounding structures forms a plausible aneurysm shape. If these two problematic datasets are left out of consideration, the average error of the remaining 21 datasets is 1.4 mm. The corresponding volume of overlap is 96% and the relative volume error 2.8%. Wever et al. [13] reported an inter-observer reproducibility coefficient (RC) of 8.3% and intra-observer RC of 3.2% and 5.7% for measurement of the total aneurysm volume in CTA. RC is, according to Bland and Altman [14], defined as 1.96 times the standard deviation of the differences in measurements. The RC of the automated measurements with respect to the expert measurements is 4.7%. Automated segmentations initialized by a second observer yield RC=5.2% with respect to the manual segmentations, and RC=1.7% as compared to the first set of automated measurements.
4
Discussion and Conclusions
Segmentation methods based on linear statistical models, such as ASM, obtain good results in many applications. In some cases however, a shape model based on statistics alone is too specific, and a linear gray value model is not always able to find the correct contour. We have presented an application — AAA segmentation — in which conventional ASM can not find a suitable segmentation. We have shown how shape models of elongated objects can be made more flexible by modeling the object axis and cross-sections independently. The idea of decoupling different types of shape variation and treating them as independent can be applied more generally. For instance, when modeling vascular trees, different segments could be modeled separately. In multiple object models, each object can be modeled separately whereafter the objects are joined into one combined model. The general relations be-
146
M. de Bruijne et al.
tween different objects are then retained, while the correlation between shape variation in different objects is removed. Such a model is more flexible but also increases the risk of producing invalid shapes, like overlapping objects. An orthonormal basis of smooth deformation modes was constructed using the eigenvectors of small matrices. The approach presented is valid for tubular objects, where the shape can be described by a stack of contours with an equal number of landmarks in each contour. In arbitrary shapes, decoupling the deformation in x, y, and z-direction would require computation of the eigenvectors of an n × n instead of an 3n × 3n matrix, still greatly reducing computation time. In AAA segmentation, we used contours in the original CT slices to build the model, and deformation is restricted to in-slice landmark displacements. We believe this approach is valid in the case of CTA images, which are in general highly anisotropic (in the images used in this study the voxels are over 4 times larger in the z-direction). However, the presented methods can also be applied to reformatted slices perpendicular to the object axis. The improvement of the presented gray value model over the original ASM gray value model is twofold. First, not only the appearance of the boundary but also the appearance of points near the boundary is learned from the training set. Second, we do not assume a Gaussian intensity profile distribution but estimate the distribution nonparametrically with kNN probability density estimation. The latter is responsible for a dramatic increase in computation time; a full segmentation took on average 25 seconds on a 1.7 GHz Pentium PC when the original ASM gray value model was used and 450 seconds using the kNN model. If computation time is an issue the method could be sped up by using fewer shifted examples and pruning the kNN tree, or by using other classifiers [15]. For instance, a quadratic discriminant classifier could be used, which is equivalent to extending the original ASM gray value modeling scheme to more classes. In leave-one-out experiments on 23 datasets, the shape approximation error was successfully reduced by modeling axis and cross-section deformation independently, and by adding supplementary smooth deformation modes. The kNN appearance model significantly outperforms the original one-class linear gray value model (p <0.00001). Obtained volume errors with respect to expert segmentations are comparable to interobserver errors reported in the literature, while the inter-observer agreement for automated segmentation initialized by two different observers is better than for manual segmentation. Acknowledgments. This research was funded by the Netherlands Organization for Scientific Research (NWO). We would like to thank our colleagues M. Prinssen, M.J. van der Laan, and J.D. Blankensteijn from the Department of Vascular Surgery for providing the datasets and expert segmentations.
References 1. T. Cootes, C. Taylor, D. Cooper, and J. Graham, “Active shape models – their training and application,” Computer Vision and Image Understanding 61(1), pp. 38–59, 1995.
Adapting Active Shape Models for 3D Segmentation
147
2. T. Cootes, G. Edwards, and C. Taylor, “Active appearance models,” IEEE Transactions on Pattern Analysis and Machine Intelligence 23(6), pp. 681–684, 2001. 3. S. Joshi, S. Pizer, P. Fletcher, P. Yushkevich, A. Thall, and J. Marron, “Multiscale deformable model segmentation and statistical shape analysis using medial descriptions,” IEEE Transactions on Medical Imaging 21(5), pp. 538–550, 2002. 4. N. Duta and M. Sonka, “Segmentation and interpretation of MR brain images: An improved active shape model,” IEEE Transactions on Medical Imaging 17(6), pp. 1049–1067, 1998. 5. B. van Ginneken, A. Frangi, J. Staal, B. ter Haar Romeny, and M. Viergever, “Active shape model segmentation with optimal features,” IEEE Transactions on Medical Imaging 21(8), pp. 924–933, 2002. 6. G. Behiels, F. Maes, D. Vandermeulen, and P. Suetens, “Evaluation of image features and search strategies for segmentation of bone structures in radiographs using active shape models,” Medical Image Analysis 6(1), pp. 47–62, 2002. 7. M. de Bruijne, B. van Ginneken, W. Niessen, J. Maintz, and M. Viergever, “Active shape model based segmentation of abdominal aortic aneurysms in CTA images,” in Medical Imaging: Image Processing, M. Sonka and M. Fitzpatrick, eds., Proceedings of SPIE 4684, pp. 463– 474, SPIE Press, 2002. 8. M. Rogers and J. Graham, “Robust active shape model search,” in Proceedings of the European Conference on Computer Vision (ECCV’02),A. Heyden, G. Sparr, M. Nielsen, and P. Johansen, eds., Lecture Notes in Computer Science 2353, pp. 517–530, Springer, 2002. 9. Y. Wang and L. Staib, “Statistical shape and smoothness models for boundary finding with correspondence,” IEEE Transactions on Pattern Analysis and Machine Intelligence 22(7), pp. 738–743, 2000. 10. M. Subasic, S. Loncaric, and E. Sorantin, “3D image analysis of abdominal aortic aneurysm,” in Medical Imaging: Image Processing, M. Sonka and M. Fitzpatrick, eds., Proceedings of SPIE 4684, pp. 1681–1689, SPIE Press, 2002. 11. H. Bosch, S. Mitchell, B. Lelieveldt, F. Nijland, O. Kamp, M. Sonka, and J. Reiber, “Active appearance-motion models for endocardial contour detection in time sequences of echocardiograms,” in Medical Imaging: Image Processing, M. Sonka and K. Hanson, eds., Proceedings of SPIE 4322, SPIE Press, 2001. 12. M. Brejl and M. Sonka, “Object localization and border detection criteria design in edge-based image segmentation: automated learning from examples,” IEEE Transactions on Medical Imaging 19(10), pp. 973–985, 2000. 13. J. Wever, J. Blankensteijn, J. van Rijn, I. Broeders, B. Eikelboom, and W. Mali, “Interand intra-observer variability of CTA measurements obtained after endovascular repair of abdominal aortic aneurysms,” American Journal of Roentgenology 175(5), pp. 1297–1282, 2000. 14. J. Bland and D. Altman, “Statistical methods for assessing agreement between two methods of clinical measurement,” Lancet (1), pp. 307–310, 1986. 15. A. Jain, R. Duin, and J. Mao, “Statistical pattern recognition: a review,” IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1), pp. 4–37, 2000.
A Unified Variational Approach to Denoising and Bias Correction in MR Ayres Fan1 , William M. Wells2,3 , John W. Fisher1,3 , M¨ ujdat C ¸ etin1 , 2 2 2 Steven Haker , Robert Mulkern , Clare Tempany , and Alan S. Willsky1 1
2
Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA USA Brigham and Women’s Hospital, Harvard Medical School, Boston, MA USA 3 Artificial Intelligence Laboratory, MIT, Cambridge, MA USA [email protected]
Abstract. We propose a novel bias correction method for magnetic resonance (MR) imaging that uses complementary body coil and surface coil images. The former are spatially homogeneous but have low signal intensity; the latter provide excellent signal response but have large bias fields. We present a variational framework where we optimize an energy functional to estimate the bias field and the underlying image using both observed images. The energy functional contains smoothness-enforcing regularization for both the image and the bias field. We present extensions of our basic framework to a variety of imaging protocols. We solve the optimization problem using a computationally efficient numerical algorithm based on coordinate descent, preconditioned conjugate gradient, half-quadratic regularization, and multigrid techniques. We show qualitative and quantitative results demonstrating the effectiveness of the proposed method in producing debiased and denoised MR images.
1
Introduction
In magnetic resonance (MR) image acquisition, there is a fundamental trade-off between noise and spatially-homogeneous signal response. An uncorrupted image (which we refer to as the true image or the intrinsic image) would depend solely on the underlying tissue and the imaging parameters. Receiving with a body coil (BC) results in low signal-to-noise ratio (SNR) but good spatial homogeneity. Surface coils (SCs) have strong signal response near the coil, but the intensity rapidly diminishes with distance [1]. This variable response allows better visualization of the region of interest (ROI) but results in a systematic intensity inhomogeneity known as the bias field. The intensity distortions caused by the bias field can significantly impair both visual inspection and image processing tasks, and separating the bias field from the true underlying image is an underconstrained and ill-posed problem—there are half the number of observations as there are free variables. The earliest bias correction techniques relied on phantoms [2] or homomorphic unsharp filtering [11], but both methods have severe limitations. Dawant et C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 148–159, 2003. c Springer-Verlag Berlin Heidelberg 2003
A Unified Variational Approach to Denoising and Bias Correction
149
al. [8] fit thin-plate splines to the bias field using a least-squares penalty. Likar et al. [15] compute a parameterized bias field estimate that minimizes the entropy of the reconstructed image. Wells et al. [21] exploit the duality behind the segmentation and bias correction problems by using the expectation-maximization (EM) algorithm [9] to alternately segment and debias brain images. Many have improved on this framework including Zhang et al. [22] who use a Markov random field to model the bias field. Sled et al. [19] sharpen the histogram of the observed image using deconvolution and use the resulting a priori density to do Bayes least-squares estimation of the true image. A few techniques capture a BC image to help correct the SC image. Brey and Narayana [5] estimate the bias field as the ratio of the two low-pass filtered observation images. Lai and Fang [14] estimate the bias field by fitting a membrane model to the ratio of the SC and BC images. Pruessmann et al. [17] fit local polynomials at every point in the image. Our method is related to the imaging framework proposed by Brey and Narayana. We exploit the homogeneity of the BC and the high SNR of the SC to create a composite image that has higher SNR than either observation image and a minimal bias field. We construct a general variational framework which can be adapted to a number of different imaging setups. We introduce a computationally efficient approach to solve the variational problem, and we demonstrate our algorithm on a variety of MR imaging applications.
2
Problem Formulation
2.1
Observation Model
We formulate our observation model in a discrete manner. We place the BC and SC observation image pixels into column vectors y B and y S respectively. We assume the SC has a bias field b∗ , and the BC has a constant gain field1 . We stipulate that both observations have the same intrinsic image f ∗ : y B = f ∗ + nB y S = b∗ ◦ f ∗ + n S .
(1) (2)
In the above equation, ◦ represents the Hadamard product [12] (or Sch¨ ur product or entrywise product). Each element of the noise vectors nB and nS is assumed to be independent and identically distributed (IID). This is justified by the thermal nature of the noise. In the ROI, b∗ tends to be significantly larger than 1 which results in higher SNR for y S than y B . We introduce two diagonal matrices B ∗ and F ∗ which have b∗ and f ∗ respectively as their diagonal entries. We can then rewrite (2) as y S = B ∗ f ∗ + nS = F ∗ b∗ + nS . 1
(3)
We can only specify f ∗ and b∗ up to a multiplicative constant. Generally, f ∗ and 2f ∗ are equivalent. Without loss of generality, we set the gain of the BC to be 1.
150
A. Fan et al.
The noise in magnitude MR images is accurately modeled by a Rician distribution [16]. Rician random variables are generated by taking the norm of a Gaussian random vector with arbitrary mean. As the SNR increases, the Rician probability density function (PDF) approaches the Gaussian PDF. The Rician PDF is unwieldy to work with, so we treat the noise as Gaussian and zeromean in our algorithm. Rician noise has a positive mean, so this assumption results in a biased estimator. In most applications, the SNR in tissue regions is high enough so that our Gaussian noise assumption is reasonable, and only a moderate upward bias is imparted. 2.2
Variational Formulation
We formulate a variational problem with a statistical interpretation. This results in an energy functional which we seek to minimize. We do not take the log transform of our observations, but instead pose our energy functional directly in the original multiplicative form. This leads to a cleaner formalism but imposes the need to do nonlinear estimation. We define an energy functional: E(f , b) = λB y B − f 2 + λS y S − b ◦ f 2 + αLb2 + γDf pp
(4)
ˆ as the vectors that minimize E(f , b): and choose our optimal estimates fˆ and b ˆ = arg min E(f , b) . fˆ , b f ,b
(5)
λB , λS , γ, and α are positive weights. ·p represents the p norm, and · represents the 2 norm. We design L and D to approximate derivative operators (generally either gradient or Laplacian operators) as finite differences. The 2 norms for our data fidelity terms (the first two terms) in (4) imply a Gaussian noise assumption if the problem is formulated as a maximum a posteriori (MAP) estimation problem. From this perspective, we see that the scalar weights λB and λS should be proportional to the inverse noise variances for each observation image. We use Tikhonov-type regularization to make our intrinsic image and bias field estimates conform to our prior knowledge of the signals [10]. Specifically, we ensure that our bias field estimate is smooth and our intrinsic image estimate is piecewise constant. The regularization on fˆ is similar to putting an anisotropic edge-preserving filter into our method. It is well known that 2 norms tend to overpenalize large derivative values associated with edges. Hence, using 2 regularization in image reconstruction tends to oversmooth edges, and p norms with p < 2 are said to be edge preserving. 2.3
Extension to Multiple SCs
Multiple SC images can be simultaneously captured using carefully crafted coil arrays without requiring additional image acquisition time [18]. Multiple coils are used due to the typically sharp drop-off in sensitivity far away from SCs. By distributing the coils spatially, we achieve better signal coverage. One way to process multiple SC images is to combine them into one composite SC image
A Unified Variational Approach to Denoising and Bias Correction
151
using a method such as Roemer’s sum-of-squares technique [18] and then use our formulation in (4). However, there are advantages to processing the SC measurements individually. We introduce a new measurement model where we receive one BC image and K SC images: y B = f ∗ + nB y S,k = b∗k ◦ f ∗ + nS,k (1 ≤ k ≤ K) .
(6) (7)
We can extend (4) to handle this more general case: E = λB y B − f 2 +
K
λS,k y S,k − bk ◦ f 2 +
k=1
K
αk Lk bk 2 + γDf pp .
k=1
(8) We obtain superior results minimizing (8) because we can optimally combine ˆk . Additionally, with the the SC observations by waiting until we have each b composite SC image, α and L are determined by the least homogeneous coil response. Processing the SC images individually allows us to choose αk and Lk to individually tune the regularization for each coil. 2.4
Extension to Multiple Pulse Sequences
Multiple scans of the same location using different pulse sequences (e.g., T1 weighted and T2 -weighted ) are commonly acquired. The bias fields in all of the SC images are nearly identical, so we can achieve satisfactory results using only one BC image. Our measurement model for this case again involves one BC image and K SC images, but this time each SC image has the same bias field but different intrinsic images: y B = f ∗1 + nB y S,k = b∗ ◦ f ∗k + nS,k (1 ≤ k ≤ K) .
(9) (10)
f ∗1
to correspond to the intrinsic Without loss of generality, we have assigned image in the BC image. We can again generalize (4) to handle this case: E = λB y B − f 1 2 +
K
λS,k y S,k − b ◦ f k 2 + αLb2 +
k=1
K
γk D k f k pp .
k=1
(11) Additionally, more complex permutations beyond the two extensions we have presented can also be handled (e.g., M pulse sequences captured with N SCs).
3
Solution of the Optimization Problem
This section details the solution to the optimization problem defined in Sec. 2. We will only describe the solution to (4). Extensions for (8) and (11) as well as 3D volumes are straightforward. A closed-form solution for (4) does not exist, and
152
A. Fan et al.
gradient descent on the full energy functional is slow and cumbersome. Therefore, we minimize (4) using coordinate descent. This is an iterative technique that alternately minimizes the energy for f and b. This results in estimates fˆ (i) ˆ(i) at each iteration i. Coordinate descent is useful in problems where and b computing solutions over all of the variables is difficult, but computing solutions over a subset is relatively easy. At each iteration, we refer to the computation of ˆ(i) as a f-step and a b-step respectively. A stationary point obtained fˆ (i) and b through coordinate descent is also a stationary point of the overall minimization problem. In order for coordinate descent to terminate, the derivative for each coordinate must be zero. Thus the gradient of the complete energy functional is zero. 3.1
Bias Field Solution
For a given f , (4) is quadratic in terms of b. Thus setting the gradient of E with respect to b equal to zero results in a simple linear equation: ˆ(i) = λS F y (λS F 2 + αLT (12) S b Lb )b Although we could solve (12) by direct matrix inversion, we note that (λS F 2 + αLT b Lb ) ≥ 0, so the subproblem is convex. Hence we can use an iterative algorithm such as preconditioned conjugate gradient [3] to efficiently compute solutions. We use as a preconditioner the tridiagonal matrix composed of the main diagonal and the adjacent subdiagonals of (λS F 2 + αLT b Lb ) in order to make our preconditioners easy to construct and apply. 3.2
Intrinsic Image Solution ˆ . To provide some insight, we examine the minimizaNo Regularization on f tion of (4) for a given b and γ = 0. We take the gradient of E with respect to f and set it equal to zero to obtain a pointwise solution at each pixel index n: λB y B [n] + λS b[n]y S [n] fˆ (i) [n] = λB + λS b2 [n]
(13)
Because λB and λS are related to the inverse noise variances, fˆ (i) [n] is the noiseweighted convex combination of y B [n] and y S [n]/b[n] with a spatially varying weighting factor. In contrast, Brey and Narayana [5] only use the data from y S to construct fˆ . This works well when b[n] 1, but in regions where the SC response is weak, using both observation images can be advantageous. Half-Quadratic Solution. We now describe the f-step for a general γ. When p = 2, the optimization problem for f with a given b is non-quadratic, and we obtain a nonlinear condition for the minimum. The p norm for p ≤ 1 is non-differentiable at zero, so we use a smoothed approximation: xpp ≈ (x2 [n] + ξ)p/2 . (14) n
As ξ → 0, the approximation approaches the unsmoothed norm.
A Unified Variational Approach to Denoising and Bias Correction
153
Half-quadratic optimization is a fixed-point iterative scheme pioneered by Geman and Reynolds [10] that constructs a weighted-2 approximation at each sub-iteration j. It has been demonstrated [20] that half-quadratic optimization provides superior convergence rates compared with gradient descent. Using halfquadratic optimization results in a linear condition on fˆ (i,j) : (15) λB I + λS B 2 + γD T W (i,j) D fˆ (i,j) = λB y B + λS By S with the weighting matrix W (i,j) being diagonal with the following entries: p/2−1 p W (i,j) [n, n] = . (16) ((D fˆ (i,j−1) )[n])2 + ξ 2 This preserves edges by weighting the 2 norm less in regions with large derivatives. Equation (15) is a positive definite linear system which we can again solve using preconditioned conjugate gradient. One of the key features of (15) is that the effective amount of regularization is spatially varying—less smoothing is performed in regions where B is large. This is superior to applying an anisotropic post-processing filter to the output of our algorithm. Depending on the regularization strength, post-processing will either oversmooth in high SNR regions or undersmooth in low SNR regions. 3.3
Convergence and Speed
The energy functional E in (4) is non-convex due to the cross-multiplication between b and f . Our algorithm possesses convergence qualities similar to the EM algorithm [9]. Each f- and b-step decreases the energy, so our algorithm will at least find a local minimum of E. In practice, we have found excellent convergence properties with the algorithm converging to identical reasonable solutions for random initializations. Multigrid techniques [6] can help avoid local minima and improve computation speed for large problems. We use a basic form of multigrid with a single coarse-to-fine sweep. We downsample our data to the coarsest level we wish to process. We then run our coordinate descent solver at this level and upsample the results to the next finest level. This cycle repeats until we have a solution at the original scale. The key advantage of multigrid is that the low-frequency components of the solution can be more efficiently computed at the coarser scales. 3.4
Parameter Selection and Initialization
There are a number of parameters that need to be set in our energy functional: λB , λS , α, γ, and p. We generally use p = 1 because it is the smallest value of p that allows the f-step to remain convex. Ideally, we would specify α and γ based on training data (e.g., phantom scans of the SC profiles, long acquisition-time BC images). In practice, we choose the parameters based on subjective visual assessment of the results. Because we use an iterative solver, we must specify ˆ(i) . The convergence speed of our solver can be initial values for both fˆ (i) and b
154
A. Fan et al.
(a)
(e)
(b)
(f)
(g)
(c)
(h)
(i)
(d)
(j)
(k)
(l)
Fig. 1. Synthetic axial T1 -weighted brain images. (a) True image (f ∗ ). (b) BC image (y B ). Estimated intrinsic image (fˆ ) computed with (c) Brey-Narayana and (d) proposed method using γ = 0.014. (e)–(h) SC images (y S,1 –y S,4 ). (i)–(l) Estimated bias ˆ1 –b ˆ4 ). αk = 2000. Convergence in 63 sec. fields (b
greatly impacted by these choices. We use the bias correction method of Brey and Narayana to produce simple and effective initializations. We stated that λB and λS should be related to the inverse noise variances of y B and y S respectively. We can estimate the noise variances directly from the images using the method from Nowak [16]. The true signal should be uniformly zero in air-filled regions, so the second moment of y B in these regions should 2 then equal 2σB . We can approximate the expected value by taking the sample 2 . Note that the bias field has average over a large air-filled region to obtain σB no effect in air-filled regions, so we can perform this same technique for y S . When using a multigrid solver, we fix λB , λS , α, and γ at the original scale. We must also choose parameters at each scale s so that the solutions at the coarser and original scales are similar. The λ’s should scale by 4s (or 8s in 3D) due to noise reduction from spatial averaging. For wavelet-based reconstruction, others have found that multiplicative scaling of the regularization parameters is effective [4]. Hence we multiply α and γ at each scale s by experimentally determined scalars k1s and k2s respectively.
4
Results
In this section, we demonstrate results on real and synthetic data. All real data in this section were captured on General Electric Signa 1.5-T machines. We computed results on a Pentium 4 1.8 GHz workstation using our multigrid solver and stopped the algorithm when the energy changed by less than 0.01%. Convergence times are indicated in the figure captions. For all results, we use Laplacian
A Unified Variational Approach to Denoising and Bias Correction 4
17
10
gamma=0 gamma=0.014 B−N
gamma=0 gamma=0.014 B−N
16
15
Total segmentation errors
SNR gain over body coil image in tissue regions (dB)
155
14
13
3
10
12
11
0
5
10 15 SNR of body coil image in tissue regions (dB)
20
25
(a)
0
5
10 15 SNR of body coil image in tissue regions (dB)
20
25
(b)
Fig. 2. Performance provided by Brey-Narayana and proposed correction method with varying SNR levels. (a) SNR gain over the BC image (y B ). (b) Total GM and WM segmentation errors. Averaged over 10 Monte Carlo trials. Table 1. Quantitative comparisons using the MNI brain phantom. Corrected images are generated using Brey-Narayana and the proposed method with γ = 0 and γ = 0.014. The first two lines are the mean squared error and mean absolute error (based on the true image f ∗ ) computed only in tissue regions. The last two lines are the percentage of misclassified points in GM and WM regions. Results averaged over 20 random trials. yB MSE (tissue) 196,542 MAE (tissue) 353.91 GM errors 64.4% WM errors 24.7%
B-N 20,820 113.59 14.6% 3.6%
fˆ , γ = 0 fˆ , γ = 0.014 20,428 10,901 112.64 81.99 14.3% 9.9% 3.3% 2.4%
ˆ and gradient regularization on fˆ . The numerical values of γ regularization on b presented in this section are not very informative because of scaling variations in the examples. The α values are a measure of relative smoothness because the bias field is unchanged if y B and y S are both scaled equally. We begin with synthetic results using the Montreal Neurological Institute (MNI) [7, 13] BrainWeb simulator. We used the T1 -weighted images with 1 mm slice thickness and constructed synthetic bias fields that simulate a four-coil phased array. We then added Rician noise to obtain our BC and SC images. For tissue regions of y B , the noise resulted in SNR of 13 dB and a bias of 2-3%. Estimates were computed within our multiple SC framework by minimizing (8). We present the observation and corrected images in Fig. 1. The bias field estimates are largely independent of the tissue, and our method produces a fˆ with noticeably superior noise properties than Brey-Narayana. These visual impressions are confirmed with our quantitative results in Table 1 with mean
156
A. Fan et al.
(a)
(e)
(b)
(f)
(g)
(c)
(h)
(i)
(d)
(j)
(k)
(l)
Fig. 3. Gated cardiac MR images. (a) BC image (y B ). Estimates of the intrinsic image (fˆ ) using (b) Brey-Narayana and proposed method with (c) γ = 0 and (d) γ = 1800. ˆ1 –b ˆ4 ). αk = 3000. Con(e)–(h) SC images (y S,1 –y S,4 ). (i)–(l) Estimated bias fields (b vergence in 71 sec.
squared error 48% lower than Brey-Narayana. Segmentation accuracy is another way to quantify the quality of the bias correction. We generated gray matter (GM) and white matter (WM) segmentation results using a thresholding scheme with manual skull peeling on f ∗ and the corrected images. Compared with BreyNarayana, we reduce overall segmentation error by 33%. In Fig. 2, we show how the different bias correction schemes function as the SNR is varied. Our method with γ = 0 consistently outperforms Brey-Narayana due to better bias field estimates. In high SNR regions, all methods provide similar results. As the SNR is decreased, our method with regularization on fˆ builds up a significant advantage over the other methods. At 0 dB, Brey-Narayana and our method with γ = 0 produce segmentation error rates of 51% (which is approximately equivalent to random guessing), while using regularization on fˆ reduces the error to 27%. Next, we apply our algorithm to one time step from a gated cardiac MR sequence in Fig. 3. For the SC images, a four-element phased array was used. The images have a field of view (FOV) of 32 cm × 32 cm, resolution of 160×192, and slice thickness of 8 mm. To obtain our results, we applied our multiple SC correction framework and minimized (8). The main differences between the BreyNarayana estimate in Fig. 3(b) and our result in Fig. 3(c) are in regions where none of the SCs have good response such as the middle and the right-hand side of the image. This is because our method uses the BC information while BreyNarayana does not. Fig. 3(d) (using fˆ regularization) is moderately better than Fig. 3(c), but the high SNR in y B and y S,k limit the benefits of filtering. In Fig. 4, we display the results of our algorithm on a real prostate image. The SCs used were an endorectal coil along with a four element pelvic phased-array coil. We captured T2 -weighted images using the BC and SCs and T1 -weighted
A Unified Variational Approach to Denoising and Bias Correction
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
157
Fig. 4. Prostate images. T2 -weighted (a) BC (y B ) and (b) composite SC (y S,1 ) images. ˆ T2 -weighted (c) Composite SC T1 -weighted image (y S,2 ). (d) Estimated bias field (b). ˆ intrinsic image estimates (f 1 ) using (e) Brey-Narayana and (f) proposed method with γ1 = 0.018. T1 -weighted intrinsic image estimates (fˆ 2 ) using (g) Brey-Narayana and (h) proposed method with γ2 = 0.010. α = 125. Convergence in 24 sec.
images using just the SCs. The T1 -weighted images do not show the internal structure of the prostate but are useful in finding the borders of the gland; the T2 -weighted images are useful for differentiating regions of the prostate as well as for tumor detection. The FOV is 12 cm × 12 cm, resolution is 256 × 256, and slice thickness is 3 mm. Estimates were computed by minimizing (11) using composite SC images because individual SC data were not available to us. The prostate is the most challenging example we consider here. The FOV is small so y B has very low SNR (about 7 dB in the prostate). To compensate, the endorectal coil produces a strong local response profile which results in a severe intensity artifact. Because the reception profile of the endorectal coil is much less homogeneous than that of the pelvic phased-array coil, the prostate would probably benefit significantly from processing each coil separately. Fig. ˆ to be under4(d) shows that using a composite surface coil image causes b regularized in regions where the endorectal coil does not dominate. Figs. 4(e)–(h) (when viewed under sufficiently high resolution) demonstrate that our method preserves edges while resulting in lower noise than Brey-Narayana. Fig. 4(h) shows that even without a BC image, we can obtain reasonable intrinsic image estimates for the T1 -weighted sequence. We show axial brain images in Fig. 5. The SCs were a four-element phased array. We captured gradient-recalled echo (GRE) images using both the BC and SCs and fluid attenuated FLAIR images using the SCs. FOV is 24 cm × 24 cm,
158
A. Fan et al.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Fig. 5. Axial brain images. (a) GRE BC image (y B ). (b) GRE SC images (y S,1 – ˆ1 –b ˆ4 ). Estimated y S,4 ). (c) FLAIR SC images (y S,5 –y S,8 ). (d) Estimated bias fields (b ˆ GRE intrinsic images (f 1 ) using (e) Brey-Narayana and (f) proposed method with γ1 = 1000. Estimated FLAIR intrinsic images (fˆ 2 ) using (g) Brey-Narayana and (h) proposed method with γ2 = 1200. αk = 1000. Convergence in 103 sec.
resolution is 192 × 256, and slice thickness is 3 mm. We minimize a hybrid of (8) and (11) to obtain our results. All of the SCs are weak in the middle of the brain, so our final estimate of the FLAIR image in Fig. 5(h) is still noisy in the middle, even with the p reconstruction. This artifact is not present in our GRE estimate in Fig. 5(f) because the BC image ensures a minimum SNR level. Acknowledgements. The authors would like to thank W. Kyriakos and S. Thibodeau at BWH for their help in acquiring data. Research was supported in part by a NDSEG graduate fellowship, ONR grant N00014-00-10089, NSF Johns Hopkins ERC 8810274, and NIH grants P41 RR13218, P01 CA67165, R33 CA99015, and R01 AG19513.
References 1. L. Axel. Surface coil magnetic resonance imaging. J. of Comp. Asst. Tomography, 8:381–384, 1984. 2. L. Axel, J. Costantini, and J. Listerud. Intensity correction in surface-coil MR imaging. Am. J. Roentgenology, 148:418–420, 1987. 3. D. P. Bertsekas. Nonlinear Programming. Athena Scientific, 2nd edition, 1999. 4. M. Bhatia, W. C. Karl, and A. S. Willsky. A wavelet-based method for multiscale tomographic reconstruction. IEEE Trans. Med. Imag., 15(1):92–101, 1996.
A Unified Variational Approach to Denoising and Bias Correction
159
5. W. W. Brey and P. A. Narayana. Correction for intensity falloff in surface coil magnetic resonance imaging. Med. Phys., 15(2):241–245, 1988. 6. W. L. Briggs. A Multigrid Tutorial. Society for Industrial and Applied Mathematics, Philadelphia, 1987. 7. D. L. Collins, A. P. Zijdenbos, V. Kollokian, J. G. Sled, N. J. Kabani, C. J. Holmes, and A. C. Evans. Design and construction of a realistic digital brain phantom. IEEE Trans. Med. Imag., 17(3):463–468, June 1998. 8. B. M. Dawant, A. P. Zijdenbos, and R. A. Margolin. Correction of intensity variations in MR images for computer-aided tissue classification. IEEE Trans. Med. Imag., 12:770–781, 1993. 9. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc., 39:1–38, 1977. 10. D. Geman and G. Reynolds. Constrained restoration and the recovery of discontinuities. IEEE Trans. Patt. Anal. Mach. Intell., 14(3):367–383, 1992. 11. J. Haselgrove and M. Prammer. An algorithm for compensation of surface-coil images for sensitivity of the surface coil. Mag. Res. Imag., 4:469–472, 1986. 12. R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1985. 13. R. K.-S. Kwan, A. C. Evans, and G. B. Pike. MRI simulation-based evaluation of image-processing and classification methods. IEEE Trans. Med. Imag., 18(11):1085–1097, Nov. 1999. 14. S.-H. Lai and M. Fang. Intensity inhomogeneity correction for surface-coil MR images by image fusion. In Proceedings of the International Conference on Multisource-Multisensor Information Fusion, pages 880–887. CSREA Press, 1998. 15. B. Likar, M. A. Viergever, and F. Pernus. Retrospective correction of MR intensity inhomogeneity by information minimization. In MICCAI 2000, pages 375–384, 2000. 16. R. D. Nowak. Wavelet-based Rician noise removal for magnetic resonance imaging. IEEE Trans. Imag. Proc., 8(10):1408–1419, 1999. 17. K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger. SENSE: Sensitivity encoding for fast MRI. Mag. Res. Med., 42:952–962, 1999. 18. P. B. Roemer, W. A. Edelstein, C. E. Hayes, S. P. Souza, and O. M. Mueller. The NMR phased array. Mag. Res. Med., 16:192–225, 1990. 19. J. G. Sled, A. P. Zijdenbos, and A. C. Evans. A nonparametric method for automatic correction of intensity nonuniformity in MRI data. IEEE Trans. Med. Imag., 17:87–97, 1998. 20. C. R. Vogel and M. E. Oman. Iterative methods for total variation denoising. SIAM J. Sci. Comp., 17(1):227–238, 1996. 21. W. M. Wells, W. E. L. Grimson, R. Kikinis, and F. Jolesz. Adaptive segmentation of MRI data. IEEE Trans. Med. Imag., 15(4):429–442, 1996. 22. Y. Zhang, M. Brady, and S. Smith. Segmentation of brain MR images through a hidden Markov random field model and the Expectation-Maximization algorithm. IEEE Trans. Med. Imag., 20(1):45–57, 2001.
Object-Based Strategy for Morphometry of the Cerebral Cortex J.-F. Mangin1,2,4 , D. Rivi`ere1,4 , A. Cachia1,2,3,4 , D. Papadopoulos-Orfanos1,4 , D.L. Collins5 , A.C. Evans5 , and J. R´egis6 1
6
Service Hospitalier Fr´ed´eric Joliot, CEA, 91401 Orsay, France [email protected], http://anatomist.info 2 INSERM ERITM Psychiatrie et Imagerie, Orsay, France 3 D´epartement Traitement du Signal et des Images, CNRS URA 820, ENST, Paris 4 Institut F´ed´eratif de Recherche 49 (Imagerie Neurofonctionnelle), Paris 5 Montreal Neurological Institute, McGill University, Montreal Service de Neurochirurgie Fonctionnelle et Stereotaxique, CHU La Timone, Marseille Abstract. Most of the approaches dedicated to automatic morphometry rely on a point-by-point strategy based on warping each brain towards a reference coordinate system. In this paper, we describe an alternative object-based strategy dedicated to the cortex. This strategy relies on an artificial neuroanatomist performing automatic recognition of the main cortical sulci and parcellation of the cortical surface into gyral patches. A set of shape descriptors, which can be compared across subjects, is then attached to the sulcus and gyrus related objects segmented by this process. The framework is used to perform a study of 142 brains of the ICBM database. This study reveals some correlates of handedness on the size of the sulci located in motor areas, which seem to be beyond the scope of the standard voxel based morphometry.
1
Introduction
Advances in neuroimaging have led to an increasing recognition that certain neuroanatomical structures may be preferentially modified by particular cognitive skills or diseases. For cognitive studies, this point of view relies on the supposition that specialized or preferred behaviour is associated with a commensurately greater allocation of neural circuitry in corresponding brain centers. For neurodegenerative disorders, the differential patterns of atrophy is supposed to reflect the clinical phenomenology [3]. This recognition has mainly resulted from the recent design of automated morphometric methods, which have empowered large-scale population studies [38,4]. Therefore, brain morphometry is now one of the basic brain mapping tools at the same level as the various functional imaging modalities. For most of the approaches, the automatic analysis relies on warping each brain towards a reference coordinate system, which plays the same role as the latitude and longitude system for localization of points on the Earth’s surface [35,18,19,43,17,26]. The coordinate system is three dimensional for the comparison of the local densities of grey and white matter (voxel-based morphometry, VBM [4]), or two dimensional (spherical) for the comparison of cortical thickness [16,26]. Each new brain is endowed with one of these coordinate systems through spatial normalization, namely a deformation matching as far as possible the new brain macroscopic anatomy as revealed by C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 160–171, 2003. c Springer-Verlag Berlin Heidelberg 2003
Object-Based Strategy for Morphometry of the Cerebral Cortex
161
magnetic resonance imaging (MRI) with a template anatomy [13,19,17]. The simplest approaches rely on affine transformations only, while modern registration techniques can now provide complex warpings relying on a large number of degrees of freedom, that are supposed to improve the normalization. In the following, we will call “iconic spatial normalization” this kind of processing. The iconic spatial normalization paradigm, originally introduced to overcome the poor statistics of positron emission tomography (PET) data [18], has made a tremendous impact on brain mapping strategies [28]. The coordinate-based approach, indeed, is very versatile since any datasets can be compared simply on a voxel by voxel basis. A disturbing fact, however, is that a number of different normalization algorithms are used throughout the world, each one potentially leading to different normalization results [22,14]. Even SPM software proposes a lot of alternatives related to the size of the warping function basis or to the choice of the template [19]. This observation means that what is called spatial normalization is far to be clear simply because nobody really knows the gold standard of brain matching. Furthermore, nobody knows today to which extent matching two different brains with a continuous deformation makes sense from a neuroscience point of view. The part of the brain leading to the main difficulties is the cortex, because the large variability of the folding patterns prevents the warping from attempting a perfect gyral matching across subjects [29,6]. Therefore, it seems rather difficult to perform reliable coordinate-based morphometric studies without either spatially blurring the data [4] or involving hundreds of subjects [44,21]. A number of teams try to overcome current difficulties via more sophisticated iconic normalization procedures [37]. In our opinion, without a better understanding of the inter-individual variability of the cortical folding patterns, the risk is the drift toward pure morphing techniques without consistent architectural justification. Spatial normalization, indeed, should try to match as far as possible the architectural parcellations of the cortical sheet [7]. Unfortunately, while some major sulci are usually considered as good indicators of architectonic or functional transitions, few people postulate that this property can be extrapolated to all cortical folds [45,30]. Anyway, the approaches imposing some sulcus-based constraints in the warping procedures [36,12,10] seem more reasonable than blind morphing procedures only driven by image grey levels, even if some progress has to be made with regard to the automatic identification and the choice of the sulci to be matched. In this paper, we propose an alternative to the coordinate-based point of view for the morphometry of the cortex. This alternative relies on a pattern recognition system performing first automatic detection of the sulci [33], and then parcellation of the cortical surface into gyri [9]. The definition of various elementary objects related to the cortical folding patterns allows the computation of a wide set of shape-based features that can be compared across subjects. Hence, this approach can be viewed as an automated version of a rather intuitive Region of Interest (ROI) based strategy. It should be noted that this ROIbased strategy is data-driven. Therefore, the ROI actually fit with individual anatomy. In contrast, the ROI-based strategy which warps a segmentation of the template brain suffers from the weakness of iconic normalization with regard to sulco/gyral patterns. A first key point of the ROI-based strategy is that the combination of measurements gathering a subset of voxels increases the statistical power. The combination of measure-
162
J.-F. Mangin et al.
ments can simply rely on some averaging process, for instance through the computation of the mean thickness in a surface patch, but the ROI definition leads also to the emergence of new morphometric opportunities provided by various ROI-shape features. For instance, the depth of a given sulcus may give some clues about the development of the surrounding functional areas, because of the tensegrity principle: the idea that the folding reaches its final pattern via stabilization of the sum of tensions and compressions stemming from the different parts of the cortex (axone bundles, cortical mantle, etc...) [30, 42]. Hence, a second key point of the object-based strategy is the possibility to compare the various instances of the same anatomical entity without requiring a point-to-point warping, which may not exist. We do not claim that our approach provides miraculous solutions to the difficulties induced by the variability of the cortex folding patterns, but only a new window to compare the cortex shapes. The relevance of a sulcus-based parcellation system is supposed to stem from the links with the cortex architectony mentioned above. The first section of the paper describes the sulcus recognition system, and the second section focuses on the cortical parcellation into gyri. The last section describes a new meaningful neuroscience result obtained with our framework: the automatic comparison of the size of the sulci of 142 brains of the ICBM project reveals some correlate of handedness in the motor and premotor cortex.
2 An Artificial Neuroanatomist The computer vision system in charge of the recognition of the cerebral sulci relies on a bottom-up strategy. This strategy aims mainly at using a scale of representation dedicated to the shape of the cortex. The transition towards a higher level stems from the conversion of each raw MR image into a structural representation supposed to embed all the information required for the sulcus recognition. While general purpose computer vision approaches build such representations from generic features like edges or corners, our approach relies on the cortical shape building blocks, namely the elementary folds [27]. Other similar methods have been proposed to break up the cortex into component parts related to the folds [23,41,34,11,49,25,31]. A rapid sketch of our conversion process is described in Fig. 1. The underlying set of image processing tools and a dedicated visualization platform can be downloaded from http://anatomist.info. The framework has been applied sucessfully on more than 500 brains stemming from various scanners. After the conversion process, the patterns of the cortex are represented by a relational graph, namely a set of elementary folds linked together according to their topographical organization relatively to junctions and proximity on the cortical surface. Various attributes are attached to the nodes and to the links of this graph in order to keep a sketch of the fold shapes. These attributes are computed after affine spatial normalization towards the standard proportional system [13]. This normalization can either be applied to the MR scan before the image processing step, or remain virtual, namely it is applied only to coordinates in the native scan. Some of these attributes are: fold: size; maximal depth; center of mass; average normal; length, extremities and average direction of the intersection with the brain hull, etc...
Object-Based Strategy for Morphometry of the Cerebral Cortex
163
B
A
C E
F
D
G
SULCUS GRAPH
SULCUS EXAMPLE sulcus core
ρT
S brain ρ
T
buried transverse gyrus deep gyrus branch external branch
ρC
SS ρC ρT
SS ρ
ρT
B
SS ρT
ρT
SS SS
Fig. 1. Computation of a structural representation of the cortical folds from a raw T1-weighted MR image [27,33]: A. raw MR slice; B. Brain hemisphere segmentation; C. Right hemisphere cortex external surface; D. Right hemisphere cortex inner surface (interface between grey and white matters); E. Skeleton of the (cortex + cerebro spinal fluid) object; F. Segmentation of the skeleton using discrete topology and recognition of the main sulci; G. An example of the attributed subgraph representing a sulcus. Each node SS is a piece of the surfacic skeleton while Sbrain represents the brain hull. Three kinds of relations are used: topological junction ρT , neighbor geodesic to the brain hull ρC , split induced by a buried gyrus ρB . Semantic attributes like size, length, depth, etc... are added to nodes and relations for recognition purpose.
link: direction between the centers of mass of the linked folds; length and average direction of the junction, etc... At this stage the pattern recognition problem amounts to giving a name (or a label) from the sulcus nomenclature to each of these elementary folds. Our strategy for this purpose relies on supervised learning from a database of 26 manually labelled brains. Then, the system learns to generalize the knowledge embedded in this learning database
164
J.-F. Mangin et al.
Learning database
Generalization database
Fig. 2. The matching of the model of the cortical sulci with any new brain is performed according to a learning strategy. Left: 12 brains of the learning database with manual labelling of some sulci. This database is used to train a congregation of 500 multilayer perceptrons [33]. Each perceptron is in charge of a local anatomical feature like the shape of a sulcus, or the shape of a pair of neighboring sulci. Automatic sulcus recognition is performed via stochastic minimization of the sum of the expert outputs, relatively weighted by their reliability on a test database. Right: 50 brains not used for learning, which have been automatically labelled by our method and aligned with the proportional system for visualization purpose.
across large variations of localization, orientation, and shape of the sulci. These variations are large enough to prevent reliable recognition using only localization in the proportional system [24]. But the main computational difficulty is the structural variability of the sulci across individuals. A sulcus corresponding to one very long elementary fold in one given brain may be made up of several small elementary folds in a second brain. Furthermore, the junctions between sulci are also highly variable, which leads to difficulties similar to the ones related to the split of handwritten words into characters. The structural variability of the sulci across brains can also be seen as analogous to one of the difficult problems, with which is confronted human vision: the variability of the spatial relationships among elementary parts of an object across orientation changes. An approach to tackle this difficulty consists in extracting a view-invariant structural description of the object that is then matched to stored object descriptions for recognition purpose [5]. Unfortunately, standard anatomical knowledge does not include any brain-invariant generic model of the sulci, which could be used to overcome the ambiguities when trying to label the elementary folds. Most of the psychological and physiological data, however, support a concurrent view-based human vision model, for which multiple views of each object are stored in memory [32]. We have designed our artificial neuroanatomist on the multiple-brain-based analogous of this multiple-viewbased approach: the shape of each sulcus is learned from a set of examples. Therefore, the resulting system is trained to mimic the interpretation of our human neuroanatomist, rather than to provide a putative gold standard of the sulcus recognition. Another neu-
Object-Based Strategy for Morphometry of the Cerebral Cortex
165
roanatomist standing for a concurrent interpretation of the folding patterns could provide a new manual labelling of the database that would lead to a rival artificial system. The above discussion focuses on the notion of sulcus, which corresponds to a kind of character of the cortical fold alphabet inferred by the neuroanatomists of the last century. The emergence of these “characters” stemmed from the need to reduce the huge complexity of the cortex folding patterns to intermediate features, which variability could be tackled by the human brain. This scale of representation may have been automatically selected to maximize the information delivered by the building blocks corresponding to a few examples of each sulcus [40]. Thank to this choice, the neuroanatomists get the capacity to generalize broadly to new brains. A small number of sulci, in fact, have a shape stable enough to allow straighforward recognition. Usually, the other sulci are identified using a kind of vote collecting agreements about the presence of the surrounding sulci. This is a type of grouping process leading the neuroanatomist to recognize the patterns made up of a set of neighboring sulci. Furthermore, the neuroanatomist can only observe a subset of the sulci at the same time, which led us to design a Markovian system [20] relying only on local and contextual knowledge: the shapes of the individual sulci and the patterns made up by pairs of neighboring sulci. This choice relies on the assumption that a set of canonical local interactions between pairs of sulci is sufficient to model the patterns made up by more than two neighboring sulci. Each local anatomical knowledge is learned by a mutilayer perceptron, which is a hyper-specialized anatomical expert. Two families of such experts are attached respectively to the sulcus shapes and to the neighborhood patterns. Each sulcus expert of the first family has a field of view, which is learned from the database, and corresponds to a domain of the standard proportional system. This field of view is simply the bounding box of all the instances of the corresponding sulcus. During the labelling step corresponding to the recognition of the sulci of any new brain, the label corresponding to a given sulcus can only be given to the elementary folds included in the associated expert’s field of view. The set of pairs of sulci taken in charge by the second kind of perceptrons is also inferred automatically from the learning database. Each pair of sulci, whose instances in the database are sometimes linked, leads to the creation of a local expert dedicated to the resulting pattern. The contextual information driving the sulcus recognition stems from the sulcus neighborhood built by this second family of expert. The fields of view and this neighborhood endow the congregation of experts with a “corticotopic” organization, which may be related to the retinotopic or somatotopic organizations found in the cortex. During the recognition step, each expert is in charge of giving an evaluation for a small part of the whole labelling. This evaluation ranges from zero (good) to one (bad). Sulcus experts deal with a subgraph of folds defined by one label, while pair experts deal with a subgraph of folds defined by two labels. The expert’s evaluation is a measure of the likelihood of the shape made up by the folds of this subgraph, considering the a priori knowledge embedded into the learning database. In order to teach this knowledge to the perceptrons, each subgraph is compressed into a simple code made up of a fixed set of synthesized attributes. These attributes can be viewed as descriptors of the subgraph. Some are more syntactic, like the number of connected components; some other are semantic, like the total size or the maximal depth of the folds included in the subgraph.
166
J.-F. Mangin et al.
These synthesized semantic attributes are computed from the attributes attached to the elementary folds. Each perceptron is trained to give a good evaluation to the examples of the database, and a bad evaluation to random modifications of these examples. Each expert’s reliability is assessed from a second learning base in order to weight the output of the expert before using it as a potential of the global Markov field [33]. Finally, the automatic labelling of the folds of any new brain is driven by stochastic minimization of a global function made up of the weighted sum of the perceptron outputs. The system is still at the beginning of its education. It has been trained from 26 manually labelled brains, including 10 brains used as a test base preventing overlearning. A nomenclature of 58 sulcus names is used in each hemisphere. The automatic recognition results decrease from 85% of accordance with the manual labelling on the learning base, to 75% on a generalization base, which calls for increasing the size of the learning base. It should be noted, however, that these results do not mean 25% of errors. Because of the large inter individual variability of the folding patterns, indeed, no gold standard exists to evaluate the percentage of correct labelling. This accordance measure, moreover, is highly dependendent on the sulcus. For instance, the generalization leads only to 3.8% of disagreement for central sulcus, respectively 6% for lateral fissure, while the disagreement may increase largely for more variable sulci, which leads sometimes to question the manual identification. The training of the 500 multilayer perceptrons on this base of 26 brains lasts about 12 hours on a network of twenty recent Pentium processors. The stochastic minimization leading to the automatic labelling last one hour for one hemisphere with a 2MH processor and the default tuning of the temperature decreasing. While the results are close to the manual ones, a lot of questions remain open in the most variable cortical areas. Therefore, a future direction of work will consist in trying to improve the learning database manual labelling thank to a better understanding of the variability stemming from brain growth studies [10].
3
Parcellation of the Cortical Surface
Image analysis methods dedicated to the cortex almost always focus on cortical folds, because they can be defined simply using geometric properties. The usual neuroscience point of view about the cortical surface segregation, however, is gyrus based. A gyrus, indeed, is usually considered to be a module of the cortex endowed with dense axonal connections throughout local white matter ([42]). Unfortunately, cortical gyri are relatively difficult to define from a pure geometrical point of view. Most of them, however, are assumed to be delimited by two parallel sulci. Therefore, our cortical surface parcellation is based on the previous sulcus recognition. The main problem complicating the morphological definition of the gyri is the interruption of the delimiting sulci. The idea developed in our framework overcomes this difficulty using the Vorono¨ı diagram principle ([9]). Such a diagram is a parcellation of space from a set of seeds. Each parcel is the influence zone of one of the seeds, namely the domain of space closer to this seed than to any other seed. If a set of lines approximatively located at the level of the crowns of the gyri of interest can be provided as gyrus seeds, the whole gyral parcellation can be defined from a geodesic distance computed
Object-Based Strategy for Morphometry of the Cerebral Cortex
A]
B]
C]
167
D]
Fig. 3. The definition of a gyrus from two parallel sulci using the Vorono¨ı diagram principle. A] Two schematic parallel sulci. B] Definition of the Vorono¨ı diagram of the sulcal lines, i.e. parcellation of the domain in influence zones of the sulci. C] The boundary between the two influence zones provides the gyrus seed. D] The gyrus delimited by the two parallel sulci can be obtained as the influence zone of the previous boundary seed. The initial two sulcal lines must be removed from the domain to prevent the front propagation underlying the Vorono¨ı diagram construction to cross them (a gyrus should end at the bottom of the limiting sulci). The rest of the gyrus boundaries will be induced by a competition with the other gyri.
conditionally to the cortical surface. Such seed lines can be extracted from the boundaries of a first Vorono¨ı diagram computed using the projections of the sulcus bottoms on the cortical surface as seeds (see Fig. 3). In order to impose the sulcus bottoms as parts of the boundaries between the gyral influence zones, they are removed from the cortical surface during the computation of the second diagram to prevent the distances to be propagated across these lines. Hence the resulting diagram is inferred from an iterative dilation of the gyrus seeds that is stopped either at the level of the sulcus bottoms, or when two zones of influence get in touch with each other. A few results of parcellation are presented in Fig. 4. These results are qualitatively comparable to atlas descriptions, apart in occipital lobe. The parcellation method assumes that a reliable identification of the main sulci can be performed first for any subject, which is far to be the case with the current pattern recognition system. We bet, however, that the current state of this system is sufficient to obtain interesting morphometry results if a large database of brains is processed, which can now be done without any user interaction.
4
Correlates of Handedness
It is usually assumed that the loss of statistical power induced by the non perfect gyral matching of spatial normalization can be compensated by large population sizes. Two recent large-scale VBM studies with hundreds of subjects, however, have reported a surprising absence of results relative to the possible relationship between brain asymmetry and handedness [44,21], which may reveal some limits of the point-by-point paradigm. Since human handedness can be viewed as a model of proficient or preferred behaviors, several ROI-based morphometric studies have also addressed this issue for a few cortical structures. For instance, the central sulcus, which houses the primary motor cortex, was found to be deeper in the left hemisphere in right-handers, and vice versa in left-handers [47,2,1]. This result remains controversial as other studies did not confirm this interac-
168
J.-F. Mangin et al.
Fig. 4. A typical parcellation obtained for the left hemisphere of four different brains. Each color corresponds to a different gyrus. The boundaries between gyri appear in white. For the external face of the brain, the set of pairs of sulci selected for this experiment aims at defining the three horizontal frontal gyri and the polar frontal face, the three horizontal temporal gyri, The pre- and postcentral vertical gyri corresponding to motor (cyan) and somesthesic (green) areas, and the two parietal lobules. Some non satisfying attempts have been done to parcellate the occipital lobe. The right image provides the mixing of six brains in the standard proportional system.
Fig. 5. (left) Automatic recognition of the sulci of 142 subjects. (right) Correlates of handedness on an asymmetry index corresponding to 2∗(L−R) , where L and R denotes left and right size of L+R a sulcus. For each sulcus, the Mann-Whitney U Test was used to compare the asymmetry indices of left-handed and right-handed groups. The resulting p-values were mapped on the sulci of one left hemisphere. Several of the detected sulci belong to the cortex area involved in motor control, which was the initial guess.
tion [46] or found an inverse pattern [15]. Methodological differences and age effects may explain these inconsistencies [39]. To investigate whether the new observation window provided by our framework could answer the kind of issues addressed by these manual ROI-based studies, 142 unselected normal volunteers of the ICBM database were processed without any manual interaction. These subjects correspond to the VBM study described in [44]. On a short handedness questionnaire, 14 subjects were dominant for left-hand use on a number of tasks; the remaining 128 subjects preferred to use their right hand. The 142 T1-weighted
Object-Based Strategy for Morphometry of the Cerebral Cortex
169
brain volumes were stereotaxically transformed using nine parameters [4] to match the Montreal Neurological Institute 305 average template. A set of 58 cortical sulci were recognized in each hemisphere. The size of each sulcus was computed from its skeletonized representation. Then, left (L) and right (R) sizes were used to obtain a normalized asymmetry index ((LR)/(L+R)/2). For each sulcus, the Mann-Whitney U Test was used to compare the asymmetry indices of left-handed and right-handed groups. This test relies on rank order statistics, which are robust to potential outliers stemming from sulcus recognition errors. The resulting p-values were mapped on the sulci of one left hemisphere (see Fig. 5.right). Several significative differences were revealed by our analysis (p < 0.05, see Fig. 5). For each result, we report the p-value and the median asymmetry indices for right-handed (RH) and left-handed (LH) groups. Most of the highlighted sulci show an asymmetry pattern left-right flipped between both groups: parieto-occipital fissure (p=0.003,RH=-0.21,LH=0.06), inferior precentral sulcus (p=0.013,RH=0.09,LH=-0.34), intermediate precentral sulcus (p=0.019,RH=0.08,LH=0.24), anterior inferior temporal sulcus (p=0.024,RH=-0.22,LH=0.11), central sulcus (p=0.031,RH=0.04,LH=-0.05), while a few sulci present only an increased asymmetry from one group to the other: retrocentral branch of the lateral cerebral fissure (p=0.028,RH=0.26,LH=0.99), median frontal sulcus (p=0.034,RH=-0.32,LH=0), orbital sulcus (p=0.048,RH=0.03,LH=0.21). While it is too soon to discard the influence of various biases on these results, several of the detected sulci turn out to belong to the cortex area involved in motor control, which would have been the initial guess. Interestingly, the handedness correlates are lower in the primary motor structures (central sulcus) than in the structures responsible for planning and coordinating movements (precentral sulcus). Moreover, the pattern of asymmetry obtained for the central sulcus matches the results obtained by the majority of the manual studies [47,2,1].
5
Conclusion
A lot of other statistical issues can be addressed relative to handedness correlates. All the numerous shape descriptors used by the pattern recognition system, indeed, can directly be used for morphometry. Shape descriptors dedicated to the surface parcellation described in the previous section can also be used, the simplest one being the gyral patch areas. Considering that each of the 500 perceptrons involved in the sulcus recognition is feeded by about 25 values, the sulcus patterns description includes about 12500 numbers. Hence, some corrections will have to be developed to take into account the high risk of false positives implied by multiple testing, following the point of view developed for VBM using random field theory [48].
References 1. K.Amunts et al. Interhemispheric asymmetry of the human motor cortex related to handedness and gender. Neuropsychologia, 38:304–312, 2000. 2. K. Amunts et al. Asymmetry in the human motor cortex and handedness. Neuroimage, 4:216–222, 1996.
170
J.-F. Mangin et al.
3. J. Ashburner et al. Computer-assisted imaging to assess brain structure in healthy and diseased brains. The Lancet Neurology, 2, 2003. 4. J. Ashburner and K. J. Friston. Voxel-based morphometry–the methods. NeuroImage, 11:805– 821, 2000. 5. I. Biederman. Recognition by components: a theory of human image understanding. Psychological Review, 94:115–147, 1987. 6. M. Brett et al. The problem of functional localization in the human brain. Nat Rev Neurosci, 3(3):243–249, 2002. 7. K. Brodmann. Vergleichende Lokalisationslehre der Grosshirnrinde. Barth, Leipzig, 1909. 8. A. Cachia et al. A mean curvature based primal sketch to study the cortical folding process from antenatal to adult brain. In MICCAI, LNCS, 897–904, 2001. (to appear in IEEE TMI) 9. A. Cachia et al. Gyral parcellation of the cortical surface using geodesic Vorono¨ı diagrams. In MICCAI, LNCS-2488, 427–434, 2002. (to appear in Med. Image Anal.) 10. P. Cachier et al. Multisubject non-rigid registration of brain MRI using intensity and geometric features. In MICCAI, LNCS-2208, 734–742, 2001. 11. A. Caunce and C. J. Taylor. Using local geometry to build 3D sulcal models. In IPMI’99, LNCS 1613, 196–209, 1999. 12. D. L. Collins et al. Non-linear cerebral registration with sulcal constraints. In MICCAI’98, LNCS-1496, 974–984, 1998. 13. D. L. Collins et al. Automatic 3D intersubject registration of MR volumetric data in standardized talairach space. J Comput Assist Tomogr., 18(2):192–205, 1994. 14. F. Crivello et al. Comparison of spatial normalization procedures and their impact on functional maps. Human Brain Mapping, 16(4):228–250, 2002. 15. C. Davatzikos and R. N. Bryan. Morphometric analysis of cortical sulci using parametric ribbons: a study of the central sulcus. J Comput Assist Tomogr, 26(2):298–307, 2002. 16. B. Fischl and A. M. Dale. Measuring the thickness of the human cerebral cortex from magnetic resonance images. Proc Natl Acad Sci USA, 97(20):11050–5, 2000. 17. B. Fischl et al. High-resolution intersubject averaging and a coordinate system for the cortical surface. Hum Brain Mapp., 8(4):272–84, 1999. 18. P. Fox et al. A stereotactic method of anatomical localization for PET. J. Comput. Assist. Tomogr., 1985. 19. K. Friston et al. Spatial realignment and normalisation of images. Human Brain Mapping, 2:165–189, 1995. 20. S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions and the bayesian restoration of images. IEEE Pattern Analysis and Machine Intelligence, 6(6):721–741, 1984. 21. C. D. Good et al. A voxel-based morphometric study of ageing in 465 normal adult human brains. Neuroimage, 14(1):21–36, 2001. 22. P. Hellier et al. Retrospective evaluation of inter-subject brain registration. In MICCAI’01, Utrecht, LNCS-2208, Springer Verlag, pages 258–265, 2001. 23. G. Le Goualher et al. Modeling cortical sulci using active ribbons. Int. J. Pattern Recognit. Artific. Intell., 11(8):1295–1315, 1997. 24. G. Le Goualher et al. Automated extraction and variability analysis of sulcal neuroanatomy. IEEE Medical Imaging, 18(3):206–217, 1999. 25. G. Lohmann and D. Y. von Cramon. Automatic labelling of the human cortical surface using sulcal basins. Medical Image analysis, 4(3):179–188, 2000. 26. D. Mac Donald et al. Automated 3-D extraction of inner and outer surfaces of cerebral cortex from MRI. Neuroimage, 12(3):340–56, 2000. 27. J.-F. Mangin et al. From 3D magnetic resonance images to structural representations of the cortex topography using topology preserving deformations. Journal of Mathematical Imaging and Vision, 5(4):297–318, 1995.
Object-Based Strategy for Morphometry of the Cerebral Cortex
171
28. J. Mazziotta et al. A probabilistic atlas and reference system for the human brain: International consortium for brain mapping (ICBM). Philos Trans R Soc Lond B Biol Sci, 356(1412):1293– 322, 2001. 29. M. Ono et al. Atlas of the cerebral sulci. Thieme Verlag, 1990. 30. J. R´egis et al. Generic model for the localization of the cerebral cortex and preoperative multimodal integration in epilepsy surgery. Stereo. Funct. Neurosurgery, 65:72–80, 1995. 31. M. E. Rettmann et al. Automated sulcal segmentation using watersheds on the cortical surface. NeuroImage, 15(2):329–344, 2002. 32. M. Riesenhuber and T. Poggio. Neural mechanisms of object recognition. Current Opinion in Neurobiology, 12:162–168, 2002. 33. D. Rivi`ere et al. Automatic recognition of cortical sulci of the human brain using a congregation of neural networks. Med Image Anal, 6(2):77–92, 2002. 34. N. Royackkers et al. Detection and statistical analysis of human cortical sulci. NeuroImage, 10:625–641, 1999. 35. J. Talairach et al. Atlas d’Anatomie St´er´eotaxique du T´elenc´ephale. Masson, Paris, 1967. 36. P. Thompson and A. W. Toga. A surface-based technique for warping three-dimensional images of the brain. IEEE Medical Imaging, 15:402–417, 1996. 37. P. M. Thompson et al. Mathematical / computational challenges in creating deformable and probabilistic atlases of the human brain. Hum Brain Mapp., 9(2):81–92, 2000. 38. A. W. Toga and P. M. Thompson. New approaches in brain morphometry. Am J Geriatr Psychiatry, 10(1):13–23, 2002. 39. A. W. Toga and P. M. Thompson. Mapping brain asymmetry. Nature Neuroscience Reviews, 4(1):37–48, 2003. 40. S. Ullman et al. Visual features of intermediate complexity and their use in classification. Nature Neuroscience, 5(7):682–687, 2002. 41. M. Vaillant and C. Davatzikos. Finding parametric representations of the cortical sulci using active contour model. Medical Image Analysis, 1(4):295–315, 1997. 42. D. C. Van Essen. A tension-based theory of morphogenesis and compact wiring in the central nervous system. Nature, 385:313–318, 1997. 43. D. C. Van Essen et al. Functional and structural mapping of human cerebral cortex: solutions are in the surfaces. Proc Natl Acad Sci USA, 95(3):788–95, 1998. 44. K. E. Watkins et al. Structural asymmetries in the human brain: a voxel-based statistical analysis of 142 mri scans. Cereb Cortex, 11(9):868–877, 2001. 45. W. Welker. Cerebral Cortex, volume 8B, chapter :Why does cerebral cortex fissure and fold?, pages 3–135. Plenium Press, 1988. 46. L. E. White et al. Structure of the human sensorimotor system. ii: Lateral symmetry. Cerebral Cortex, 7:31–47, 1997. 47. L. E. White et al. Cerebral asymmetry and handedness. Nature, 368:197–198, 1994. 48. K. Worsley. The geometry of random images. Chance, 9(1):27–40, 1996. 49. X. Zeng et al. A new approach to 3D sulcal ribbon finding from MR images. In MICCAI’99, Cambridge, UK, LNCS-1679, Springer-Verlag, pages 148–157, 1999.
Genus Zero Surface Conformal Mapping and Its Application to Brain Surface Mapping Xianfeng Gu1 , Yalin Wang2 , Tony F. Chan2 , Paul M. Thompson3 , and Shing-Tung Yau4 1
Division of Engineering and Applied Science, Harvard University [email protected] 2 Mathematics Department, UCLA {ylwang,chan}@math.ucla.edu 3 Laboratory of Neuro Imaging and Brain Research Institute UCLA School of Medicine [email protected] 4 Department of Mathematics, Harvard University [email protected]
Abstract. It is well known that any genus zero surface can be mapped conformally onto the sphere and any local portion thereof onto a disk. However, it is not trivial to find a general method which finds a conformal mapping between two general genus zero surfaces. We propose a new variational method which can find a unique mapping between any two genus zero manifolds by minimizing the harmonic energy of the map. We demonstrate the feasibility of our algorithm by applying it to the cortical surface matching problem. We use a mesh structure to represent the brain surface. Further constraints are added to ensure that the conformal map is unique. Empirical tests on MRI data show that the mappings preserve angular relationships, are stable in MRIs acquired at different times, and are robust to differences in data triangulation, and resolution. Compared with other brain surface conformal mapping algorithms, our algorithm is more stable and has good extensibility.
1
Introduction
Recent developments in brain imaging have accelerated the collection and databasing of brain maps. Nonetheless, computational problems arise when integrating and comparing brain data. One way to analyze and compare brain data is to map them into a canonical space while retaining geometric information on the original structures as far as possible [1,2,3,4,5]. Fischl et al. [1] demonstrate that surface based brain mapping can offer advantages over volume based brain mapping, especially when localizing cortical deficits and functional activations. Thompson et al. [4,5] introduce a mathematical framework based on covariant partial differential equations, and pull-backs of mappings under harmonic flows, to help analyze signals localized on brain surfaces. C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 172–184, 2003. c Springer-Verlag Berlin Heidelberg 2003
Genus Zero Surface Conformal Mapping
1.1
173
Previous Work
Conformal surface parameterizations have been studied intensively. Most works in conformal parameterizations deal with surface patches homeomorphic to topological disks. For surfaces with arbitrary topologies, Gu and Yau [6] introduce a general method for global conformal parameterizations based on the structure of the cohomology group of holomorphic one-forms. They generalize the method for surfaces with boundaries in [7]. For genus zero surfaces, there are five basic approaches to achieve conformal parameterizations. 1. Harmonic energy minimization. Eck et al. [8] introduce the discrete harmonic map, which approximates the continuous harmonic map [9] by minimizing a metric dispersion criterion. Desbrun et al. [10,11] compute the discrete Dirichlet energy and apply conformal parameterization to interactive geometry remeshing. Pinkall and Polthier compute the discrete harmonic map and Hodge star operator for the purpose of creating a minimal surface [12]. Kanai et al. use a harmonic map for geometric metamorphosis in [13]. Gu and Yau in [6] introduce a non-linear optimization method to compute global conformal parameterizations for genus zero surfaces. The optimization is carried out in the tangent spaces of the sphere. 2. Cauchy-Riemann equation approximation. Levy et al. [14] compute a quasiconformal parameterization of topological disks by approximating the Cauchy-Riemann equation using the least squares method. They show rigorously that the quasi-conformal parameterization exists uniquely, and is invariant to similarity transformations, independent of resolution, and orientation preserving. 3. Laplacian operator linearization. Haker et al. [3] use a method to compute a global conformal mapping from a genus zero surface to a sphere by representing the Laplace-Beltrami operator as a linear system. 4. Circle packing. Circle packing is introduced in [2]. Classical analytic functions can be approximated using circle packing. But for general surfaces in R3 , the circle packing method considers only the connectivity but not the geometry, so it is not suitable for our parameterization purpose. Bakircioglu et al. use spherical harmonics to compute a flow on the sphere in [15] in order to match curves on the brain. Thompson and Toga use a similar approach in [16]. This flow field can be thought of as the variational minimizer of the integral over the sphere of Lu, with L some power of the Laplacian, and u the deformation. This is very similar to the spherical harmonic map used in this paper. 1.2
Basic Idea
It is well known that any genus zero surface can be mapped conformally onto the sphere and any local portion thereof onto a disk. This mapping, a conformal equivalence, is one-to-one, onto, and angle-preserving. Moreover, the elements
174
X. Gu et al.
of the first fundamental form remain unchanged, except for a scaling factor (the so-called Conformal Factor). For this reason, conformal mappings are often described as being similarities in the small. Since the cortical surface of the brain is a genus zero surface, conformal mapping offers a convenient method to retain local geometric information, when mapping data between surfaces. Indeed, several groups have created flattened representations or visualizations of the cerebral cortex or cerebellum [2,3] using conformal mapping techniques. However, these approaches are either not strictly angle preserving [2], or there may be areas with large geometric distortions [3]. In this paper, we propose a new genus zero surface conformal mapping algorithm [6] and demonstrate its use in computing conformal mappings between brain surfaces. Our algorithm depends only on the surface geometry and is invariant to changes in image resolution and the specifics of data triangulation. Our experimental results show that our algorithm has advantageous properties for cortical surface matching. Suppose K is a simplicial complex, and f : |K| → R3 , which embeds |K| in 3 R ; then (K, f ) is called a mesh. Given two genus zero meshes M1 , M2 , there are many conformal mappings between them. Our algorithm for computing conformal mappings is based on the fact that for genus zero surfaces S1 , S2 , f : S1 → S2 is conformal if and only if f is harmonic. All conformal mappings between S1 , S2 form a group, the so-called M¨ obius group. Our method is as follows: we first find a homeomorphism h between M1 and M2 , then deform h such that h minimizes the harmonic energy. To ensure the convergence of the algorithm, constraints are added; this also ensures that there is a unique conformal map. This paper is organized as follows. In Section 2, we give the definitions of a piecewise linear function space, inner product and piecewise Laplacian. In Section 3, we describe the steepest descent algorithm which is used to minimize the string energy. In Section 4, we detail our conformal spherical mapping algorithms. Experimental results on conformal mapping for brain surfaces are reported in Section 6. In Section 7, we compare our algorithm with other conformal mapping approaches used in neuroimaging. We conclude the paper in Section 8.
2
Piecewise Linear Function Space, Inner Product, and Laplacian
For the diffeomorphisms between genus zero surfaces, if the map minimizes the harmonic energy, then it is conformal. Based on this fact, the algorithm is designed as a steepest descendent method. Definition 1. All piecewise linear functions defined on K form a linear space, denoted by C P L (K). Definition 2. Suppose a set of string constants ku,v are assigned for each edge {u, v}, the inner product on C P L is defined as the quadratic form 1 ku,v (f (u) − f (v))(g(u) − g(v)) (1) < f, g >= 2 {u,v}∈K
Genus Zero Surface Conformal Mapping
175
The energy is defined as the norm on C P L . Definition 3. Suppose f ∈ C P L , the string energy is defined as: ku,v ||f (u) − f (v)||2 E(f ) =< f, f >=
(2)
{u,v}∈K
By changing the string constants ku,v in the energy formula, we can define different string energies. Definition 4. If string constants ku,v ≡ 1, the string energy is known as the Tuette energy. Definition 5. Suppose edge {u, v} has two adjacent faces Tα , Tβ , Tα = {v0 , v1 , v2 }, define the parameters 1 (v1 − v3 ) · (v2 − v3 ) 2 |(v1 − v3 ) × (v2 − v3 )| 1 (v2 − v1 ) · (v3 − v1 ) = 2 |(v2 − v1 ) × (v3 − v1 )| 1 (v3 − v2 ) · (v1 − v2 ) = 2 |(v3 − v2 ) × (v1 − v2 )|
aα v1 ,v2 =
(3)
aα v2 ,v3
(4)
aα v3 ,v1
(5) (6)
β Tβ is defined similarly. If ku,v = aα u,v + au,v , the string energy obtained is called the harmonic energy.
The string energy is always a quadratic form. By carefully choosing the string coefficients, we make sure the quadratic form is positive definite. This will guarantee the convergence of the steepest descent method. Definition 6. The piecewise Laplacian is the linear operator ∆P L : C P L → C P L on the space of piecewise linear functions on K, defined by the formula ku,v (f (v) − f (u)) (7) ∆P L (f ) = {u,v}∈K
If f minimizes the string energy, then f satisfies the condition ∆P L (f ) = 0. Suppose M1 , M2 are two meshes and the map f : M1 → M2 is a map between them, f can be treated as a map from M1 to R3 also. Definition 7. For a map f : M1 → R3 , f = (f0 , f1 , f2 ), fi ∈ C P L , i = 0, 1, 2, we define the energy as the norm of f : E(f ) =
2 i=0
The Laplacian is defined in a similar way,
E(fi )
(8)
176
X. Gu et al.
Definition 8. For a map f : M1 → R3 , the piecewise Laplacian of f is ∆P L f = (∆P L f0 , ∆P L f1 , ∆P L f2 )
(9)
A map f : M1 → M2 is harmonic, if and only if ∆P L f only has a normal component, and its tangential component is zero. ∆P L (f ) = (∆P L f )⊥
3
(10)
Steepest Descent Algorithm
Suppose we would like to compute a mapping f : M1 → M2 such that f minimizes a string energy E(f ). This can be solved easily by the steepest descent algorithm: df (t) = −∆f (t) (11) dt f (M1 ) is constrained to be on M2 , so −∆f is a section of M2 ’s tangent bundle. Specifically, suppose f : M1 → M2 , and denote the image of each vertex v ∈ K1 as f (v). The normal on M2 at f (v) is n(f (v)). Define the normal component as Definition 9. The normal component (∆f (v))⊥ =< ∆f (v), n(f (v)) > n(f (v)),
(12)
where <, > is the inner product in R3 . Definition 10. The absolute derivative is defined as Df (v) = ∆f (v) − (∆f (v))⊥
(13)
Then equation (14) is δf = −Df × δt.
4
Conformal Spherical Mapping
Suppose M2 is S 2 , then a conformal mapping f : M1 → S 2 can be constructed by using the steepest descent method. The major difficulty is that the solution is not unique but forms a M¨ obius group. Definition 11. Mapping f : C → C is a M¨ obius transformation if and only if f (z) =
az + b , a, b, c, d ∈ C, ad − bc =0 cz + d
(14)
All M¨ obius transformations form the M¨ obius transformation group. In order to determine a unique solution we can add different constraints. In practice we use the following two constraints: the zero mass-center constraint and a landmark constraint.
Genus Zero Surface Conformal Mapping
177
Definition 12. Mapping f : M1 → M2 satisfies the zero mass-center condition if and only if M2
f dσM1 = 0,
(15)
where σM1 is the area element on M1 . All conformal maps from M1 to S 2 satisfying the zero mass-center constraint are unique up to a Euclidean rotation group (which is 3 dimensional). We use the Gauss map as the initial condition. Definition 13. A Gauss map N : M1 → S 2 is defined as N (v) = n(v), v ∈ M1 ,
(16)
n(v) is the normal at v. Algorithm 1 Spherical Tuette Mapping Input (mesh M ,step length δt, energy difference threshold δE), output(t : M → S 2 ) where t minimizes the Tuette energy. 1. 2. 3. 4. 5.
Compute Gauss map N : M → S 2 . Let t = N , compute Tuette energy E0 . For each vertex v ∈ M , compute Absolute derivative Dt. Update t(v) by δt(v) = −Dt(v)δt. Compute Tuette energy E. If E − E0 < δE, return t. Otherwise, assign E to E0 and repeat steps 2 through to 5.
Because the Tuette energy has a unique minimum, the algorithm converges rapidly and is stable. We use it as the initial condition for the conformal mapping. Algorithm 2 Spherical Conformal Mapping Input (mesh M ,step length δt, energy difference threshold δE), output(h : M → S 2 ). Here h minimizes the harmonic energy and satisfies the zero mass-center constraint. 1. 2. 3. 4.
Compute Tuette embedding t. Let h = t, compute Tuette energy E0 . For each vertex v ∈ M , compute the absolute derivative Dh. Update h(v) by δh(v) = −Dh(v)δt. Compute M¨ obius transformation ϕ0 : S 2 → S 2 , such that ϕ ◦ hdσM1 , ϕ ∈ M obius(CP 1 ) Γ (ϕ) =
(17)
S2
ϕ0 = min ||Γ (ϕ)||2 ϕ
(18)
where σM1 is the area element on M1 . Γ (ϕ) is the mass center, ϕ minimizes the norm of mass center.
178
X. Gu et al.
5. compute the harmonic energy E. 6. If E − E0 < δE, return t. Otherwise, assign E to E0 and repeat step 2 through to step 6. Step 4 is non-linear and expensive to compute. In practice we use the following procedure to replace it: 1. Compute the mass center c = S 2 hdσM1 ; 2. For all v ∈ M , h(v) = h(v) − c; h(v) . 3. For all v ∈ M , h(v) = ||h(v)|| This approximation method is good enough for our purpose. By choosing the step length carefully, the energy can be decreased monotonically at each iteration.
5
Optimize the Conformal Parameterization by Landmarks
In order to compare two brain surfaces, it is desirable to adjust the conformal parameterization and match the geometric features on the brains as well as possible. We define an energy to measure the quality of the parameterization. Suppose two brain surfaces S1 , S2 are given, conformal parameterizations are denoted as f1 : S 2 → S1 and f2 : S 2 → S2 , the matching energy is defined as ||f1 (u, v) − f2 (u, v)||2 dudv (19) E(f1 , f2 ) = S2
We can composite a M¨obius transformation τ with f2 , such that E(f1 , f2 ◦ τ ) = min E(f1 , f2 ◦ ζ), ζ∈Ω
(20)
where Ω is the group of M¨ obius transformations. We use landmarks to obtain the optimal M¨ obius transformation. Landmarks are commonly used in brain mapping. We manually label the landmarks on the brain as a set of sulcal curves [4], as shown in Figure 5. First we conformally map two brains to the sphere, then we pursue a best M¨ obius transformation to minimize the Euclidean distance between the corresponding landmarks on the spheres. Suppose the landmarks are represented as discrete point sets, and denoted as {pi ∈ S1 } and {qi ∈ S2 }, pi matches qi , i = 1, 2, . . . , n. The landmark mismatch functional for u ∈ Ω is defined as n ||pi − u(qi )||2 , u ∈ Ω, pi , qi ∈ S 2 (21) E(u) = i=1
In general, the above variational problem is a nonlinear one. In order to simplify it, we convert it to a least squares problem. First we project the sphere to the complex plane, then the M¨ obius transformation is represented as a complex linear rational formula, Equation 14. We add another constraint for u, so that u maps infinity to infinity. That means the north poles of the spheres are mapped
Genus Zero Surface Conformal Mapping
179
to each other. Then u can be represented as a linear form az + b. Then the functional of u can be simplified as E(u) =
n
g(zi )|azi + b − τi |2
(22)
i=1
where zi is the stereo-projection of pi , τi is the projection of qi , g is the conformal factor from the plane to the sphere, it can be simplified as g(z) =
4 . 1 + z z¯
(23)
So the problem is a least squares problem.
6
Experimental Results
The 3D brain meshes are reconstructed from 3D 256x256x124 T1 weighted SPGR (spoiled gradient) MRI images, by using an active surface algorithm that deforms a triangulated mesh onto the brain surface [5]. Figure 1(a) and (c) show the same brain scanned at different times [4]. Because of the inaccuracy introduced by scanner noise in the input data, as well as slight biological changes over time, the geometric information is not exactly the same. Figure 1(a) and (c) reveal minor differences.
(a)
(b)
(c)
(d)
Fig. 1. Reconstructed brain meshes and their spherical harmonic mappings. (a) and (c) are the reconstructed surfaces for the same brain scanned at different times. Due to scanner noise and inaccuracy in the reconstruction algorithm, there are visible geometric differences. (b) and (d) are the spherical conformal mappings of (a) and (c) respectively; the normal information is preserved. By the shading information, the correspondence is illustrated.
The conformal mapping results are shown in Figure 1(b) and (d). From this example, we can see that although the brain meshes are slightly different, the mapping results look quite similar. The major features are mapped to the same position on the sphere. This suggests that the computed conformal mappings continuously depend on the geometry, and can match the major features consistently and reproducibly. In other words, conformal mapping may be a good candidate for a canonical parameterization in brain mapping.
180
X. Gu et al.
(a) Texture mapping of the sphere (b) Texture mapping of the brain Fig. 2. Conformal texture mapping. The conformality is visualized by texture mapping of a checkerboard image. The sphere is mapped to the plane by stereographic projection, then the planar coordinates are used as the texture coordinates. This texture parameter is assigned to the brain surface through the conformal mapping between the sphere and the brain surface. All the right angles on the texture are preserved on the brain surface.
(a) Surface with 20,000 faces (b) Surface with 50,000 faces Fig. 3. Conformal mappings of surfaces with different resolutions. The original brain surface has 50,000 faces, and is conformally mapped to a sphere, as shown in (a). Then the brain surface is simplified to 20,000 faces, and its spherical conformal mapping is shown in (b).
Angle Distribution 8000
7000
6000
Frequency
5000
4000
3000
2000
1000
0
0
20
40
60
80
100 Angles
120
140
160
180
200
(a) Intersection angles (b) Angle distribution Fig. 4. Conformality measurement. The curves of iso-polar angle and iso-azimuthal angle are mapped to the brain, and the intersection angles are measured on the brain. The histogram is illustrated.
Genus Zero Surface Conformal Mapping
(a)
(b)
(c)
181
(d)
Fig. 5. M¨ obius transformation to minimize the deviations between landmarks. The blue curves are the landmarks. The correspondence between curves has been preassigned. The desired M¨ obius transformation is obtained to minimize the matching error on the sphere.
Fig. 6. Spherical conformal mapping of genus zero surfaces. Extruding parts (such as fingers and toes) are mapped to denser regions on the sphere.
Figure 2 shows the mapping is conformal by texture mapping a checker board to both the brain surface mesh and a spherical mesh. Each black or white square in the texture is mapped to sphere by stereographic projection, and pulled back to the brain. Note that the right angles are preserved both on the sphere and the brain. Conformal mappings are stable and depend continuously on the input geometry but not on the triangulations, and are insensitive to the resolutions. Figure 3 shows the same surface with different resolutions, and their conformal mappings. The mesh simplification is performed using a standard method. The refined model has 50k faces, coarse one has 20k faces. The conformal mappings map the major features to the same positions on the spheres. In order to measure the conformality, we map the iso-polar angle curves and iso-azimuthal angle curves from the sphere to the brain by the inverse conformal mapping, and measure the intersection angles on the brain. The distribution of the angles of a subject(A) are illustrated in Figure 4. The angles are concentrated about the right angle. Figure 5 shows the landmarks, and the result of the optimization by a M¨ obius transformation. We also computed the matching energy, following Equation 19. We did our testing among three subjects. Their information is shown in Table 1. We took subject A as the target brain. For each new subject model, we found a M¨ obius transformation that minimized the landmark mismatch energy on the
182
X. Gu et al.
maximum intersection subsets of it and A. As shown in Table 1, the matching energies were reduced after the M¨obius transformation. The method described in this work is very general. We tested the algorithm on other genus zero surfaces, including the hand and foot surface. The results are illustrated in Figure 6.
7
Comparison with Other Work
Several other studies of conformal mappings between brain surfaces are reported in [2,3]. In [2], Hurdal et al. used the circle packing theorem and the ring lemma to establish a theorem: there is a unique circle packing in the plane (up to certain transformations) which is quasi-conformal (i.e. angular distortion is bounded) for a simply-connected triangulated surface. They demonstrated their experimental results for the surface of the cerebellum. This method only considers the topology without considering the brain’s geometric structure. Given two different mesh structures of the same brain, one can predict that their methods may generate two different mapping results. Compared with their work, our method really preserves angles and establishes a good mapping between brains and a canonical space. Haker et al. [3] built a finite element approximation of the conformal mapping method for brain surface parameterization. They selected a point as the north pole and conformally mapped the cortical surface to the complex plane. In the resulting mapping, the local shape is preserved and distances and areas are only changed by a scaling factor. Since stereo projection is involved, there is significant distortion around the north pole areas, which brings instability to this approach. Compared with their work, our method is more accurate, with no regions of large area distortion. It is also more stable and can be readily extended to compute maps between two general manifolds. Finally, we note that Memoli et al. [17] mentioned they were developing implicit methods to compute harmonic maps between general source and target manifolds. They used level sets to represent the brain surfaces. Due to the extensive folding of the human brain surface, these mappings have to be designed very carefully.
Table 1. Matching energy for three subjects. Subject A was used as the target brain. For subjects B and C, we found M¨ obius transformations that minimized the landmark mismatch functions, respectively. Subject A B C
Vertex # 65,538 65,538 65,538
Face # Before After 131,072 131,072 604.134 506.665 131,072 414.803 365.325
Genus Zero Surface Conformal Mapping
8
183
Conclusion and Future Work
In this paper, we propose a general method which finds a unique conformal mapping between genus zero manifolds. Specifically, we demonstrate its feasibility for brain surface conformal mapping research. Our method only depends on the surface geometry and not on the mesh structure (i.e. gridding) and resolution. Our algorithm is very fast and stable in reaching a solution. There are numerous applications of these mapping algorithms, such as providing a canonical space for automated feature identification, brain to brain registration, brain structure segmentation, brain surface denoising, and convenient surface visualization, among others. We are trying to generalize this approach to compute conformal mappings between nonzero genus surfaces.
References 1. B. Fischl, M.I. Sereno, R.B.H. Tootell, and A.M. Dale. High-resolution intersubject averaging and a coordinate system for the cortical surface. In Human Brain Mapping, volume 8, pages 272–284, 1999. 2. M.K. Hurdal, K. Stephenson, P.L. Bowers, D.W.L. Sumners, and D.A. Rottenberg. Coordinate systems for conformal cerebellar flat maps. In NeuroImage, volume 11, page S467, 2000. 3. S. Haker, S. Angenent, A. Tannenbaum, R. Kikinis, G. Sapiro, and M. Halle. Conformal surface parameterization for texture mapping. IEEE Transactions on Visualization and Computer Graphics, 6(2):181–189, April-June 2000. 4. P.M. Thompson, M.S. Mega, C. Vidal, J.L. Rapoport, and A.W. Toga. Detecting disease-specific patterns of brain structure using cortical pattern matching and a population-based probabilistic brain atlas. In Proc. 17th International Conference on Information Processing in Medical Imaging (IPMI2001), pages 488–501, Davis, CA, USA, June 18–22 2001. 5. P.M. Thompson and A.W. Toga. A framework for computational anatomy. In Computing and Visualization in Science, volume 5, pages 1–12, 2002. 6. X. Gu and S. Yau. Computing conformal structures of surfaces. Communications in Information and Systems, 2(2):121–146, December 2002. 7. X. Gu and S. Yau. Global conformal surface parameterization. preprint, December 2002. 8. M. Eck, T. DeRose, T. Duchamp, H. Hoppe, M. Lounsbery, and W. Stuetzle. Multiresolution analysis of arbitrary meshes. In Computer Graphics (Proceedings of SIGGRAPH 95), Auguest 1995. 9. R. Schoen and S.T. Yau. Lectures on Harmonic Maps. International Press, Harvard University, Cambridge MA, 1997. 10. P. Alliez, M. Meyer, and M. Desbrun. Interactive geometry remeshing. In Computer Graphics (Proceedings of SIGGRAPH 02), pages 347–354, 2002. 11. M. Desbrun, M. Meyer, and P. Alliez. Intrinsic parametrizations of surface meshes. In Proceedings of Eurographics, 2002. 12. U. Pinkall and K. Polthier. Computing discrete minimal surfaces and their conjugates. In Experim. Math., volume 2(1), pages 15–36, 1993. 13. T. Kanai, H. Suzuki, and F. Kimura. Three-dimensional geometric metamorphosis based on harmonic maps. The Visual Computer, 14(4):166–176, 1998.
184
X. Gu et al.
14. B. Levy, S. Petitjean, N. Ray, and J. Maillot. Least squares conformal maps for automatic texture atlas generation. In Computer Graphics (Proceedings of SIGGRAPH 02). Addison Wesley, 2002. 15. M. Bakircioglu, U. Grenander, N. Khaneja, and M.I. Miller. Curve matching on brain surfaces using frenet distances. Human Brain Mapping, 6:329–333, 1998. 16. P. Thompson and A. Toga. A surface-based technique for warping 3-dimensional images of the brain. IEEE Transactions on Medical Imaging, 15(4):1–16, 1996. 17. F. Memoli, G. Sapiro, and S. Osher. Solving variational problems and partial equations mapping into general target manifolds. Technical Report 02–04, CAM Report, January 2002.
Coupled Multi-shape Model and Mutual Information for Medical Image Segmentation A. Tsai1 , W. Wells2,3 , C. Tempany3 , E. Grimson2 , and A. Willsky1 1
3
Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 2 Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA Department of Radiology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
Abstract. This paper presents extensions which improve the performance of the shape-based deformable active contour model presented earlier in [9]. In contrast to that work, the segmentation framework that we present in this paper allows multiple shapes to be segmented simultaneously in a seamless fashion. To achieve this, multiple signed distance functions are employed as the implicit representations of the multiple shape classes within the image. A parametric model for this new representation is derived by applying principal component analysis to the collection of these multiple signed distance functions. By deriving a parametric model in this manner, we obtain a coupling between the multiple shapes within the image and hence effectively capture the co-variations among the different shapes. The parameters of the multi-shape model are then calculated to minimize a single mutual information-based cost functional for image segmentation. The use of a single cost criterion further enhances the coupling between the multiple shapes as the deformation of any given shape depends, at all times, upon every other shape, regardless of their proximity. We demonstrate the utility of this algorithm to the segmentation of the prostate gland, the rectum, and the internal obturator muscles for MR-guided prostate brachytherapy.
1
Introduction
A knowledge-based approach to medical image segmentation is proposed in this paper. The strength of such an approach is the incorporation of prior information into the segmentation algorithm to reduce the complexity of the segmentation process. To motivate this approach, we show, in Figure 1, an axial brain MR image depicting 3 subcortical brain structures. The dark ventricle is easy to distinguish from the rest of the brain structures. The boundaries of the other two subcortical brain structures, however, are more difficult to localize. Despite this apparent difficulty, the human vision system does not have trouble locating all 3 subcortical structures. First, based on prior knowledge of the spatial relationship
This work was supported by ONR grant N00014-00-1-0089, AFOSR grant F4962098-1-0349, NSF ERC grant under Johns Hopkins Agreement 8810274, NIH grants P41RR13218, P01CA67167, R33CA99015, R21CA89449, and R01 AG19513-01.
C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 185–197, 2003. c Springer-Verlag Berlin Heidelberg 2003
186
A. Tsai et al.
Ventricle Caudate nucleus
Caudate nucleus
Lenticular nucleus
Lenticular nucleus
Ventricle
(a) (b) Fig. 1. Motivational example. (a) MR image showing an axial cross section of the brain. (b) Hand segmentation of the 3 subcortical brain structures within the image.
of the 3 subcortial brain structures, our vision system uses the easily identifiable ventricle as a spatial reference point to localize the other two subcortical brain structures. Next, based on prior knowledge of the variability of the individual shapes and their mutual shape variability, our vision system proceeds to identify the boundaries of the lenticular and the caudate nuclei. This two level usage of prior information, first to localize spatially and then to extract shape, is a powerful concept, and one that our vision system exploits. In this paper, we show the development of a segmentation algorithm that is sophisticated enough to mimic this particular characteristic of the human vision system. Our work is related to many shape-based active contour models. Cootes, et al., [2] used linear combinations of eigenvectors that reflect shape variations to parametrize the segmenting curve. Staib and Duncan [7] used elliptic Fourier decomposition of various landmark points to parametrize their segmenting curve. Wang and Staib [10] proposed a shape parametrization scheme based on applying principal component analysis to covariance matrices that capture the variations of the shapes’ control points. Leventon, et al., [4] derived a parametric shape model based on applying principal component analysis to a collection of signed distance functions to restrict the flow of the geodesic active contour. Paragios and Rousson [5] used a prior level set shape model to restrict the flow of an evolving level set. Constraints are imposed to force the evolving level set to remain as a distance function. Our work also shares many common aspects with a number of coupled active contour models. Zeng, et al., [12] introduced a coupled-surfaces propagation method where each evolving surface is driven by two forces: 1) an image-derived information force, and 2) a coupling force to maintain the 2 evolving surfaces a certain distance apart. Chan and Vese [1] employed n level set functions to represent 2n segments in the image with the n level set functions coupled to one another through an energy functional. Yezzi, et al., [11] derived a set of coupled curve evolution equations from a single global cost functional to evolve multiple contours simultaneously toward the region boundaries. The rest of the paper is organized as follows. Section 2 describes a variational approach to align all the example training shapes. Section 3 describes a new approach to represent multiple shapes. Section 4 illustrates how our multi-shape
Coupled Multi-shape Model and Mutual Information
187
model can be incorporated into a mutual information-based active contour model for image segmentation. In Section 5, we show the application of our technique to a medical application. We conclude in Section 6 with a summary of the paper.
2
Multi-shape Alignment
Alignment removes shape differences within a particular shape class database that might be due to differences in pose. Since multiple shape classes are involved in our framework, we seek an alignment procedure that is able to (1) jointly align the different shapes within a particular shape class, and (2) perform the alignment for all the shape classes simultaneously. Let m represent the number of known shape classes in an image that we would like to model, and let the training set consist of n such example images. One way to represent the multiple shapes in an image is to encode the shapes in a m-dimensional binary vectorvalued image. Specifically, let the training set consists of n vector-valued images {I1 , ..., In } where Ii = (Ii1 , ..., Iim ) for i = 1, ..., n. Each Iik for i = 1, ..., n and k = 1, ..., m is a binary image with values of 1 inside and 0 outside the shape. The basic idea behind our approach is to calculate the set of pose parameters p1 , ..., pn used to transform the n binary vector-valued images to jointly align them. We focus on using rigid body and scaling transformations to align these binary vector-valued images. In 2D, the pose parameter p = [a b h θ]T with a, b, h, and θ corresponding to x-, y-translation, scale, and rotation, respectively. The transformed image of Iik for i = 1, ..., n and k = 1, ..., m, based on the pose parameter pi , is denoted by I˜ik , and is defined as I˜ik T(pi )[x y 1]T = Iik (x, y) where T(pi ) is a transformation matrix that transforms the coordinate of one image into the coordinate of another. Gradient descent is employed to minimize the following energy functional to align a set of n m-dimensional binary images:
2 k k ˜ ˜ I − I dA n m n i j Ω Ealign =
2 i=1 j = 1 k=1 I˜ik + I˜jk dA j = i
Ω
where Ω denotes the image domain. The gradient of Ealign , taken with respect to pi , for any i, is given by
k k ˜ ˜k ˜ 2 I − I m n i j ∇pi Ii dA Ω ∇pi Ealign =
2 k=1 I˜ik + I˜jk dA j=1 j = i
Ω
2 −
Ω
2
I˜ik − I˜jk dA I˜ik + I˜jk Ω 2
2 k k ˜ ˜ Ii + Ij dA Ω
∇pi I˜ik dA
188
A. Tsai et al.
where ∇pi I˜ik is the gradient of the transformed image I˜ik taken with respect to pi . This gradient descent approach is performed until convergence for alignment. To illustrate this alignment technique, we employ a training set consisting of 30 synthetic examples as shown in Figure 2. Each of these examples depicts a different axial cross section of the 3 subcortical brain structures: the ventricle, the lenticular nucleus, and the caudate nucleus. After employing our multi-shape alignment scheme, we show in Figure 3 the post-aligned training database.
3
Implicit Parametric Multi-shape Model
Following the lead of [4] and [9], the signed distance function Ψ is employed as the representation of a particular shape class. Let m be the number of shape classes that we would like to represent simultaneously. The boundaries of each of the m shapes are embedded as the zero level sets of m distinct signed distance functions {Ψ 1 , ..., Ψ m } with negative distances assigned to the inside and positive distances assigned to the outside of the shape. Suppose there are n aligned examples of these images, each with m shape classes, in the database. For each of the n example images, m signed distance functions are generated, giving rise to nm signed distance functions. Specifically, Ψik denotes the signed distance function associated with the kth shape class in the ith example image of the ¯m }, one ¯1 , ..., Φ training database. We compute m mean level set functions {Φ for each shape class, by averaging n signed distance functions from that shape class. To extract the shape variabilities of each shape class, the mean level set function from each shape class is subtracted from the n signed distance functions belonging to that shape class. This gives rise to nm mean-offset functions, with Ψ˜ik denoting the mean-offset function associated with the kth shape class in the ith example image. To capture the shape variabilities, we form n column vectors {ψ˜1 , ..., ψ˜n }. Each column vector ψ˜i , of size mN , is made up of m meanoffset functions {Ψ˜i1 , ..., Ψ˜im } stacked on top of one another with each meanoffset function consisting of N samples (using identical sample locations for each mean-offset function). The most natural sampling strategy for the meanoffset functions is to utilize the N1 × N2 rectangular grid to generate N = N1 N2 lexicographically ordered samples (where the columns of the rectangular grid are sequentially stacked on top of one another to form one large column). Define a tall rectangular shape-variability matrix S = ψ˜1 ψ˜2 ... ψ˜n . An eigenvalue decomposition is employed to factor n1 SS T as U ΣU T where U is a tall rectangular mN × n matrix whose columns represent the n principal variational modes or eigenshapes of the m shape classes, and Σ is an n × n diagonal matrix whose diagonal elements, denoted by σ12 , ..., σn2 , represent the corresponding non-zero eigenvalues. Each non-zero eigenvalue reflects the variance of shape variability associated with that eigenvalue’s corresponding eigenshape. The mN elements of the ith column of U , denoted by Ui , are arranged back into m rectangular structures of dimension N1 × N2 (by undoing the earlier stacking and lexicographical concatenation of the grid columns). This “unwrapping” process yields the ith principal modes or eigenshapes for all the m shape classes denoted
Coupled Multi-shape Model and Mutual Information
189
Fig. 2. Database of pre-aligned subcortical brain structures. The 3 brain structures depicted in different colors are: ventricles (black), lenticular nucleus (white), and caudate nucleus (gray).
Fig. 3. Database of post-aligned subcortical brain structures.
190
A. Tsai et al. −0.75σ
−0.50σ
−0.25σ
0.00σ
+0.25σ
+0.50σ
+0.75σ
Mode 1:
Mode 2:
Mode 3:
Mode 4:
Mode 5:
Mode 6:
Fig. 4. Illustration of the shape variabilities in the subcortical brain structures based on our implicit multi-shape parametric model.
by {Φ1i , ..., Φm i }. In the end, this approach generates a maximum of n different eigenshapes {Φk1 , Φk2 , ..., Φkn } for shape classes k = 1, ..., m. We now introduce m new level set functions based on choosing q ≤ n modes: q q ¯1 + ¯m + Φ1 [w] = Φ wi Φ1i , ··· , Φm [w] = Φ wi Φm (1) i i=1
i=1
where w = {w1 , w2 , ..., wq } are the weights for the q eigenshapes in each of the m level set functions with the variances of these weights {σ12 , σ22 , ..., σq2 } given by the eigenvalues calculated earlier. We propose to use the level set functions {Φ1 , ..., Φm } as our implicit representation of the m shape classes. Specifically, the zero level set of Φk describes the boundaries of the kth shape class with that shape’s variability directly linked to the variability of its level set function. To illustrate this new parametric multi-shape representation, we show, in Figure 4, the shape variations of the subcortical brain structures based on varying the model’s first six eigenshapes by different amounts. Each row of the figure demonstrates the effect of a particular principal mode in altering the shapes of the subcortical brain structures. Notice that by varying the first principal mode, the shape of the ventricle changes topology going from 3 regions to one. This is an advantage of using the Eulerian framework for shape representation as it can handle topological changes in a seamless fashion. Further, because multiple level sets are employed to represent multiple curves in this framework, triple points
Coupled Multi-shape Model and Mutual Information
191
can be captured automatically. Triple points formed by the ventricle and the caudate nucleus can be seen throughout Figure 4. There are no restrictions placed on the range of values that w can take. Thus, it is possible that the different shape classes may overlap one another, especially with extreme values of w. In fact, this phenomenon can be seen starting to develop in a number of image frames shown in Figure 4. Later in the paper, we show the method by which we avoid overlapping of the different shapes during the segmentation process. At this point, our implicit representation of multiple shape classes cannot accommodate shape variabilities due to pose differences. To have this flexibility, pose parameter p is added as another parameter to the level set functions of equation (1). With this addition, the implicit description of the segmenting curve C is given by the combined zero level set of the following m level set functions: q ¯1 (˜ x, y˜) + wi Φ1i (˜ x, y˜) Φ1 [w, p](x, y) = Φ i=1
.. .
(2)
¯m (˜ Φm [w, p](x, y) = Φ x, y˜) +
q
wi Φm x, y˜) . i (˜
i=1
4
Mutual Information-Based Segmentation Model
Mutual information-based models view the segmentation problem as a region labeling process with the objective of the process to maximize the mutual information between the image pixel intensities and the segmentation labels. A generalization of the mutual information-based energy functional EM I , proposed by Kim, et al. [3] for image segmentation, to handle m + 1 regions is given by m ˆ ˆ ˆ ˆ L) = −h(I) EM I = −I(I; = Ri ) + PRc h(I|L = Rc ) + PRi h(I|L i=1
where L is the segmentation label determined by the segmenting curve C, Iˆ is the estimate of the mutual information I between the test image I and the segmentation label L, PRi denotes the prior probability of pixel values in the ˆ is the estimate of the differential entropy h(·), and h(·|·) ˆ ith region, h(·) is the estimate of the conditional differential entropy h(·|·). The estimate of the difˆ ferential entropy h(I) is removed from EM I because it is independent of the segmentation label L, and hence the segmenting curve C. Each estimate of the ˆ m + 1 conditional differential entropy terms (i.e. h(I|L = Ri ) for i = 1, ..., m and c ˆ h(I|L = R )) quantifies the randomness of I conditioned on the label L. Let pRi (I) and pRc (I) denote the probability density function (pdf) of I in regions Ri and Rc , respectively. By using weak law of large numbers to approximate entropy, the estimates of the conditional differential entropy terms ˆ ˆ = Rc ), are given by [3]: h(I|L = Ri ) for i = 1, ..., m and h(I|L
192
A. Tsai et al.
1 ˆ h(I|L = Ri ) = − ARi 1 ˆ h(I|L = Rc ) = − ARc
Ω
Ω
log (ˆ pRi (I)) H(−Φi ) dA log (ˆ pRc (I))
m
H(Φj ) dA
j=1
where pˆRi (I) and pˆRc (I) are estimates of pRi (I) and pRc (I), respectively, and H(·) denotes the Heaviside function. We apply the nonparametric Parzen window method [6] to estimate these probability densities from the training data. The gradients of EM I , taken with respect to w and p, are given by: m ˆ ˆ ∇w EM I = = Ri ) + PRc ∇w h(I|L = Rc ) PRi ∇w h(I|L i=1
∇p EM I =
m
ˆ ˆ = Rc ) PRi ∇p h(I|L = Ri ) + PRc ∇p h(I|L
i=1
where the lth component of gradients ∇w h and ∇p h is given by: 1 i ˆ ∇wl h(I|L = Ri ) = ∇wl Φ log(ˆ pRi (I)) ds ARi C i 1 i ˆ ∇pl Φ log(ˆ pRi (I)) ds ∇pl h(I|L = Ri ) = ARi Ci m −1 ˆ ∇wl h(I|L = Rc ) = ∇wl Φj log(ˆ pRc (I)) ds ARc j=1 C j m −1 ˆ = Rc ) = ∇pl Φj log(ˆ pRc (I)) ds ∇pl h(I|L ARc Cj j=1
i
i
j
with ∇wl Φ , ∇pl Φ , ∇wl Φ , and ∇pl Φj denoting the lth components of the gradients of Φ, for the ith or the jth shape class, taken with respect to w and p. With these gradients, coordinate descent approach can be employed to update w and p in an alternating fashion. To avoid overlap of the different shape classes, we perform a check at each of the gradient steps used to update w. If updating w causes an overlap, this particular update of w is not performed and the algorithm skips forward to the next coordinate descent step which involves updating p. We employ texture images to illustrate this image segmentation approach. Figure 4 shows a particular realization of the 4 different textures that we employ to represent the 3 subcortical brain regions and the background: vertical wood grain (background), fine fabric (ventricle), diagonal knit (lenticular nucleus), and rocky terrain (caudate nucleus). Using the subcortical brain images shown in Figure 3 as templates and 30 different realizations of each of the 4 textures shown in Figure 4, we construct 30 different subcortical brain texture images. These newly generated subcortical brain texture images serve as our training database. Using nonparametric Parzen windowing method [6], we obtain pdf estimates of the pixel intensities in each of the subcortical brain regions and the background. These density estimates are shown in Figure 6. Figure 7 demonstrates the performance of the model described in equation (3). In Figure 7(a), a
Coupled Multi-shape Model and Mutual Information
193
Fig. 5. The 4 textures: vertical wood grain, fine fabric, diagonal knit, and rocky terrain. Lenticular nucleus (Green)
Background
Caudate nucleus (Blue) 4
3
3
3
3
2 1 0 −1
2
1
2 1
1
0 pixel intensity
probability
4
probability
4
probability
probability
Ventricle (Red) 4
0 −1
0 pixel intensity
0 −1
1
2 1
0 pixel intensity
1
0 −1
0 pixel intensity
1
Fig. 6. Parzen density estimate of the pixel intensities in the various regions.
(a)
(b)
(c)
(d)
(e)
Fig. 7. Performance of the mutual information-based model. (a) subcortical brain template image, (b) subcortical brain texture image based on the template image shown in (a), (c) test image I obtained after adding Gaussian noise to image shown in (b), (d) starting location of the segmenting curve, (e) final location of the segmenting curve.
synthetic image of subcortical brain structures is displayed. This image is similar to the subcortical brain images shown in the training database of Figure 2 but it is not part of that database. Using this image as the template, we generate the subcortical brain texture image shown in Figure 7(b). Figure 7(c) shows this subcortical brain texture image contaminated by additive Gaussian noise. We employ the pose-aligned database shown in Figure 3 to derive an implicit parametric multi-shape model for the subcortical brain structures in the form of ¯i [w, p] equation (2). In this example, we choose q = 15. The zero level sets of Φ for i = 1, 2, 3 are employed as the starting curve. The initializing location of the curve is shown in Figure 7(d). Figure 7(e) shows the final position of the curve.
194
A. Tsai et al.
Fig. 8. The 3D models all 8 patients pelvic structures after alignment. Rectum (Green)
Prostate (Red)
2
2
0.5
1
probability
1
4
probability
probability
probability
1.2
2.5
2 1.5
0
Background
Muscle (Yellow)
6
2.5
1.5 1
0.6
0.5 0
0.5 pixel intensity
1
0
0
0.5 pixel intensity
1
0
0.8
0
0.5 pixel intensity
1
0.4
0
0.5 pixel intensity
1
Fig. 9. Parzen density estimate of the pixel intensities in the various regions.
5
Application to Medical Imagery
Our strategy to segment the prostate gland from a pelvic MR volumetric data set for prostate brachytherapy is to use easily identifiable structures within the pelvic MR data set to help localize the prostate gland. The most prominent structure within the pelvic MR data set is the dark-colored rectum. The prostate gland is flanked on either side by the internal obturator muscles which are also easy to find. The prostate gland, the rectum, and the internal obturator muscles form the 3 shape classes in our parametric multi-shape segmentation algorithm. We employ 8 hand segmented 3D models of the prostate, the rectum, and the internal obturator muscles as our training set. The 3D version of the alignment
Coupled Multi-shape Model and Mutual Information
195
Fig. 10. Segmentation of a new patient’s pelvic MR image using the mutual information-based model. The segmentations of the prostate (red), the rectum (green), and the internal obturator muscles (yellow) are shown.
procedure is used to (1) jointly align the 8 example shapes from each shape class, and (2) simultaneously perform this task on all 3 shape classes. Figure 8 displays all 8 hand segmented 3D models of the 3 pelvic structures after alignment. The shape classes are color coded as follows: the rectum (green), the prostate gland (red), and the internal obturator muscles (yellow). Next, we employed the 3D version of our shape modeling approach to obtain a 3D multi-shape parametric model of the 3 pelvic structures. We then employ the mutual information-based segmentation model described in Section 4 as our segmentation model. In order to implement this model, however, the probability density function of the pixel intensities within the different regions need to be estimated. We apply the nonparametric Parzen window method to the 8 pelvic MR volumetric data sets to estimate these probability densities within the 3 pelvic structures and the
196
A. Tsai et al.
Fig. 11. The 3D representation of the above computer segmentation is shown on the left. The 3D representation of the hand segmentation is shown on the right.
background region.1 The Parzen density estimate of the pixel intensities in the 3 pelvic structures and the background region are shown in Figure 9. Figure 10 shows twenty consecutive axial slices of the pelvic MR volumetric data set of a new patient. The segmentation results of this data set, based on our algorithm, are shown in this figure as well. The boundaries of the rectum, the prostate gland, and the internal obturator muscles, as determined by our algorithm, are shown in green, red, and yellow contours, respectively. For comparison, Figure 11 shows two 3D representations of this new patient’s rectum, prostate gland, and internal obturator muscles. One representation is based on the segmentation results of our algorithm and the other is the results based on a hand segmentation. This example demonstrates that, even with a very limited data set, our algorithm performed very well in segmenting the anatomical structures of interest. By having a larger data set for training, our algorithm should be able to perform even better.
6
Conclusion
We presented an unified analytical formulation that extends the work of [9] to multiple shapes. In particular, we described a new multi-shape modeling approach that (1) can capture important co-variations shared among the different shape classes, (2) allows triple points to be captured automatically, (3) does not require point correspondences during the training phase of the algorithm, (4) can handle topological changes of the shapes in a seamless fashion, and (5) can be extended from a 2D to a 3D framework in a straightforward manner. We then showed the utility of this parametric multi-shape model by incorporating it 1
The 8 pelvic MR volumetric data sets used here are the same 8 volumetric data sets from which we derived the 3D models shown in Figure 8.
Coupled Multi-shape Model and Mutual Information
197
within a mutual information-based framework for medical image segmentation. Of note, performance validations of this parametric multi-shape model, based on simulation studies, are shown in [8].
References
1. T. Chan and L. Vese, “An efficient variational multiphase motion for the MumfordShah segmentation model,” Asilomar Conf. Sig., Sys., and Comp., 1:490–494, 2000. 2. T. Cootes, C. Taylor, D. Cooper, and J. Graham, “Active shape models–their training and application,” Comp. Vision and Image Understanding, 61:38–59, 1995. 3. J. Kim, J. Fisher, A. Yezzi, M. Cetin, and A. Willsky, ”Nonparametric methods for image segmentation using information theory,” IEEE Int’l. Conf. Image Processing, 3:797–800, 2002. 4. M. Leventon, E. Grimson, and O. Faugeras, “Statistical shape influence in geodesic active contours,” IEEE Conf. on Comp. Vision and Patt. Recog., 1:316–323, 2000. 5. N. Paragios and M. Rousson, “Shape priors for level set representation,” European Conf. on Comp. Vision, 2002. 6. E. Parzen, ”On estimation of a probability density function and model,” Annals of Mathematical Statistics, 33:1065–1076, 1962. 7. L. Staib and J. Duncan, “Boundary finding with parametrically deformable contour models,” IEEE Trans. Patt. Analysis and Mach. Intell., 14:1061–1075, 1992. 8. A. Tsai, “Coupled multi-shape model for medical image segmentation: A general framework utilizing region statistics, edge information, and information- theoretic criteria,” M.D. dissertation, Harvard Medical School, 2003. 9. A. Tsai and A. Yezzi and W. Wells and C. Tempany and D. Tucker and A. Fan and E. Grimson and A. Willsky, “Model-based curve evolution technique for image segmentation,” IEEE Conf. Comp. Vision and Patt. Recog., 1:463–468. 10. Y. Wang and L. Staib, “Boundary finding with correspondence using statistical shape models,” IEEE Conf. Comp. Vision and Patt. Recog., 338–345, 1998. 11. A. Yezzi, A. Tsai, and A. Willsky, “A statistical approach to snakes for bimodal and trimodal imagery,” Int’l Conf. on Comp. Vision, 2:898–903, 1999. 12. X. Zeng, L. Staib, R. Schultz, and J. Duncan, “Segmentation and measurement of the cortex from 3-d MR images using coupled-surfaces propagation,” IEEE Trans. Medical Imaging, 18:927–937, 1999.
Neighbor-Constrained Segmentation with 3D Deformable Models Jing Yang1 , Lawrence H. Staib1,2 , and James S. Duncan1,2 Departments of Electrical Engineering1 and Diagnostic Radiology2 , Yale University, P.O. Box 208042, New Haven CT 06520-8042, USA, {j.yang,lawrence.staib,james.duncan}@yale.edu
Abstract. A novel method for the segmentation of multiple objects from 3D medical images using inter-object constraints is presented. Our method is motivated by the observation that neighboring structures have consistent locations and shapes that provide configurations and context that aid in segmentation. We define a Maximum A Posteriori(MAP) estimation framework using the constraining information provided by neighboring objects to segment several objects simultaneously. We introduce a representation for the joint density function of the neighbor objects, and define joint probability distributions over the variations of the neighboring positions and shapes of a set of training images. By estimating the MAP shapes of the objects, we formulate the model in terms of level set functions, and compute the associated Euler-Lagrange equations. The contours evolve both according to the neighbor prior information and the image gray level information. We feel that this method is useful in situations where there is limited inter-object information as opposed to robust global atlases. Results and validation from various experiments on synthetic data and medical imagery in 2D and 3D are demonstrated.
1
Introduction
Segmentation and quantitative analysis of structures in an image has tremendous applications in medical imaging. In order to fully realize the value of medical imaging in both clinical and research settings, information about anatomical structures must be extracted and quantified with accuracy, efficiency, and reproducibility. Snakes or Active Contour Models(ACM)(Kass et al. (1987)) [1] have been widely used for segmenting non-rigid objects in a wide range of applications. ACMs are energy minimizing parametric contours with smoothness constraints deformed according to the image data. Unlike level set implementations[2], the direct implementation of this energy model is not capable of handling topological changes of the evolving contour without explicit discrete pixel manipulations. Usually, ACMs can detect only objects with edges defined by the gradient. Recently, methods using level set methods and new energy terms have been reported to increase the capture range of deformable models. For example, Chan and Vese [3] have proposed an active contour model that can detect objects whose boundaries are not necessarily defined by gray level gradients. C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 198–209, 2003. c Springer-Verlag Berlin Heidelberg 2003
Neighbor-Constrained Segmentation with 3D Deformable Models
199
In addition to smoothness model, the incorporation of more specific prior information into deformable models has received a large amount of attention. Cootes et al. [4] find corresponding points across a set of training images and construct a statistical model of shape variation from the point positions. Staib and Duncan [5] incorporate global shape information into the segmentation process by using an elliptic Fourier decomposition of the boundary and placing a Gaussian prior on the Fourier coefficients. Zeng et al. [6] develop a coupled surfaces algorithm to segment the cortex by using a thickness prior constraint. Leventon et al. [7] extend Caselles’ [8] geodesic active contours by incorporating shape information into the evolution process. In many cases, objects to be detected have one or more neighboring structures whose location and shape provide information about the local geometry that can aid in the delineation. The relative positions or shapes among these neighbors can be modeled based on statistical information from a training set. Though applicable in many domains, these models are particularly useful for medical applications. In anatomy, neighboring structures provide a strong constraint on the relative position and shape of a structure. Without a prior model to constrain the segmentation, algorithms often fail due to the difficult challenges of poor image contrast, noise, and missing or diffuse boundaries. Segmentation can be made easier if suitable neighbor prior models are available. Our model is based on a MAP framework using the neighbor prior constraint. We introduce a representation for the joint density function of the neighbor objects and define the corresponding probability distributions. Formulating the segmentation as a MAP estimation of the shapes of the objects and modeling in terms of level set functions, we compute the associated Euler-Lagrange equations. The contours evolve both according to the neighbor prior information and the image gray level information. The neighboring objects can be automatically detected simultaneously.
2 2.1
Description of the Model MAP Framework with Neighbor Prior
A probabilistic formulation is a powerful approach to deformable models. Deformable models can be fit to the image data by finding the model shape parameters that maximize the posterior probability. Consider an image I that has M shapes of interest; a MAP framework can be used to realize image segmentation combining neighbor prior information and image information: Sˆi = arg maxSi p(S1 , S2 , ..., Si , ..., SM /I) ∝ arg maxSi p(I/S1 , S2 , ..., SM )p(S1 , S2 , ..., SM ), i = 1, 2, ..., M
(1)
where S1 , S2 , ..., SM are the evolving surfaces of all the shapes of interest. p(I/S1 , S2 , ..., SM ) is the probability of producing an image I given S1 , S2 , ... , SM . In 3D, assuming gray level homogeneity within each object, we use the following imaging model[3]:
200
J. Yang, L.H. Staib, and J.S. Duncan
M
i=1 {
2 exp[−(I(p, q, r) − c1i )2 /(2σ1i )] 2 2 (p,q,r)outside(Si ),inside(Ωi ) exp[−(I(p, q, r) − c2i ) /(2σ2i )]} (2) where c1i and σ1i are the average and variance of I inside Si , c2i and σ2i are the average and variance of I outside Si but also inside a certain domain Ωi that contains Si . p(S1 , S2 , ..., SM ) is the joint density function of all the M objects. It contains the neighbor prior information such as the relative position and shape among the objects. By the chain rule, we have:
p(I/S1 , S2 , ..., SM ) =
(p,q,r)inside(Si )
p(S1 , S2 , ..., SM ) = p(SM /SM −1 , SM −2 , ..., S1 )p(SM −1 /SM −2 , SM −3 , ..., S1 ) ...p(S3 /S2 , S1 )p(S2 /S1 )p(S1 ) (3) 2.2
Neighbor Priors
Let us define a binary matrix RM ×M where each element rij describes the independence of Si and Sj . rij has value zero when Si and Sj are independent and has value one otherwise. Obviously, the more ones there are in R, the more neighbor prior information is incorporated in the MAP segmentation model. When: 1 1 ... 1 1 1 ... 1 R= (4) ... 1 1 ... 1 all of the M objects are related to each other. The shape prior, as well as neighbor prior of all the neighbors, are included. In this case, equation (3) cannot be simplified and finding the joint density function of all the M objects is complicated. We incorporate the most neighbor prior information(full order prior) we can use but with the corresponding loss of computational efficiency. If all the M objects are independent to each other, i.e., 1 0 ... 0 0 1 ... 0 R= (5) ... 0 0 ... 1 then equation (3) can be simplified to: p(S1 , S2 , ..., SM ) = p(SM )p(SM −1 )...p(S2 )p(S1 )
(6)
No neighboring information is included here since all the objects are independent to each other. The only prior information contained in the MAP model for each object is the object’s self shape prior p(Si ), which we designate the first order prior. In this case, equation (3) can be most simplified; we can achieve good computational efficiency but with no neighbor prior information. Each object in
Neighbor-Constrained Segmentation with 3D Deformable Models
201
the image can be segmented independently according to its shape prior and image gray level information. This formulation corresponds to previous work[7][9]. In order to achieve a better segmentation result by using neighboring information, as well as good computational efficiency, we can consider second order prior information, i.e. the neighboring information from only one neighbor and the first order prior, i.e. the self shape information. Let us consider a simple case where each object is related to object 1 independently; the corresponding R is: 1 1 ... 1 1 1 ... 0 R= (7) ... 1 0 ... 1 The joint density function can be simplified to: p(S1 , S2 , ..., SM ) = p(SM /S1 )p(SM −1 /S1 )...p(S2 /S1 )p(S1 ) = p(∆M,1 )p(∆M −1,1 )...p(∆2,1 )p(S1 )
(8)
where ∆i,1 = Si − S1 is the difference between shape i and shape 1 (to be defined in the next section). The process of defining the joint density function p(S1 , S2 , ..., SM ) is simplified to building only the shape prior, p(S1 ), and the local neighbor prior p(∆i,1 ), i = 2, 3, ..., M . In our MAP model, we consider this case for the rest of the paper. 2.3
Neighbor Prior Model
To build a model for the neighbor prior and shape prior, we choose level sets as the representation of the shapes, and then define the joint probability density function in equation (8). Consider a training set of n aligned images, with M objects or structures in each image. Each shape in the training set is embedded as the zero level set of a higher dimensional level set Ψ . For object 1, the training set consists of a set of level set functions {Ψ1,1 , Ψ2,1 , ..., Ψn,1 }. We can use the difference between the two level sets, Ψi −Ψ1 , as the representation of the neighbor difference ∆i,1 , i = 2, 3, ..., M . Thus, the corresponding training set is {Ψ1,i − Ψ1,1 , Ψ2,i − Ψ2,1 , ..., Ψn,i − Ψn,1 }, i = 2, 3, ..., M . Our goal is to build the shape model and neighbor difference model over these distributions of the level set functions and level sets differences. The mean and variance of shape 1 can be computed n using Principal Component Analysis(PCA)[4]. The mean shape, Ψ¯1 = n1 l=1 Ψl,1 , is subtracted from each Ψl,1 to create the deviation from the mean. Each such deviation is placed as a column vector in a N d × n dimensional matrix Q where d is the number of spatial dimensions and N d is the number of samples of each level set function. Using Singular Value Decomposition(SVD), Q = U ΣV T . U is a matrix whose column vectors represent the set of orthogonal modes of shape variation and Σ is a diagonal matrix of corresponding singular values. An estimate of the shape
202
J. Yang, L.H. Staib, and J.S. Duncan
Ψ1 can be represented by k principal components and a k dimensional vector of coefficients(where k < n), α1 [7]: Ψ˜1 = Uk α1 + Ψ¯1
(9)
Under the assumption of a Gaussian distribution of shape represented by α1 , we can compute the probability of a certain shape: 1 1 p(α1 ) = exp[− α1T Σk−1 α1 ] k 2 (2π) |Σk |
(10)
Similarly, an estimate of the neighbor difference ∆i,1 can be represented from the mean neighbor difference ∆¯i,1 and k principal components Pik and a k dimensional vector of coefficients, βi,1 : ∆˜i,1 = Pik βi,1 + ∆¯i,1
(11)
The neighbor difference ∆i,1 can also be assumed to be Gaussian distributed over βi,1 : 1 T −1 1 exp[− βi,1 p(βi,1 ) = Λi,1k βi,1 ] (12) k 2 (2π) |Λi,1k |
Fig. 1. Training set:outlines of 4 shapes in 12 3D MR brain images.
Figure 1 shows a training set of four sub-cortical structures:left and right amygdalas and hippocampuses, where we assume the left amygdala is related to all of the other three structures, independently. By using PCA, we can build the shape model of the left amygdala and the neighbor difference models of the other three structures. Figure 2 shows the three primary modes of variance of the left amygdala. Figure 3 shows the three primary modes of variance of the neighbor difference between the left hippocampus and the left amygdala.
Neighbor-Constrained Segmentation with 3D Deformable Models
Fig. 2. The three primary modes of variance of the left amygdala.
203
Fig. 3. The three primary modes of variance of the left hippocampus relative to the left amygdala.
In our active contour model, we also add some regularizing terms[10]: a gen
M −µi S ds i , and a eral boundary smoothness prior, pB (S1 , S2 , ..., SM ) = i=1 e M −νi Ac i prior for the size of the region, pA (S1 , S2 , ..., SM ) = i=1 e , where Ai is the size of the region of shape i, c is a constant and µi and νi are scalar factors. Here we assume the boundary smoothness and the region size of all the objects are independent. Thus, the prior joint probability p(S1 , S2 , ..., SM ) in equation (8) can be approximated by a product of the following probabilities: p(S1 , S2 , ..., SM ) = [
M
p(βi,1 )]·p(α1 )·pB (S1 , S2 , ..., SM )·pA (S1 , S2 , ..., SM ) (13)
i=2
Since: Sˆi = arg maxSi p(S1 , S2 , ..., Si , ..., SM /I) = arg minSi [− ln p(S1 , S2 , ..., Si , ..., SM /I)], i = 1, 2, ..., M
(14)
combining equation (1), (2), and (13), we introduce the energy functional E defined by E = − ln p(S1 , S2 , ..., Si , ..., SM /I) M ∝ i=1 {λ1i · (p,q,r)inside(Si ) |I(p, q, r) − c1i |2 dpdqdr +λ2i · (p,q,r)outside(Si ),inside(Ωi ) |I(p, q, r) − c2i |2 dpdqdr}
M M M 1 T −1 T Λ−1 + i=1 µi Si ds + i=1 νi Aci + i=2 12 βi,1 i,1k βi,1 + 2 α1 Σk α1
(15)
The MAP estimation of the shapes in equation (1), Sˆi (i = 1, 2, ..., M ), is also the minimizer of the above energy functional E. This minimization problem can be formulated and solved using the level set method and we can realize the segmentation of multiple objects simultaneously.
204
2.4
J. Yang, L.H. Staib, and J.S. Duncan
Level Set Formulation of the Model
In the level set method, Si is the zero level set of a higher dimensional level set ψi corresponding to the i th object being segmented, i.e., Si = {(x, y, z)|ψi (x, y, z) = 0}. The evolution of surface Si is given by the zero-level surface at time t of the function ψi (t, x, y, z). We define ψi to be positive outside Si and negative inside Si . Each of the M objects being segmented in the image has its own Si and ψi . For the level set formulation of our model, we replace Si with ψi in the energy functional in equation (15) using regularized versions of the Heaviside function H and the Dirac function δ, denoted by Hε and δε [3](described below):
δε (ψi (x, y, z))|∇ψi (x, y, z)|dxdydz E (c1i , c2i , ψi |i = 1, 2, ...M ) = µi Ω
+ νi (1 − Hε (ψi (x, y, z)))dxdydz
Ω + λ1i |I(x, y, z) − c1i |2 (1 − Hε (ψi (x, y, z)))dxdydz
Ω |I(x, y, z) − c2i |2 Hε (ψi (x, y, z))dxdydz + λ2i Ωi
+
M i=2
+
1 T ¯ [G(ψi − ψ1 ) − ∆¯i,1 ]T Pik Λ−1 i,1k Pik [G(ψi − ψ1 ) − ∆i,1 ] 2
1 [G(ψ1 − ψ¯1 )]T Uk Σk−1 UkT [G(ψ1 − ψ¯1 )] 2
(16)
where Ω denotes the image domain. G(·) is an operator to generate the vector representation(as in equation 9) of a matrix by column scanning. g(·) is the inverse operator of G(·). To compute the associated Euler-Lagrange equation for each unknown level set function ψi , we keep c1i and c2i fixed, and minimize E with respect to ψi (i = 1, 2, ...M ) respectively. Parameterizing the descent direction by artificial time t ≥ 0, the evolution equation in ψi (t, x, y, z) is: ∂ψi ∇ψi = δε (ψi )[µi · div[ ] + νi + λ1i |I − c1i |2 − λ2i |I − c2i |2 ] ∂t |∇ψi | T −H(i − 1.5) · g{Pik Λ−1 Pik [G(ψi − ψ1 ) − ∆¯i,1 )} i,1k
−[1 − H(i − 1.5)] · g{Uk Σk−1 UkT [G(ψ1 − ψ¯1 )]} 2.5
(17)
Evolving the Surface
We approximate Hε and δε as follows [11]: Hε (z) = 12 [1 + ε π(ε2 +z 2 ) .
c1i and c2i are defined by: c1i (ψi ) = I(x,y,z)·H(ψi (x,y,z))dxdydz . c2i (ψi ) = Ωi Ωi
H(ψi (x,y,z))dxdydz
Ω
2 z π arctan( ε )], δε (z) = I(x,y,z)·(1−H(ψi (x,y,z)))dxdydz
Ω
(1−H(ψi (x,y,z)))dxdydz
,
Neighbor-Constrained Segmentation with 3D Deformable Models
205
Given the surfaces ψi (i = 1, 2, ...M ) at time t, we seek to compute the evolution steps that bring all the zero level set curves to the correct final segmentation based on the neighbor prior information and image information. We first set up p(α1 ) and p(βi,1 ), i = 2, 3, ..., M from the training set using PCA. At each stage of the algorithm, we recompute the constants c1i (ψit ) and c2i (ψit ) and update ψit+1 . This is repeated until convergence. The parameters µi , νi , λ1i , and λ2i are used to balance the influence of the neighbor prior model and the image information model. The tradeoff between neighbor prior and image information depends on how much faith one has in the neighbor prior model and the imagery for a given application. We set these parameters empirically for particular segmentation tasks, given the general image quality and the neighbor prior information.
3
Experimental Results
We have used our model on various synthetic and real images, with at least two different types of contours and shapes. In Figure 4 top, we show the segmentation of the left and right ventricles using only image information, by which the curves cannot lock onto the shapes of the objects. In Figure 4 bottom, we show the results obtained using our model. The curves are able to converge on the desired boundaries even though some parts of the boundaries are too blurred to be detected using only gray level information. Both of the segmentations converged in several minutes on an SGI Octane with a 255MHz R10000 processor.
Fig. 4. Three steps in the segmentation of 2 shapes in a 2D cardiac MR image without(top) and with (bottom) neighbor prior. The right ventricle is the reference shape S1 . The training set consists of 16 images.
In Figure 5, we show that our model can detect multiple objects of different intensities and with blurred boundaries. Figure 5 top shows the results of using only gray level information. Only the lower (posterior) portions of the lateral
206
J. Yang, L.H. Staib, and J.S. Duncan
ventricles can be segmented perfectly since they have clearer boundaries. Figure 5 bottom shows the results obtained using our neighbor prior model. Segmenting all eight subcortical structures took approximately twenty minutes.
Fig. 5. Detection of 8 sub-cortical structures(the lateral ventricles, heads of the caudate nucleus, and putamen) in a MR brain image. Top: results with no prior information. Bottom: results with neighbor prior. The left lateral ventricle is the reference shape S1 . The training set consists of 12 images.
Figure 6 shows the segmentation of the right amygdala and hippocampus in a 2D MR image. In Figure 6 top, we show results of using only gray level information. The segmentations are poor since both structures have very poorly defined boundaries. The middle row in Figure 6 shows the results of using the shape prior but with no neighbor prior. The results are much better, but the boundaries of the amygdala and the hippocampus are overlapped at the part where the two structures are connected. This is because the two structures are treated independently here without the constraint of the neighbor. In Figure 6 bottom, we show results of using our neighbor prior model, the two structures can be clearly segmented, and there is no overlap of the boundaries. We also tested our method in 3D images. We have generated a training set of 9 synthetic images of two uniform ellipsoids with added Gaussian noise. Figure 7 illustrates several steps in the segmentation of the two ellipsoids. Figure 8 shows initial, middle, and final steps in the segmentation of the left and right amygdalas and hippocampuses in a 3D MR brain image using training set model shown in Figures 1,2 and 3. Three orthogonal slices and the 3D surfaces are shown for each step. To validate the segmentation results, we compute the undirected distance between the boundary of the computed segmentation A(NA points) and the boundary of the manual segmentation B: H(A, B) = max(h(A, B), h(B, A)),
Neighbor-Constrained Segmentation with 3D Deformable Models
207
Fig. 6. Four steps in the segmentation of right amygdala and hippocampus. Top: results with no prior information. Middle: results using individual shape priors. Bottom: results using our neighbor prior model. The right amygdala is the reference shape S1 . The training set consists of 12 brain images.
Fig. 7. Initial, middle, and final steps in the segmentation of 2 shapes in a synthetic image. Three orthogonal slices and the 3D surfaces are shown for each step.
h(A, B) = N1A a∈A minb∈B a − b. Table 1 shows the computed results for the synthetic image, the heart image and the brain images. Virtually all the boundary points lie within one or two voxels of the manual segmentation. We also test the robustness of our algorithm to noise as well as to the location of the initial seeds. First, we add Gaussian noise with different variances to the synthetic image(as in Figure 7), and run our algorithm to segment the two ellipsoids, where we set the initial seeds at the centers of the objects. Figure 9 shows the segmentation error in three cases: with no prior, with shape prior, and with neighbor prior. When the variance of the noise is small, the errors are also small for all the three cases. As the variance of the noise goes up, the error for no prior increases rapidly since the objects are too noisy to be recognized using
208
J. Yang, L.H. Staib, and J.S. Duncan
Fig. 8. Initial, middle, and final steps in the segmentation of 4 shapes in a brain image. Three orthogonal slices and the 3D surfaces are shown for each step. Table 1. Distance between the computed boundary and the manual boundary Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Without neighbor prior With neighbor prior
4.2 1.8
9.6 1.9
6.7 0.8
11.2 1.2
7.8 1.7
only gray level information. However, for the methods with the shape prior and with the neighbor prior, the segmentation errors are much lower and are locked in a very small range even when the variance of the noise is very large. We also notice that our neighbor prior model achieves the least error among all the cases. Next, we fix the standard deviation of the noise to be 40, but vary the location of the initial seed inside the right ellipsoid and run the segmentation for the same three cases again. The segmentation error for different seed locations with each method is shown in Figure 10. As the initial seed goes far away from the center of the ellipsoid, the errors are kept in a small range for all the cases because the models are based on level sets. Still, the method with the neighbor prior achieves the smallest error.
4
Conclusions
A new model for automated segmentation of images containing multiple objects by incorporating neighbor prior information in the segmentation process has been presented. We wanted to capture the constraining information that neighboring objects provided and use it for segmentation. We define a MAP estimation framework using the prior information provided by neighboring objects to segment several objects simultaneously. We introduce a representation for the joint density function of the neighbor objects, and define joint probability distributions over the variations of the neighboring positions and shapes in a
Neighbor-Constrained Segmentation with 3D Deformable Models 12
209
4.5
Without Prior With Shape Prior With Neighbor Prior
4
10 3.5
8 Without Prior With Shape Prior With Neighbor Prior
Error
Error
3
6
2.5
2
4 1.5
2 1
0
0
10
20
30 40 Standard Deviation of the Noise
50
60
70
Fig. 9. Segmentation errors with different variances of the noise.
0.5
0
5
10 15 20 25 Distance between the initial seed and the center of the object
30
Fig. 10. Segmentation errors with different locations of initial seed.
set of training images. We estimate the MAP shapes of the objects using evolving level sets based on the associated Euler-Lagrange equations. The contours evolve both according to the neighbor prior information and the image gray level information. Multiple objects in an image can be automatically detected simultaneously.
References 1. M. Kass, A. Witkin, D. Terzopoulos.: Snakes: Active contour models. Int’l Journal on Computer Vision, 1 (1987) 321–331 2. S. Osher and J. A. Sethian.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi Formulation. J. Comp. Phy. 79 (1988) 12–49 3. T. Chan, L. Vese.: Active Contours Without Edges. IEEE Transactions on Image Processing, vol.10 No. 2 (2001) 266–277 4. T.F. Cootes, A. Hill, C.J. Taylor, and J. Haslam.: Use of active shape models for locating structures in medical images. Image and Vision Computing, 12(6):355– 365, July 1994. 5. L.Staib, J. Duncan.: Boundary finding with parametrically deformable models. PAMI, 14(11) (1992) 1061–1075 6. X. Zeng, L.H.Staib, R.T.Schultz and J.S.Duncan.: Volumetric Layer Segmentation Using Coupled Surfaces Propagation. IEEE Conf. on Comp. Vision and Patt. Recog. (1998). 7. M. Leventon, E. Grimson, and O. Faugeras.: Statistical shape influence in geodesic active contours. IEEE Conf. on Comp. Vision and Patt. Recog. 1 (2000) 316–323 8. V. Caselles, R. Kimmel, and G. Sapiro.: Geodesic active contours. Int. J. Comput. Vis. vol.22 No. 1 (1997) 61–79 9. A. Tsai, A. Yezzi, et al.: Model-based curve evolution technique for image segmentation. IEEE Conf. on Comp. Vision and Patt. Recog. vol.1 (2001) 463–468 10. Z. Tu, and S. Zhu: Image segmentation by data-driven Markov Chain Monte Carlo. IEEE Trans. on Patt. Ana. and Machine Intelligence. vol.24 No. 5 (2002) 657–673 11. J. Yang, L. Staib and J. Duncan: Statistical Neighbor Distance Influence in Active Contours. MICCAI. vol.1 (2002) 588–596
Expectation Maximization Strategies for Multi-atlas Multi-label Segmentation Torsten Rohlfing1 , Daniel B. Russakoff1,2 , and Calvin R. Maurer1 1
Image Guidance Laboratories, Department of Neurosurgery, 2 Department of Computer Science, Stanford University, Stanford, CA, USA {rohlfing,dbrussak}@stanford.edu, [email protected]
Abstract. It is well-known in the pattern recognition community that the accuracy of classifications obtained by combining decisions made by independent classifiers can be substantially higher that the accuracy of the individual classifiers. In order to combine multiple segmentations we introduce two extensions to an expectation maximization (EM) algorithm for ground truth estimation based on multiple experts (Warfield et al., MICCAI 2002). The first method repeatedly applies the Warfield algorithm with a subsequent integration step. The second method is a multi-label extension of the Warfield algorithm. Both extensions integrate multiple segmentations into one that is closer to the unknown ground truth than the individual segmentations. In atlas-based image segmentation, multiple classifiers arise naturally by applying different registration methods to the same atlas, or the same registration method to different atlases, or both. We perform a validation study designed to quantify the success of classifier combination methods in atlas-based segmentation. By applying random deformations, a given ground truth atlas is transformed into multiple segmentations that could result from imperfect registrations of an image to multiple atlas images. We demonstrate that a segmentation produced by combining multiple individual registration-based segmentations is more accurate for the two EM methods we propose than for simple label averaging.
1
Introduction
One way to automatically segment an image is to perform a non-rigid registration of the image to a labeled atlas image; the labels associated with the atlas image are mapped to the image being segmented using the resulting non-rigid transformation [1]. This approach has two important components that determine the quality of the segmentations, namely the registration method and the atlas. Just as human experts typically differ slightly in their labeling decisions, different registration methods produce different segmentations when applied to the same raw image and the same atlas. Likewise, different segmentations typically result from using different atlases. Therefore, each combination of a registration algorithm with an atlas effectively represents a unique classifier for the voxels in the target image. C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 210–221, 2003. c Springer-Verlag Berlin Heidelberg 2003
Expectation Maximization Strategies
211
The atlas can be an image of an individual or an average image of multiple individuals. Our group recently showed [2] that the choice of the atlas image has a substantial influence on the quality of a registration-based segmentation. Moreover, we demonstrated that by using multiple atlases, the segmentation accuracy can be improved over using a single atlas (either an image of an individual or an average of multiple individuals). Specifically we showed that a segmentation produced by combining multiple individual segmentations is more accurate than the individual segmentations.1 This finding is consistent with the observation that a combination of classifiers is generally more accurate than an individual classifier in many pattern recognition applications. Typically among the individual segmentations there are more accurate ones as well as less accurate ones. This is true for human experts, due to different levels of experience, as well as for automatic classifiers, due, for example, to differences in similarities between the image to be segmented and different atlases. In this paper we present and evaluate methods that automatically estimate the classifiers’ segmentation qualities and take these into account when combining the individual segmentations into a final segmentation. For binary segmentations (object vs. background), Warfield et al. [3] recently introduced an expectation maximization (EM) algorithm that derives estimates of segmentation quality parameters (sensitivity and specificity) from segmentations of the same image performed by several experts. Their method also enables the generation of an estimate of the unknown ground truth segmentation. This ground truth estimate can provide a way of defining a combined segmentation that takes into account all experts, weighted by their individual reliability. We introduce two extensions of the Warfield method to non-binary segmentations with arbitrary numbers of labels. We also perform an evaluation study to quantitatively compare different methods of combining multiple segmentations into one. Our study is specifically designed to model situations where the segmentations are generated by non-rigid registration of an image to atlas images.
2
Binary Multi-expert Segmentation
This section briefly reviews the Warfield algorithm [3] and introduces the fundamental notation. Our notation differs slightly from that used by the original authors in order to simplify notation for the multi-label extension proposed below. In binary segmentation, every voxel in a segmented image is assigned either 0 or 1, denoting background and object, respectively. For any voxel i, let T (i) ∈ {0, 1} be the unknown ground truth, i.e., the a priori correct labeling. It is assumed that the prior probability g(T (i) = 1) of the ground truth segmentation of voxel i being 1 is uniform (independent of i). During the course of the EM 1
Each individual registration was produced by non-rigid registration of an image to a different atlas that is a labeled image of a reference individual. The combination was performed by simple label averaging.
212
T. Rohlfing, D.B. Russakoff, and C.R. Maurer
algorithm, weights W (i) are estimated, which denote the likelihood that the ground truth for voxel i is 1, i.e., W (i) = P (T (i) = 1). Given segmentations by K experts, we denote by Dk (i) the decision of “expert”2 k for voxel i, i.e., the binary value indicating whether voxel i has been identified as object voxel by expert k. Each expert’s segmentation quality is represented by values pk and qk . While pk denotes the likelihood that expert k identifies an a priori object voxel as such (sensitivity), qk is the likelihood that the expert correctly identifies a background voxel (specificity). 2.1
Estimation Step
Given estimates of the sensitivity and specificity parameters for each expert, the weights for all voxels i are calculated as W (i) = where α=
pk
k:Dk (i)=1
g(T (i) = 1)α g(T (i) = 1)α + (1 − g(T (i) = 0))β (1 − pk ) and β =
k:Dk (i)=0
(1)
qk
k:Dk (i)=0
(1 − qk ) .
k:Dk (i)=1
(2) 2.2
Maximization Step
From the previously calculated weights W , the new estimates pˆk and qˆk for each expert’s parameters are calculated as follows: i:Dk (i)=1 W (i) i:Dk (i)=0 (1 − W (i)) pˆk = and qˆk = . (3) i W (i) i (1 − W (i)) 2.3
Application to Multi-label Segmentation
An obvious way to apply Warfield’s algorithm (described above) to multi-label segmentation is to apply it repeatedly and separately for each label. In each run, one of the labels is considered as the object in the sense of the algorithm. This strategy, however, may lead to inconsistent results, i.e., some voxels can be assigned multiple labels (in other words, voxels can be classified as object voxels in more than one run of the algorithm). To address this issue, we propose to combine the results of all runs as follows: each application of the algorithm provides sensitivity and specificity estimates for all experts for one label (the label that is considered the object of interest in this run of the algorithm). These values are used to compute the weights W (i) according to Eq. (1) separately for 2
In the context of the present paper, we use the term “expert” for the combination of a non-rigid registration algorithm with an atlas image. However, the framework we propose is also appropriate for human experts or any other kind of classifier.
Expectation Maximization Strategies
213
each label. The voxel i is then assigned the label that has the highest weight W . One could instead use the weights W calculated during the last EM iteration for each label, but this requires storing all weights. It is more memory efficient and only slightly more computationally expensive to compute the weights once more after all EM iterations have been completed.
3
Multi-label Multi-expert Segmentation
This section describes a multi-label extension to Warfield’s EM algorithm that simultaneously estimates the expert parameters for all labels. This extension contains Warfield’s algorithm as a special case for one label (L = {0, 1}). This is easily proved by induction over the iterations of the algorithm. For a multi-label segmentation let L = {0, . . . , L} be the set of (numerical) labels in the atlas. Each element in L represents a different anatomical structure. Every voxel in a segmented image is assigned exactly one of the elements of L (i.e., we disregard partial volume effects), which defines the anatomical structure that this voxel is part of. For every voxel i, let T (i) ∈ L be the unknown ground truth, i.e., the a priori correct labeling. We assume that the prior probability g(T (i) = ) of the ground truth segmentation of voxel i being ∈ L is uniform (independent of i). During the course of the algorithm, we estimate weights W (i, ) as the current estimate of the probability that the ground truth for voxel i is , i.e., W (i, ) = P (T (i) = ). Given segmentations by K experts, we denote by Dk (i) the decision of “expert” k for voxel i, i.e., the anatomical structure that, according to this expert, voxel i is part of. Each expert’s segmentation quality, separated by anatomical structures, is represented by a (L + 1) × (L + 1) matrix of coefficients λ. For expert k, we define λk (m, ) := P (T (i) = | Dk (i) = m),
(4)
i.e., the conditional probability that if the expert classifies voxel i as part of structure m, it is in fact part of structure . We note that this matrix is very similar to the normalized confusion matrix of a Bayesian classifier [9]. The diagonal entries or our matrix ( = m) represent the sensitivity of the respective expert when segmenting structures of label , i.e., ()
pk = λk (, ).
(5)
The off-diagonal elements quantify the crosstalk between the structures, i.e., the likelihoods that the respective expert will misclassify one voxel of a given structure as belonging to a certain different structure. The specificity of expert k for structure is computed as () qk = 1 − λk (m, ). (6) m =
214
3.1
T. Rohlfing, D.B. Russakoff, and C.R. Maurer
Estimation Step
In the “E” step of our EM algorithm, the (usually unknown) ground truth segmentation is estimated. Given the current estimate for λ and the known expert decisions D, the likelihood of the ground truth for voxel i being label is g(T (i) = ) k λk (Dk (i), ) . (7) W (i, ) = m [g(T (i) = m) k λk (Dk (i), m)] The likelihoods W for each voxel i are normalized and, over all labels, add up to unity: g(T (i) = ) k λk (Dk (i), ) (8) W (i, ) = [ m g(T (i) = m) k λk (Dk (i), m)] [g(T (i) = ) k λk (Dk (i), )] = 1. (9) = m [g(T (i) = m) k λk (Dk (i), m)] 3.2
Maximization Step
The “M” step of our algorithm estimates the expert parameters λ to maximize the likelihood of the current ground truth estimate determined in the preceding “E” step. Given the previous ground truth estimate g, the new estimates for the expert parameters are computed as follows: (i)= W (i, m) ˆ k (, m) = i:D k λ . (10) i W (i, m) Obviously, since there is some label assigned to each voxel by each expert, the sum over all possible decisions is unity for each expert, i.e., W (i, m) i:Dk (i)= W (i, m) ˆ k (, m) = = i = 1. (11) λ W (i, m) i i W (i, m)
The proof that the update rule in Eq. (10) indeed maximizes the likelihood of the current weights W is tedious, but largely analogous to the proof in the binary case (see Ref. [3]).
4
Implementation
Incremental Computation. Warfield et al. state in their original work [3] that for each voxel they store the weight W , which expresses the current confidence estimate for that voxel being an object voxel. When considering 3-D instead of 2-D images, however, the memory required to store the (real-valued) weights W for each voxel becomes a problem. For the multi-label algorithm introduced in Section 3, the situation is even worse, since it would require storing as many weights per voxel as there are labels in the segmentation. Fortunately, it is possible to
Expectation Maximization Strategies
215
perform the EM iteration without storing the weights, instead propagating the expert parameters estimated in the M-step of the previous iteration directly to the M-step of the next iteration. Inspection of Eq. (3) for the binary algorithm and Eq. (10) for the multi-label algorithm reveals that the computation of the next iteration’s expert parameters requires only the sums of all weights W for all voxels as well as for the subsets of voxels for each expert that are labeled the same by that expert. In other words, the value W (i) (the values W (i, j) for all j in the multi-label case) is needed only for one fixed i at any given time. The whole field W (i) (W (i, j) in the multi-label case) need not be present at any time, thus relieving the algorithm from having to store an array of N floating point values (N · L in the multi-label case). The weights W from Eq. (1) can instead be recursively substituted into Eq. (3), resulting in the incremental formulas pˆk =
g(T (i)=1)α i:Dk (i)=1 g(T (i)=1)α+(1−g(T (i)=0))β g(T (i)=1)α i g(T (i)=1)α+(1−g(T (i)=0))β
qˆk =
g(T (i)=1)α g(T (i)=1)α+(1−g(T (i)=0))β ) , g(T (i)=1)α g(T (i)=1)α+(1−g(T (i)=0))β )
i:Dk (i)=0 (1
i (1
−
,
−
(12)
(13)
where α and β are defined as in Eq. (2) and depend only on the parameters p and q from the previous iteration and the (invariant) expert decisions. Analogously, in the multi-label case the weights W from Eq. (7) can be recursively substituted into Eq. (10), resulting in the incremental formula i:Dk (i)= k λk (Dk (i), m) ˆ . (14) λk (, m) = i k λk (Dk (i), m) Restriction to Disputed Voxels. Consider Eqs. (1) and (7) and let us assume that for some voxel i, all experts have made the same labeling decision and assigned a label . Let us further assume that the reliability of all experts for the assigned label is better than 50%, i.e., pk > 0.5 for all k during the -application of the repeated binary method, or λk (, ) > 0.5 in the multi-label method. It is then easy to see that voxel i will always be assigned label . We refer to such voxels as undisputed . Conversely, we refer to all voxels where at least one expert disagrees with the others as disputed . Mostly in order to speed up computation, but also as a means of eliminating image background, we restrict the algorithm to the disputed voxels. In other words, where all experts agree on the labeling of a voxel, that voxel is assigned the respective label and is not considered during the iterative optimization procedure. In addition to the obvious performance benefit, it is our experience that this restriction actually improves the quality of the segmentation outcome. To understand this phenomenon, consider application of the binary EM algorithm to an image with a total of N voxels that contains a structure n voxels large. Take an expert who correctly labeled the n foreground voxels, but mistakenly
216
T. Rohlfing, D.B. Russakoff, and C.R. Maurer
labeled m additional background voxels as foreground. This expert’s specificity −n)−m . By increasing the field of view, thus adding peripheral is therefore q = (N N −n background voxels, we can increase N arbitrarily. As N approaches infinity, q approaches 1, regardless of m. Therefore, we lose the ability to distinguish between specific and unspecific experts as the amount of image background increases. Due to limited floating-point accuracy this is a very real danger, and it explains why, in our experience, it is beneficial to limit consideration to disputed voxels only.
5
Volume-Weighted Label Averaging
As a reference method for the two EM algorithms above, a non-iterative label averaging algorithm is implemented. The fundamental function of this method is to assign to each voxel in the final segmentation the label that was assigned to this voxel by the (relative) majority vote of the experts [4]. However, the situation we are interested in is slightly different. Instead of presenting an image to a human expert, each expert in our context is merely a non-rigid coordinate transformation from an image into an atlas. Since the transformation is continuous, while the atlas is discrete, more than one voxel in the atlas may contribute to the labeling of each image voxel. The contributing atlas voxels can (and will near object boundaries) have different labels assigned to them. The simplest way to address this situation is to employ nearest-neighbor interpolation. However, it is our experience that it is a better idea to use Partial Volume Integration (PVI) as introduced by Maes et al. [5] in order to properly consider fractional contributions of differently labeled voxels. For a quick review of PVI, consider a voxel i to be segmented. From each of the k expert segmentations, looking up the label for this voxel under some coordinate transformation yields an 8-tuple of labels from a 2 × 2 × 2 neighborhood of voxels in the atlas, numbered 0 through 7. Each voxel is also assigned a weight w based on its distance from the continuous position described by the non-rigid image-toatlas coordinate mapping. Therefore, each expert segmentation for each voxel produces an 8-tuple Xk (i) of label-weight pairs: (0)
(0)
(7)
(7)
Xk (i) = ((wk , k ), . . . , (wk , k )).
(15)
For each expert, all weights of atlas voxels with identical labels are added: (j) Wk () = wk . (16) j=0...7, (j) k =
In what is commonly referred to as “Sum fusion” [4], the image voxel is finally assigned the label with the highest total weight summed over all experts, i.e., Wk (). (17) arg max
k
Expectation Maximization Strategies
6
217
Validation Study
The goal of the algorithms described above is to improve the accuracy of segmentation results by taking into account estimates of all experts’ segmentation qualities. We are particularly interested in the case where each expert is an instance of a non-rigid registration method combined with an atlas image. Unlike statistics-based methods, atlas-based segmentation is by nature capable of, and typically aims at, labeling anatomical structures rather than tissue types. As an atlas is usually comprised of continuously defined objects, multiple independent atlas-based segmentations differ by deformation of these objects, rather than by noise (sparse pixels of different labels within a structure). The validation study described below is designed accordingly. An increasingly popular non-rigid registration method was originally introduced by Rueckert et al. [6]. It applies free-form deformations [7] based on Bspline interpolation between uniform control points. We implemented this transformation model and simulate imperfect segmentations by applying random deformations to a known atlas. Each randomly deformed atlas serves as a model of an imperfect segmentation that approximates the original atlas. Several of these deformed atlases are combined into one segmentation using the methods described in the previous sections. Since the original (undeformed) atlas is known, it provides a valid ground truth for the results of all three methods. 6.1
Atlas Data
In order to ensure that the underlying undeformed atlas is meaningful and relevant, we did not generate a geometric phantom. Instead, we used real threedimensional atlases derived from confocal microscopies of the brains of 20 adult foraging honey bees (see Ref. [8] for details). Each volume contained 84–114 slices with thickness 8 µm and each slice had 610–749 pixels in x direction and 379–496 pixels in y direction with pixel size 3.8 µm. In each individual image, 22 anatomical structures were distinguished and labeled. For each ground truth, random B-spline-based free-form deformations were generated by adding independent Gaussian-distributed random numbers to the coordinates of all control points. The control point spacing was 120 µm, corresponding to approximately 30 voxels in x and y direction and 15 voxels in z direction. The variances of the Gaussian distributions were σ = 10, 20, and 30 µm, corresponding to approximately 2, 4, and 8 voxels in x and y direction (1, 2, and 4 voxels in z direction). Figure 1 shows examples of an atlas after application of several random deformations of different magnitudes. A total of 20 random deformations were generated for each individual and each σ. The randomly deformed atlases were combined into a final atlas once by label averaging, and once using each of our novel algorithms. 6.2
Algorithm Parameters
Initialization. The expert parameters were initialized as follows. In the binary case, p and q were set to 0.9 for all experts. In the multi-label case, λk (, )
218
T. Rohlfing, D.B. Russakoff, and C.R. Maurer σ = 10µm
σ = 20µm
σ = 30µm
Warp #1
Warp #2
Warp #3
Overlay Fig. 1. Examples of a randomly deformed atlas. Each image shows a central axial slice from the same original atlas after application of a different random deformation. Within each column, the magnitudes of the deformations (variance of random distribution of control point motion) were constant. The images in the bottom row show overlays of the isocontours from the three images above to emphasize the subtle shape differences.
was initialized as 0.9 for all k and all . The off-diagonal elements were set to (1 − λk (, ))/L. Convergence Criterion. We are interested in processing large amounts of image data with many labels. In order to keep computation times somewhat reasonable, we do not wait for actual convergence of the results. Instead, we perform a fixed number of iterations, typically 7. In the validation study described below, our experience was that in the final iteration typically only one out of 10,000 voxels changed its value. 6.3
Evaluation
For every registration, the registration-based segmentation is compared with the manual segmentation. As one measure of segmentation quality we compute the global segmentation correctness measure C, which we define as the fraction of voxels for which the automatically generated registration-based segmentation matches the manually assigned labels: (s) (s) ∩ V V GT s comb . (18) C= (s) s |VGT |
Expectation Maximization Strategies
σ = 20 µm
98
98
96
94
92 EMbin EMmulti AVG Individual
90
88
σ = 30 µm 100
96
94
92 EMbin EMmulti AVG Individual
90
88 5
10
15
20
Number of Atlases
Percentage of Correctly Labeled Voxels
100
Percentage of Correctly Labeled Voxels
Perecentage of Correctly Labeled Voxels
σ = 10 µm 100
219
98
96
94 EMbin EMmulti AVG Individual
92
90
88
5
10
15
20
Number of Atlases
5
10
15
20
Number of Atlases
Fig. 2. Mean correctness C of combined segmentation over 20 individuals vs. number of random segmentations used. Results are shown for PVI label averaging (AVG), repeated application of the binary EM algorithm (EMbin), and the multi-label EM algorithm (EMmulti). Each method was applied to atlases after random deformations of magnitudes σ = 10 µm (left diagram), σ = 20 µm (center ), and σ = 30 µm (right). The dashed line in each graph shows the average correctness achieved by the respective set of individual atlases with no combination method. (s)
(s)
where VGT and Vcomb denote the sets of indices of the voxels labeled as belonging to structure s in the undeformed ground truth (GT) and the combined estimated segmentation (comb), respectively. 6.4
Results
Figure 2 shows a plot of the mean correctness over all 20 individuals versus the number of segmentations. Both EM algorithms performed consistently better, i.e., produced more accurate combined segmentations, than simple label averaging. The improvement achieved using the EM strategies was larger for greater magnitudes of the random atlas deformations. Between the two EM methods, repeated application of the binary algorithm outperformed the multi-label method. For all algorithms, adding additional segmentations increased the accuracy of the combined segmentation. The incremental improvement obtained by adding an additional segmentation decreased as the number of atlases increased. The figure also nicely illustrates the superiority of using multiple atlases over using just one: in all cases, the individual correctnesses are substantially lower than any of the combined results. Again, the difference increases as the magnitude of the random deformations is increased.
7
Discussion
This paper has several new ideas. First, based on a novel interpretation of the term “expert”, we propose to combine multiple registration-based segmentations into one in order to improve segmentation accuracy. Second, we introduce
220
T. Rohlfing, D.B. Russakoff, and C.R. Maurer
two multi-label extensions to an EM algorithm [3] for ground truth estimation in binary segmentation. Finally, we evaluate the segmentation quality of the two methods and a combined segmentation method based on simple label averaging. Effectively, this paper introduces the principle of combining multiple classifiers [4,9] to atlas-based image segmentation. In fact, the multi-label EM algorithm presented here can be understood as a learning method for the confusion matrix of a Bayesian classifier [9]. The quantitative evaluation of segmentation accuracy using random deformations of a known atlas demonstrated that both methods introduced in this paper produce better segmentations than simple label averaging. This is true despite the natural advantage that label averaging has by being able to consider fractional label contributions using PVI. Both EM algorithms described here more than make up for this inherent disadvantage. This finding is particularly significant as our previous research showed that combining multiple registrationbased segmentations by label averaging already produces results that are better than the individual segmentations [2]. This finding, which corresponds to the experience of the pattern recognition community that multiple classifier systems are generally superior to single classifiers [4], was also confirmed by the validation study performed in this paper. Between the two EM methods, the repeated application of a binary EM algorithm was superior to a dedicated multi-label algorithm, but at substantially increased computation cost. However, this may be different for different atlas topologies. Assume, for example, that there is an adjacency relationship between two anatomical structures in the form that one encloses the other. In this case, the crosstalk between classifications of both structures may be beneficial to consider, which is precisely what our novel multi-label EM algorithm does. It should be mentioned that, like the original Warfield algorithm, our methods and their validation are based on several assumptions regarding the nature of the input data. Most notably, we assume that the errors of the individual segmentations are somewhat independent. In the presence of systematic errors made by all or at least a majority of the experts, the same error will very likely also appear in the final ground truth estimate. This problem, however, is not restricted to the machine experts that we focused on in this paper. In fact, since the individual training and experience of human experts are not mutually independent (in fact, similarity in training and expertise is what makes us consider someone an expert with respect to a certain problem), the same is true for manual segmentations. While seemingly similar, the situation we address with the validation study in this paper is fundamentally different from validation of non-rigid registration. A promising approach to validating non-rigid image registration involves simulating a known deformation using a biomechanical model. The simulated deformation is taken as the ground truth against which transformations computed using non-rigid registration can be validated. In that context, it is important that the simulated deformation be based on a different transformation model than the registration, for example, a B-spline-based registration should not be validated using simulated B-spline deformations.
Expectation Maximization Strategies
221
In our context, however, the opposite is true. In this paper, we validated methods for combining different automatic segmentations generated by nonrigid registration. In this framework it makes sense (and is, in fact, necessary to correctly model the problem at hand) that the randomly deformed segmentations are generated by applying transformations from the class used by the registration algorithm. Only in this way can we expect to look at variations in the segmentations comparable to the ones resulting from imperfect non-rigid registration. Acknowledgments. TR was supported by the National Science Foundation under Grant No. EIA-0104114. DBR was supported by the Interdisciplinary Initiatives Program, which is part of the Bio-X Program at Stanford University, under the grant “Image-Guided Radiosurgery for the Spine and Lungs.”
References 1. BM Dawant, SL Hartmann, JP Thirion, et al. Automatic 3-D segmentation of internal structures of the head in MR images using a combination of similarity and free-form transformations: Part I, methodology and validation on normal subjects. IEEE Trans Med Imag, 18(10):909–916, 1999. 2. T Rohlfing, R Brandt, R Menzel, et al. Segmentation of three-dimensional images using non-rigid registration: Methods and validation with application to confocal microscopy images of bee brains. In Medical Imaging: Image Processing, Proceedings of SPIE, Feb. 2003. In print. 3. SK Warfield, KH Zou, WM Wells. Validation of image segmentation and expert quality with an expectation-maximization algorithm. In Proceedings of Fifth International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 298–306, Springer-Verlag, Berlin, 2002. 4. J Kittler, M Hatef, RPW Duin, et al. On combining classifiers. IEEE Trans Pattern Anal Machine Intell, 20(3):226–239, Mar. 1998. 5. F Maes, A Collignon, D Vandermeulen, et al. Multimodality image registration by maximisation of mutual information. IEEE Trans Med Imag, 16(2):187–198, 1997. 6. D Rueckert, LI Sonoda, C Hayes, et al. Nonrigid registration using free-form deformations: Application to breast MR images. IEEE Trans Med Imag, 18(8):712–721, 1999. 7. TW Sederberg, SR Parry. Free-form deformation and solid geometric models. Comput Graph (ACM), 20(4):151–160, 1986. 8. T Rohlfing, R Brandt, CR Maurer, Jr., et al. Bee brains, B-splines and computational democracy: Generating an average shape atlas. In Proceedings of IEEE Workshop on Mathematical Methods in Biomedical Image Analysis, pp. 187–194, 2001. IEEE Computer Society, Los Alamitos, CA. 9. L Xu, A Krzyzak, CY Suan. Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst Man Cybern, 22(3):418– 435, 1992.
Quantitative Analysis of Intrathoracic Airway Trees: Methods and Validation K´ alm´an Pal´ agyi, Juerg Tschirren, and Milan Sonka Dept. of Electrical and Computer Engineering, The University of Iowa, 4016 Seamans Center, Iowa City, IA 52242-1595 {kalman-palagyi,juerg-tschirren,milan-sonka}@uiowa.edu
Abstract. A method for quantitative assessment of tree structures is reported allowing evaluation of airway or vascular tree morphology and its associated function. Our skeletonization and branch-point identification method provides a basis for tree quantification or tree matching, treebranch diameter measurement in any orientation, and labeling individual branch segments. All main components of our method were specifically developed to deal with imaging artifacts typically present in volumetric medical image data. The proposed method has been tested in 343 computer phantom instances subjected to changes of its orientation as well as in a repeatedly CT-scanned rubber plastic phantom width sub-voxel accuracy and high reproducibility. Application to 35 human in vivo trees yielded reliable and well-positioned centerlines and branch-points.
1
Introduction
Tubular tree structures are common in human anatomy. Arterial, venous, or bronchial trees may serve as most frequent examples. Computed tomography (CT) or magnetic resonance (MR) imaging provides volumetric image data allowing identification of such tree structures. Frequently, the trees represented as contiguous sets of voxels must be quantitatively analyzed. The analysis may be substantially simplified if the voxel-level tree is represented in a formal tree structure consisting of a set of nodes and connecting arcs. To build such formal trees, the voxel-level tree object must be transformed into a set of interconnected single-voxel centerlines representing individual tree branches. Therefore, the aim of our work was to develop a robust method for identification of centerlines and bifurcation (trifurcation, etc.) points in segmented tubular tree structures acquired in vivo from humans and animals using volumetric CT or MR scanning.
2
Methods
The input of the proposed method is a 3D binary image representing a segmented voxel-level tree object. The entire tree analysis process consists of the following main steps: topological correction, root detection, centerline extraction by thinning, pruning, branch-point identification, generating formal tree structures representing centerlines and branches, and branch labeling. C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 222–233, 2003. c Springer-Verlag Berlin Heidelberg 2003
Quantitative Analysis of Intrathoracic Airway Trees
2.1
223
Topological Correction of the Segmented Tree
When applied to clinical volumetric images, segmentation algorithms may produce imperfect segmentation results in which the segmented objects contain internal cavities (i.e., connected “0” voxels surrounded by “1” voxels), holes (i.e., “0” voxels forming a tunnel such as a doughnut has), and bays (i.e., disturbances without a topological meaning). Some of them cause unwanted changes of the underlying topology, all of them disturb and consequently yield an incorrect set of centerlines and thus an incorrect formal representation. To overcome the effects of artifactual cavities, the “0” voxels connected to the frame of the volume are labeled by sequential forward and backward scanning (instead of the conventional object labeling) then all unlabeled “0” voxels are filled (i.e., changed to “1” voxels). The applied method is similar to the linear-time Chamfer distance mapping [2]. As a result, all cavities are filled with no connectivity alteration. Holes and bays were removed by applying morphological closing [4] (i.e., a dilation followed by an erosion with a suitable structuring element). Note that the closing is a double-edged sword; it is suitable for filling small gaps, holes, and cavities, but new holes may be created. This side-effect can be handled by the pruning process that follows the centerline extraction. 2.2
Root Detection
The center of the topmost nonzero 2D slice in direction z (detected by 2D shrinking) defines the root of the formal tree to be generated. In airway trees, the root point belongs to the trachea. The detected root voxel acts as an anchor point during the centerline extraction (i.e., it cannot be deleted by the forthcoming iterative peeling process). The root detection is not a critical phase of the process. It can be identified interactively or automatically [9,13]. 2.3
Centerline Extraction
One of the well-known approaches to centerline determination is to construct a 3D skeleton of the analyzed object. However, some of the properties of 3D skeletons in discrete grids are undesirable. Specifically, in the case of 3D tubular objects, we do not need the exact skeleton, since skeleton generally contains surface patches. We need a skeletonization method that can suppress creation of such skeleton surface patches. As a solution, a 3D curve-thinning algorithm was developed that is preserving line-end points and can thus extract both geometrically and topologically correct centerlines. As part of this process, a novel method for endpoint re-checking was developed based on comparisons between the centerline configuration at some stage of thinning and the previous object configuration. Thinning is a frequently used method for producing an approximation to the skeleton in a topology-preserving way [6]. Border points of a binary object that satisfy certain topological and geometric constraints are deleted in the iteration steps. In case of tubular 3D objects, thinning has a major advantage over
224
K. Pal´ agyi, J. Tschirren, and M. Sonka
other skeletonization methods since curve-thinning (i.e., iterative object reduction preserving line-end points) can produce one voxel wide centerlines directly [10]. In order to outline our thinning scheme, let’s first define (26, 6) images, border points (corresponding to the six main directions in 3D), line-end points, and simple points. A binary image is (26, 6) image if 26-connectivity (i.e., the reflexive and transitive closure of the 26-adjacency relation) is considered for “1” voxels forming the objects and 6-connectivity (i.e., the reflexive and transitive closure of the 6-adjacency) is considered for “0” voxels [6]. A “1” voxel in a (26, 6) image is called U-border point if its 6-neighbor in direction U (“up”) is “0”. We can define N-, E-, S-, W-, and D-border (“down”) points in the same way. A “1” voxel is called line-end point if it has exactly one “1” 26-neighbor. A “1” voxel is called a simple point if its deletion does not alter the topology of the image [6]. It needs to be emphasized that simplicity in (26, 6) images is a local property that can be decided by investigating the 26-neighbors, the 3 × 3 × 3 neighborhood of any given point [8]. Our sequential thinning algorithm can be regarded as a modified version of the method proposed by Lee, Kashyap, and Chu [7]. It is sketched as follows: repeat for each direction U, N, E, S, W, and D do mark all border points according to the actual direction that are simple points and not line-end points for each marked point p do if p is simple in the actual image then if p is not a line-end point then delete p else if #(deleted 6-neighbors of p)≥ t then delete p endfor endfor until changes occur One iteration step of the sequential object reduction process (i.e., the kernel of the repeat cycle) is decomposed into six successive sub-iterations according to the six main directions in 3D. Each sub-iteration consists of two phases; at first the border points of the actual type that are simple and not line-end points are marked as potential deletable points of the actual sub-iteration. This phase of the algorithm can be executed in parallel, but the forthcoming re-checking phase must be sequential. During the re-checking, a marked point is deleted if it remains simple and is not a line-end point after the deletion of some previously visited marked points. In addition, in the following special cases, simple points are deleted if they have become line-end point, too. The algorithm uses an extra parameter t ∈ {0, 1, 2, 3, 4, 5, 6} and a marked (simple and line-end) point can be deleted if at least t points of its 6-neighbors have been deleted during the actual phase of the process. Note that in the case of t = 6, the novel endpoint re-checking has no effect (since a point is not simple if all of its 6-neighbors
Quantitative Analysis of Intrathoracic Airway Trees
225
belong to the object). In that case, it produces the same result as the method proposed by Lee, Kashyap, and Chu [7]. According to our experience, setting t = 1 or t = 2 is suggested for human airway trees. See Fig. 1 for an example of the usefulness of the endpoint re-checking.
Fig. 1. A part of a segmented tree and its raw centerline extracted by the proposed algorithm without endpoint re-checking (t = 6) (left). The result with endpoint rechecking (t = 1) (right). The centerline extracted by endpoint re-checking contains only 125 (true) branch-points (junctions) and 128 line-end points. There are 167 branchpoints and 176 line-end points in the centerline generated without endpoint re-checking. Several of the unwanted branches are marked by arrows
2.4
Pruning
Unfortunately, each skeletonization algorithm (including ours) is rather sensitive to coarse object boundaries. As a result, the produced (approximation to the) skeleton generally includes false segments that must be removed by a pruning step. Applying a proper pruning method that would yield reliable centerlines is critical in all tree-skeletonization applications. An unwanted branch causes false generation numbering and consequently false measurements corresponding to the individual segments of the tree (including length, volume, surface area, etc.). There are pruning approaches (e.g., morphological pruning [4]) capable of removing all side branches that are shorter than a predefined threshold [13]. Those methods necessarily fail in structures consisting of tubular segments of varying thickness. Therefore, we have developed a method capable of removing “long” parasitic branches from “thick” parts and preserving “short” correctly determined branches in “thin” segments. Our pruning process consists of the following two phases: – cutting holes that remain after the morphological closing, and – deleting side branches using both the length and depth information. At first, the centerlines are converted into a graph structure (each voxel corresponds to a graph node/vertex and there is an edge between two nodes if
226
K. Pal´ agyi, J. Tschirren, and M. Sonka
the corresponding voxels are 26-adjacent). Then, Dijkstra’s algorithm is applied to solve the single-source shortest-paths problem [3] that maintains a rooted tree from a source node (i.e., the root detected in the first nonzero 2D slice in direction z). Since the result of Dijkstra’s algorithm is always a (cycle-free) tree, we can detect and cut holes in the centerlines easily: a skeletal point is to be deleted if it is not a line-end point and is not the parent of any other point in the Dijkstra’s tree. This heuristic hole-cutting approach works well, although counter-examples can be given in which the heuristic does not apply. After the hole cutting, the parasitic side branches are removed. We have developed a centerline pruning that uses both the branch length and the distancefrom-surface (depth) information for the identification of a pruning candidate. The following algorithm is applied to delete all branches if their lengths are shorter than a given threshold tl and their branch-points are closer to the border/surface of the elongated tree (after topological correction) than a given threshold td : 1. Calculate the linear time (3,4,5)-chamfer distance map [2] for the elongated tree (after topological correction) in which the feature points are formed by the “0” voxels 6-adjacent to a “1” voxel. The resulted “distance-fromsurface” map D is a non-binary array containing the distance to the closest feature voxel. 2. Initialization of the “skeletal distance map” SD: 0 if v is a branch-point and D(v) ≥ td B if v is a branch-point and D(v) < td SD(v) = , B if v is the root of the tree m otherwise where values “B” and “m” should be larger than the maximal length in the tree and they have special meanings: “B” voxels are “bumpers” during the forthcoming distance-propagation step while “m” voxels (assigned to line-points and end-points in the centerlines) are to be changed. 3. Distance propagation in SD according to the (3,4,5)-chamfer distance — it can be performed similarly to the linear time chamfer distance mapping. Note that “B” voxels remain the same during this step. 4. Branch deletion: a side branch with an associated end-point v is deleted if SD(v) ≤ tl . It can be done easily by using the Dijkstra’s tree or in the following way: for i = tl downto 1 do for each end-point v in the centerlines do if SD(v) = i then delete v from the centerlines Steps 2-4 of the above process can be repeated for different pairs of thresholds (tl , td ). In our experience, 2 to 4 iterations typically provide satisfactory results for in-vivo airway trees. The result of our pruning is demonstrated in the Fig. 2.
Quantitative Analysis of Intrathoracic Airway Trees
227
Fig. 2. A part of a segmented tree and its centerline before pruning (left) and after pruning (right). The applied pruning technique can delete unwanted long branches from thick parts and unwanted shorter ones from thinner parts, while correct branches are typically preserved throughout the tree
2.5
Branch-Point Identification
In a skeleton, three types of points can be identified: end-points (which have only one 26-neighbor), line points (which have exactly two 26-neighbors), and branch-points (which have more than two 26-neighbors) that form junctions (bifurcations, trifurcations, etc.), see Fig. 3. Clearly, branch-identification from maximally thinned centerlines of an elongated tree is trivial. One problem is that more than one branch-points may form a junction. In that case, the branch-point closest to the root of the tree is assigned the junction label.
root
line−point branch−point
reference
end−point
Fig. 3. The three types of voxels in a maximally thinned structure (left). Sometimes, a junction is formed by more than one branch-points (right). In that case, the branchpoint closest to the root is the reference point of the junction
2.6
Generating Formal Tree Structure
The formal tree structure assigned to the pruned centerlines is based on the updated Dijkstra’s tree (after pruning). It is stored in an array of n elements
228
K. Pal´ agyi, J. Tschirren, and M. Sonka
for a centerline containing n voxels. Each element of that array stores the coordinates of a voxel, its depth in the elongated volume, and the index of the element that corresponds to the parent/predecessor voxel in the tree. This internal data structure is suitable for the forthcoming analysis and measurements, and provides an efficient coding of the resulted binary image. A similar structure is assigned to the branch-points. In the branch-tree, a path between two branch-points is replaced by a single edge.
2.7
Labeling
The aim of the labeling procedure is to partition all voxels of the binary tree into branches — each voxel is assigned a branch-specific label. There are two inputs into the process — the binary image after topological corrections, and the formal tree structure corresponding to the centerlines. The output is a gray-level image, in which value “0” corresponds to the background and different non-zero values are assigned to the voxels belonging to different tree branches. The automated labeling consists of two steps. First, only the voxels in the centerlines are labeled so that each branch-centerline has a unique label. Nonskeletal tree voxels are then labeled by label-propagation — each voxel in the tree gets the label of the closest skeletal point.
3 3.1
Experimental Methods Data
The reproducibility experiments were performed in 342 instances of a computer phantom and in a rubber plastic phantom CT-scanned under 9 orientations. The computer phantom[5] is a 3-dimensional structural model of the human airway tree. The model consists of 125 elongated branches and its centerlines have 62 branch-points and 64 end-points (including the root of the tree). Note, that the true positions of the branch-points are known. The generated object is embedded in a 300 × 300 × 300 binary array containing unit-cube voxels. Independently, the phantom was rotated in 5 degree steps between −15 and +15 degrees along all three axes. The second phantom is a hollow rubber plastic one, casted from a normal human bronchial tree. It was embedded in potato flakes (simulating lung tissue) and imaged in 9 orientations using multi-row detector computed tomography with voxel size 0.488 × 0.488 × 0.5 mm (4-slice spiral CT, Mx8000, Philips Medical Systems). The volume sizes were 512 × 512 × 500 − 600 voxels. The rotation angles defined 9 phantom orientations in the scanner, the orientations were separated by 15◦ intervals in the x − z and y − z planes. From 3D CT phantom images, segmentation was performed to separate bronchial airways from the lung parenchyma. This phantom consists of about 400 branches and 200 branch-points (see Fig. 4).
Quantitative Analysis of Intrathoracic Airway Trees
3.2
229
Quantitative Indices
To evaluate the reproducibility of our airway tree skeletonization algorithm, the method described above was applied to the 3D binary images of airway trees. For each of the 342+9 = 351 trees, skeletonization was performed fully automatically and the resulting skeletons were not interactively edited. For each instance of the computer phantoms, the branch-point position error was determined. It was defined as a Euclidean distance between the skeletonization-determined and true coordinates of the corresponding branch-points. For a subset of 9 computer phantoms and the 9 rubber phantoms, the following quantitative indices were determined for the first 5 generations of the matched trees. Here, the reproducibilities were determined by assessing differences between the reference tree and the tree analyzed in different orientations, after registering the analyzed tree with the reference tree. The tree in neutral position was used as a reference tree: – branch length – defined as a Euclidean distance between the parent and child branch-points, – branch volume – defined as a volume of all voxels belonging to the branch, – branch surface area – defined as a surface area of all boundary voxels belonging to the branch, – average branch diameter – calculated from the distance map.
Fig. 4. The rubber phantom (in the neutral orientation) and its centerlines
230
K. Pal´ agyi, J. Tschirren, and M. Sonka
Fig. 5. A segmented in-vivo acquired airway tree with its (pruned) centerlines extracted by our method. The 512 × 512 × 570 human data close to total lung capacity has a nearly isotropic resolution of 0.7×0.7×0.6mm3 . It was scanned using multi-row detector computer tomography
3.3
Statistical Validation
The reproducibility results are reported separately for each of the two phantom studies. The average branch-point positioning errors are only calculated for the computer phantom for which the true branch-point positions were known. These errors are presented as mean ± standard deviation and reported in voxels. All other reproducibility indices were compared using Bland-Altman statistic for which the average value of all corresponding measurements was used as an independent variable. The reproducibility showing 95% confidence intervals are presented in the form of Bland-Altman agreement plots [1].
4
Results
The experiment was performed to assess the reproducibility of branch-point location using our approach. First, the true branch-points in a neutral phantom orientation were rotated. Then, the phantom was rotated in the same way and the branch-points were identified in the new phantom position. Consequently, for each phantom orientation, a set of independent-standard branch-points was available. The average branch-point positioning error showed sub-voxel accuracy of 0.93 ± 0.41 voxel size.
Quantitative Analysis of Intrathoracic Airway Trees
231
Fig. 6. Reproducibility in a computer phantom. a) branch length, b) branch volume, c) branch surface area, d) average branch diameter
The reproducibility of the quantitative tree morphology indices are given in Figs. 6–7. Note that the relatively large differences between the surface and volume indices are to be expected due to a high sensitivity of these measures to minor partitioning errors, especially in short branches. Compare with the high reproducibility of the branch diameter and length measures. The reported method was applied to 35 human datasets. In all cases, it produced reliable and well positioned centerlines, see Fig. 5.
5
Discussion and Conclusion
Our algorithm for extracting centerlines from tree structures has several advantageous properties: It produces geometrically correct centerlines due to the employed directional approach (i.e., the outmost layer of an object is peeled by 6 successive subiterations according to the 6 main directions). As a result, the centerline is in its correct position (i.e., in the middle of the object) and its location is fairly invariant under object orientation as discussed later. The produced centerline is topologically equivalent to the original elongated object, since simple points are deleted sequentially. Our algorithm is topologypreserving by definition of simple points, therefore, the proof is self-evident. The skeletonization algorithm can produce maximally thinned (i.e., 1-voxel wide) centerlines, since all simple points are deleted. Note, that some thinning
232
K. Pal´ agyi, J. Tschirren, and M. Sonka
Fig. 7. Reproducibility in a rubber phantom. a) branch length, b) branch volume, c) branch surface area, d) average branch diameter
algorithms may delete only a subset of simple points [10]. Therefore, the obtained structure is free from surface patches and any kinds of elongated parts. In comparison, the maximal thinness is not guaranteed by distance-based methods [11, 12]. Our skeletonization retains the shape of the original (elongated) object by preserving line-end points. The endpoint preserving thinning differs from shrinking (it is capable of extracting the topological kernel of an object). Our approach creates a substantially smaller number of unwanted centerline segments compared to competing methods [7] due to a novel endpoint re-checking step. The re-checking process statistically decreases the number of identified branch- and end-points without removing valid branches (p <<0.001). Additionally, our method allows an easy and efficient implementation. Two linked lists are used; the first one stores all border points and the second one stores all border points of the actual type that are simple and not line-end points. The simplicity of a point is decided by determining a 26-bit integer code corresponding to their 3 × 3 × 3 neighborhood and addressing a pre-calculated (unit time access) look-up-table containing the answers for all possible 3 × 3 × 3 configurations. The codes of the marked points are stored in the second list. Therefore, the endpoint re-checking can be performed easily. Whenever a point is deleted, it is removed from the first list and its 6-neighbors (that are not in that list) are added. By using linked lists, the consumptive scanning of the volume is avoided. The proposed thinning algorithm is fairly fast; our implementation takes only 7 seconds for a 512×512×512 image containing 250,000 object points
Quantitative Analysis of Intrathoracic Airway Trees
233
voxels (running on a 1.7 MHz AMD Athlon 2000+ PC) including reading the input volume and the 8 MB look-up-table, and writing the output image. It is much faster than the parallel thinning algorithm employed in [13]. Our labeling method uses a generation-by-generation tree-walk strategy (similar to breadth-first-search [3]). Therefore, it can handle any kinds of tree including trees containing trifurcations, fourfurcations, etc. In comparison, the labeling method proposed by Mori et al. may fail if trifurcation is present [9]. Note, that during the labeling process, an expanded branch-tree structure containing generation numbers can be created. Additional quantitative indices including local diameter of a tree branch, bifurcation angle, etc. can be calculated. The presented automated method for skeletonization, branch-point identification and quantitative analysis of tubular tree structures is robust, efficient, and highly reproducible. Acknowledgments. This work was supported by the NIH grant HL-064368.
References 1. Bland, J.M., Altman, D.G.: Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1(8476) (1986) 307–310 2. G. Borgefors, G.: Distance transformations in arbitrary dimensions. Computer Vision, Graphics, and Image Processing 27 (1984) 321–345 3. Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to algorithms. The MIT Press (1990) 4. Gonzales, R.C., Woods, R.E.: Digital image processing. Addison-Wesley, Reading, Massachusetts (1992) 5. Kitaoka, H., Takaki, R., Suki, B.: A three-dimensional model of the human airway tree. Journal of Applied Physiology”, 87 (1999) 2207–2217 6. Kong, T.Y., Rosenfeld, A.: Digital topology: Introduction and survey. Computer Vision, Graphics, and Image Processing 48 (1989) 357–393 7. Lee, T., Kashyap, R.L., Chu, C.: Building skeleton models via 3-D medial surface/axis thinning algorithms. CVGIP: Graphical Models and Image Processing 56 (1994) 462–478 8. Malandain, G., Bertrand, G.: Fast characterization of 3D simple points. In: Proc. 11th IEEE Int. Conf. on Pattern Recognition (1992) 232–235 9. Mori, K., Hasegawa, J., Suenaga, Y., Toriwaki, J.: Automated anatomical labeling of the bronchial branch and its application to the virtual bronchoscopy system. IEEE Trans. Medical Imaging 19 (2000) 103–114 10. Pal´ agyi, K., Kuba, A.: A parallel 3D 12-subiteration thinning algorithm. Graphical Models and Image Processing 61 (1999) 199–221 11. Pudney, C.: Distance-ordered homotopic thinning, A skeletonization algorithm for 3D digital images. Computer Vision and Image Understanding 72 (1998) 404–413 12. Saito, T., Toriwaki, J.: A sequential thinning algorithm for three dimensional digital pictures using the Euclidean distance transformation. In: Proc. 9th Scandinavian Conf. Image Analysis, SCIA’95 (1995) 507–516 13. Wan, S.Y., Kiraly, A.P., Ritman, E.L., Higgins, W.E.: Extraction of the hepatic vasculature in rats using 3-D Micro-CT images. IEEE Trans. Medical Imaging 19 (2000) 964–971
Multi-view Active Appearance Models: Application to X-Ray LV Angiography and Cardiac MRI C.R. Oost1, B.P.F. Lelieveldt1, M. Üzümcü1, H. Lamb2, J.H.C. Reiber1, and M. Sonka3 1
Division of Image Processing, Dept of Radiology, Leiden University Medical Center 3 Dept of Electrical and Computer Engineering, the University of Iowa {C.R.Oost, B.P.F.Lelieveldt}@lumc.nl 2
Abstract. This paper describes a Multi-View Active Appearance Model (AAM) for coherent segmentation of multiple cardiac views. Cootes’ AAM framework was adapted by considering shapes and intensities from multiple views, while eliminating trivial difference in object pose in different views. This way, the coherence in organ shape and intensities between different views is modeled, and utilized during image search. The method is validated in two substantially different and novel applications: segmentation of combined enddiastolic and end-systolic left ventricular X-ray angiograms, and simultaneous segmentation of a combination of four chamber, two chamber and short-axis cardiac MR views.
1 Introduction In cardiac imaging, typically multiple acquisitions are acquired within one patient examination following fixed imaging protocols, where images may depict different geometrical or functional features of the heart. For instance, in cardiac MR imaging, the short-axis, long-axis, perfusion, rest-stress and delayed enhancement images provide complementary information about different aspects of geometry and function of the same heart. Also, in bi-plane Left-Ventricular (LV) X-ray angiography, different views are acquired of the LV, which are the left anterior oblique 60 and right anterior oblique 30, showing the left ventricle from different projection angles. Different time frames from an angiographic or echocardiographic image sequence are other examples of such interrelated views. To quantify cardiac function and morphology from such image sets, a (preferably automatic) segmentation of the heart is required. However, typically, automatic segmentation methods focus on one subpart of a patient examination. Segmentation is achieved for one view at a time, and the different parts of a patient examination are treated separately. As a result, not all available information is used to achieve a segmentation result, since additional shape information of the same organ may be available from a different view. The goal of this work was to develop a segmentation method that exploits existing shape- and intensity redundancies and correlations between different parts of a patient examination. Potentially, this increases robustness,
C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 234–245, 2003. © Springer-Verlag Berlin Heidelberg 2003
Multi-view Active Appearance Models
235
and enforces segmentation consistency between views, therefore yielding a better segmentation. To realize this, we have developed the Multi-View Active Appearance Model (AAM): an extension of Cootes’ AAM framework [1-5] that captures the coherence and correlation between multiple parts of a patient examination. Model training and matching are performed on multiple 2D views simultaneously, combining information from all views to yield a segmentation result. To investigate the clinical potential, we validate the Multi-View AAM in two substantially different and largely unsolved segmentation problems: automatic definition of the LV contours in pairs of LV X-ray angiograms in ED and ES phase, and second, simultaneous LV contour detection in a combination of short-axis, four chamber and two chamber cardiac MR views. The remainder of the paper is structured as follows. Section 2 provides a brief background on AAMs and their application to medical image segmentation. In Section 3, the Multi-View AAM is described. Section 4 presents an experimental validation study on LV angiograms (70 patients) and cardiac MR images (29 patients). Section 5 concludes with a discussion.
2 Background An Active Appearance Model is a statistical model of object shape and texture. The construction of the AAM and the matching procedure are briefly introduced in this section. A detailed description can be found in [3]. 2.1 AAM Training An AAM is trained on a series of representative images, in which an expert manually segmented the object of interest. Contours are resampled in n corresponding points, and expressed as a vector of 2n elements: x = (x1 , y1 , x 2 , y 2 , x 3 , y 3 ,..., x n , y n )T
(1)
After Procrustes alignment of the shape vectors to eliminate trivial pose differences, a shape model is built by applying Principal Component Analysis (PCA) on the sample covariance matrix. Arranging the eigenvectors according to descending eigenvalues enables elimination of less significant eigenvectors. Similarly, a texture model is created by warping the training images onto the mean shape and creating a shape free patch, from which pixel intensity vectors g are extracted. Texture vectors are normalized to zero average and unit variance and PCA is performed on the sample covariance matrix, resulting in the statistical texture model. Using the shape and texture models, the sample shapes x and textures g can be approximated from the respective models:
x ≈ x + Ps bs and
g ≈ g + Pg bg
(2)
236
C.R. Oost et al.
where g and x represent the average texture and shape vectors, Pg and Ps the texture and shape eigenvector matrices, and bg and bs the texture and shape parameters characterizing each training sample. From the shape and texture models, an AAM is created by concatenating the shape and texture parameter vectors: Wbs WPsT (x − x ) = T b = P (g − g ) b g g
(3)
W denotes a weight factor coupling the shape and texture coefficients. After a final PCA over the set of appearance vectors b the resulting AAM can be written as b = Qc
(4)
in which Q is the matrix containing the eigenvectors and c denotes the appearance parameters for the combined model. Matching the model to an unseen image involves minimizing the difference between the model generated image and the target image, within the boundaries of statistically plausible model limits. To drive the model matching iterations, the parameter update steps are computed from the residual images δg 0 = g s − g m , where gs denotes the target image, and gm the model synthesized image. By applying known parameter perturbations on the model, pose and texture, gradient matrices Rc, Rp and Rt can be estimated for the model, pose and texture respectively. In our approach, we adopted Cootes’ direct gradient method [5]. 2.2 AAM Matching From the current estimate of the model parameters c0 and the parameter derivatives for the model, texture and pose parameters (matrices Rc, Rt & Rp respectively), Cootes describes an iterative matching algorithm, consisting of the following steps [2]: 1) Calculate the residual between target image and model patch δg 0 = g s − g m 2) Calculate the intensity error E 0 = δg 0
2
3) Using the pre-computed gradient matrices, determine the model parameter update δc = Rcδg 0 , pose update δp = R p δg 0 and texture update δt = Rt δg 0 , 4) Set k = 1 and determine a new estimate for the model parameters c1 = c 0 − kδc , pose parameters p1 = p 0 − kδp and texture parameters t1 = t 0 − kδt 5) Calculate a new model based on c1, p1 & t1, 6) Determine a new difference-vector and calculate error E1, 7) If E1 < E0, select c1, p1 & t1 as the new parameter vectors, else try k = 1.5, k = 0.5, k = 0.25, k = 0.125 etc and go to step 4. Repeat until convergence (either using a fixed number of iterations, or until no improvement is achieved).
Multi-view Active Appearance Models
237
2.3 Medical Applications of AAMs Since introduction, several successful medical applications of AAMs in medical image segmentation have been presented. Initially, Cootes has demonstrated the application of 2D AAMs on finding structures in brain MR images [2], and knee cartilage in MR images [3]. In 2D cardiac MR images, Mitchell et al. successfully applied AAMs to segment the left and right ventricle [6]. Thodberg[7] applies a 2D AAM to reconstruct bones in hand radiographs. Bosch et al. applied 2D + time AAM to segment endocardial borders in echocardiography [8], introducing a correction method to compensate for non-Gaussian intensity distributions in echocardiographic images. Beichel et al describe a semi-3D AAM extension applied to segmentation of the diaphragm dome in 3D CT data [9]. Mitchell et al. describe a full 3D AAM extension, and apply it to 3D cardiac MR data and 2D + time echocardiograms [10]. In many of the applications mentioned here, Active Appearance Models have been shown to outperform other segmentation approaches for two reasons: • They combine correlated intensity and shape knowledge, thus maximally integrating a-priori knowledge, resulting in highly robust performance, • They model the relationship between expert contours and underlying image data, and are therefore capable of reproducing expert contour drawing behavior.
3 Multi-view Active Appearance Models The Multi-View AAM presented here is designed to exploit the existing correlation between different views of the same object. It is derived from Cootes’ work on coupled view AAMs [4], where a frontal and a side view of a face are segmented simultaneously by building separate models for each view, and a combined model for both views. During model matching, segmentation is performed using single view models, however shape constraints are applied from a combined model. The approach presented here differs in that the organ shape is modeled simultaneously for all views from the start, contrary to only imposing model constraints from a combined model. The Multi-View model is constructed by aligning the training shapes for different views separately, and concatenating the aligned shape vectors xi for each of the N views. A shape vector for N frames is defined as:
(
T
T
x = x1 , x 2 ,K x N
)
T T
(5)
By applying a PCA on the sample covariance matrix of the combined shapes, a shape model is computed for all frames simultaneously. The principal model components represent shape variations, which are intrinsically coupled for all views. For the intensity model, the same applies: an image patch is warped on the average shape for view i and sampled into an intensity vector gi, the intensity vectors for each single frame are normalized to zero mean and unit variance, and concatenated:
(
g = g 1T , g 2T , K , g TN
)
T
(6)
238
C.R. Oost et al.
Analogous to the single frame AAM, a PCA is applied to the sample covariance matrices of the concatenated intensity sample vectors, and subsequently each training sample is expressed as a set of shape- and appearance coefficients. A combined model is computed from the combined shape-intensity sample vectors. In the combined model, the shape and appearance of both views are strongly interrelated, as is illustrated in Figure 1. -2σ
-σ
0
σ
2σ
Fig. 1. First mode of variation for a left ventricle Multi-View AAM, constructed from 70 EDES X-ray LV angiograms. Upper row = ED, lower row = ES. The correlation in shape between ED and ES is clearly visible. Also the texture variation, describing mainly the local contrast between the LV and its embedding around the mitral valve, shows clear similarities for ED and ES.
Estimation of the gradient matrices for computing parameter updates during image matching is performed by applying perturbations on the model, pose, and texture parameters, and measuring their effect on the residual images. Because of the correlations between views in the model, a disturbance in an individual model parameter yields residual images in all views simultaneously. The pose parameters however, are perturbed for each view separately: the model is trained to accommodate for trivial differences in object pose in each view, whereas the shape and intensity gradients are correlated for all views. In the matching procedure, the pose transformation for each view is also applied separately, whereas the model coefficients intrinsically influence multiple frames at once. Hence, the allowed shape and intensity deformations are coupled for all frames, whereas pose parameter vectors for each view are optimized independently. This is a significant difference as compared to Cootes’ coupled view AAMs, where separately trained 2D models are matched to each separate view, and subsequently only the appearance constraints are imposed from a combined appearance model [4].
4 Experimental Validation To determine the clinical utility of the Multi-View AAMs, we investigated two issues: • To what extent can information from different frames improve overall segmentation performance? To address this, we tested the Multi-View AAM on Xray LV angiography images in ED and ES. Though other segmentation methods
Multi-view Active Appearance Models
239
for LV angiograms have been reported [11,12], these images are notoriously difficult to segment, especially in ES. This is mainly due to the fact that in ES, much of the contrast agent is ejected, therefore border definition of the ventricle is rather poor. For this modality, we expect that the better LV shape definition in ED frames improves the segmentation of ES frames. • The potential of the Multi-View AAM to segment substantially different geometrical shapes in multiple views. To evaluate this, we selected a combination of cardiac MR short-axis and long-axis views. To our knowledge, this is the first report of an automatic contour detection for endo- and epicardial contours in longaxis cardiac MR views. 4.1 X-Ray LV Angiography The effectiveness of the Multi-View AAM was tested on ED-ES pairs of clinically representative LV angiograms from 70 infarct patients, 140 images in total. Apart from high quality images with good LV definition in both ED and ES, images were selected, in which frequently appearing acquisition artifacts were present (poor LV contrast, inhomogeneous distribution of the contrast agent, presence of a diaphragm overlapping the LV). Figure 2 shows representative examples. An expert manually defined contours in both frames, and point correspondence was defined based on three prominent landmarks: both aorta valve points and the apex. Every contour was equidistantly resampled to 60 points. 14 ‘leave five out’ models were trained on 65 out of 70 ED-ES image pairs, leaving out 5 sets for testing purposes. All models were constructed retaining 98% of available shape information and 95% of available intensity information, generally resulting in 27 modes of variation. To speed up the training and matching process and to reduce model dimensionality, all images were subsampled by a factor of 4.
Fig. 2. X-ray LV angiography example images for ED (upper row) and ES (lower row). From left to right: well defined LV, poor contrast, inhomogeneous distribution of the contrast agent (most apparent in ED) and presence of a diaphragm overlapping the LV.
240
C.R. Oost et al.
4.2 Cardiac MRI To assess the performance of the Multi-View AAM method for simultaneous segmentation of several different cardiac views with a different geometric definition, the method was evaluated on a commonly acquired combination of cardiac MR views. Usually, during acquisition of a routine cardiac MR patient exam, a two chamber view, a four chamber view and a short-axis stack are acquired following strictly defined acquisition protocols, allowing an optimal depiction of LV anatomy. Following this protocol, image data was acquired from 29 patients with various cardiac pathologies. The Multi-View AAM was constructed based on the ED two chamber view, the ED four chamber view and the ED mid-ventricular short-axis slice. Endo- and epicardial contours were drawn manually by an expert observer in all views, using the posterior LV-RV junction to define point correpondence. To maximize the amount of evaluation data, validation and training was performed using a leave-one-subject-out approach. The initial position for the model matching was manually set by indicating the apex and base in the long-axis views, and the LV midpoint in the short-axis views. 4.3 Evaluation Method Matching results for each patient study were first qualitatively scored to three categories: matching success for all views, failure in one view and failure in more than one view. Failures were reported and excluded from quantitative evaluation. On the successful matches, quantitative comparison with expert contours was performed on: • point-to-curve border positioning errors for the contours as compared to the manually defined expert contours, calculated separately for each view, • endocardial contour area for each frame separately, • for the LV angiography application, area ejection fraction. Linear regression was used to determine relationships between manually traced and computer determined values. A two-tailed paired samples t-test was applied to area measurements from automatic and manual contours to investigate systematic errors. A p-value smaller than 0.05 was considered significant.
4.4 Results For the LV angiographic study, the Multi-View AAM yielded borders that agreed closely with the expert defined outlines in both ED and ES in 56 out of 70 patients. In 10 cases, partial failure was observed, where the contour in one frame clearly failed. In 4 cases, neither ED nor ES contours were correctly detected. In total, 122 out of 140 images (87 %) were successfully segmented, whereas in the other 18 images, manual interaction was required. In general, for the successful matches, contours showed an excellent agreement with the manually defined contours, even in compelling images with artifacts such as LV
Multi-view Active Appearance Models
241
diaphragm overlap, and partial filling. In Figure 3, two representative examples of automatically detected contours are given. Border position errors were generally small, and are given in Table 1. Area and ejection fraction regressions are given in Figure 4. In both ED and ES phases, area errors showed a slight, but statistically significantly underestimation (p<0.001, relative error for ED: 3.5 %, for ES 9.4 %). The area ejection fraction was slightly overestimated (relative error 7%, p=0.003). ED
ED
ES
ES
Fig. 3. Two successful matches for ED and ES. The black dotted lines denote the manual contours, the white dotted lines represent the model contours. Note that even with inhomogeneous contrast agent distribution (ES image top, ED image below), contours are accurately determined.
The cardiac MR validation yielded 27 successful matches out of 29, and in 2 cases partial failure was observed, where the model drifted away from the LV boundaries in one of the three views. No total failures occurred. Examples of automatically detected contours in the cardiac MR views are given in Figure 6. For the contours from successful matches (87 out of 89 images in total, 98 %), area correlations between manually and automatically detected contours are given in Figure 5, and border positioning errors in Table 1. In a paired samples t-test, differences between manually and automatically determined endocardial contour areas were found statistically insignificant for all three views (p>0.7 for all views).
C.R. Oost et al.
ES Area
40 20 0 0
20
40
60
y = 0.84x + 1.24
50 30 20 10 0
80
0
y = 0.87x + 8.35
80
R2 = 0.85
40
R2 = 0.73
60 40 20 0
10 20 30 40 50
[cm2]
Manual
Area Ejection Fraction
Automatic
2
[cm ]
80 y = 0.92x + 1.72 R2 = 0.91 60
Automatic
Automatic
2
[cm ]
ED Area
[%]
242
0
[cm2]
Manual
20
40
60
Manual
80
[%]
Fig. 4. Area regression plots for ED (left) and ES (middle) and area ejection fraction (right).
3
3
R2 = 0.88
3
6y = 1.09x - 0.21 R2 = 0.93
2
4
2
1 0 0
1 Manual
2
3
4
[103 pixel]
Automatic
Automatic
2
y = 0.97x + 0.03
4
3
R2 = 0.89
[10 pixel]
y = 1.05x - 0.13
4
Two Chamber
Automatic
Short Axis [10 pixel]
3
[10 pixel]
Four Chamber
1 0 0
1 Manual
2
3
[103 pixel]
4
0 0
2 Manual
4
6
[103 pixel]
Fig. 5. Area regression plots for four-chamber (left), short-axis (middle) and two chamber (right) cardiac MR views.
Table 1. Point-to curve border positioning errors in pixels for the cardiac MR and LV angiography validation studies
MRI 2 CH MRI 4 CH MRI SA LV angio ED LV angio ES
Border positioning errors (pixels) 1.7 ± 0.8 1.5 ± 0.7 1.4 ± 0.7 6.5 ± 2.8 8.0 ± 3.7
5 Discussion and Conclusions In general, the presented Multi-View AAM yielded good results in two challenging clinical segmentation problems. Contours were detected with a minimal user interaction to initially position the model, and showed high agreement with manually defined contours. Especially in ES LV angiograms, segmentation results were very good compared to other segmentation methods reported for this modality [11,12].
Multi-view Active Appearance Models
243
This good performance in ES images can mainly be attributed to the coupling of information from both ED and ES.
Fig. 6. Automatically detected contours (white dotted lines) for two patients (top and bottom row) in a four-chamber (left), short-axis (middle) and two-chamber view
In LV angiography, a success rate of 87 % was achieved. Matching failure mainly occurred in cases where contrast was extremely low, when there was a significant overlap between the LV and the diaphragm or in cases of large dilated areas near the apex, as is illustrated in Figure 7. Comparison between manually and automatically derived area measurements showed a good correlation, though a slight underestimation of LV area in both ED and ES was present. This underestimation is mainly caused by the lack of dynamic information: a manual observer draws the contours in ED and ES after reviewing the whole dynamic sequence, whereas computer borders are only based on ED and ES views. When manually examining an entire image run, this motion is used to decide on the border location of the ventricle, especially in “problem areas”; therefore the manual borders are generally drawn slightly wider around the ventricle than visually apparent in only ED and ES. Also, since interpretation and contour drawing in LV angiograms is highly subjective, an assessment of intra- and inter-observer variation inherent to manual contour drawing is ongoing, to compare the accuracy and reproducibility of the automated method for different experts. The cardiac MR study showed a significantly higher success rate than the LV angiography study: in 98 % of the images, a successful match was achieved. This can mainly be attributed to the better definition of the ventricle in cardiac MR views. Though acquisition related artifacts were present in some patient studies (surface coil intensity gradients), overall LV endo- and epicardial contour definition is significantly
244
C.R. Oost et al.
stronger in the cardiac MR study. Area calculations, which serve as a basis for LV volume estimates, did not differ statistically significantly between manual and automatic analysis. Also for this application, border positioning errors were small (comparable to errors reported in [6]), and well within clinically acceptable margins.
Fig. 7. Examples of segmentation failures for ED (upper row) and ES (lower row), due to poor contrast (left), overlap between LV and diaphragm (middle) and large dilated areas near the apex (right). The black dotted lines denote the manual contours, the white dotted lines represent the model contours.
In this study we have tested the Multi-View AAM robustness and performance from a manually set initial position, yielding good results. However, we foresee a further increase in robustness by also coupling the scale of the object in all views, since this is correlated as well between views. This is a topic of current research. Moreover, future research will focus on analysis of Multi-View AAM shape parameters to distinguish between pathologies. We expect the coupling of shape information from different parts of a patient examination to enhance pathology identification. For cardiac MR, methods to automatically position the initial models based on a geometrical thorax template model [13] will be investigated. In summary, we conclude that the Multi-View AAM presented here combines a high robustness with clinically acceptable accuracy. It demonstrated good automatic segmentation results for two substantially different and novel clinical applications. A cardiac MR case study showed the utility to simultaneously segment different geometrical shapes, and a case study on LV X-ray angiography proving that poor ventricle definition in one view (ES) can be resolved by information from a corresponding ED view.
Multi-view Active Appearance Models
245
Acknowledgements. Financial support by the Dutch Technology Foundation STW (Project LGN 4508) is gratefully acknowledged. B.P.F. Lelieveldt is supported by an Innovational Research Incentive 2001 grant from the Netherlands Organization for Scientific Research.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
T.F. Cootes, G.J. Edwards and C.J. Taylor, Active Appearance Models, Proc. ECCV Vol. 2: 484–98, 1998. T. F. Cootes, C. Beeston, G. J. Edwards, and C. J. Taylor, A Unified Framework for Atlas Matching Using Active Appearance Models, LNCS Vol. 1613, pp 322–333, 1999 T.F. Cootes and C.J. Taylor, Statistical Models of Appearance for Computer Vision,: http://www.wiau.man.ac.uk/~bim/Models/app_model.ps.gz, 2001. T.F. Cootes, G.V. Wheeler, K.N. Walker, C.J. Taylor, View-based active appearance models, Image and vision computing Vol. 20, pp 657–664, 2002. T.F.Cootes, P.Kittipanya-ngam, Comparing Variations on the Active Appearance Model Algorithm, Proc.BMVC, Vol.2, pp 837–846, 2002. S.C. Mitchell, B.P.F. Lelieveldt, R.J. van der Geest, H.G. Bosch, J.H.C. Reiber and M. Sonka, Multistage Hybrid Active Appearance Model Matching: Segmentation of Left and Right Ventricles in Cardiac MR Images, IEEE TMI, Vol. 20(5): 415–23, 2001. H.H. Thodberg, Hands on Experience with Active Appearance Models, Proc SPIE Medical Imaging, Vol. 4684, pp 495–506, 2002. J.G. Bosch, S.C. Mitchell, B.P.F. Lelieveldt, F. Nijland, O. Kamp, M. Sonka and J.H.C. Reiber, Automatic Segmentation of Echocardiographic Sequences by Active Appearance Motion Models, IEEE TMI Vol. 21(11), pp 1374–1383, 2002. R. Beichel, G. Gotschuli, E. Sorantin, F. Leberl, M. Sonka, Diaphragm Dome Surface Segmentation in CT Data Sets: A 3D Active Appearance Model Approach, Proc SPIE Medical Imaging, Vol. 4684, pp 475–484, 2002. S.C. Mitchell, J.G. Bosch, B.P.F. Lelieveldt, R.J. van der Geest, J.H.C. Reiber, M. Sonka, 3D Active Appearance Models: Segmentation of Cardiac MR and Ultrasound images, IEEE TMI, Vol. 21(9), pp 1167–1178, 2002. S. Tehrani, T.E. Weymouth, G.B.J. Mancini, Model Generation and Partial Matching of Left Ventricle Boundaries, Proc. SPIE, Vol. 1445, pp 434–45, 1991. P. Lilly, J. Jenkins, P. Bourdillon, Automatic Contour Definition on Left Ventriculograms by Image Evidence and a Multiple Template-Based Model, IEEE TMI, Vol. 8(2): 173–85, 1989. B. P. F. Lelieveldt, M. Sonka, L. Bolinger, T. D. Scholz, H. W. M. Kayser, R. J. van der Geest, and J. H. C. Reiber, “Anatomical Modeling with Fuzzy Implicit Surface Templates: Application to Automated Localization of the Heart and Lungs in Thoracic MR Volumes.” CVIU, Vol. 80, pp1–20, 2000.
Tunnelling Descent: A New Algorithm for Active Contour Segmentation of Ultrasound Images Zhong Tao1, C. Carl Jaffe 2, and Hemant D. Tagare1,3 1 3
Dept. of Electrical Engineering, Yale University, New Haven CT 06520. 2 Dept. of Internal Medicine, Yale University, New Haven CT 06520. Dept. of Diagnostic Radiology, Yale University, New Haven CT 06520.
Abstract. The presence of speckle in ultrasound images makes it hard to segment them using active contours. Speckle causes the energy function of the active contours to have many local minima, and the gradient descent procedure used for evolving the contour gets trapped in these minima. A new algorithm, called tunnelling descent, is proposed in this paper for evolving active contours. Tunnelling descent can jump out of many of the local minima that gradient descent gets trapped in. Experimental results with 70 short axis cardiac ultrasound images show that tunnelling descent has no trouble finding the blood-tissue boundary (the endocardium). This holds irrespective of whether tunnelling descent is initialized in blood or tissue.
1
Introduction
Ultrasound images are ubiquitous in cardiology, and their automated segmentation is very desirable, especially the segmentation of the endocardium (the blood-tissue boundary of the left ventricle). It is generally accepted that active contours perform poorly at this task. The problem is that active contours evolve by gradient descent of an energy function and get trapped in local minima. This behavior is exacerbated in ultrasound images because ultrasound signal contains speckle. Speckle causes the energy function to have many local minima. If gradient descent is initialized far from the global minimum, it gets trapped immediately, giving a grossly incorrect answer. In this paper, we suggest an alternative to pure gradient descent. For maximumlikelihood active contour segmentations, we propose a new minimization strategy called tunnelling descent. Tunnelling descent can tunnel out of most spurious minima and keep the energy minimization going. In practice, tunnelling descent reliably finds the endocardium without manual intervention or parameter tweaking. This happens even when the active contour is initialized far from the final answer. 1.1
Escaping from Local Minima
Two observations help us design tunnelling descent: C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 246−257, 2003. Springer-Verlag Berlin Heidelberg 2003
Tunnelling Descent: A New Algorithm for Active Contour Segmentation
247
1. Because the probability density of speckle has a significant tail, there are pixels within a region - say blood - whose gray levels appear to come from a different region - tissue. 2. Gradient descent evolves the contour by only using image information that is local to it (see fig. 1). Hence, when the contour meets pixels described above, it gets trapped. Of course, the problem is that gradient descent uses only local information. Clearly the solution to detecting and escaping local minima is to look in a larger neighborhood of the contour and check whether the contour is truly at the boundary or is trapped inside a region. If we extend a region Ω outside the contour (fig. 1), then by using the gray levels in Ω it should be possible to decide whether or not Ω contains blood or tissue, i.e. whether the contour is at a local minimum or at the true boundary. 1.2
CUSUM and RSPRT
Because speckle is random, the decision using gray levels in Ω may not be perfect. It is subject to error. The error rate of a decision depends on the sample size and can be lowered only by adding more samples, i.e, by making Ω larger. On the other hand, there is not much point in using an arbitrarily large Ω since a large region might extend well beyond the true boundary giving rise to other errors. The trick is to adopt a strategy where, depending on the data, Ω is increased only to the degree that is necessary for making the decision with a specified error rate. There is a well-known statistical decision procedure called the sequential probability ratio test (SPRT) that does exactly this. Adding a modified SPRT to gradient descent gives tunnelling descent. Tunnelling descent can tunnel out of most local minima. It is based on CUSUM and RSPRT, which are one-dimensional change detection algorithms used in operations research [1]. We have implemented an active contour algorithm which evolves using tunnelling descent and processed 70 ultrasound cardiac images. Tunnelling descent reliably finds the endocardium when initialized in blood. What is even more convincing is that tunnelling descent works when initialized in the myocardial tissue. The energy function of an active contour is far more jagged when it is initialized in tissue. The results of myocardium initialization truly show the ability of tunnelling descent to escape from spurious local minima. In this paper we report experiments with a classical implementation of a 2D active contour as a curve with controllable knot points. Several extensions are possible, including a level set version, and the addition of shape priors. These will be forthcoming. The extension to 3D appears to be straightforward and is also being investigated (although it is available, 3D ultrasound is not yet common). Most of the segmentations in this paper are 2D. A terse mathematical description of tunnelling descent may not shed light on the key ideas behind it. In this paper, we choose to provide a more explanatory exposition, instead of providing the entire theory and equations. We hope that the reader will indulge us.
248
Z. Tao, C.C. Jaffe, and H.D. Tagare
Finally, although our exposition is in terms of segmenting ultrasound images, the algorithm is quite general and can be used to segment other texture images.
2 2.1
Background Speckle and Ultrasound Segmentation
Ultrasound images are formed by recording the reflections of a high frequency sound wave propagating through an object. Inhomogeneities in the propagating medium cause backscatter, and this backscatter appears as speckle in the image. Speckle manifests as a gray level random process in the image whose variance is roughly proportional to the mean [2]. Theoretical models for speckle statistics have been proposed [3], but their use in segmentation is limited. Simple empirical models have also been proposed. In particular, we follow the recommendation in [2] to use the Gamma distribution to model speckle. We assume that gray levels within blood or tissue are i.i.d. random variables with the Gamma distribution: p(I | α, β) =
xα−1 −x eβ , Γ (α)β α
(1)
where, α is the shape parameter, β is the scale parameter, and Γ is the gamma function. The empirical study [2] suggests that the shape parameter α can be set to 3.0 for blood and 6.5 for tissue, while the scale parameter needs to be estimated from the data. Given an area Ω within which gray levels follow the Gamma distribution, the maximum likelihood estimate of the scale parameter is given by the mean value of the gray levels divided by the shape parameter α: % 1 βˆ = I(x, y) dA. (2) α|Ω| Ω Speckle and Segmentation. In one approach, ultrasound images are preprocessed to reduce or eliminate speckle before segmentation. The image may be pre-processed using Markov Random Fields [5], motion-adaptive filters [6], multiscale wavelet analysis [7], and spatiotemporal analysis [8]. Direct segmentation (i.e. without speckle reduction) has also been attempted via Active Appearance-Motion Models [9] (AAMM), by using texture and gray level features [10], spatio-temporal processing [11], and combining motion and active contour information [12]. Our approach to segmentation is a maximum-likelihood approach to active contours. 2.2
Maximum-Likelihood Active Contours
Let O ∈ R2 be the rectangular region in the plane which contains the image I. Suppose that O contains two distinct regions separated by a curve. Further
Tunnelling Descent: A New Algorithm for Active Contour Segmentation
249
assume that pixel gray levels of the regions inside and outside the curve are random variables (all pixels inside are i.i.d. as are all pixels outside ) whose probability distribution is given by p0 (u) and p1 (u) resp. We are given the image I, and have to estimate the boundary. The log-likelihood that any curve C is the boundary is % % L(C) = log p0 (I(x, y))dA + log p1 (I(x, y))dA, C
O−C
where the first integral is over the inside of C and the second over the outside of C. The maximum-likelihood estimate of the boundary is the curve that maximizes L(C). It is easy to show that L(C) = −E(C)+const., where, % p1 (I(x, y)) dA, (3) E(C) = log p0 (I(x, y)) C is the “energy” of the curve C and “const.” is a constant that is independent of C. Thus the maximum-likelihood estimate of C is the curve that minimizes E(C). The usual technique of minimizing the energy is to initialize the curve C in the image and evolve it via gradient descent: ∂C = −λ δE(C), ∂t
(4)
where δE(C) is the first variation of E(C). This evolving curve becomes stationary when δE(C) = 0, i.e. when the curve is at a local minimum of E(C). 2.3
The Sequential Probability Ratio Test (SPRT)
The SPRT is a statistical test for deciding between two hypotheses, say H 0 and H1 [13]. Given a sequence of i.i.d. observations x1 , . . . , xn , which come from the distribution p0 under H0 or the distribution p1 from H1 , the test decides to either accept one of the hypotheses or to gather another data point (xn+1 ). The SPRT decision is to if L(x1 , . . .) ≤ B, accept H0 obtain the next observation xn+1 and repeat this test if B < L(x1 , . . .) < A, if L(x1 , . . .) ≥ A, accept H1 where, L(x1 , . . .) = log p1 (x1 , . . .)/p0 (x1 , . . .) =
& i
log p1 (xi )/p0 (xi ) =
&
si
(5)
i
is the log-likelihood ratio of all observations made so far, and A and B are thresholds, usually set to positive and negative numbers respectively. The SPRT has a number of interesting properties:
250
Z. Tao, C.C. Jaffe, and H.D. Tagare
1. The errors rates (false positives and false negatives) can be independently set for an SPRT by adjusting A and B. 2. SPRT requires the minimum number of observations to achieve the specified error rates [13]. 3. The SPRT terminates almost surely, i.e. it does not suspend decision and gather data for ever. An intuitive explanation of the last property will help further our understanding of tunnelling descent. In equation (5), si are i.i.d random variables. If the data come from H1 then si has a positive mean, else if the data comes from H0 , si has a negative mean. Thus as more data is considered the log likelihood ratio forms a random walk with constant drift, the drift being positive under H1 and negative under H0 (see fig. 2). Such a process crosses the limits A and B almost surely for any finite A and B. For theoretical details please consult [1].
3
The Idea behind Tunnelling Descent
We can now explain the idea behind tunnelling descent. To keep the explanation simple we will discuss a one-dimensional segmentation example. Figure 4 shows a scan line from a short axis ultrasound image and figure 5 shows the gray levels I(n) as a function of the location n along the line. We wish to find the blood tissue boundary (which is marked in the figure) in this one-dimensional “image.” For the sake of discussion assume that p0 (u) and p1 (u), the probability densities of gray levels of blood and tissue, are known. Then, the maximum likelihood'estimate of the location of the boundary is that n which minimizes n E(n) = i=1 log p1 (I(i))/p0 (I(i)). Figure 6 shows E(n) for the gray levels in figure 5. The jagged nature of the energy function is very apparent. Suppose we initialize gradient descent at n = 1. It will be trapped at the local minimum at n0 = 7. If this minimum was the global minimum then the gray levels to its right would come from tissue. To test this, we can conduct an SPRT with H0 saying that the data to the right comes from tissue, and H1 saying the data comes from blood, and A = 20, B = 0. Note, by comparing the form of E(n) to the log-likelihood of equation (5), that the log-likelihood function of the SPRT with k observations to the right of n0 is simply E(n0 + k) − E(n0 ), so that the SPRT decides H0 if for some k > 0, E(n0 + k) − E(n0 ) < B or it decides H1 if for some k > 0, E(n0 + k) − E(n0 ) > A, whichever condition occurs first. Since n0 is in the blood, by the argument given in the preceding section, the log-likelihood E(n0 + k) − E(n0 ) is a random walk with a negative drift and E(n0 + k) − E(n0 ) < B is obtained at n1 = n0 + k = 15. At this point we know two things: [1] the region from n0 to n1 is blood, not tissue, and [2] E(n1 ) = E(n0 + k) < E(n0 ) (recall that B = 0). So we may resume gradient descent from n1 . Clearly, the SPRT has helped gradient descent “tunnel” out of the local minima at n0 . By repeating this procedure at other local minima in blood, we can tunnel out of all of them. Now suppose that gradient stops at the global minima which occurs at n = ng = 37. Because the gray levels at the right now really do come from tissue,
Tunnelling Descent: A New Algorithm for Active Contour Segmentation
251
E(n) drifts upwards and crosses A at n = 44. Thus, the SPRT at ng informs us that the data at the right comes from tissue and we may stop here and take n g to be a good estimate of the boundary. There is one point that we have not clearly addressed. How were the values of A and B decided? The value of A can be simply decided by the error rate we need for detecting tissue. Deciding the value of B is more involved. If B is significantly negative then it is quite possible that the SPRT at a local minimum (which lies close to the global minimum) might erroneously decide the there is tissue on the right hand side. To avoid this, it can be shown that the appropriate value for B is 0 [1]. Also B = 0 lets us restart gradient descent as explained above. Thus, the algorithm is: 1. Initialize gradient descent till a local minimum is found at n∗ 2. From n∗ , run SPRT with a preset value of A and with B = 0. If after processing some samples, the SPRT decides that the data to the right is tissue, terminate. Else if it decides after sampling k points that the data to the right is blood, re-initialize gradient descent at n∗ + k. Go to step 1. The above algorithm is identical to the repeated SPRT and the CUSUM algorithms of operations research [1]. The description given above is a slightly non-standard interpretation in terms of gradient descent. We prefer this interpretation because it suggests an obvious extension to active contours. We hasten to add that the algorithm given above does not always give a global minimum because the SPRT does have a finite error rate. It may be wrong when it decides that there is tissue to the right and terminates. However, this error rate can be made very small by choosing an appropriate threshold A, and in practice the algorithm can reject most spurious local minima.
4
SPRT in 2D
We can now extend these ideas to active contours. The first task is to extend the SPRT to two-dimensions. Recall that the idea of SPRT is to monotonically increase the number of samples being used at a fixed rate until a reliable decision can be made. In the 1-D case this simply means that we add one data sample to the SPRT at a time and the number of samples being considered grows monotonically to the right. In 2d, it is more complicated. Suppose we want to test the region outside a curve C using SPRT. Then we need to generate a sequence of curve C1 , C2 , . . . (see fig. 3) such that 1. The sequence is monotonic. That is, each curve Cn+1 contains Cn , with C1 containing C, and 2. The areas of C, C1 , C2 , . . . grow by a constant increment, i.e. | Ci |=| C | +i%, for some % > 0. This condition ensures us that a fixed number of pixels are added at every step. However, in contrast to the 1-D case, this sequence is no longer unique. Consider C1 : there are many curves which contain C and which have the area
252
Z. Tao, C.C. Jaffe, and H.D. Tagare
| C | +% and it is not clear which of these curves we should choose as C1 . The same problem arises for C2 , C3 , . . .. To resolve this question, recall that the SPRT detects a spurious local minima at C if for some k > 0, E(Ck ) < E(C). Since we are interested in detecting as many spurious local minima as possible, it stands to reason that we choose that C1 which has the smallest energy amongst all possible candidate curves (so that E(C1 ) < E(C) has a greater chance of occurring). Having chosen such a C1 , we can choose C2 in exactly the same way - it is the curve with the smallest energy E(C2 ) among all curves of area | C | +2% containing C1 , and so on for C3 , C4 , . . .. Let us call this sequence the SPRT curve sequence from C. The 2D SPRT is the following. Given a curve C, the SPRT decision about the region outside of C is: if E(Cn ) − E (C) ≤ B, accept H0 obtain the next observation Cn+1 and repeat this test if B < E (Cn ) − E (C) < A, , if E(Cn ) − E (C) ≥ A, accept H1 where C, C1 , C2 , . . . is an SPRT curve sequence from C.
5
Tunnelling Descent for Active Contours
Tunnelling descent for active contours is simply a modified gradient descent algorithm that uses the 2d SPRT to reject spurious local minima: Tunnelling Descent: 1. Choose a % > 0. Initialize the contour at C in blood or tissue. 2. Proceed by gradient descent until a local minimum occurs, say at C0 . 3. Create the SPRT sequence C1 , C2 , . . . , Ck , . . . from C0 and carry out the SPRT. If at some k, the SPRT decides on H1 , terminate the algorithm with C0 as the estimate of the boundary. If the SPRT decides on H0 , set C0 to Ck and go to step 2. The SPRT sequence C0 , C1 , . . . can be created as follows. For n = 0, 1, . . ., dilate the curve Cn to obtain a curve Dn+1 of area | C0 | +(n + 1)%. Starting from Dn+1 conduct a gradient descent to minimize the energy E subject to the area of the curve being constant. The minimizer is Cn+1 . In practice, it is sufficient to use only a local minimum. 5.1
Tunnelling Descent with Parameter Estimates
So far we assumed that the distributions p0 and p1 were completely known. Of course, in reality they are usually known up to a few parameters. Let θ0 and θ1 be the parameters of p0 and p1 , so that the probability density that a gray level u belongs to them can be written as p0 (u | θ0 ) and p1 (u | θ1 ). At any stage of the contour evolution, the parameters θo and θ1 can be estimated by a maximum-likelihood procedure by using gray levels inside and outside the
Tunnelling Descent: A New Algorithm for Active Contour Segmentation
253
contour respectively. And the estimated parameters can be substituted in the SPRT. Thus, tunnelling descent with parameter estimates is: Tunnelling Descent with estimates: 1. Choose a % > 0. Initialize the contour at C in blood or tissue. 2. Proceed by gradient descent until a local minimum occurs, say at C0 . At each step of gradient descent calculate the maximum likelihood estimate of the parameters θ0 and θ1 using gray levels from inside and outside the contour. 3. Use the estimated parameters to create the SPRT sequence C1 , C2 , . . . , Ck , . . . from C0 and carry out the SPRT. If at some k, the SPRT decides on H1 , terminate the algorithm. C0 is the estimate of the boundary. If the SPRT decides on H0 , set C0 to Ck and go to step 2. 5.2
Shrinking Tunnelling Descent
Finally, another modification is also possible. So far we assume the contour was initialized within a region and it expanded to find the boundary. That is why the region Ω was taken to be outside C. However, there are no obstructions to creating an algorithm that shrinks the contour. We assume that the contour is initialized outside the boundary and that it shrinks towards the boundary. Two modifications are necessary for this. The SPRT curve sequence C, C1 , . . . now has to be a monotonically shrinking sequence of curves, and p0 is taken to be the gray levels outside of C and p1 taken to inside. Other than these modifications shrinking tunnelling descent is identical to expanding tunnelling descent as described above.
6
Experiments
We have successfully segmented the endocardium in 70 ultrasound images with tunnelling descent. These images were collected from the Acuson Sequoia C256 imaging system. Of the 70 images, the active contour was initialized with blood and propagated outwards in 40, and the active contour was initialized within myocardial tissue and propagated inwards in 30. 6.1
Implementation of the Algorithm
In the experiments, we added an internal energy term to the energy function to regularize the curve C: %
p1 (I(x, y)) E(C) = log dA + λ p0 (I(x, y)) C
$ C
ds.
(6)
The probability distributions p1 and p0 were set to tissue and blood distributions respectively for the outward growing active contour and to blood and tissue
254
Z. Tao, C.C. Jaffe, and H.D. Tagare
respectively for the inward growing curve. The gamma distribution of equation (1) was used to model blood and tissue. As mentioned before, the shape parameter α was set to 3.0 for blood and 6.5 for tissue. The scale parameter β was estimated from the data according to equation (2) with the region Ω in equation (2) set as follows: For blood all pixels inside the contour within a distance of D b were taken to be Ω, and for tissue all pixels within a distance of Dt were take to be Ω. The active contour was implemented as having N knot points connected by straight lines. The contours were evolved with tunnelling descent using the knot points. Additional knots were introduced during evolution so that the distance between any consecutive knot points never exceeded 20 pixels. The numerical values of all constants used in the experiment are given in Table 1. All outward propagating contours had the same constants, as did all inward propagating contours. Table 1. Values of constants used in experiments (N =num. of vertices) λ Db Dt B A Outward propagation 0.6/N 5 20 0 5000 Inward propagation 0.05 10 15 0 13000
6.2
Results
We have extensively tested tunnelling descent, but lack of space prevents us from presenting the results in detail. Figures 7 and 8 show two examples of the active contour initialized in blood. The initializations are the diamond shaped contours within the blood pool. The figures also show the final curves found by tunnelling descent. Clearly, tunnelling descent found the endocardium. Figure 9 shows the value of the energy function as the contour evolved for figure 7. There are 5 local minima marked by the vertical lines in figure 9. Tunnelling descent escape through all of them. Figures 10-11 show the contour initialized in myocardial tissue and evolving by the shrinking tunnelling descent. Again endocardium is easily found. Figure 12 shows the energy function of the contour in figure 10 as it evolved. Nine local minima are observable. Note that the energy function is more jagged than in figure 9. The ability of tunnelling descent to escape from local minima is clear in this figure. Figures 13-14 show robustness of tunnelling descent to initialization. The figures contain the same image, but the active contour is initialized at different location. Figure 15 shows two active contours, one initialized in blood and one in tissue. They both converge to the endocardium (the difference in the converged contours is due to the coarseness of the knot points of the contours).
Tunnelling Descent: A New Algorithm for Active Contour Segmentation
7
255
Conclusion and Future Work
We proposed a new algorithm - tunnelling descent - for evolving active contours that seek a maximum-likelihood estimate of the boundary. This algorithm is quite capable of rejecting spurious local minima and reliably finds the endocardium in ultrasound images. The capacity to reject spurious minima is evident in experiments - especially those in which the contour is initialized in tissue. We are not aware of any reported active contour algorithm that can find the endocardium when initialized in the tissue. Although the development of the algorithm is reported here in the context of ultrasound segmentation, it should be clear from the presentation that the algorithm is quite general only requiring the specification of probability distributions of gray levels. It may be used in any situation where such a distribution is available (or can be estimated). Much remains to be done. A level set formulation of the algorithm would be very useful especially a 3D version. The addition of prior anatomical information would also be useful for processing those ultrasound images that have significant signal drop out. These extensions are being developed.
Acknowledgements. This research was supported by the grant R01-LM06911 from the National Library of Medicine.
References 1. Basseville, M., Nikiforov, I. V., Detection of Abrupt Changes, Theory and application. P T R Prentice Hall, 1993. 2. Tao, Z., Beaty, J., Jaffe, C. C., Tagare, H. D., Gray level models for segmentating myocardium and blood in cardiac ultrasound images. Proceedings ISBI2002. 3. Wagner. R. F., Insana, M. F., Brown, D. G., Statistical properties of radiofrequency and envelope-detected signals with applications to medical ultrasound. J. Opt. Soc. Am. A 4(5), (1987). 4. Wagner. R. F., Smith, S. W., Sandrick, J. M., Lopez, H.. Statistics of speckles in ultrasound B-scans. IEEE. Trans. Son. Ultra. 30 p.156-163, (1983). 5. Xiao, G., Brady, M., Noble, J., Zhang, Y.: Segmentation of Ultrasound B-Mode images with intensity inhomogeneity correction. IEEE Trans. Medical Imaging 21(1) (2002) p.48-57. 6. Evans, A., Nixon, M.: Biased motion-adaptive temporal filtering for speckle reduction in echocardiography. IEEE Trans. Medical Imaging 15(1) (1996) p.39-50. 7. Zong, X., Laine, A., Geiser, E.: Speckle reduction and contrast enhancement of echocardiograms via multiscale nonlinear processing. IEEE Trans. Medical Imaging 17(4) (1998) p.532-540. 8. Angelini, E. D., Laine, A. F., Takuma, S., Holmes, J. W., Homma, S.: LV volume quantification via spatio temporal analysis of real-time 3-D echocardiography. IEEE Trans. Medical Imaging 20(2001) p.457-469.
256
Z. Tao, C.C. Jaffe, and H.D. Tagare
9. Bosch, J., Mitchell, S., Lelieveldt, B., Nijland, F., Kamp, O., Sonka, M., Reober, J.: Fully automated endocardial contour detection in time sequences of echocardiograms by active appearance motion models. Computers in Cardiology 2001 , 2001 p. 93 -96 10. Xiaohui Hao; Bruce, C.J.; Pislaru, C.; Greenleaf, J.F.; Segmenting high-frequency intracardiac ultrasound images of myocardium into infarcted, ischemic, and normal regions. IEEE Tran. Medical Imaging 20 (12) (2001) p.1371-1383. 11. Mulet-Parada, M.; Noble, J.A.; Intensity-invariant 2D+T acoustic boundary detection. Proceedings, Workshop on Biomedical Image Analysis, Jun 1998 p.133 -142. 12. Mikic, I.; Krucinski, S.; Thomas, J.D.; Segmentation and tracking in echocardiographic sequences: active contours guided by optical flow estimates. IEEE Trans. Medical Imaging 17(2) (1998) p. 274-284. 13. Wald, A., Sequential Analysis, Dover Publications.
Path under H1
Log-likelihood
True boundary
Current Contour Local neighborhood The region W
C3
A
C2
n: Number of samples B
C1
Path under H0
C
Fig. 1. Jumping out of a local min- Fig. 2. The drift of log- Fig. 3. The sequence of likelihood curves for 2D SPRT imum 60
180
40
160
20
140
0
120
−20
100
−40 80
−60
60
−80
40
−100
20
0
Fig. 4. Cardiac Ultrasound Image
0
10
20
30
40
50
60
70
80
90
Fig. 5. The Gray Levels
−120
0
10
20
30
40
50
60
70
80
90
Fig. 6. The Energy Function
Tunnelling Descent: A New Algorithm for Active Contour Segmentation
0534714 3b.jpgstep=7thd=1000N=29 0
257
1098705 3d.jpgstep=7thd=1000N=34coeff=.05 0
053471403b.jpgstep=7thd=1000N=30
0
−5000
−10000
−15000
0
10
20
30
40
50
60
70
80
90
Fig. 7. Initial Contour in Fig. 8. Initial Contour in Fig. 9. Energy Function Blood 1 Blood 2
053471403b.jpgstep=7thd=600N=32coeff=.05
109870503d.jpgstep=5thd=800N=44coeff=.06 14000
12000
10000
8000
6000
4000
2000
0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95100105110115120125130135140145150155160
Fig. 10. Initial Contour in Fig. 11. Initial Contour in Fig. 12. Energy Function Tissue 1 Tissue 2
053471403b.jpgstep=7thd=1000N=29
053471403b.jpgstep=7thd=1000N=30
053471403b.jpgstep=7thd=600N=32coeff=.12
Fig. 13. Initialized to the Fig. 14. Initialized at the Fig. 15. Blood and Tissue Right Bottom Initializations
Improving Appearance Model Matching Using Local Image Structure I.M. Scott, T.F. Cootes, and C.J. Taylor Imaging Science and Biomedical Engineering, University of Manchester, Manchester, M13 9PT, UK. [email protected] Abstract. We show how non-linear representations of local image structure can be used to improve the performance of model matching algorithms in medical image analysis tasks. Rather than represent the image structure using intensity values or gradients, we use measures that indicate the reliability of a set of local image feature detector outputs. These features are image edges, corners, and gradients. Feature detector outputs in flat, noisy regions tend to be ignored whereas those near strong structure are favoured. We demonstrate that combinations of these features give more accurate and reliable matching between models and new images than modelling image intensity alone. We also show that the approach is robust to non-linear changes in contrast, such as those found in multi-modal imaging.
1
Introduction
This paper builds on Cootes’s et al.[2, 5] work on constructing statistical appearance models and matching them to new images using the Active Appearance Model (AAM) search algorithm. When building models of the appearance of objects it is advantageous to choose a representation of the image structure which can capture the features of interest in a way that allows a reliable comparison between model and image, and is invariant to the sorts of global transformation that may occur. For instance, when building statistical appearance models[2, 18] it is common to represent the image texture by a vector of intensity values sampled from the image, normalised by a linear transform so as to be invariant to global changes in brightness and contrast. By sampling across the whole region, all image structures are represented and all pixels treated equally (though the statistical analysis will then typically favour pixels in some regions over others, as dictated by the data.) However, models based on raw intensity tend to be sensitive to changes in conditions such as imaging parameters or biological variability. Thus models built on one data set may not perform well on images taken under different conditions. Also, intensity models do not explicitly distinguish between areas of noisy flat texture and real structure, and thus may not lead to accurate fitting in AAM search. Edge based representations tend to be less sensitive to imaging conditions than raw intensity measures. Thus an obvious alternative to modelling the intensity values directly is to record the local image gradient in each direction C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 258−269, 2003. Springer-Verlag Berlin Heidelberg 2003
Improving Appearance Model Matching Using Local Image Structure
259
at each pixel. Although this yields more information at each pixel, and at first glance might seem to favour edge regions over flatter regions, it is only a linear transformation of the original intensity data. Where model building involves applying a linear Principal Component Analysis (PCA) to the samples, the resulting model will be almost identical to one built from raw intensities, apart from some effects around the border where computing the gradients includes some background information into the model. In this paper we present a new representation. Rather than just recording the intensities at each pixel, we record a local structure tuple. It is useful to think about the rest of this work as using texture preprocessors which take an input image, and produce an image of tuples representing various aspects of local structure. This local structure tuple can include such things as edge orientation, corner strength, etc. When sampling the image to produce a texture vector for a model, instead of sampling n image intensity values from the original image, we sample all the values from each m-tuple at n sample locations, to produce a texture vector of length nm. The local structure descriptors that we have used are gradient orientation (which was first discussed in a previous paper[4],) corner and edge strength. We demonstrate that using all of these measures in the texture preprocessor gives improved AAM matching accuracy and reliability when compared to intensity texture AAMs alone. We demonstrate that these improvements are statistically significant. We also show that the new approach can deal with images subject to strong non-linear changes in contrast, as found in multi-modal imaging.
2
Background
Eigen-faces[18] model the statistics of the intensities in a region of an image, and have been widely used for object location and recognition. Moghaddam and Pentland generalised this to include models of smoothed canny edge images[14]. In the image registration community, edge maps are widely used[17]. However, they tend to use either linearly normalised gradients or squared gradients, or nonmaximally suppressed edges (all pixels other than those thought to be exactly on the edge are set to zero). Edge orientation images have been used for face recognition by Hond and Spacek[7], who created histograms of edge orientation over regions and obtained good results. One of our structure descriptors can be thought of as a weighted version of edge orientation, in which strong edges are given more weight than weak edges. Rather than use a histogram we model the edge structure at every pixel. Bosch et al.[1] have used a non-linear normalisation step on intensities in echo-cardiograms as pre-processing before building an appearance model. The intention is to modify the strongly non-gaussian noise statistics of such images into more normal shaped distributions. This gives significantly improved results. However, this was on intensities, not structure descriptors, and the approach described may not be optimal when applied to structure descriptors.
260
I.M. Scott, T.F. Cootes, and C.J. Taylor
Several authors have attached feature detectors to points on an Point Distribution Model (PDM). This PDM can been automatically generated created using elastic variation of a single image[10]. A manually trained, but statistically learnt PDM can be used with profiles[3], and Gabor jets[11]. In all these approaches there is no dense model of texture, and the feature detector location, and effect on the shape model, has been set by humans rather than learnt. Kittler et al.[9] demonstrated that different types of image normalisation could have a significant effect on a face verification task. They found that histogram normalisation tended to perform well over a range of experiments.
3
Statistical Models of Appearance
An appearance model can represent both the shape and texture variability seen in a training set. The training set consists of labelled images, where key landmark points are marked on each example object. An appearance model can be thought of as generalisations of eigen-patches or eigen-faces[14, 18] in which, rather than represent a rigid region, we model the shape of the region and allow it to deform. Given a training set we can generate statistical models of shape and texture variation using the AAM method developed by Cootes et al.[2]. The shape of an object can be represented as a vector s of the positions of the landmarks and the texture (grey-levels or colour values) as a vector t. This texture is sampled after the image has been warped to the mean shape. The texture preprocessing described in this paper also takes place after the texture has been warped to the mean shape. The appearance model has parameters, c, controlling the shape and texture according to s = ¯s + Qs c t = ¯t + Qt c
c1 = −3
c1 = 0
c1 = +3
c2 = −3
c2 = 0
c2 = +3
Fig. 1. Effect of varying first two
where ¯s is the mean shape, ¯t the mean texture parameters of a spinal X-ray appearance model, by ±3 standard deviaand Qs ,Qt are matrices describing the modes tions from the mean. of variation derived from the training set. An example image can be synthesised for a given c by generating a texture image from the vector t and warping it using the control points described by s (see figure 1.)
Improving Appearance Model Matching Using Local Image Structure
261
Such a model can be matched to a new image, given an initial approximation to the position, using the AAM algorithm[2]. This uses a fast linear update scheme to modify the model parameters so as to minimise the difference between a synthesised image and the target image. Appearance models and AAMs have been shown to be powerful tools for medical image interpretation [13, 1] and face image interpretation [2].
4 4.1
Local Structure Detectors Non-linear &Transforms
As noted earlier, the texture preprocessor needs to be non-linear to make a significant difference to a linear PCA-based model. If we restrict the choice of preprocessor to those whose magnitude reflects the strength of response of a local feature detector, then it would be useful to transform this magnitude m into a reliability measure. We have chosen to use sigmoid function for this non-linear transform; f (x) =
m m+m
(1)
where m is the mean of the feature response magnitudes m over all samples. This function has the effect of limiting very large responses, preventing them from dominating the image. Any response significantly above the mean gives similar output. Also, any response significantly below the mean gives approximately zero output. This output behaves like the probability of there being a real local structure feature at that location. 4.2
Gradient Structure
The first local structure descriptor with which we have experimented is gradient orientation. Early work on non-linear gradient orientation is described in [4]. We calculate the image gradient g = (gx gy )T at each point giving a 2-tuple texture image for 2-d input images. The magnitude |g| can be transformed using equation 1, while preserving the direction. This is followed by the non-linear normalisation step to give gn = 4.3
(gx gy )T |g| + |g|
(2)
Corner and Edge Structure
We had observed that image corners were sometimes badly matched by gradient and intensity AAMs. Corners are well known as reliable features for corresponding multiple images[17], and in applications such as morphometry[15] accurate corner location is important in diagnosis.
262
I.M. Scott, T.F. Cootes, and C.J. Taylor
Harris and Stephens [6] describe how to build a corner detector. They construct a local texture descriptor by calculating the Euclidean distance, or sum of square differences between an image patch and itself as one is scanned over the other. This local image difference energy E is zero at the patch origin, and rises faster for stronger textures. X E(x, y) = [I(u + x, v + y) − I(u, v)]2 u,v
To enforce locality and the consideration of only small shifts, they added a Gaussian window w(u, v),and then made a first order approximation: E(x, y) =
X u,v
·
∂I ∂I w(u, v) x (x, y) + y (x, y) + O(x2 , y 2 ) ∂u ∂v
¸2
Expanding the square-term gives E(x, y) =
Ax2 + 2Cxy + By 2 =
where w(u, v) = exp −(u2 − v 2 )/2σ 2 ,
A C ), M = (C B
(x y)M(x y)T A(x, y) =
£ ∂I ¤2 ∂u
⊗ w, etc.
The eigenvalues α,β of M characterise the rate of change of the sum of squared differences function as its moves from the origin. Since α and β are the principle rates of change, they are invariant to rotation. Without loss of generality, the eigenvalues can be rearranged so that α >= β. The local texture at each point in the image can be described by these two values. As shown in figure 2, low values of α and β imply a flat image region. A high value of α and low value of β imply an image region flat in one direction, but changing in another, i.e. an edge. High values of both α and β imply a region that isn’t flat in any direction, i.e. a corner. At this point Harris and Stephens identified individual points of interest by looking for local maxima in det M − k[tr M]2 . We leave their approach here, except to note that useful measures derived from α and β can be found without actually performing an eigenvector decomposition, e.g. det(M) = AB − C 2 4.4
Developing Measures of Cornerness and Edgeness
It would be useful to have independent descriptors of edgeness and cornerness. To force α and β into an independent form, we take the vector (α β)T and double the angle from the α axis, as in figure 3. It is possible to calculate the cornerness, r, and edgeness, e, defined this way, without explicitly having to calculate an eigenvector decomposition. Note that our edgeness measure is different from the gradient measure, by being independent of edge direction. tan θ =
β α
⇒
β sin θ = p α2 + β 2
and
α cos θ = p α2 + β 2
Improving Appearance Model Matching Using Local Image Structure β
263
cornerness c
β
strong corners flat areas
strong edges α
Fig. 2. How α and β relate to cornerness and edgeness.
r = (α2 + β 2 ) sin 2θ = 2 det M = 2AB − 2C 2 (3)
5
2θ
θ
α
edgeness e
Fig. 3. Making cornerness independent of edgeness by doubling angle from axis.
e = (α2 + β 2 ) cos 2θ p = tr M tr2 M − 4 det M p = (A + B) (A − B)2 + 4C2 (4)
Experiment with Spinal X-Rays
We took a previously described [16] data set of low-dose Dual X-ray Absorptiometry (DXA) lateral scans of the spines of 47 normal women. The vertebrae from T7 to L4 were marked up under the supervision of an experienced radiologist — figure 4 shows an example. The images are 8-bit greyscale and roughly 140×400 pixels in size. Each vertebra is about 20-25 pixels tall. Since we did not have a large data set, we performed leave-1-out experiments, by repeatedly training an AAM on 46 of the images and testing it on the remaining image. For each test image we performed 9 AAM Fig. 4. A spinal DXA image with searches starting with the mean shape learned markup, and after multi-modal simduring training, displaced by all combinations ulation. of [−10, 0, +10] pixels in x and y. After the AAM search had converged we measured the distance from each control point on the AAM to the nearest point on the curve through the equivalent marked-up points. We calculated the mean of these absolute errors for each AAM search. Because of the even spacing of control points around each vertebra, this error will be approximately proportional to the total pixel overlap error. This whole experiment was run for each of the following texture preprocessors: Intensity Original AAM.
264
I.M. Scott, T.F. Cootes, and C.J. Taylor
Sigmoidal gradient 2-tuple output gn of sigmoidally normalised directed gradient (equation 2.) Sigmoidal edge Sigmoidally normalised version of undirected edgeness e (equation 4.) Sigmoidal corner Sigmoidally normalised version of cornerness r (equation 3.) Sigmoidal corner and edge 2-tuple of the sigmoidally normalised cornerness and edgeness (r, e). (equations 3 and 4.) Sigmoidal corner and gradient . . . Sigmoidal edge and gradient . . . Sigmoidal corner, edge, and gradient . . . In another experiment to simulate performance on multi-modal images, roughly half of the set of images were transformed by applying a bitonic pixel-value transfer function — see figure 4 for an example. The two groups were then merged, to give a set of 47 images. A leave-1-out experiment, similar to the previously described one, was then performed on this simulated multi-modal data set. 250
5.1
Results
200
Intensity Sigmoidal undirected edge Sigm. corner, edge and gradient
Frequency
The distribution of mean ab150 solute errors for the 47 × 9 = 423 searches of the normal data set 100 for three of the preprocessors is shown in figure 5. Each search was 50 considered a success if the mean absolute point to curve error was 0 0 2 4 6 8 10 12 14 Mean abs error for a single search result / pixels less than 2 pixels. (The estimated repeatability of expert annotation Fig. 5. Error spread between spinal AAM control is 1 to 1.5 pixels on this data.) points and the marked-up curves. Figure 6 summarises the results for all of the preprocessors. The results from the simulated multi-modal data set for the original “Intensity” and the “Sigmoidal corner, edge and gradient” AAMs are summarised in figure 7.
5.2
Statistics
It is not possible enough to show that the improvement is significant by simply comparing the means and standard deviations in figure 6, because the data is not normally distributed. Instead we use the percentage of successful results. If we classify the results as successes or failures according to the above test (section 5.1,) and count the number of successes, we should expect the result to be a binomially distributed random variable. When comparing two experiments, we need to show that any improvement in the percentage of successful results is statistically significant. To do this we must assume that there is an underlying distribution based on a probability of a single success of θ. After performing one experiment with n trials we get ny successes, and so estimate a probability
Improving Appearance Model Matching Using Local Image Structure
Texture Preprocessor Intensity Sigmoidal Sigmoidal Sigmoidal Sigmoidal Sigmoidal Sigmoidal Sigmoidal
Searches <2 pixels 35% 40% gradient 82% edge 75% corner 81% corner and edge 80% corner and gradient 85% edge and gradient corner, edge, and gradient 92%
265
Point-Curve error mean std median 90%-ile 5.4 3.8 5.6 11.0 5.1 4.0 4.7 10.8 2.4 3.1 1.4 6.5 2.6 2.7 1.5 7.5 2.2 2.6 1.3 4.8 2.1 2.2 1.2 1.9 1.9 2.1 1.2 4.6 1.5 1.4 1.2 1.8
Fig. 6. Comparing point-to-curve errors (in pixels) for different spinal AAM texture preprocessors. Texture Preprocessor
Searches Point-Curve error <2 pixels mean std median 90%-ile 7% 9.5 6.1 8.9 16.0 Intensity 3.4 3.8 1.6 9.3 Sigmoidal corner, edge, and gradient 60% Fig. 7. Comparing point-to-curve errors (in pixels) for simulated multi-modal spinal images
of success y. We perform another experiment of m trials and get a probability of success x. We are interested in the probability of x being from the same distribution as y, having already measured y. p(x|y) =
p(x ∩ y) p(y)
Each of these probabilities depends on the parameter of the underlying binomial distribution p(x|θ), so we must marginalise θ away. p(x|y) =
R1
R1
0
0
p(x ∩ y|θ)dθ = R1 p(y|θ)dθ 0
p(x|θ)p(y|θ)dθ R1 p(y|θ)dθ 0
where the binomial distribution is
µ ¶ n n−x p(x|θ) = θx (1 − θ) x
It doesn’t appear to be possible to find an analytic solution to these integrals, however we can use numeric integration. Figure 8 gives the p-values for each result, given a null hypothesis that a poorer performing experiment could have produced that result. It should be noted that because the 9 search tests per image can not be considered independent of each other, we based the significance calculation on a value n = 47.
266
I.M. Scott, T.F. Cootes, and C.J. Taylor
Texture Preprocessor Intensity Sigmoidal Sigmoidal Sigmoidal Sigmoidal Sigmoidal Sigmoidal Sigmoidal
Result − log10 p-value given base result 35% 40% 75% 80% 81% 82% 85% 35% 40% 0.5 gradient 75% 4.7 3.9 corner 80% 5.6 4.8 0.6 corner and gradient 81% 6.1 5.3 0.8 0.5 corner and edge † 82% 6.1 5.3 0.8 0.5 0.4 edge † 85% 6.7 5.8 0.9 0.6 0.5 0.5 edge, and gradient corner, edge, and gradient 92% 9.5 8.5 2.2 1.7 1.4 1.4 1.2
† Note that the fraction of successful results is rounded down to the next lowest multiple of 1/n for p-value calculation, causing two rows with slightly dissimilar success rates to have identical p-values. Fig. 8. Probabilities (p-values) that an experiment could be a random result of a worse performing spinal experiment.
We can see that the large improvements between the “intensity” AAM and the various texture preprocessor AAMs are certainly significant. With the exception of the “sigmoidal gradient” preprocessor, the differences between the various texture preprocessors are not significant at the α = 0.01 level. In the simulated multi-modal experiment, the improvement of the “Sigmoidal corner, edge, and gradient” preprocessor over the “intensity” AAM, is significant with p = 5×10−7.
6
Experiment with Faces
To add to the confidence of our results, we repeated the experiments with a face data set. We have a much larger database of face images. This enables us to build a face AAM using 100 images, and then test it on the independently collected extM2VTS[12] database. We have marked up 1817 images of 295 distinct subjects from this database. The remaining images in the database suffer from extreme motion blur, and interlace artifacts. The raw “intensity”, and “sigmoidal gradient” are repetitions respectively, of the NI and ES cases described by Cootes’s and Taylor’s paper[4]. That paper only used a 100 image subset of extM2VTS for testing, and did no statistical analysis of the results. The results are tabulated in figure 9. The median and 90 percentile errors are given for all experiments. A search was considered successful when the mean absolute point to curve error fell below 5% of the inter-occular distance or 5 pixels in the Surrey dataset. As described before we cannot use all the measurements because they are not independent. This is especially unfortunate with the faces experiments because we have a lot of data for which the tests should be largely if not completely
Improving Appearance Model Matching Using Local Image Structure
Texture Preprocessor Intensity Sigmoidal Sigmoidal Sigmoidal Sigmoidal Sigmoidal Sigmoidal Sigmoidal
Searches <2 pixels 55.8% 72.5% gradient 68.8% edge 68.0% corner 73.9% corner and edge 83.6% corner and gradient 80.3% edge and gradient corner, edge, and gradient 83.7%
267
Point-Curve error mean std median 90%-ile 5.4 2.9 4.6 9.0 4.5 2.0 3.9 7.2 4.8 2.5 4.0 8.1 4.8 2.3 4.0 7.8 4.5 2.3 3.8 7.6 3.9 1.4 3.5 5.7 4.1 1.7 3.6 6.2 3.9 1.7 3.8 5.8
Fig. 9. Comparing point-to-curve errors for different facial AAM texture preprocessors.
independent. In particular AAM searches of different images of the same person should be mostly independent. However, we again conservatively chose n = 295 as the number of strictly independent measurements.
Texture Preprocessor
Result − log10 p-value given base % rate 55.8 68.0 68.8 72.5 73.9 80.3 83.6 55.8% Intensity 68.0% 3.0 Sigmoidal corner 68.8% 3.2 0.4 Sigmoidal edge 72.5% 5.0 1.0 0.8 Sigmoidal gradient 73.9% 5.7 1.2 1.1 0.5 Sigmoidal corner and edge 80.3% 10.3 3.6 3.3 2.0 1.6 Sigmoidal edge and gradient 83.6% 13.5 5.5 5.1 3.4 2.9 0.9 Sigmoidal corner and gradient† Sigmoidal corner, edge, and gradient† 83.7% 13.5 5.5 5.1 3.4 2.9 0.9 0.3 † See note in figure 8. Fig. 10. Probabilities (p-values) that an experiment could be a random result of a worse performing facial experiment.
7
Discussion and Conclusion
We have shown that using descriptions of local structure for the texture model of an AAM significant improves the accuracy of the AAM search. Furthermore, the righthand mode of the distribution of “intensity” AAM results (figure 5) can be interpreted as convergence failures or false minima. The significant reduction in these failures using the various local structure preprocessors shows that we have also improved the reliability of AAMs. The local structure descriptors are less dependent on the global or sub-global contrast effects caused by differing imaging parameters. The simulated multi-
268
I.M. Scott, T.F. Cootes, and C.J. Taylor
modal spinal image experiment shows that the “intensity” AAM needs to devote so much variance to it’s texture model to cope, that it fails to learn any useful information about the images. Comparing the results for the “Sigmoidal corner, edge and gradient” preprocessor in figures 6 and 7 shows that the severe image corruption has a relatively small effect on a local structure AAM. Using all the sigmoidally-normalised local structure descriptors gives the best results on both the spine and face data. This suggests that it may be advantageous to add more local structure descriptors, including parameterised families of descriptors (provided that they all have magnitude based outputs, and distributions that are compatible with the non-linear sigmoidal function.) Potentially interesting families include the differential Gaussian invariants (which have been shown[19] to have high saliency in shape modelling applications) and complex wavelets[8]. Adding ever more local structure descriptors will inevitably lead to a case of diminishing returns. More work will be needed to determine if adding too many descriptors would significantly decrease performance, perhaps though increased numerical errors in the linear PCA learning. It may be possible to normalise the statistics of any structure description (perhaps using the Cumulative Distribution Function (CDF) of the magnitudes over all or part of the image) and correctly combine non-magnitude (e.g. pixel intensity) descriptors with the sigmoidally-normalised descriptors. Even without fully solving this problem it should be straightforward to concatenate the sigmoidally-normalised descriptors from the different planes of co-registered multi-modal medical images. This work should extend straightforwardly to 3D images. The calculation of gradient preprocessor extends trivially to 3D. The joint calculation of the corner, edge pair in 2D would extend to an analogous calculation of a plane, edge, corner triplet in 3D. However, it is difficult to do experiments with 3D volume AAMs due to the sheer size of such models. Picking one local structure descriptor that is responsible for the majority of the improvement is possible in either data set. However, the “sigmoidal gradient” descriptor which is responsible for most of the performance improvement in the face data set, gives performance which is not significantly different from the “intensity” AAM in the spine data set. By providing the AAM training algorithm with all of the local structure descriptors, it can learn which descriptors are most useful, and adjust the importance of each descriptor on a sample by sample basis to get optimum performance.
Acknowledgements. We would like to thank Prof. Judith Adams and Martin Roberts for the data and useful discussions.
References [1] H.G. Bosch, S.C.Mitchell, P.F.Boudewijn, P.F.Leieveldt, F.Nijland, O. Kamp, and M. Sonka. Active Appearance-Motion Models for endocardial contour detection in time sequences of echocardiograms. In SPIE Medical Imaging, 2001.
Improving Appearance Model Matching Using Local Image Structure
269
[2] T.F. Cootes, G.J. Edwards, and C.J.Taylor. Active Appearance Models. IEEE Transactions on Pattern Matching and Machine Intelligence, 23(6):681–885, 2001. [3] T.F. Cootes and C.J. Taylor. Active Shape Models - ‘Smart Snakes’. In British Machine Vision Conference, pages 266–275. Springer-Verlag, 1992. [4] T.F. Cootes and C.J. Taylor. On Representing Edge Structure for Model Matching. In CVPR, volume 1, pages 1114–1119, 2001. [5] G. Edwards, Chris J. Taylor, and Tim F. Cootes. Interpreting face images using active appearance models. In International Conference on Automatic Face and Gesture Recognition, pages 300–5, 1998. [6] C. Harris and M. Stephens. A Combined Corner and Edge Detector. In Alvey Vision Conference, pages 147–151, 1988. [7] D. Hond and L. Spacek. Distinctive Descriptions for Face Processing. In British Machine Vision Conference, Colchester, 1997. [8] N. Kingsbury. Image processing with complex wavelets. Philosophical Transactions of the Royal Society, A, 357(1760):2543–60, 1999. [9] J. Kittler, Y.P. Li, and J. Matas. On matching scores for LDA-based face verification. In British Machine Vision Conference, volume 1, pages 42–51. BMVA Press, 2000. [10] M. Lades, J.C. Vorbruggen, J. Buhmann, J. Lange, C. von der Malsburg, R.P. Wurtz, and W . Konen. Distortion Invariant Object Recognition In The Dynamic Link Architecture. IEEE Transactions On Computers, 42(3):300–311, 1993. [11] Stephen J. McKenna, Shaogang Gong, Rolf P. Wrtz, Jonathan Tanner, and Daniel Banin. Tracking Facial Feature Points with Gabor Wavelets and Shape Models. In International Conference on Audio-Video Based Biometric Person Authentication, pages 35–43, 1997. [12] K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maitre. XM2VTSdb: The Extended M2VTS Database. In 2nd Conf. on Audio and Video-based Biometric Personal Verification. Springer Verlag, 1999. [13] S.C. Mitchell, P.F Boudewijn, P.F.Lelievedt, R.J. van der Geest, H.G. Bosch, J.H. Reiber, and M. Sonka. Time continuous segmentation of cardiac MR image sequences using Active Appearance Motion Models. In SPIE Medical Imaging, 2001. [14] Baback Moghaddam and Alex Pentland. Probabilistic Visual Learning for Object Representation. IEEE Transactions on Pattern Matching and Machine Intelligence, 19(7):696–710, 1997. [15] National Osteoporosis Foundation Working Group on Vertebral Fractures. Assessing vertebral fractures. Journal of Bone and Mineral Research, 10(4):518–23, 1995. [16] Paul P. Smyth, Christopher J. Taylor, and Judith E. Adams. Vertebral Shape: Automatic Measurement with Active Shape Models. Radiology, 211:571–578, 1999. [17] E. Trucco and A. Verri. Introductory Techniques for 3-D Computer Vision. Prentice-Hall, 1998. [18] M. Turk and A. Pentland. Eigenfaces for Recognition. Journal of Cognitive Neuroscience, 3(1):71–86, 1991. [19] K. Walker, T.F. Cootes, and C.J. Taylor. Locating Salient Object Features. In British Machine Vision Conference, pages 463–472, 1998.
Knowledge-Driven Automated Extraction of the Human Cerebral Ventricular System from MR Images Yan Xia, QingMao Hu, Aamer Aziz, and Wieslaw L. Nowinski Biomedical Imaging Lab, Institute for Infocomm Research 21 Heng Mui Keng Terrace, Singapore 119613 [email protected] Abstract. This work presents an efficient and automated method to extract the human cerebral ventricular system from MRI driven by anatomic knowledge. The ventricular system is divided into six three-dimensional regions; six ROIs are defined based on the anatomy and literature studies regarding variability of the cerebral ventricular system. The distribution histogram of radiological properties is calculated in each ROI, and the intensity thresholds for extracting each region are automatically determined. Intensity inhomogeneities are accounted for by adjusting intensity threshold to match local situation. The extracting method is based on region-growing and anatomical knowledge, and is designed to include all ventricular parts, even if they appear unconnected on the image. The ventricle extraction method was implemented on the Window platform using C++, and was validated qualitatively on 30 MRI studies with variable parameters.
1
Introduction
The human cerebral ventricular system consists of four intercommunicating chambers: left and right lateral, third, and fourth ventricle. The ventricles contain cerebrospinal fluid (CSF). Changes in CSF volume and shape are associated with several neurological diseases, such as congenital anomalies, post-traumatic disorders, tumors, pseudo-tumors, neurodegenerative diseases, inflammatory diseases, headache, cognitive dysfunction / psychiatric diseases, and post-operative changes. Quantification of the degree of dilatation of ventricles is important in diagnosis of various diseases and to measure the response to treatment, and in predicting the prognosis of the disease process. Extraction and quantification of the ventricular system from medical images is therefore of primary importance. Due to the great importance of the ventricular system, its extraction has attracted a lot of research work. Major representative methods for the extraction of ventricles have been region-growing assisted by morphological operations (Schnack et al., 2001 [1]), thresholding (Worth et al., 1998 [2]), template matching (Kaus et al., 2001 [3]), atlas warping (Holden et al, 2001 [4]), level sets (Baillard et al., 2000 [5]), active models (Wang et al., 1998 [6]), fuzzy set and information fusion (Geraud, 1998 [7]), genetic method (GA) optimization (Sonka et al, 1996 [8]), pure region-growing (Hahn et al, 1998 [9]), and knowledge based methods (Fisher, et al, 2002 [10]). Despite numerous approaches proposed to solve the problem, there is no simple method
C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 270–281, 2003. © Springer-Verlag Berlin Heidelberg 2003
Knowledge-Driven Automated Extraction of the Human Cerebral Ventricular System
271
suitable for clinical use. This is due to the human intervention required, inability to handle pathological and abnormal cases, and/or inability to extract the complete ventricular system. Moreover, the existing methods are too slow to be clinically acceptable. Because of the anatomical complexity of the cerebral ventricular system and the lack of fast and reliable extraction procedures, a fast, automatic, accurate, and robust method is desirable to extract the complete ventricular system. In this paper, we present an efficient and automated method to extract the human cerebral ventricular system from MRI driven by anatomic knowledge. The method is based on region-growing and anatomical knowledge, and is focused on automatically extracting all ventricular parts, even if they appear unconnected on the image.
2 Materials and Methods The method is based on the following anatomical principles: (1) Ventricles are surrounded by white and gray matter. (2) All of the ventricles are interconnected to enable circulation of cerebrospinal fluid in nature and could, in principle, be extracted with one region-growing operation. However, owing to the partial volume effect, noise, and spatial and contrast resolutions of the scan, parts of the system may appear not to be connected on the image and additional anatomical knowledge of the shape and position of the ventricular system needs to be inserted in the method to find these unconnected parts and remove those connected to nonventricular regions (cisterns, sulci). To facilitate this, the ventricular system is divided into six 3D sub-regions that could be extracted independently. The sub-regions are the body of the left lateral ventricle (including the anterior and posterior horns, VLL-B), the body of the right lateral ventricle (including the anterior and posterior horns, VLR-B), the inferior (temporal) horn of the left lateral ventricle (VLL-I), inferior (temporal) horn of the right lateral ventricle (VLR-I), third ventricle (V3), and fourth ventricle (V4). Anatomic knowledge of the ventricular system is inserted to guide the method in defining the sub-regions, drawing a region of interest (ROI) for each sub-region, calculating statistical characteristics of each ROI and locating a seed point in each ROI, extracting for sub-regions, and connecting them to get the complete ventricular system. 2.1 Preprocessing Steps The brain MRIs are loaded initially. Next the location of the Midsagital Plane (MSP) is extracted from the volume data [11], and the coordinates of the anterior commissure (AC) and posterior commissure (PC) are identified on the MSP image [12]. A coordinate system (xyz) is defined such that x runs from subject’s right to left, z runs from superior to inferior, and y runs from anterior to posterior. Thus the xz plane is coronal, the yz plane is the sagittal, and the xy plane is axial that is parallel to the AC-PC line. The axial, coronal, and sagittal orientations are thus available. The original volumetric dataset is reformatted into above coordinate system (however, there is not need to use an isotropic volume). The reformatted dataset facilitates using any orientation for setting the ROI and performing region growing. It also helps in defining the anatomical landmarks accurately.
272
Y. Xia et al.
2.2 Define Multiple Regions of Interest (ROIs) Six regions of interest (ROIs) are defined corresponding to the respective ventricular sub-regions: VLL-B, VLR-B, VLL-I, VLR-I, V3, and V4. The initial ROIs are defined taking into account the worst-case assumption (the biggest ROI necessary) based on literature studies of variability, [13] [14]. Fig. 1 shows the ROI of left lateral ventricle. It is defined on the coronal plane passing through the AC (VAC). Taking AC as base point, ROI is set 25 mm laterally and 35 mm superiorly. Fig. 2 shows a ROI of the inferior horn of the right lateral ventricle, which will be set on all the coronal slices between VAC and coronal plane passing through the PC (VPC). The horizontal profile is then drawn at the level of the inferior point of V3; a middle point (a) is marked at half the distance between MSP and the temporal bone marrow fat signal point in the profile as reference. The ROI 20 mm x 20 mm is drawn in reference to this point so that the coordinates are (a, a-10) (a, a+10) laterally and (a, a20) inferiorly. The initial ROI of V3 on the MSP is antero-posteriorly between AC, and PC and superior-inferiorly between [AC– 35 mm, AC + 10 mm] (Fig. 3). For the V4, initially a large ROI with width of 45 mm and height of 55 mm is used on the MSP, Fig. 4. The ROI shape is changed to quadrangular so as to exclude the clivus (bone), ambient cistern, quadrigeminal cistern, and cisterna magna.
Fig. 1. ROI for the left lateral ventricle
Fig. 3. ROI for the third ventricle
Fig. 2. ROI for the inferior horn of the right lateral ventricle
Fig. 4. ROI for the fourth ventricle
2.3 Calculate Statistics and Adapt Size of Each ROI The thresholds for gray matter (GM), white matter (WM), and CSF in each sub-region have to be determined automatically before extracting the ventricular system. This is
Knowledge-Driven Automated Extraction of the Human Cerebral Ventricular System
273
done by finding the peaks in the histogram that characterize the distribution of the voxel intensities in the studied region of interest. Calculate the Histogram of ROI. Each ROI is defined to facilitate analysis that will produce statistically significant findings necessary for each ventricular sub-region’s extraction. The histogram obtained from an ROI containing the lateral ventricle and adjacent structures on T1-wieghted MRI is shown in Fig. 5. This produces a multimodal histogram where the intensity peaks and ranges corresponding to WM, GM, CSF, and other classes can be chosen. The ventricles filled with CSF are defined by those spatially connected pixels with intensity between two thresholds. Frequency
0.07 0.06
Original histogram
0.05
Smoothed histogram
0.04 0.03 0.02 0.01 0 0
32
64
96
128
Gray scale
Fig. 5. Original and smoothed histograms of ROI
Smooth the Histogram. Histograms of real data have substantial local variations that hamper determination of global peaks and valleys. Some initial smoothing methods have to be applied to reduce the local variations. In this work, the Fourier filter based on the frequency components of a signal is applied to the original histogram. It works by taking the Fourier transform of the signal, cutting off all frequencies above a specified limit, and inversely transforming the result. The best cut-off frequency of the Fourier filter is found automatically based on knowledge of anatomy, radiology, and image processing. An initial small cut-off frequency is assigned, the Fourier filter is imposed on the signal and then the inverse transformation is performed. The inverse result is checked using the following constrains: • The number of peaks in the smoothed histogram is equal to or less than 5 because there are CSF, CSF/GM, GM, GM/WM and WM in the ROI. • A peak is ignored if its area is smaller than the cumulative area of the histogram (i.e., if its area is less than 1/30th of the total area). • A peak is ignored if it is between the intensities at 5% and 95% of the cumulative area of the histogram. If the result meets the above constrains, the cut-off frequency is the best one for the smoothing. If not, the cut-off frequency will be adaptively reduced/increased and the smoothing be repeated until a new inverse result fulfils these constrains. The original and smoothed histograms are shown in Fig. 5. The original histogram contains many noisy peaks. The smoothed histogram has only main peaks after undergoing a
274
Y. Xia et al.
Fourier-filter smoothing. The noise is greatly reduced while the main peaks are hardly changed without distorting the signal significantly. Smoothing increases the signal-tonoise ratio and allows the signal characteristics (peak position, height, width, area, etc.) to be measured more accurately. Identify the Peaks in the Histogram. The histogram is modeled as the sum of five smoothed, modified normal distribution (Gaussian) functions -- one distribution each for CSF, CSF/GM, GM, GM/WM and WM. In Equation (1), µ is the mean, σ is the standard deviation and h is the height of the Gaussian function.
− ( x − µi ) 2 f ( x ) = ∑ h(i ) × exp 2 2 σ i =1 i
(1)
5
Fig. 6 shows the smoothed histogram distribution and the result of modeling each separately by a Gaussian function described the intensity distributions. After modeling, the peaks, ranges and intersections of modeled distributions can be taken. Neuroanatomical and radiological knowledge is incorporated to locate the peaks corresponding to CSF, GM, and WM. It is assumed that the highest intensity peak corresponds to WM on T1-weighted/SPGR MRI while the lowest intensity peak corresponds to CSF, and the lowest intensity peak corresponds to WM on T2weighted MRI while the highest intensity peak corresponds to CSF. After identifying the peaks in the histogram, the upper and lower gray-value thresholds of each peak are computed on both sides of the peak (mean) of the Gaussian, at two standard deviations away so that 95% of the data usually falls within the region, or the cross points with right/left neighborhood Gaussian functions (Fig. 6). Frequency
0.06 lower threshold
0.05
Smoothed histogram Modeling function
0.04
lower upper 0.03 threshold threshold 0.02 0.01
0
32
64
96
128
Gray value
Fig. 6. The Result of Gaussian Fitting
Reduce ROI Adaptively. As the initial ROIs are defined based on the worst case literature variability studies, the distribution of CSF, GM, and WM in some cases may be unbalanced (some peaks corresponding to either WM, GM or CSF are so small that they are not easily identifiable in the histogram), which may hinder an accurate processing of the histogram. To improve the density distribution balance, the ROI is
Knowledge-Driven Automated Extraction of the Human Cerebral Ventricular System
275
reduced adaptively. When the ratio of the WM/GM to CSF peak is greater than a given number, the ROI is reduced by a given percentage, and the histogram is recalculated and the new thresholds are gotten. The adaptive reduction of ROI is performed iteratively until the required distribution is achieved. Adjust the Thresholds for Each Slice. The initial thresholds will be changed adaptively during the growing of the ventricular system to cope with the partial volume effect and intensity inhomogeneity. For the partial volume effect, it based on a two compartment model, and can assume that each voxel near the edge of the ventricular system may contain up to two classes, CSF and GM or WM. Using the class averages for each of these two compartments, the contribution of each class can be calculated from the grown voxel intensity. In Equation (2), I means the measured voxel intersity, µ is the mean, α means the proportion of CSF in the measured voxel.
α=
µ gm / wm − I µ gm / wm − µ csf
(2)
When α value is greater than 0.5, meaning that the proportion of CSF in the measured voxel is more than 50%, the measured voxel may belong to ventricular regions or cisternal regions. This is decided based on anatomy knowledge. It is important to use local thresholds to extract the ventricle region on the current slice to reduce the influence of intensity inhomogeneities. Since the intensity inhomogeneities are visually obvious on GM or WM for radiologist, the ratio of the means of GM/WM is taken as the multiplicative correction factor for each slice:
Correct slice =
global ( µ gm / wm )
(3)
local ( µ gm / wm )
Where “local” means current slice, “global” means the slice that a ROI is defined on (Fig. 1 -- Fig. 4), and Correct slice means the intensity inhomogeneities correction factor for current slice (axial, sagittal, or coronal). The result of ( local ( µ csf ) × Correct slice ) is the threshold that is used to extract the ventricle region in the current slice. 2.4 Extract Ventricular Region The ventricular regions are grown in 3D independently starting from the defined seed points. Each region growing is directional which allows for better control of growing in 3D space. Moreover, the extraction sequence is firstly extracting third ventricle, next fourth ventricle, and then the left and right ventricles (VLL and VLR) for controlling leakage and connections.
276
Y. Xia et al.
Extract V3. Third ventricle is a small but with complex shape ventricle. The V3 is subdivided into four subregions by the planes passing through the AC and PC points, namely VAC, VPC, AC-PC axial plane, Fig. 7. The seed point of V3 is point whose grey scale level is closest to the CSF mean value on the AC-PC line segment. Subregion 2 is grown superiorily direction and subregion 4 is grown inferiorily direction on all subsequence axial slices starting from the seed point. Subregion 1 is grown anteriorly direction on coronal slices starting from any pixel Fig. 7. Sub-regions of V3 and V4 common with subregion 4. Subregion 3 is grown superiorily on axial slices starting from any pixel common with subregion 2. The following leakages may occur in V3: 1) anteriorly through the lamina terminalis to the chiasmatic cistern, 2) ventrally through the mesencephalic tegmentum to the interpeduncular cistern, 3) posteriorly through the posterior commissure (stalk of the pineal body) to the cisterna ambiens. So, suitable spatial constraints are imposed into method for leakage prevention: • As the shape of the roof of V3 is oblique, the distance between the superior and inferior points of sub-region 1 decreases, and the width of subregion 1 on a processed coronal slice is not allowed to increase by 50% over that of the previous slice along anterior direction. • To prevent leakage posteriorly through the PC (stalk of the pineal body) to the cisterna ambiens, two spatial constraints are imposed: 1) The maximum width of the foreground region of subregion 3 on a currently processed axial slice is on PC line (PC line is the intersection between VPC and the current axial slice); 2) The distance between the gravity centre of the foreground region of subregion 3 and the MSP should be small less than 4 mm in sagittal direction [13]. Extract V4. V4 is subdivided into two subregions by the axial plane passing through the seed point located at the middle point of the longest CSF segment on MSP, Fig. 7. V4 is grown on axial slices, superiorily in subregion 1 and inferiorily in subregion 2 starting from the axial slice containing the seed point. The aqueduct cannot be extracted directly as its diameter is quite small (about 1.2 mm in average) in most cases, it can be positioned at the lowest intensity in a small search area on T1weighted MR image (or the highest on T2-weighted images). The following leakages may occur in V4: 1) dorsoposteriorly through the superior medullary velum to the cisterna ambiens, 2) ventroposteriorly through the inferior medullary velum to the cisterna magna.To prevent leakage dorsoposteriorly through the superior medullary velum to the cisterna ambiens, two constraints are imposed:
Knowledge-Driven Automated Extraction of the Human Cerebral Ventricular System
277
•
The number of the grown pixels of the current axial slice should not increase by 50% over that in the previous slice in inferior direction. • The distance between the gravity centre of the grown region and the MSP should be less than 2 mm in sagittal direction. The gravity centre of the grown region should not deviate more than 2 mm away from that of the previous slice in sagittal direction [13]. To prevent the leakage ventroposteriorly through the inferior medullary velum to the cisterna magna, the width of the foreground region at the lateral recesses should be less than 2 mm in sagittal direction [13]. Extract VLL-B and VLR-B. Each of VLL-B and VLR-B regions is grown in 3D space on coronal slices, slice by slice. Growing is initiated anteriorly direction from the seed point located at the middle longest CSF segment on VAC. When this subregion growing is completed, it is continued posteriorly direction on all subsequent coronal slices. Eventually it is continued anteriorly when attempting to extract the posterior part of the inferior horn. The VLL-B/VLR-B growing stops when there are no more connected voxels left to add to the result. This extraction result includes at least the main bodies of the lateral ventricles. Leakage may occur in a posterior ventral part of the body of the lateral ventricles. Moreover, in some cases, a large cystic elongated space between the lateral ventricles may be present representing the cavum septum pellucidum and cavum vergae. A potential leakage has to be checked when the extracting operation has finished at these spaces. On radiological images the cavum is relatively perpendicular and predictable, and the septal wall can be easily detected (it has higher signal intensity as compared to CSF on T1-weighted images). For the leakage into the quadrigeminal cistern, the grown area in current coronal slice is compared with that of the previous slice (y+1). When a sudden appearance of a large connected area of such voxels is possibly a leak into quadrigeminal cistern. If such a region is below the PC in inferior direction, in the area behind the VPC up to the seed point of V4, this region can be considered as leakage. When the leakage occurs, the voxels of this leakage region are set to a value that is outside of CSF range. Also, the CSF range should be narrowed and the last region-growing step is carried out again followed by checking whether the leakage is avoided. This time the “wrong” region cannot be grown. This procedure is repeated until all growing is done. Extract VLL-I and VLR-I. Connections between VLL-B with VLL-I and VLR-B with VLR-I are too thin to be visible on MRI in most cases with the current resolution. When VLL-B and VLR-B do not include the temporal horns, the seed point for each left and right temporal horn will be located (at the geometric center of the largest CSF region in the ROIs), and VLL-I and VLR-I be extracted. Each of VLL-I and VLR-I regions is grown in 3D space on coronal slices, slice by slice. Growing is initiated anteriorily from the seed point. Follow the CSF of temporal horn through anterior slice until no more CSF is seen. At same time, the gray matter will begin to extend superior to the hippocampus as slices move anteriorly. When this is completed, growing is followed the CSF of temporal horn through posterior sequent coronal slices. The lateral boundary is defined by the CSF of temporal horn at
278
Y. Xia et al.
beginning and then by the white matter adjacent to the gray matter of hippocampus until no more gray matter is seen through posterior slices. The following leakages may occur in these regions: 1) inferomedially through the hippocampus to the interpeduncular cistern, 2) medially through white matter (optic radiation) and cortex (choroidal fissure) to the subarachnoid space. To prevent potential leakages, the foreground region(s) on a processed coronal slice is (are) not allowed to grow beyond the boundary of ROI for VLL-I or VLR-I.
3
Results
The ventricle extraction method is implemented on the Window platform using C++. First evaluation of the method was done on a T1-weighted MR brain study, with volume size of 256*256*168, voxel size of 1.0*1.0*0.67 mm, of good image quality, with known underlying extraction result of ventricular system. The run time of the extraction method was less than 30 seconds on a Pentium 3 800 MHz PC with 128 MB of RAM. Fig. 8 shows the extraction results in some axial slices. The third ventricle, fourth ventricle, lateral ventricle, aqueduct, the temporal horns of the lateral ventricle, and the connection between VL-B and VL-I were extracted clearly. The “raw” automatic extraction result was not edited.
Fig. 8. Axial slices showing the ventricular system extracted
Knowledge-Driven Automated Extraction of the Human Cerebral Ventricular System
4
279
Discussion
To test the validity and applicability of the method, 30 real clinical MR volume data were collected from different sources. The volume data were acquired using a T1weighted spin echo and a Spoiled Gradient Echo Recovery (SPGR) sequence. They include healthy controls and patients with various tumors (Fig. 9), age ranging from 12 to 60 years, both females and males. The volume size of the scans range from 192*256*192 to 256*181*256, the voxel size from 0.897 to 1 mm in sagittal, from 0.879 to 2.0 mm in coronal, and 0.67 to 3.5 mm in axial directions. Furthermore, some volume data has significant partial volume effect and inhomogeneity, Fig. 10.
Fig. 9. Pathological Data
Fig. 10. Partial volume and inhomogeneity
The run time of the extraction method was less than 1 min for all of the data on a Pentium3 800 MHz, 128 MB RAM PC regardless of the complexity of the ventricular system and the severity of the leakages. The method for automatic determination of intensity model parameters using statistical worked well even for a wide variety of scan data sets obtained at different times from different scanners under different protocols. On one of the 30 datasets the method failed to run correctly because the slice thickness was 3.5 mm at axial direction, the method cannot pass the seed point and grow the lateral ventricles correctly. The roof of third ventricle could not be found in one of the data set, because the V3 was too thin for the case (12-year old subject), the partial volume effect caused the V3 roof to be invisible. On another dataset the connection between VLR-B and VLR-I could be located entirely, also because the hippocampus was too narrow at some places (60-year old subject). These tests show the method gives qualitatively good results (visual check of the extrication) for complete ventricular system; a more thorough validation is being carried out by the radiology experts. Our method assumes that the main body of the lateral ventricle can be extracted with one region-growing operation. Posterior or anterior parts of the lateral ventricle is (are) not connected to the main body on the image because the connection is too thin. Additional steps should be added the method to automatically find and grow the piece(s) and discard the other nearby loose ends of sulci and cisterns. Our method assumes that the MSP has been generated and the AC and PC landmarks have been detected. The calculation of MSP and the detection of AC and PC have their own inaccuracies, [11][12]. The inaccuracies propagate in our method,
280
Y. Xia et al.
and may have an influence on the extraction accuracy; AC is particular sensitive to the accuracy of the extraction result. Our method is based on region-growing and anatomical knowledge. Slice thickness influences more on accuracy than the partial volume effect and inhomogeneity. When the slice thickness especially in coronal and axial directions, is too big, the method can not get a good extraction result and might even fail to run. The tests show that the method fails to run correctly when the slice thickness is greater than 3.5 mm in axial direction and 2.0 mm in coronal direction. Noise has a more notable effect than partial volume and RF inhomogeneity on the method because the method is driven by anatomic knowledge and statistics. The method was not able to determine the location of the initial peaks (thresholds) when the data is very noisy with large intensity inhomogeneities or in data where there was very little gray-white contrast (i.e. these two peaks merging together). The method should adequately handle the parameter estimation, partial volume, and intensity inhomogeneities although these problems may not fit well into the theoretical formalism. Our emphasis here was to automate a precise segmentation procedure for ventricular system. The utility of the method to the general image processing is to deal directly with problems by using deep domain specific knowledge. This method provides an example of the application of neuroanatomical and radiological knowledge (expectations about tissue response in MR such as histogram peak sizes and knowing which peak to use in which situation; templates locating ROIs and seed points; specific sequence of successive region extraction and detection of leakages and connection) along with image processing and pattern recognition methods (locating peaks using the Gaussian fitting; excluding edge voxels from influencing threshold determination; and directional region growing from a starting point).
5
Conclusion
We have proposed an efficient and automated method to extract the complete human cerebral ventricular system from MRI driven by anatomic knowledge. The combination of neuroanatomy, radiological properties, variability of the ventricular system, and image processing techniques in the formulation of the method is advantageous as they aid to identify the thresholds of extraction, locate the starting point for the 3D region growing, localize the ventricular system structure and guide the system extraction including handling of the leakages and connection during the system extraction more reliably. The proposed method runs correctly on MR images acquired using different pulse sequences when the slice thickness was less than 3.0 mm at axial direction or is less than 2.0 mm at coronal direction, and yields a good qualitative result of complete ventricular system even in the presence of significant inhomogeneity and partial volume effect. The noise, slice thickness and partial volume effect have significant influence in extraction. We continue to validate the method by quantifying the extraction results. We will also extend this concept to other neuroanatomical structures such as hippocampus and corpus callosum.
Knowledge-Driven Automated Extraction of the Human Cerebral Ventricular System
281
References 1.
2.
3.
4. 5.
6.
7. 8. 9. 10. 11. 12. 13. 14.
Schnack HG, Hulshoff PHE, Baare WFC, Viergever MA, Kahn RS, “Automatic segmentation of the ventricular system from MR images of the human brain,” NeuroImage 2001 vol. 14, pp. 95–104. Worth AJ, Makris N, Patti MR, Goodman JM, Hoge EA, Caviness VS, Kennedy DN, “Precise segmentation of the lateral ventricles and caudate nucleus in MR brain images using anatomically driven histograms,” IEEE Transactions on Medical Imaging 1998, vol. 17, no. 2, pp. 303–310. Kaus MR, Warfield SK, Nabavi A, Black PM, Jolesz FA, Kikinis R, “Automated segmentation of MR images of brain tumors,” Radiology 2001, vol. 218, no. 2, pp. 586– 591. Holden M, Schnable JA, Hill DLG, “Quantifying small changes in brain ventricular volume using non-rigid registration,” MICCAI 2001, pp. 49–56. Baillard C, Hellier P, Barillot C, “Segmentation of 3D brain structures using level sets and dense registration,” IEEE Workshop on Mathematical Methods on Biomedical Image Analysis (MMBIA 2000), pp. 94–101. Wang Y, Staib LH, “Boundary finding with correspondence using statistical shape models,” Proceeding IEEE conference of computer vision and pattern recognition 1998, pp. 338–345. http://www-sig.enst.fr/tsi/groups/TII/active Sonka M, Tadikonda SK, Collins SM, “Knowledge-based interpretation of MR brain images,” IEEE Transactions on Medical Imaging 1996, vol. 15, no. 4, pp. 443–452. http://www.mevis.de/projects/volumetry/volumetry.html, Center for Medical Diagnostic Systems and Visualisation, University of Bremen. Fisher E, Rudick RA, “Method and system for brain volume analysis,” USA patent US006366797B1, 2002. Hu Q, Nowinski WL, “Method and apparatus for determining symmetry in 2D and 3D images,” PCT patent application, Jan 2002 (PCT/SG02/00006). Nowinski WL, “Modified Talairach landmarks”, Acta Neurochirurgica, 2001; 143(10): 1045–1057. Radiology of the skull and brain, ventricles and cisterns. Eds: Newton TH and Potts DG. MediBooks, Great Neck, NY, pp. 3494–3537. Robertson EG. Pneumoencephalography, ed. 2, Springfield Il. 1967, Thomas CC, Publisher.
Volumetric Texture Description and Discriminant Feature Selection for MRI Constantino Carlos Reyes-Aldasoro and Abhir Bhalerao Department of Computer Science, Warwick University, Coventry, UK. {creyes,abhir}@dcs.warwick.ac.uk Abstract. This paper considers the problem of classification of Magnetic Resonance Images using 2D and 3D texture measures. Joint statistics such as co-occurrence matrices are common for analysing texture in 2D since they are simple and effective to implement. However, the computational complexity can be prohibitive especially in 3D. In this work, we develop a texture classification strategy by a sub-band filtering technique that can be extended to 3D. We further propose a feature selection technique based on the Bhattacharyya distance measure that reduces the number of features required for the classification by selecting a set of discriminant features conditioned on a set training texture samples. We describe and illustrate the methodology by quantitatively analysing a series of images: 2D synthetic phantom, 2D natural textures, and MRI of human knees. Keywords: Image Segmentation, Texture classification, Sub-band filtering, Feature selection, Co-occurrence.
1
Introduction
There has been extensive research in texture analysis in 2D and even if the concept of texture is intuitively obvious it can been difficult to give a satisfactory definition. Haralick [7] is a basic reference for statistical and structural approaches for texture description, contextual methods like Markov Random Fields are used by Cross and Jain [3], and fractal geometry methods by Keller [9]. The dependence of texture on resolution or scale has been recognised and exploited by workers in the past decade. Texture description and analysis using a frequency approach is not as common as the spatial-domain method of co-occurrence [4] but there has been renewed interest in the use of filtering methods akin to Gabor decomposition [10] and joint spatial/spatial-frequency representations like Wavelet transforms [17]. Although easy to implement, co-occurrence measures are outperformed by such filtering techniques (see [13]) and have prohibitive costs when extended to 3D. The importance of Texture in MRI has been the focus of some researchers, notably Lerksi [5] and Schad [16], and a COST European group has been established for this purpose [2]. Texture analysis has been used with mixed success in MRI, such as for detection of microcalcification in breast imaging [4] and for knee segmentation [8], and in Central Nervous System (CNS) imaging to detect C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 282–293, 2003. c Springer-Verlag Berlin Heidelberg 2003
Volumetric Texture Description
283
macroscopic lesions and microscopic abnormalities such as for quantifying contralateral differences in epilepsy subjects [15], to aid the automatic delineation of cerebellar volumes [12] and to characterise spinal cord pathology in Multiple Sclerosis [11]. Most of this reported work, however, has employed solely 2D measures, usually co-occurrence matrices that are limited by computational cost. Furthermore, feature selection is often performed in an empirical way with little regard to training data which are usually available. Our contribution in this work is to implement a fully 3D texture description scheme using a multiresolution sub-band filtering (based on the Wilson and Spann [18] FPSS) and to develop a strategy for selecting the most discriminant texture features conditioned on a set of training images containing examples of the tissue types of interest (a 2D version of the method was presented in [14] without the 3D extension nor the feature selection). The ultimate goal is to select a compact and appropriate set of features thus reducing the computationally burden in both feature extraction and subsequent classification. We describe the 2D and 3D frequency domain texture feature representation and the feature selection method, by illustrating and quantitatively comparing results on 2D images and 3D MRI.
2
Materials and Methods
For this work three textured data sets were used: 1. 2D Synthetic phantom of artificial textures; random noise and oriented patterns with different frequencies and orientation, 2. 2D 16 natural textures from the Brodatz album arranged by Randen and Husøy [13]. This is a difficult image, the individual images have been histogram equalised, even to the human eye, some boundaries are not evident, 3. 3D MRI of a human knee. The set is a sagittal T1 weighted with dimensions 512 × 512 × 87, each pixel is 0.25 mm and the slice separation is 1.4 mm. Figure 1 presents the data sets, in the case of the MRI only one slice (54) is shown. Throughout this work we will consider that an image, I, has dimensions for rows and columns Nr × Nc and is quantised to Ng grey levels. Let Lc = {1, 2, . . . , Nc } and Lr = {1, 2, . . . , Nr } be the horizontal and vertical spatial domains of the image, and G = {1, 2, . . . , Ng } the set of grey tones. The image I can be represented then as a function that assigns a grey tone to each pair of coordinates: Lr × Lc ; I : Lr × Lc → G
2.1
(1)
Multiresolution Sub-band Filtering: The Second Orientation Pyramid (SOP)
Textures can vary in their spectral distribution in the frequency domain, and therefore a set of sub-band filters can help in their discrimination: if the image contains textures that vary in orientation and frequency, then certain filter subbands will be more energetic than others, and ‘roughness’ will be characterised by
284
C.C. Reyes-Aldasoro and A. Bhalerao
Fig. 1. Materials used for this work: three sets of images (a) Synthetic Phantom (2D), (b) Natural Textures (2D), (c) MRI of human knee (3D).
more or less energy in broadly circular band-pass regions. Wilson and Spann [18] proposed a set of operations that subdivide the frequency domain of an image into smaller regions by the use of compact and optimal (in spatial versus spatialfrequency energy) filter functions based on finite prolate spheroidal sequences (FPSS). The FPSS are real, band-limited functions which cover the Fourier half-plane. In our case we have approximated these functions with truncated Gaussians for an ease of implementation with satisfactory results (figure 3). These filter functions can be regarded as a band-limited Gabor basis which provides for frequency localisation. Any given image I whose centred Fourier transform is Iω = F{I} can be subdivided into a set of regions Lir × Lic : Lir = {r, r + 1, . . . , r + Nri }, 1 ≤ r ≤ i i Nr −Nri , Lic = {c, c+1, c , that follow the conditions: . . i. , c+Nc }, 1 ≤i c ≤ Nc −N i i i i j j Lr ⊂ Lr , Lc ⊂ Lc , = j. i Nr = Nr , i Nc = Nc , (Lr ×Lc )∩(Lr ×Lc ) = {φ}, i
Fig. 2. 2D and 3D Second Orientation Pyramid (SOP) tessellation. Solid lines indicate the filters added at the present order while dotted lines indicate filters added in lower orders. (a) 2D order 1, (b) 2D order 2, (c) 2D order 3, and (d) 3D order 1.
For this work, the Second Orientation Pyramid (SOP) tessellation presented in figure 2 was selected for the tessellation of the frequency domain. The SOP tessellation involves a set of 7 filters, one for the low-pass region and six for the high-pass, and they are related to the i subdivisions of the frequency domain as:
Volumetric Texture Description
i
Lr × L c ; F :
Lir × Lic → N (µi , Σ i ) ∀i ∈ SOP i i c (Lr × Lc ) → 0
285 (2)
where µi is the centre of the region i and Σ i is the variance of the Gaussian that will provide a cut-off of 0.5 at the limit of the band (figure 3).
Fig. 3. Band-limited Gaussian Filter F i (a) Frequency domain, (b) Spatial Domain.
The Feature Space Sωi in its frequency and spatial domains will be defined as: i Sw (k, l) = F i (k, l)Iω (k, l) ∀(k, l) ∈ (Lr × Lc ),
S i = F −1 {Sωi }
(3)
Every order of the SOP Pyramid will consist of 7 filters. The same methodology for the first order can be extended to the next orders. At ever step, one of the filters will contain the low-pass (i.e. the centre) of the region analysed, Iω for the first order, and the six remaining will subdivide the high-pass bands or the surround of the region. This is detailed in the following co-ordinate systems: Nc 3Nc 1 r Centre : F 1 : L1r = { N4r + 1 . . . 3N 4 }, Lc = { 4 + 1 . . . 4 }, Surround : Nr Nr Nr 2−7 3,4,5,6 2,7 2,3 : Lr = {1 . . . 4 }, Lr = { 4 + 1 . . . 2 }, Lc = {1 . . . N4c }, L4c = F Nc Nc 2Nc 6,7 c { 4 + 1 . . . 2 }, L5c = { N2c + 1 . . . 3N 4 }, Lc = { 4 + 1 . . . Nc }. For a pyramid of order 2, the region to be subdivided will be the first central region described by (L1r (1) × L1c (1)) which will become (Lr (2) × Lc (2)) with N (o) dimensions Nr (2) = Nr2(1) , Nc (2) = Nc2(1) , (or in general Nr,c (o + 1) = r,c2 , for any order o). It is assumed that Nr (1) = 2a , Nc (1) = 2b so that the results of the divisions are always integer values. The horizontal and vertical frequency domains are expressed by: Lr (2) = { Nr4(1) + 1 . . . 3N4r (1) }, Lc (2) = { Nc4(1) + 1 . . . 3N4c (1) } and the next filters can be calculated recursively: L8r (1) = L1r (2), L8c (1) = L1c (2), L9r (1) = L2r (2), etc. Figure 4 shows the feature space S i of the 2D synthetic phantom shown in figure 1(a). Figure 4(a) contains the features of orders 1 and 2, and figure 4(b) shows the features of orders 2 and 3. Note how in S 2−7 , the features that are from high pass bands, only the central region, which is composed of noise, is present. The oriented patterns have been filtered out. S 10 and S 20 show the activation due to the oriented patterns. S 8 is a low pass filter and still keeps a trace of one of the oriented patterns.
286
C.C. Reyes-Aldasoro and A. Bhalerao
Fig. 4. Two sets of features S i from the phantom image (a) Features 2 to 14 (Note S 10 which describes one oriented pattern) (b) Features 9 to 21 (Note S 20 which describes one oriented pattern). In each set, the feature S i is placed in the position corresponding to the filter F i in the frequency domain.
2.2
3D Multiresolution Sub-band Filtering
In order to filter a three dimensional set, a 3D tessellation (figure 2(d)) is required. The filters will again be formed by truncated 3D Gaussians in a octavewise tessellation that resemble a regular oct-tree configuration. In the case of MR data, these filters can be applied directly to the K-space. As in the 2D case, the low pass region will be covered by one filter, and for the high pass there are 28 filters. This tessellation yields 29 features per order. As in the two dimensional case, half of the space is not used because of the symmetric properties of the Fourier transform. The definitions of the filters follows the extension of the space of rows and columns to Lr × Lc × Ll with the new dimension l - levels. 2.3
Discriminant Feature Selection: Bhattacharyya Space and Order Statistics
Feature selection is a critical step in classification since not all features derived from sub-band filtering, co-occurrence matrix, wavelets, wavelet packet or any other methodology have the same discrimination power. In many cases, a large number of features are included into classifiers or reduced by principal components analysis (PCA) or other methods without considering that some of those features will not help to improve classification but will consume computational efforts. As well as making each feature linearly independent, PCA allows the ranking of features according to the size of the global covariance in each principal axis from which a ‘subspace’ of features can be presented to a classifier. Fisher linear discriminant analysis (LDA) diagonalises the features space constrained by maximising the ratio between-class to within-class variance and can be used together with PCA to rank the features by their ‘spread’ and select a discriminant subspace [6]. However, while these eigenspace methods are optimal and effective, they still require the computation of all the features for given data.
Volumetric Texture Description
287
We propose a feature selection methodology based on the discrimination power of the individual features taken independently, the ultimate goal is select a reduced number m of features or bands (in the 2D case m ≤ 7o, and in 3D m ≤ 29o, where o is the order of the SOP tessellation). It is sub-optimal in the sense that there is no guarantee that the selected feature sub-space is the best, but our method does not exclude the use of PCA or LDA to diagonalise the result to aid the classification. A set of training classes are required, which make this a supervised method. Four training classes of the human knee MRI have been manually segmented and each class has been sub-band filtered in 3D. Figure 5 shows the scatter plot of three bad and three good features arbitrarily chosen.
Fig. 5. Scatter plots of three features S i from human knee MRI (3D, order 2) (a) bad discriminating features S 2,24,47 (b) good discriminating features S 5,39,54 . Note that each feature corresponds to a filtered version of the data, therefore the axis values correspond to the grey levels of each feature.
In order to obtain a quantitative measure of how separable two classes are, a distance measure is required. We have studied a number measures (Bhattacharyya, Euclidean, Kullback-Leibler, Fisher [1]) and concluded that the Bhattacharyya distance [6] is the most convenient. The variance and mean of each class are computed to calculate a distance in the following way: B(a, b) =
1 ln 4
1 σa2 σ2 ( 2 + b2 + 2) 4 σb σa
+
1 4
(µa − µb )2 σa2 + σb2
(4)
where: B(a, b) is the Bhattacharyya distance between a − th and b − th classes, σa is the variance of the a − th class, µa is the mean of the a − th class, and a, b are two different training classes. The Mahalanobis distance used in Fisher LDA is a particular case of the Bhattacharyya, when the variances of the two classes are equal, this would eliminate the first term of the distance. The second term, on the other hand will be zero if the means are equal and is inversely proportional to the variances. B(a, b) was calculated for the four training classes (background, muscle, bone and tissue) of the human knee MRI (figure 1(c)) with the following results:
288
C.C. Reyes-Aldasoro and A. Bhalerao µ σ Background Muscle Bone Tissue Background 91 49 0 4.36 12.51 11.70 Muscle 696 140 4.36 0 3.25 3.26 Bone 1605 212 12.51 3.25 0 0.0064 Tissue 1650 227 11.70 3.26 0.0064 0
It should be noted the small Bhattacharyya distance between the tissue and the bone classes. These two classes have low discrimination power. For n classes with S i features, each class pairs (p) at feature i will have a Bhattacharyya distance B i (a, b), and that will produce a Bhattacharyya Space of dimensions Np = (n2 ) and Ni = 7o: Np × Ni . The domains are Li = {1, 2, . . . 7o} and Lp = {(1, 2), (1, 3), . . . (a, b), . . . (n − 1, n)} where o is the order of the pyramid. The Bhattacharyya Space, BS, is defined then as: Lp × Li ; BS : Lp × Li → B i (Sai , Sbi )
(5)
Np
whose marginal BS i = p=1 B i (Sai , Sbi ) is of particular interest since it sums the Bhattacharyya distance of every pair of a certain feature and thus will indicate how discriminant a certain filter is over the whole combination of class pairs. Figure 6(a) Shows the Bhattacharyya Space for the 2D image of Natural Textures shown in figure 1(b), and figure 6(b) shows the marginal BS i . The selection process of the most discriminant features that we propose uses the marginal of the Bhattacharyya space BS i that indicates which filtered feature is the most discriminant. The marginal is a set BS i = {BS 1 , BS 2 , . . . BS 7o },
(6)
which can be sorted in a decreasing order, its order statistic will be: BS (i) = {BS (1) , BS (2) , . . . BS (7o) }, BS (1) ≥ BS (2) ≥ . . . ≥ BS (7o) .
(7)
This new set can be used in two different ways, first, it provides a particular order in which the feature space can be fed into a classifier, and with a mask provided, the error rate can be measured to see the contribution of each feature into the final classification of the data. Second, it can provide a reduced set or sub-space; a group of training classes of reduced dimensions can show which filters are adequate for discrimination and thus use only those filters.
3
Classification of the Feature Space
For every data set the feature space was classified with a K-means classifier, which was selected for simplicity and speed. The feature space was introduced to the classifier by the order statistic BS (i) , one at the time (that is, the feature and the mean from the training data; k = 1, 2, 3, . . .) and for each additional feature included, the misclassification error was calculated. Figures 9 (c) and 10 (c) show the misclassification as the features are included and k increased.
Volumetric Texture Description
289
Fig. 6. Natural textures (a) Bhattacharyya Space BS (2D, order 5 = 35 features), i ((16 2 ) = 120 pairs), (b) Corresponding marginal of the Bhattacharyya Space BS .
Fig. 7. Human knee MRI (a) Bhattacharyya Space BS (3D, order 2) (34 ) = 6 pairs (b) Bhattacharyya Space(BS i (bone, tissue)).
Fig. 8. Two sets of features S i from different images: (a) Features 2 to 14 of the Natural Textures image (b) Features 2 to 14 from one slice of the human knee MRI.
290
C.C. Reyes-Aldasoro and A. Bhalerao
Figures 4 and 8 show the features spaces S i of the sub-band filtering process (for the MRI only one slice, 54 is shown). For the synthetic phantom two features S 10,20 highlight the oriented patterns. For the natural textures the S i is more complexbut still some of the textures are highlighted in certain features. For instance, S 2,7 highlight one of the upper central textures that is of high frequency. Note also that the S 3−6 , the upper row have a low value for the circular regions, i.e. they have been filtered since their nature is of lower frequencies. For the human knee S i the first observation is the high pass nature of the background in S 2,3,6,7 , which could be expected for the noise nature of the background, but S 4,5 , do not describe the background but rather the bone of the knee. S 8 is a low pass filtered version of the original slice. The Bhattacharyya Spaces in figures 6, 7 present very interesting information towards the selection of the features for classification. In the natural textures case, a certain periodicity can be found; the BS 1,7,14,21,28 have the lowest values. This implies that the low-pass features provide no discrimination at all. The human knee MRI Bhattacharyya Space (Figure 7)(a) was formed with four 32 × 32 × 32 training regions of background, muscle, bone. These training regions, which are small relative to the size of the data set, were manually segmented, and they will remain as part of the data to classify. It can be immediately noticed that two bands (S 22,54 , low-pass) dominate the discrimination while the distance of the pair bone-tissue is practically zero compared with the rest of the space. If the marginal were calculated like in the previous cases, the low-pass would dominate and the discrimination of the bone and tissue classes, which are difficult to segment would be lost. Figure 7 (b) zooms into the Bhattacharyya space of this pair. Here we can see that some features: 12, 5, 8, 38, . . . , could provide discrimination between bone and tissue, and the low pass bands could help discriminate the rest of the classes.
4
Discussion
Figure 9 (a) shows the classification of the 2D synthetic phantom at 4.3% misclassification with 7 features (out of 35). Of particular importance were features 10 and 20 which can be seen in the marginal of the Bhattacharyya space in figure 9 (b). The low-pass features 1 and 8 also have high values but should not be included in this case since they contain the frequency energy that will be disclosed in features 10 and 20 giving more discrimination power. The misclassification plot in figure 9 (c) shows how the first two features manage to classify correctly more than 90% of the pixels and then the next 5, which describe the central circular region, decrease the misclassification. If more features are added, the classification would not improve. The natural textures image present a more difficult challenge. Randen and Husøy [13] used 9 techniques to classify this image, interestingly, they did not used FPSS filtering. Some of their misclassification results were Dyadic Gabor filter banks (60.1%), Gabor filters (54.8%), co-occurrence (49.6%), Laws filters (48.3%), Wavelets (38.2%), quadrature mirror filters (36.4%). The misclassification of SOP filtering is 37.2%, placing this in second place. Figure 10(a) shows
Volumetric Texture Description
291
the final classification and figure 10(b) show the pixels that were correctly classified. The misclassification decreases while adding features and requires almost all of them in contrast with the synthetic phantom previously described. The most important figure of the materials is the Human knee MRI. The original data set consisted of 87 slices of 512 × 512 pixels each. The classification was performed with the low-pass feature, 54, and the ordered statistics of the bone-tissue feature space: S 12,5,8,39,9,51,42,62 . This reduced significantly the computational burden since only these features were filtered. The misclassification obtained was 8.1%. Several slices in axial, coronal and sagittal planes with their respective classifications are presented in figure 11. To compare the discrimination power of the sub-band filtering technique with the co-occurrence matrix, one slice of the human knee MRI set was selected and classified with both methods. The major disadvantage of the co-occurrence matrix is that its dimensions will depend on the number of grey levels. In many cases, the grey levels are quantised to reduce the computational cost and information is lost inevitably. Otherwise, the computational burden just to obtain the original matrix is huge.
Fig. 9. Classification of the figure 1(a), (a) Classified 2D Phantom at misclassification 4.13% (b) Marginal Distribution of the Bhattacharyya Space BS i . (Note the high values for features 10 and 20) (c) Misclassification per features included.
The Bhattacharyya Space was calculated with the same methodology and the 10 most discriminant features were Contrast: f2 (θ = 0, π2 , 3π 4 ), Inverse differπ 3π ), Variance f (θ = 0, , ), Entropy f11 (θ = 0, π2 , 3π ence moment: f5 (θ = 3π 10 4 2 4 4 ). The misclassification obtained with these 10 features was 40.5%. To improve the classification, the gray-level original data was included as another feature and in this case, with the first 6 features the misclassification reached 17.0%. With the SOP, this slice had a misclassification of 7%.
5
Conclusions
Three data sets were classified with our methodology. The first was a simple combination of artificial textures mainly for visualisation purposes. The second
292
C.C. Reyes-Aldasoro and A. Bhalerao
Fig. 10. Classification of the natural textures image (figure 1(b)) with 16 different textures (a) Classification results at 37.2% misclassification (b) Pixels correctly classified,(c) misclassification error for the sequential inclusion of features to the classifier.
Fig. 11. Human knee MRI and their classification (misclassification 8.1%) (a) Sagittal slice 45 (b) Axial slice 200 (c) Coronal slice 250 and (d) Rendering of the bone.
a combination of natural textures which are quite difficult to segment, these were included to show the power of the method. The third and most interesting was a 3D MRI set of a Human Knee, which was successfully segmented. The Second Orientation Pyramid Sub-Band Filtering is a powerful and simple technique to discriminate both natural and synthetic textures and extends well to 3D. The number of features can be drastically reduced by feature selection through the Bhattacharyya Space to a most discriminant subset, either from the marginal or an individual class pair distances. This feature selection technique can be applied with similar classification schemes like wavelets or co-occurrence where a number of features are to be discarded before classifying. Our results compared with the co-occurrence matrix and show the misclassification for the sub-band filtering is almost half for the MRI, and as good as Randen’s [13] for the
Volumetric Texture Description
293
natural textures. While co-occurrence is not easily extended to three dimensions, we can employ our feature selection method for effectively selecting a compact set of discriminant features for this scheme. This method could be linked to contextual classification methods that can improve the misclassification rates.
References 1. A. Bhalerao and N. Rajpoot. Selecting Discriminant Subbands for Texture Classification. In submitted to BMVC’U3, 2003. 2. COST European Cooperation in the field of Scientific and Technical Research. COST B11 Quantitation of MRI Texture. http://www.uib.no/costb11/, 2002. 3. G.R. Cross and A.K. Jain. Markov Random Field Texture Models. IEEE Trans. on PAMI, PAMI-5( 1):25–39, 1983. 4. D. James et al. Texture Detection of Simulated Microcalcification Suceptibility Effects in MRI of the Breasts. J. Mag. Res. Imaging, 13:876–881, 2002. 5. R.A. Lerski et.al. MR Image Texture Analysis – An Approach to Tissue Characterization. Mag. Res. Imaging, 11(6):873–887, 1993. 6. K. Fukanaga. Introd. to Statistical Pattern Recognition. Academic Press, 1972. 7. R.M. Haralick. Statistical and Structural Approaches to Texture. Proceedings of the IEEE, 67(5):786–804, 1979. 8. T. Kapur. Model based three dimensional Medical Image Segmentation. PhD thesis, AI Lab, Massachusetts Institute of Technology, May 1999. 9. J.M. Keller and S. Chen. Texture Description and Segmentation through Fractal Geometry. Computer Vision, Graphics und Image Processing, 45:150–166, 1989. 10. M. Eden M. Unser. Multiresolution Feature Extraction and Selection for Texture Segmentation. IEEE Trans. on PAMI, 11(7):717–728, 1989. 11. J.M. Mathias, P.S. Tofts, and N.A. Losseff. Texture Analysis of Spinal Cord Pathology in Multiple Sclerosis. Mag. Res. in Medicine, 42:929–935, 1999. 12. I.J. Namer O. Yu, Y. Mauss and J. Chambron. Existence of contralateral abnormalities revealed by texture analysis in unilateral intractable hippocampal epilepsy. Magnetic Resonance Imaging, 19: 1305–1310, 2001. 13. T. Randen and J. H˚ akon Husøy. Filtering for Texture Classification: A Comparative Study. IEEE Trans. on PAMI, 21(4):291–310, 1999. 14. C.C. Reyes-Aldasoro and A. Bhalerao. Sub-band filtering for mr texture segmentation. In Proceedings of Medical Image Understanding und Analysis, pages 185–188, Portsmouth, UK, July 2002. 15. N. Saeed and B.K. Piri. Cerebellum Segmentation Employing Texture Properties and Knowledge based Image Processing: Applied to Normal Adult Controls and Patients. Magnetic Resonance Imaging, 20:425–429, 2002. 16. L.R. Schad, S. Bluml, and I. Zuna. MR Tissue Characterization of Intracrianal Tumors by means of Texture Analysis. Mag. Res. Imaging, 11:889–896, 1993. 17. M. Unser. Texture Classification and Segmentation Using Wavelet Frames. IEEE Trans. on Image Processing, 4(11):1549–1560, 1995. 18. R. Wilson and M. Spann. Finite Prolate Spheroidal Sequences and Their Applications. IEEE Trans. on PAMI, 10(2):193–203, 1988.
CAD Tool for Burn Diagnosis Begoña Acha1, Carmen Serrano1, José I. Acha1, and Laura M. Roa2 1
Área de Teoría de la Señal y Comunicaciones, Escuela Superior de Ingenieros, University of Seville, Camino de los Descubrimientos s/n, 41092 Seville, Spain. {bacha, cserrano, acha}@us.es 2 Grupo de Ingeniería Biomédica, Escuela Superior de Ingenieros, University of Seville, Camino de los Descubrimientos s/n, 41092 Seville, Spain. [email protected]
Abstract. In this paper a new system for burn diagnosis is proposed. The aim of the system is to separate burn wounds from healthy skin, and the different types of burns (burn depths) from each other, identifying each one. The system is based on the colour and texture information, as these are the characteristics observed by physicians in order to give a diagnosis. We use a perceptually uniform colour space (L*u*v*), since Euclidean distances calculated in this space correspond to perceptually colour differences. After the burn is segmented, some colour and texture descriptors are calculated and they are the inputs to a Fuzzy-ARTMAP neural network. The neural network classifies them into three types of burns: superficial dermal, deep dermal and full thickness. Clinical effectiveness of the method was demonstrated on 62 clinical burn wound images obtained from digital colour photographs, yielding an average classification success rate of 82 % compared to expert classified images.
1 Introduction For a successful evolution of a burn injury it is essential to initiate the correct first treatment [1]. To choose an adequate one, it is necessary to know the depth of the burn, and a correct visual assessment of burn depth highly relies on specialized dermatological expertise. As the cost of maintaining a Burn Unit is very high, it would be desirable to have an automatic system to give a first assessment in all the local medical centres, where there is a lack of specialists [2], [3]. The World Health Organization demands that, at least, there must be one bed in a Burn Unit for each 500000 inhabitants. So, normally, one Burn Unit covers a large geographic extension. If a burn patient appears in a medical centre without Burn Unit, a telephone communication is established between the local medical centre and the closest hospital with Burn Unit, where the not-expert doctor describes subjectively the colour, shape and other aspects considered important for burn characterization. The result in many cases is the application of an incorrect first treatment (very important, on the other hand, for a correct evolution of the wound), or unnecessary displacements of the patient, involving high sanitary cost and psychological trauma for the patient and family.
C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 294–305, 2003. © Springer-Verlag Berlin Heidelberg 2003
CAD Tool for Burn Diagnosis
295
With the fast advances in technology, the Computer Aided Diagnosis (CAD) systems are getting more popular. However, nowadays, the research in the field of colour skin images is being developed slowly due to the difficulty of translating colour human perception into objective rules, analyzable by a computer. That is why automation of burn wound diagnosis is still an almost unexplored field. While there is hardly bibliography about burn depth determination by visual image analysis and processing [4] [5], one can find some research about the relationship between depth and superficial temperature [6], or other works trying to evaluate burn depth by using thermographic images [7], infrared and ultraviolet images [8], radioactive isotopes [9] and Doppler laser flux measurements [10]. These techniques have limitation not only in diagnosis accuracy but also in unallowable economical cost. Talking more generally about colour skin image processing, one can find two main applications in the literature [11]: the assessment of the healing of skin wounds or ulcers [12-16], and the diagnosis of pigmented skin lesions such as melanomas [1720]. The analysis of lesions involves more traditional image processing techniques such as edge detection and object identification, followed by an analysis of the size, shape, irregularity and colour of the segmented lesion. However, in wound analysis, although it is necessary to detect the wound border and to calculate its area, analysis of the colours within the wound site is often more important. Particularly, in the case of burn depth determination, we are not going to focus on the shape of the burn, because it is irrelevant in order to predict its depth. The main characteristics for this purpose are the colour and texture information, as they are what physicians observed in order to give a diagnosis. The developed system consists of the following steps: 1. Image acquisition. We have developed a new protocol for standardizing the image acquisition [2] [3]. 2. Segmentation. Many segmentation algorithms have been proposed in the literature, but none of them can be used as a standard, because most of them are highly application dependent [21]. Particularly, when segmenting burn wounds, generalpurpose segmentation algorithms are less effective because there are only slight differences between healthy and burnt skin, whereas there are other significant borders in the image. That is the reason why in this paper we proposed a new segmentation algorithm, which has been proven effective in segmenting burn wound images. Section 2 is devoted to describe this new algorithm. 3. Classification. Once the burnt part is segmented, we extract from it representative colour and texture descriptors, which will be the entries to a neural network classifier that will give the depth. We explain feature extraction and classification in Section 3.
2 Image Acquisition The image acquisition is carrying out by means of a digital photographic camera. The reasons for this choice are mainly due to its low cost to make feasible its practical implementation and its easy use. Any non-specialised person must be able to acquire data from the patient, because it is not possible to have an expert in each centre.
296
B. Acha et al.
Once the acquisition system is selected, we had to specify a protocol to acquire the image, that is, we have to develop a protocol to homogenise the patient information that should go with each photograph and another one about the way of taking the photograph. Medical specialists made the first one [2]. To determine the second one a pilot study was done, where an interdisciplinary group composed by burn specialists and non-specialists filled out questionnaires about image quality [2] [3]. The main points in the resulting protocol were: distance between camera and patient approximately of 40 cm, it should appear healthy skin in the image, the background should be a green/blue sheet (the ones used in hospitals), the flash has to be on, the camera should be placed parallel to the burn. As with this system what we want to solve is the problem of diagnosing when a patient arrives to the local medical centre in order to receive an adequate first treatment, all the images used for validating our algorithms have been taken by physicians within 24 hours of burn evolution. It should be noted that we have tried to do the procedure as easier as possible for physicians, what may mean that we have got more difficult images in order to automatically analyse them. First of all and trying to be closer to a hypothetic practical implementation of the system, the physicians, instead of us, took the photographs. Depending on the room where the physicians took the photographs, the illumination was different (some rooms had windows and some not). All of them have typical fluorescent lights and to try to homogenise as much as possible the illumination the flash had to be always on, so that the main quantity of light came from it. The camera was a digital Canon Power Shot 600.
3 Segmentation Algorithm The new colour segmentation algorithm that is proposed consists of four main steps: 1. Preprocessing step. 2. Conversion to a single channel image, where a pixel value is a measure of the similarity of its colour to the one to be segmented. 3. Automatic thresholding to achieve the segmented image. 4. Postprocessing step. It must be emphasized that the colour to be segmented is obtained from a selection box that the user has to select with the mouse. As it was pointed out in [22], it is very difficult to develop a completely automatic system. The reason lies on the normal and healthy skin colour properties. There is a large variability in the healthy skin, even within the same human race. On the other hand, for a non-expert physician, in fact for many people, it is easy to differentiate burnt skin from normal one, due to the experience. The problem is to differentiate among the different depths of the burn. We have represented some colour descriptors (H, S, u*, v*,...) of fifty 49×49 pixel images, belonging to both normal and burnt skin. In Figure 1a the chromatic coordinates of the L*u*v* colour space are represented, and in Figure 1b the saturation versus the hue coordinates are shown. It can be seen that there is a large variability in the colour coordinates for the 50 small images of healthy skin, as well as a strong overlap among
CAD Tool for Burn Diagnosis
297
healthy skin, blisters (superficial dermal) and brown-coloured full thickness burns1. Therefore, we can conclude that it is necessary the help of the user by selecting the colour of the burn to be segmented.
(a)
(b)
Fig. 1. (a) Comparison of u* and v* colour coordinates for 50 burn images per depth and healthy skin, where (o) is superficial dermal (red), () is superficial dermal (blisters), (+) is deep dermal, (x) is full thickness (creamy), (*) is full thickness (brown) and (◊) is healthy skin. (b) The same for saturation and hue coordinates
3.1 Preprocessing Step Before segmenting the image, it is preprocessed to make the regions more homogeneous. Therefore a low pass filter is required. But this filter must exhibit the property of preserving edges unaltered. A filter that fit these two requirements is the anisotropic diffusion filter [23]. Other low-pass filters, like gaussian, tend to blur the whole image, losing border locations. In order to perform the anisotropic diffusion, we follow the idea developed by Lucchese and Mitra of separating the diffusion of the chromatic and achromatic information [24] [22]. It calculates the hue and chroma components from the L*u*v* colour coordinate system. Once it has these two chromatic components, it forms a complex quantity as P = c ⋅ exp( jh) , where c and h denotes the chroma and hue components respectively. The anisotropic diffusion is carried out by means of the partial differential equations explained in [24] for the P and the lightness (L*) components. We have proven experimentally that doing the diffusion separately for the chromatic and achromatic channels yields better results than diffusing in the R, G and B planes.
1
There are three main depths of burns that can present five different appearances: superficial dermal (bright red colour or presence of blisters (usually brown colour)), deep dermal (pinkwhitish colour) and full thickness (beige-yellow colour or dark brown colour).
298
B. Acha et al.
3.2 Single Channel Image Conversion In this step a grey scale image is obtained from the diffused colour image. To this aim, the L*u*v* diffused image must be transformed to a measure which, for each pixel, is proportional to the similarity to the colour to be segmented. The single channel image is based on the Euclidean distance from a colour pixel to the centroid of the selection box selected by the user. To take into account the texture information, instead of calculating the distance from a pixel to a particular colour, we calculate the distance from a group of pixels to a mask of size L×L. The mask represents a selection box in the area to be segmented, that is, a small selection done with the mouse by the user in the burnt part of the image. This selection box must be slid as a mask along the image and, for the pixel in the centre of the mask position, the following operation must be performed:
f ( n, m ) =
n+∆
m+ ∆
∑ ∑ d ( p(i, j), w(i, j)) ,
(1)
i = n − ∆ j = m −∆
where ∆ = (L − 1) / 2 (L odd), p(i,j) represents a pixel in the image to be segmented in L*u*v* colour coordinates, w(i,j) is a pixel of the mask of size L×L, and d(⋅) represents the Euclidean distance between pixels p(i,j) and w(i,j).
3.3 Thresholding Operation After applying the former algorithm, we have a grey scale image where pixels with lowest values are those in the region to be segmented. As this image has been carefully designed to emphasize the burnt regions, a thresholding operation should suffice to get a good segmentation. Therefore, a thresholding process is applied to this grey scale image in order to get the segmented area. There are two possibilities to carry out the thresholding: 1. The user introduces the threshold manually. Most of the segmentation algorithms in the literature work this way. Depending on the application, and normally on the image, the threshold varies. 2. Automatic threshold selection. This is the most desired way to solve the thresholding problem, although the most difficult. But, as we are involved in a particular application and with images following a specific protocol, the finding of an automatic threshold is easier to carry out. Actually, there are some general purpose algorithms that do not need the introduction of a threshold. They usually give good results when the colours of the image are well differentiated. In our case, the colours of the different burn depths and healthy skin are very close to each other, so the application of this kind of algorithms does not yield good results.
CAD Tool for Burn Diagnosis
299
In previous works we have tested the images with manual thresholding. The results were very good, which is normal, because we can choose the best threshold for each image [25]. In further studies we developed an automatic thresholding algorithm consisting in applying a modification of Otsu’s method [22]. Results in these cases were also very good, but this algorithm is specific for the type of images we are working with, because it makes the assumption that the histogram has three main peaks: the most right one belonging to the background, the one in the middle belonging to the healthy skin, and the most left one belonging to the burnt skin. So, in case that the image does not follow the protocol correctly or that there is more than one type of burn in the same photograph, the algorithm may fail. In this work we present a new thresholding algorithm that can be useful not only for this kind of images, but for any type of image. It uses Otsu’s method, but with a previous step to find peaks in the histogram. As the input image for thresholding is a grey scale image where pixels with the lowest values belong to the region to be segmented, we want to determine the threshold which isolates the most left significant peak in the histogram. In order to carry out this task we first automatically find the most significant peaks in the histogram. The algorithm that finds these peaks is summarized in the following steps: 1. Find all peaks in the histogram. 1. Select the peaks in the new curve formed by the peaks found in step 1. 2. Remove non-significant peaks. Those peaks whose values are less than 1% of the maximum peak value are rejected. 3. Remove non-significant valleys. We calculate the minimum value of the pixels between two peaks. If this minimum is greater than 75% of the minimum peak value out of the two peaks, then no significant valley is considered. Once we have found all the significant peaks in the histogram, we know that the threshold found will be between the two most left peaks in the histogram. To find this threshold, Otsu’s method is applied to the histogram between these two peaks. Otsu’s method is an adaptive thresholding technique to split a histogram into two classes [26]. 3.4 Postprocessing Step With this processing step, the segmentation result is smoothed by removing small points that differ from their surroundings. To this end, a median filtering has been employed.
4 Classification Part Once the burn is segmented, we have to classify it into its depth. It has been proven that physicians determine the depth of a burn based on color perception, as well as on some texture aspects. This implies that if a color metric in accordance with human perception is applied, we will get a color feature adequate to attain our goal of classifying burn wounds. One of the color representation based on human color
300
B. Acha et al.
matching is the CIE L*u*v* color space, since it was designed so that intercolor distances computed using the ⋅ norm correspond to subjective color matching data. 2
In this study, we have employed a set of descriptors formed by first order texture parameters extracted from the three coordinates of the L*u*v* color space as well as from the hue and chroma measurements derived from them. More specifically, the descriptors chosen are: mean of lightness (L*), mean of hue (h), mean of chroma (c), standard deviation of lightness (σL), standard deviation of hue (σh), standard deviation of chroma (σc), mean of u*, mean of v*, standard deviation of u* (σu), standard deviation of v* (σv), skewness of lightness (sL), kurtosis of lightness (kL), skewness of u* (su), kurtosis of u* (ku), skewness of v* (sv) and kurtosis of v*. Afterwards it has been necessary to apply a descriptor selection method to obtain the optimum set for the subsequent classification. 4.1 Feature Selection The discrimination power of these 16 features is analyzed using the Sequential Forward Selection (SFS) method and the Sequential Backward Selection (SBS) method [27] via the Fuzzy-ARTMAP neural network which is detailed in the following subsection. SFS is a bottom-up search procedure where one feature at a time is added to the current feature set. At each stage, the feature to be included in the feature set is selected among the remaining available features which have not been added to the feature set. So the new enlarged feature set yields a minimum classification error comparing to adding any single feature. The algorithm stops when adding a new feature yields an increase of the classification error. The SBS is the top-down counterpart of the SFS method. It starts from the complete set of features and, at each stage, the feature which shows the least discriminatory power is discarded. The algorithm stops when removing another feature implies an increase of the classification error. To apply these two methods we have fifty 49×49 pixel images per each appearance (as there are five appearances, in all we have 250 49×49 pixel images) 2. The selection performance is evaluated by fivefold cross validation (XVAL) [28]. In this sense, the disadvantage of sensitiveness to the order of presentation of the training set, that the SBS and SFS methods present [27], is diminished. To perform the XVAL method the 50 images per burn appearance are split into five disjoint subsets. Four of these subsets (that is, 40 images per appearance) serve as training set for the neural network, while the other one (10 images) is used as validation set. Then, the procedure is repeated interchanging the validation subset with one of the training subsets, and so on till the five subsets have been used as validation sets. The final classification error is calculated as the mean of the errors for each XVAL run. The results of applying the SFS and SBS methods are summarized in Table 1. The average error is calculated counting the misclassifications and dividing by the total 2
The 250 49×49 pixel images are small images showing each one only one burn appearance (no healthy skin or background). Each 49×49 pixel image has been validated by two physicians as belonging to a particular depth.
CAD Tool for Burn Diagnosis
301
number of images used to validate. Looking at Table 1 we choose the SBS feature set (lightness, hue, standard deviation of the hue component, u* chrominance component, standard deviation of the v* component, skewness of lightness) as the entries to the neural network, due to the smaller average error. Table 1. Results of SFS and SBS methods for feature selection
Method
Feature set
SFS SBS
L*, H, σC, u*, v*,σv, sL L*, H, σH, u*, σv, sL
Average error 2% 1.6%
4.2 Fuzzy-ARTMAP Neural Network The classifier used is a Fuzzy-ARTMAP neural network. This type of network is based on the Adaptive Resonance Theory developed by Grossberg and Carpenter. Fuzzy-ARTMAP is a supervised learning classification architecture for analog-value input pairs of patterns [29]. The reasons for this choice are that Fuzzy-ARTMAP offers the advantages of well-understood theoretical properties, an efficient implementation, clustering properties that are consistent with human perception, and a very fast convergence. It has also a track record of successful use in industrial and medical applications [30]. Other strongpoints of this type of neural network are the small number of design parameters (the vigilance parameter, ρa∈[0,1], and the selection parameter, α>0) and that the architecture and initial values are always the same, independent of the application. The input parameters are the features selected by the SBS method (L*, H, σH, u*, σv, sL). The network classifies them into five regions (the first and the second belonging to superficial dermal depth, the third to deep dermal, and the fourth and fifth to full-thickness). So, the network has six neurons in the input layer, five neurons in the hidden layer and five neurons in the output layer.
5 Experimental Results This burn CAD tool was tested with 62 images (Caucasian race). The images are digital photographs taken by physicians following the acquisition protocol. All the images were diagnosed by a group of plastic surgeons, affiliated to the Burn Unit of the Virgen del Rocío Hospital, from Seville (Spain). The assessments were validated one week later, as it is the common practice when handling with burned patients.
5.1 Segmentation Results Figs. 2 to 4 show the segmentation results for some images of the three types of depth. Figs. a represent original images and Figs. b represent the segmented ones. In
302
B. Acha et al.
the segmented images we have marked with yellow colour the segmented region. In all the cases, the burn wound was segmented correctly from the normal skin. We have fixed the number of times the image is diffused to 10. 5.2 Classification Results
Classification results are summarized in Table 2. We have used 22 images with superficial dermal burns, 18 with deep dermal burns and 22 with full-thickness burns. The average success percentage was 82.26%. Table 2. Classification results
Burn depth Superficial dermal Deep dermal Full-thickness Average
Success percentage 86.36% 83.33% 77.27% 82.26%
All superficial dermal burns misclassified were classified by the network as deep dermal ones. All deep dermal burns were misclassified as superficial dermal ones. In general, this is also common among physicians; actually some burns are diagnosed as “intermediate depth”, when they are neither clearly superficial dermal nor deep dermal. For these burns it is necessary to wait one week in order to get the definitive assessment.
(a)
(b)
Fig. 2. Segmentation result for a superficial dermal burn. (a) Original image where the selection box selected by the user is shown in red. (b) Segmented image
CAD Tool for Burn Diagnosis
(a)
303
(b)
Fig. 3. Segmentation result for a deep dermal burn. (a) Original image where the selection box selected by the user is shown in red. (b) Segmented image
(a)
(b)
Fig. 4. Segmentation result for a full thickness burn. (a) Original image, which have both superficial dermal burn (the red part) and full-thickness burn (the whitish part). (b) Segmented image. In this case the user has selected a small box in one toe in order that the algorithm segments all the full thickness part of the burn. It segments correctly all the full thickness parts of the image regarding what physicians said.
6 Discussion and Conclusions In this paper a color image segmentation and classification method is proposed in order to determine the depth of a burn. We use digital color photographs taken by the physicians following a determined protocol. The system starts with a segmentation step, whose aim is to isolate the burn wound from the rest of the scene (healthy skin and background). In order to perform this step, we start with a preprocessing step (diffusion filtering and change of color space). Then a transformation from a three-plane image (L*u*v*) to one-plane image is performed, where it is taken into account color and texture information. From the gray-scale image a threshold is determined automatically to separate the burn from
304
B. Acha et al.
the background. The last processing step consists in a median filtering to homogenize regions. The segmentation algorithm works well for most of the database images, and it is useful not only for the images shown here that follow a specific protocol, but also for any kind of images. Once the burn is isolated, we extract from it six color and texture descriptors that will be the inputs to the classifier. The selection of the features has been carried out by the Sequential Forward and Backward Selection methods. We start from 16 texture and color characteristics and after applying the Sequential methods we get the six descriptors with the largest discrimination power. The six descriptors are the inputs to a Fuzzy-ARTMAP neural network which classified them as one of the possible depths a burn can present. We tested 62 photographs, yielding a classification average success percentage of 82.26%. Based on these results, we can conclude that our method shows a very good performance for segmenting and classifying the images into their burn depths. Acknowledgments. The authors thank Dr. Gómez-Cía, Dr. Torre and the Burn Unit of Virgen del Rocío Hospital, Seville (Spain), their invaluable help, for providing us the burn wound photographs and their medical advice.
References 1. 2.
Clarke, J.A.: A Colour Atlas of Burn Injuries. Chapman & Hall Medical, London (1992) Serrano, C., Roa, L.M., Acha, B.: Evaluation of a Telemedicine Platform in a Burn Unit. Proc. IEEE Int. Conf. on Information Technology Applications in Biomedicine, Washington (1998) 121–126 3. Roa, L.M., Gómez-Cía, T., Acha, B., Serrano, C.: Digital Imaging in Remote Diagnosis of Burns. Burns, 7 (1999) 617–624 4. Afromowitz, M.A., Van Liew, G.S., Heimbach, D.M.: Clinical Evaluation of Burn Injuries Using an Optical Reflectance Technique. IEEE Trans. on Biomedical Engineering, 2 (1987) 114–127 5. Afromowitz, M.A., Callis, J.B., Heimbach, D.M., DeSoto, L.A., Norton, M.K.: Multispectral Imaging of Burn Wounds: A New Clinical Instrument for Evaluating Burn Depth. IEEE Trans. on Biomedical Engineering, 10 (1988) 842–850 6. Wyllie, F.J., Sutherland, A.B.: Measurement of Surface Temperature as an Aid to the Diagnosis of Burn Depth. Burns, 2 (1991) 123–127 7. Cole, R.P., Jones, S.G., Shakespeare, P.G.: Thermographic Assessment of Hand Burns. Burns, 1 (1990) 60–63 8. Barsley, R.E., West, M.H., Fair, J.A.: Forensic Photography. Ultraviolet Imaging of Wounds on Skin. American Journal of Forensic Medical Pathology, 4 (1990) 300–308 9. Bennett, J.E., Kingman, R.O.: Evaluation of Burn Depth by the Use of Radioactive Isotopes – An Experimental Study. Plastic and Reconstructive Surgery, 4 (1957) 261–272 10. Niazi, Z.B.M., Essex, T.J.H., Papini, R., Scott, D., McLean, N.R., Black, J.M.: New Laser Doppler Scanner, a Valuable Adjunct in Burn Depth Assessment. Burns, 6 (1993) 485– 489 11. Berriss, W.P., Sangwine, S.J.: Automatic Quantitative Analysis of Healing Skin Wounds using Colour Digital Image Processing. http://www.smtl.co.uk/World-Wide-Wounds/1997/july/Berris/Berris.html#perednia1. (1997)
CAD Tool for Burn Diagnosis
305
12. Herbin, M., Bon, F.X., Venot, A., Jeanlouis, F., Dubertret, M.L., Dubertret, L., Strauch, G.: Assessment of Healing Kinetics through True Color Image Processing. IEEE Trans. on Medical Imaging, 1 (1993) 13. Arnqvist, J., Hellgren, L., Vincent, J.: Semiautomatic Classification of Secondary Healing Ulcers in Multispectral Images. Proc. of 9th International Conference on Pattern Recognition, Rome (1988) 459–461 14. Hansen, G.L., Sparrow, E.M., Kokate, J.Y., Leland, K.J., Iaizzo, P.A.: Wound Status Evaluation using Color Image Processing. IEEE Trans. on Medical Imaging, 1 (1997) 78– 86 15. Liu, J., Bowyer, K., Goldgof, D., Sarkar, S.: A Comparative Study of Textures Measures for Human Skin Treatment. Proc. of Int. Conf. on Information, Communications and Signal Processing ICICS’97, Singapur (1997) 170–174 16. Mekkes, J.R., Westerhof, W.: Image Processing in the Study of Wound Healing. Clinics in Dermatology, 4 (1995) 401–407. Summarized in: http://www.ncbi.nlm.nih.gov/htbinpost/Entrez/query?uid=8665449&form=6&db=m&Dopt=r 17. Fiorini, R.A., Crivellini, M., Codagnone, G., Dacquino, G.F., Libertini, G., Morresi, A.: DELM Image Processing for Skin-Melanoma early Diagnosis. Proc. SPIE – Int. Soc. Opt. Eng. Vol. 3164 (1997) 359–370 18. Thira, J.P., Macq, B.: Morphological Feature Extraction for the Classification of Digital Images of Cancerous Tissues. IEEE Trans. on Biomedical Engineering, 10 (1996) 1011– 1020 19. Hance, G.A., Umbaugh, S.E., Moss, R.H., Stoecker, W.V.: Unsupervised Color Image Segmentation with Application to Skin Tumor Borders. IEEE Engineering in Medicine and Biology. Jan/Feb (1996) 104–111 20. Zhang, Z., Stoecker, W.V., Moss, R.H.: Border Detection on Digitized Skin Tumor Images. IEEE Trans. on Medical Imaging, 11 (2000) 1128–1143 21. Pratt, W.K.: Digital Image Processing. 3rd edn. Wiley, New York (2001) 22. Serrano, C., Acha, B., Acha, J.I.: Segmentation of Burn Images based on Color and Texture Information. Proc. SPIE Int. Symposium on Medical Imaging, San Diego (CA, USA). (2003). To be published 23. Perona, P., Malik, J.: Scale-Space and Edge Detection using Anisotropic Diffusion. IEEE Trans. on Pattern Analysis and Machine Intelligence, 7 (1990) 629–639 24. Lucchese, L., Mitra, S.K.: Color Segmentation based on Separate Anisotropic Diffusion of Chromatic and Achromatic Channels. IEE Proc. Vision, Image and Signal Processing, 3 (2001) 141–150 25. Acha, B., Serrano, C., Acha, J.I.: Segmentation of Burn Images Using the L*u*v* Space and Classification of their Depths by Color and Texture Information. SPIE Int. Symposium on Medical Imaging, San Diego (CA, USA.). Vol. 4684, Part Three (2002) 1508–1515 26. Petrou, M., Bosdogianni, P.: Image Processing. The Fundamentals. Wiley, Chichester (U.K.) (1999) 27. Fukunaga, K.: Introduction to Statistical Pattern Recognition. 2nd edition, Morgan Kaufmann (Academic Press), San Diego, CA, (1990) 28. Ganster, H., Pinz, A., Röhrer, R., Wilding, E., Binder, M., Kittler, H.: Automated Melanoma Recognition. IEEE Trans. on Medical Imaging, 3, (2001) 233–239 29. Carpenter, G.A., Grossberg, S., Markuzon, S., Reynolds, J.H.: Fuzzy-ARTMAP: A Neural Network Architecture for Incremental Supervised Learning of Analog Multidimensional Maps. IEEE Trans. on Neural Networks, 5 (1992) 698–713 30. Donohoe, G.W., Nemeth, S., Soliz, P.: ART-based Image Analysis for Pigmented Lesions of the Skin. 11th IEEE Symposium on Computer-Based Medical Systems, (1998) 293–298
An Inverse Method for the Recovery of Tissue Parameters from Colour Images Ela Claridge and Steve J Preece School of Computer Science, The University of Birmingham, Birmingham B15 2TT, U.K. {e.claridge s.j.preece}@cs.bham.ac.uk
Abstract. The interpretation of colour images is presented as an inverse problem in which a mapping is sought between image colour vectors and the physiological parameters characterizing a tissue. To ensure the necessary oneto-one correspondence between the image colours and the parameters, the mapping must be unique. This can be established through testing the sign of the determinant of the Jacobian matrix, a multi-dimensional equivalent of a discrete derivative, over the space of all parameter values. Furthermore, an optimisation procedure is employed to find the set of filters for image capture which generate image vectors minimizing the mapping error. This methodology applied to interpretation of skin images shows that the standard RGB system of filters provides for a unique mapping between image values and parameters characterizing the normal skin. It is further shown that an optimal set of filters reduces the error of quantification by a factor of 2, on average.
1 Introduction Recently Cotton and Claridge proposed a novel approach to the interpretation of colour medical images [1]. It is based on the hypothesis that, because colours seen at the surface of a tissue reflect its internal structure and composition, it should be possible to recover this information from colour images of the tissue. This approach was successfully applied to skin imaging [2] and early clinical trials show its value in melanoma diagnosis [3]. The method consists of a modelling step, which needs to be carried out only once, and an interpretation step. In the first step a predictive mathematical model of tissue coloration is constructed. It requires data on the tissue’s laminar structure, the optical properties of the layers and an appropriate model of radiation transport. Given a specific set of tissue parameters, such as the thickness of the layers and the concentration of all the pigments, a radiation transport model computes a corresponding spectrum. Image vectors corresponding to a given image acquisition system are then obtained by convolving the spectrum with the appropriate RGB filter response curves. From the knowledge of physiology, the normal ranges for all the tissue parameters can be established and RGB image vectors computed for all combinations of the parameters. In this way a cross-reference between histology and colour, a model of colouration, is obtained. Provided that the mapping is unique, twoC.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 306−317, 2003. Springer-Verlag Berlin Heidelberg 2003
An Inverse Method for the Recovery of Tissue Parameters
307
way predictions are possible: from tissue composition to its colour; and, importantly, from the tissue colour to its composition. In the second step, the model of colouration is employed to interpret colour images of tissue. If tissue parameters are in the normal ranges, the corresponding colour will have its entry in the model and hence its histological parameters can be obtained. New images called parametric maps are constructed by reading the colours from the original image point by point, obtaining the tissue parameters from the model, and recording the magnitude of each parameter at the corresponding location of a respective parametric map. Instances of abnormal tissue constituents can be identified as they lie outside the range of colouration predicted by the model. Their parametric maps can also be constructed and be a subject of further interpretation. In the context of skin imaging, the method computes parametric maps depicting the concentration of epidermal melanin and dermal blood, the concentration of dermal melanin, if present, and the thickness of the papillary dermis. Figure 1 shows an image of a lesion (a melanoma) and the four parametric maps. Clinical features, related to physiology, are easy to detect in these maps. A stepwise diagnostic procedure using combinations of the features results in melanoma detection with 80.1% sensitivity and 82.7% specificity on a set of 348 lesions [3], which compares very well with other diagnostic methods.
(a)
(b)
(c)
(d)
(e)
Fig. 1.. (a) A colour image of a melanoma together with parametric maps showing: (b) total melanin (darker = more); (c) dermal melanin, whose presence suggests abnormality (brighter = more); (d) papillary dermis showing the collagen hole and peripheral increase (brighter = more); (e) dermal blood showing the absence in the centre and increase on the periphery (darker = more). These features are typical for melanoma and can be easily seen in the maps
One of the key factors on which the success of this method depends is the uniqueness of mapping between the tissue parameters and the image values. Cotton and Claridge’s justification of this fact is informal and it relies on a particular geometric distribution of image vectors in RGB colour space [1]. This paper develops the theoretical underpinnings for that work. The original approach is re-presented in the context of inversion methods used in optical tomography. This paper proposes a formal generic method for determining the uniqueness of mapping between tissue parameters and image vectors, which is the necessary condition for any inversion method. It then demonstrates that the RGB image vectors used in Cotton and Claridge’s work indeed provide a unique inverse solution. The paper shows further how to determine the optimal set of filters which minimise the uncertainty in quantification of tissue parameters. The optimal set of filters is computed for the normal skin and shown to reduce the error of quantification by a factor of 2 on average in comparison to standard RGB filters.
308
E. Claridge and S.J. Preece
2 Finding Tissue Parameters as an Inversion Process Deriving tissue parameters from a model of tissue colouration is akin to the inverse solution approach used in optical modelling and reconstruction ([4], abstract): “Given a set of measurements of […] light […] on the surface of the object, there exists a unique three-dimensional distribution of internal scatterers and absorbers which would yield that set. Thus imaging becomes a task of solving an inverse problem using an appropriate model of photon transport.” In the context of this work, light measurements are available in the form of image vectors (e.g. [ r g b]) and the solution sought is reduced to finding a two-dimensional distribution of scatterers and absorbers projected onto the surface. Importantly, the inversion method must provide a solution which is unique. In addition, to be of practical use, the solution has to be accurate enough for the intended clinical application. The uniqueness is an objective criterion and its existence can be formulated mathematically. The notion of “sufficient accuracy” is highly subjective and depends on the needs of the application domain. However, it is possible to replace it with a weaker condition which is amenable to mathematical treatment, namely that the error of estimation is at is minimum according to a given objective criterion.
3 Image Formation as a Two-Step Mapping For the purpose of this study the image formation process is presented as a sequence of two steps. In the first step the incident light interacts with an object. As a result of this interaction any remitted light has its spectrum altered depending on the object composition. In the second step, an image acquisition device captures and integrates specific portions of the remitted spectra and generates a single value corresponding to each spectral band. The result of collating all the bands is a pixel vector such as, for example, a typical [ r g b ] 3-dimensional vector. This process can be expressed mathematically as follows. Let p = {pk}k=1,…,K , p ∈ P
(1)
be a vector of k parameters which affect the object colouration, and P is the space of all possible parameter variations associated with a given object. A spectrum is represented by a set of values λm at a number of discrete wavelengths m
O = { λm }m=1,…,M, O ∈ Λ
(2)
and the space Λ defines all possible spectra that can be remitted from a given object. Image values captured by a camera with N optical filters are represented by a vector i = {in}n=1,…,N , i ∈ I
(3)
where I describes the space of all possible image values corresponding to parameters p ∈P. The mapping a defined as a:P→Λ
(4)
An Inverse Method for the Recovery of Tissue Parameters
309
represents the first step in the imaging process, denoting a function from parameter space to wavelength space. This mapping produces the spectral reflectance O for a given parameter vector p. The second step, i.e. the digital image capture, is represented by the mapping function b b:Λ→ I
(5)
In real imaging b is a mapping from the continuous space of spectra and is implemented by optical filtering. In discrete form b is a convolution of O with a filter response function R. The function for the n-th filter can be defined as in =
M
∑ Rmn λm
(6)
m=1
The matrix [ Rmn ] defines a set of filter response functions. For physically realisable filters all values Rmn are positive. The entire process of image formation can be represented by a two-step function f f=a°b
f:P→I
(7)
which describes the correspondence between specific parameters characterising the tissue and the pixel vector captured through a given set of optical filters.
4 Finding the Inverse Solution In terms of the framework introduced above, the aim of estimating the parameters which characterise a tissue from its colour (or more generally from its multi-spectral) image is to find an inverse function f -1 : I → P
(8)
such that f defines a unique, one-to-one mapping between the points in P and the points in I and p = f –1(i) exists for all p∈P. The solution will be sought by, first, specifying a forward mapping f and then testing its uniqueness. The first component of the mapping function, a, will take the form of a photon transport model, computing a spectrum corresponding to a given parameter vector. At this stage the function b, applying a set of filters to the spectra generated by a, will be assumed to be known. The uniqueness of mapping for f will be tested by examining the behaviour of the determinant of the Jacobian matrix over all p∈P. Next, the restriction on b will be removed and an optimization scheme will be applied to select a set of N filters such that f : P → Λ → I is unique and the error of mapping is at its minimum. 4.1 Generative Model of Colouration: Forward Mapping In order to perform mapping a, it is necessary to have either a mathematical model which can predict spectral reflectance for a given set of parameter values, or some technique for measurement of the appropriate spectra. Mathematical models provide a
310
E. Claridge and S.J. Preece
solution to the radiative transfer equation at a number of discrete wavelengths and can be implemented as either deterministic or stochastic algorithms. Deterministic algorithms include a Kubelka-Munk approximation [9] and radiative transfer equation (in [4], p842); the most common stochastic method is Monte Carlo simulation [5]. The input parameters required by the algorithms of either class can be subdivided into those which characterise the entire tissue type and those which characterise a specific instance of the tissue. The first group typically includes absorption and scatter coefficients and the second group includes variable factors such as thickness of tissue layers and concentrations of absorbing compounds. Function a, which effectively implements a solution to a radiative transfer equation, operates on the parameters of the second group whereas parameters of the first group are treated as constants. To test the uniqueness of mapping of a and, later on, to seek the optimal b which minimises the mapping error, it is necessary to ensure that the parameter space P covers the entire range of histologically valid concentrations and thicknesses. P has also to be appropriately discretised. The outline of the algorithm is as follows: given incident light absorption and scatter coefficients for all the components the spatial arrangement of the components for all concentrations and thicknesses of parameter p1 for all concentrations and thicknesses of parameter p2 . . . for all concentrations and thicknesses of parameter pK compute O = a( p1, p2, …, pK )
The image vectors i corresponding to the spectra O obtained above are computed by convolving each spectrum with a set of filter response functions, typically RGB. 4.2 Testing for Uniqueness of the Solution Finding a unique vector of parameters p corresponding to a given image vector i corresponds to finding a unique solution to the equation p = f -1( i )
(9)
In one-dimensional case, i.e. when f is a single-valued function on a single variable, one-to-one correspondence is ensured if and only if f is monotonic on the entire domain P of p. Mathematically this can be expressed as |
df | > 0 ∀ p∈P dp
(10)
Drawing on differential geometry, the inverse function theorem [6] enables us to derive an equivalent condition for the case when f is a (discrete) vector valued function of a vector variable. Jacobian matrix defined as
J =
An Inverse Method for the Recovery of Tissue Parameters
311
∂f1 ∂pk ... ∂fn ∂pk
(11)
∂f1 ∂p1 ... ∂fn ∂p1
∂f1 ∂p2 ... ∂fn ∂p2
... ... ...
is a multi-dimensional equivalent of the one-dimensional derivative. The inverse function theorem states that if the determinant of the Jacobian matrix, det(J), is nonzero at a point p = p0 then there exists a neighbourhood around p0 where f can be approximated linearly. It follows then that within this neighbourhood there is one-toone mapping between p∈P and i = f(p) ∈ I. The condition det(J) ≠ 0 ∀ p∈P
(12)
is necessary but insufficient if P is discrete, because there may be instances of p not in P where f has “turning points” – local extrema. Such points are characterised by the change of sign in det(J). If det(J) has the same sign over the whole domain P, f does not have any local extrema and thus is monotonic over the whole domain of P. Thus to ensure the uniqueness of mapping over the whole of P we require that the determinant of the Jacobian matrix is either strictly positive or strictly negative for all p∈P. Then there exists a neighbourhood around each p∈P where there is one-to-one mapping between parameters p and image vectors i. Within such neighbourhood the inverse function f -1 can be expressed as dp = J-1 di
(13)
where dp = p – p0 and di = i – i0, i0 = f(p0). 4.3 Finding Optimal Filters The uniqueness conditions formulated above ensure that for a given image vector i and a given function f there is a unique set of corresponding parameters p characterising the tissue at the specific image location. In practice mapping f is not error free and even when in theory the following is true f -1(pj) ≠ f -1(pk) for j≠k,
(14) -1
-1
a camera digitisation error σcam can be such that | f (pj) − f (pk) | < σcam and it is impossible to distinguish pj and pk. In this situation only the requirements of a specific application domain can determine what level of uncertainty makes the parameter recovery clinically useful. Although it is not possible to find a generic criterion for the level of uncertainty, it is possible to find a criterion for finding a mapping which minimizes a mapping error. Drawing on the definition of f = a ° b, it can be seen that whereas a is determined by the physics of image formation, b can be decided by the user. Specifically, by varying the filters, the mapping error can be varied also. The two main sources of error are the uncertainties in the absorption and scatter coefficients (σspec) and the camera digitisation error (σcam). σspec captures a mainly
312
E. Claridge and S.J. Preece
experimental error and is commonly available, together with the coefficient data (e.g. as the standard deviation at each wavelength [10]). Spectral errors associated with each parameter vector p are then calculated using standard error propagation. Partial derivatives ∂p/∂i are obtained from J-1. From statistical error analysis [7] ∂p σspec =∑[( ∑( σin2 )1/2 (15) ∂i k
n
The second error, σcam, is derived from the camera SNR and is also wavelength dependent. Using statistical error analysis it is possible to estimate the error from each source for a given pk (for details see [8]). As the two sources are independent, the respective errors can be combined to give the overall error estimate σpk =
σspec2 + σcam2
(16)
It is now possible to define an optimisation scheme which selects a set of N filters such that the mapping between p and i is unique for all p∈P and that the error of mapping (16) is minimised. A typical optical filter can be defined, for example, by the central wavelength (CW) and full width at half maximum (FWHM). For N filters this defines an N x 2 search space. The use of the Jacobian to test the existence of the unique solution requires the computation of the image vector, i = f(p) = b( a(p) ) and in practice filter response functions are represented in the form of a matrix [Rmn] (see equation (6)). The outline of the optimization algorithm is as follows: until the stopping criterion is met n 1. define a new set of filters Rm (defined by CW & FWHM) n 2. for a given set of filter response functions Rm for each point p within a discretised parameter space n compute the image vector i = b(Rm , a(p)) 3. check that the Jacobian is either strictly positive or strictly negative for ALL the points p if true compute the inverse Jacobian matrix if false, return to step 1 4. compute the error of parameter recovery and use it to compute a stopping criterion.
5 Applying Methodology to Skin Imaging – Procedure and Results The above analysis is now applied to the problem of computing quantitative parameters characterising the normal human skin from colour images. Earlier work by Cotton and Claridge showed the feasibility of deriving such parameters using a model of image formation [1], and also the clinical utility of the resulting parameteric maps in early diagnosis of melanoma [3]. In this paper we demonstrate that the derived histological parameters have one-to-one correspondence to RGB image vectors. We show further that the error in recovered parameter values can be reduced by selecting filters optimised for the specific set of skin parameters.
An Inverse Method for the Recovery of Tissue Parameters
313
5.1 Optical Model of the Normal Skin The skin is modelled as a two-layer structure. The optical characteristics of the upper layer, epidermis, are determined by a pigment melanin which strongly absorbs light in the blue range of the spectrum. Any light which is not absorbed enters the next layer, the dermis. A proportion of light is absorbed there by haemoglobin and oxyhaemoglobin in the blood, but most is scattered by collagen back towards the skin surface. On the way back light is absorbed again by melanin in the epidermis and what remains is remitted and registered by a camera. Absorption coefficients for melanin and haemoglobin and scatter coefficients for dermal collagen are the constants in the modelling process. The variable quantities are the concentration of melanin, concentration of blood pigments and thickness of the papillary dermis, making the dimension of the parameter space P to be 3. The spectra corresponding to the histologically valid parameter values characterising melanin and blood were computed using Kubelka-Munk theory [9]. Monte Carlo simulations return virtually the same spectra, but they are much more time consuming. The absorption and scatter coefficients were taken from published data [10]. Parameter ranges for melanin, blood and papillary dermis were discretised to define respectively 10 x 10 x 4 spectra. This choice of discretisation was arbitrary, but additional experiments showed that it did not affect the results of the uniqueness test. Figures 2a-c show the changes in the remitted spectrum effected by changes in the levels of melanin, the blood pigments and thickness of the papillary dermis. 5.2 Testing One-to-One Mapping for RGB Filters Each spectrum generated above was convolved with the R, G and B filter (CW=610nm, 550nm and 450nm respectively; FWHM=60) yielding 10 x 10 x 4 image vectors, i = f(p) corresponding to 10 x 10 x 4 parameters p = [pmel, phaem, ppapd]. The det(J) was positive for every p, thus showing the uniqueness of mapping between the parameters and the image vectors for the RGB filters. This result confirms an earlier insight based on geometric interpretation of the mapping [1], that the parameters can be derived uniquely from RGB image values. Parametric maps based on RGB filters have already been validated on a large data set of 348 lesions [3]. 5.3 Computing Optimum Filters The standard RGB filters produce parametric images which provide the clinician with diagnostically important information about the state of the skin. However, there may exist combinations of filters which further decrease the uncertainty of mapping. To define filters which minimise the mapping error, an optimisation procedure was implemented following the algorithm in section 4.3. A genetic algorithm (GA) provided in MATLAB Statistic Toolbox was used to drive the optimisation. Three filters are required for the unique mapping of three parameters. Each filter is defined by two parameters, the central wavelength and FWHM, thus defining a 6dimensional search space. The search space was constrained so that a filter has to lie
314
E. Claridge and S.J. Preece
entirely within the visible part of the spectrum (400-700nm) and its FWHM must be within the range 25-100nm. The fitness function for the GA computations was defined as a reciprocal of the sum of the camera and the spectral errors. The GA was initialised using random seeds and run 5 times to ensure that the results were not dependent on the starting values.
(a)
(b)
(c) Fig. 2. Remittance spectra of the normal human skin (a) for varying melanin concentration; (b) for varying blood concentration (oxy- and de-oxyhaemoglobin in equal proportions); (c) for varying thickness of the papillary dermis
5.4 Results and Discussion The results of the runs were fairly consistent and returned the filter parameters as shown in Table 1. The combined spectral and camera error (equation 16) associated with the recovery of melanin, blood and papillary dermis parameters was computed for several combinations of these parameters. Tables 2, 3 and 4 show typical errors for the optimal filters. The errors lie in the range 0.18-0.94 for melanin, 0.25-1.16 for blood and 0.07-0.14 for papillary dermis. Errors for the RGB filters (not shown here in full because of lack of space) vary between 0.35 and 1.8 for melanin, 0.6 and 1.6 for blood and 0.07 and 0.35 for papillary dermis.
An Inverse Method for the Recovery of Tissue Parameters
315
Table 1. Filter parameters returned by the optimisation procedure. Filter 1 2 3
CW (nm) 485 ± 0.75 560 ± 0.73 700 ± 0.01
FWHM (nm) 24 ± 0.90 14 ± 0.55 95 ± 0.72
Table 2. Error in recovery of melanin level for the fixed thickness of the papillary dermis (level 2) and varying levels of melanin and blood. Every other level is shown. Error is relative to melanin level, e.g. for melanin level 4 and blood level 2, the recovered melanin level is 4±0.23 Blood Melanin level level 2 4 2 0.21 0.23 4 0.28 0.31 6 0.37 0.41 8 0.49 0.54 10 0.66 0.72
6 0.25 0.34 0.45 0.59 0.79
8 0.28 0.37 0.49 0.64 0.86
10 0.30 0.40 0.53 0.70 0.94
Table 3. Error in recovery of blood level for the fixed thickness of the papillary dermis (level 2) and varying levels of melanin and blood. Every other level is shown. Error is relative to blood level, e.g. for melanin level 4 and blood level 2, the recovered melanin level is 2±0.35 Blood Melanin level level 2 4 2 0.29 0.35 4 0.34 0.41 6 0.39 0.48 8 0.46 0.56 10 0.54 0.65
6 0.42 0.49 0.58 0.67 0.79
8 0.51 0.60 0.70 0.82 0.96
10 0.62 0.73 0.85 0.99 1.16
Table 4. Error in recovery of papillary dermis thickness level for a fixed thickness of the papillary dermis (level 2) and varying levels of melanin and blood. Every other level is shown. Error is relative to the papillary thickness level, e.g. for melanin level 4 and blood level 2, the recovered thickness level for papillary dermis level is 2±0.074 Blood level 2 4 6 8 10
Melanin level 2 4 0.073 0.074 0.079 0.080 0.086 0.087 0.094 0.095 0.102 0.109
6 0.075 0.081 0.088 0.096 0.119
8 0.075 0.082 0.089 0.098 0.130
10 0.076 0.082 0.089 0.107 0.141
It is interesting to analyse the choice of optimal filters by relating them to the graphs showing the variability of the individual parameters (figure 2). A fairly broad filter centered at 700nm is a clear choice for the papillary dermis thickness. At this wavelength variations related to blood and melanin are much smaller in comparison to variations related to papillary dermis. A filter centered at 560nm corresponds to a range of wavelengths where sensitivity to change in blood level is very high; e.g. the peak absorption for oxyhaemoglobin is at 558nm. The first filter, at 485nm, coincides
316
E. Claridge and S.J. Preece
with the maximum variability related to melanin levels and it carefully avoids encroaching on the higher range of wavelengths where variability in blood levels begins to show. The comparison of error levels for the optimal filters and the RGB filters shows clearly the advantage of carefully tuning the spectral parameters of the imaging system. It could be argued that the above comparison is somewhat unfair with respect to the actual working solution for the skin imaging which uses one near-infrared filter to estimate the thickness of the papillary dermis in addition to three RGB filters for quantification of the remaining two parameters. Bearing in mind that the optimization process constrained the filters to lie within the range between 300 and 700 nm, it is very likely that filter 3, currently at the top of this range at 700nm, would by placed further towards infra-red. In another study [11] the authors sought to find a set of filters which would optimise recovery of melanin and blood only, assuming a known thickness of the papillary dermis. That work showed that using just two optimal filters error reduction up to 20% can be achieved in comparison to the standard RGB filters.
6 General Discussion This paper makes several contributions in the area of medical image interpretation. In terms of methodology, it shows that an existing scheme [1] for derivation of tissue parameters from colour images can be viewed as a subset of optical reconstruction methods [4]. Within this framework the problem of finding a mapping between image colours and tissue parameters is formulated in terms of finding a unique inverse solution, given the forward mapping derived from the physical model of image formation. Specifically, the solution involves computing the Jacobian matrix of partial derivatives of image values with respect to parameters characterizing the tissue. A simple test is used to establish the uniqueness of the mapping. Once this condition is confirmed, the inverse mapping can be implemented in a number of ways. This approach is generic and is the main contribution of this paper. In addition, by formulating the mapping as a two step process, it is possible to find the optimal set of filters which minimize the mapping error. Although the method is not tomographic, it is capable of producing parametric maps which show the surface distribution and the magnitudes of individual tissue components. The depth resolution is poor, being limited by both absorption and scatter. Within-the-surface resolution varies depending on scatter, so that small details are not always resolved. However, the maps showing gross (low-passed) distribution of parameters are still likely to give useful diagnostic insight into the tissue properties. The methodology involved and the nature of results are similar to quantitative spectroscopy. The main difference is that information is shown in the form of an image, capturing not only quantitative data, but also spatial patterns and relationships. The recently developed fNIR imaging [12] is similar in this respect. Interpretation of hyperspectral images is also likely to benefit from the ability to find the spectral bands specifically optimized for a task in hand. The selection of optimal filters is carried out only once, prior to data acquisition and from there on images can be
An Inverse Method for the Recovery of Tissue Parameters
317
acquired using a standard camera equipped with a small number of optical filters. This reduces cost, storage and computational requirements.
7 Conclusions This work has laid down the foundations for a generic image interpretation method which allows one to compute histologically based parameters characterizing epithelial tissues. Given the optical characteristics of the tissue components and their laminar structure, it is now possible to establish, thorough a formal analysis, whether a unique correspondence exists between the tissue parameters and its colours. When this condition is met, the method can estimate tissue composition from optical images captured through a small number of filters chosen to optimize the parameter recovery. A substantial body of experimental work on skin imaging showed that the method offers a unique in-vivo insight into the tissue structure and can improve diagnosis. The theory presented in this paper should enable this kind of analysis to be readily extended to other imaging domains, including other epithelial tissues. Indeed, ongoing work on ocular fundus and colon imaging shows that this is the case.
References 1 Cotton SD, Claridge E, Hall PN. Noninvasive skin imaging, Information Processing in Medical Imaging (LNCS 1230) (1997) 501--507 2 Claridge E, Cotton S, Hall P, Moncrieff M. From colour to tissue histology: Physics based interpretation of images of pigmented skin lesions. Medical Image Computing and Computer-Assisted Intervention - MICCAI’2002. Dohi T and Kikinis R (Eds). LNCS 2488, vol I : Springer (2002) 730--738 3 Moncrieff M, Cotton S, Claridge E, Hall P. Spectrophotometric Intracutaneous Analysis: a new technique for imaging pigmented skin lesions. Br J Derm 146(3) (2002) 448--457 4 Arridge S, Hebden J. Optical imaging in medicine: II. Modelling and reconstruction. Phys Med Biol 42 (1997) 841--854 5 Prahl SA et al. A Monte Carlo Model of light propagation in tissue. SPIE Institute Series IS 5 (1989) 102-111 6 Lipschutz MM. Differential Geometry: McGraw-Hill (1969) 7 Kendall MG, Stuart A. The Advanced Theory of Statistics. Volume 1: Distribution Theory (3rd Ed): Charles Griffin & Co (1969) 8 Preece SJ, Claridge E. A technique for the geometry insensitive recovery of parameters which characterise human skin. Journal of the Optical Society of America (submitted) 9 Egan WG, Hilgeman TW. Optical Properties of Inhomogeneous Materials: Academic Press (1979) 10 Anderson R, Parrish BS, Parrish J. The optics of human skin. The Journal of Investigative Dermatology 77(1), 13--19 (1981). 11 Preece SJ, Claridge E. Physics-based quantitative image interpretation using material specific spectral characterisation models. IEEE Pattern Analysis and Machine Intelligence (submitted) 12 Hoshi Y, Chen SJ, Tamura M Spatiotemporal imaging of human brain activity by functional near-infrared spectroscopy. American Laboratory 33 (20) 35--39 (2001)
Ideal Observer Model for Detection of Blood Perfusion and Flow Using Ultrasound Roger J. Zemp1, Craig K. Abbey, and Michael F. Insana Department of Biomedical Engineering, University of California, Davis Davis, CA 95616
Abstract. An ideal observer model is developed for the task of detecting blood perfusing or flowing through tissue. The ideal observer theory relies on a linear systems model that describes tissue and blood object functions and electronic noise as random processes. When aliasing is minimal, the system is characterized by a quantity similar to Noise-Equivalent Quanta used in photon imaging modalities. A simple 1-D model is used to illustrate the effect of the system and object parameters on task performance. Velocity and decorrelation are seen to be advantageous for detection. Aliasing can degrade performance. The ideal observer model provides a framework for assessing the performance of Power Doppler ultrasound systems, and may aid in their design.
1 Introduction Estimation of blood velocity has been widely dealt with in the medical ultrasound literature [1]. Less has been done on detection of blood. Our motivation for studying detection is angiogenesis in tumors. Increased microvascular density and metabolic demand have been correlated with malignant growth. In many cases this means a greater flow rate through capillary networks. Various techniques have been used to quantify blood volume fraction and flow rates to monitor tumor progression [2]. Perfusion quantification could be an extension of this work, but we focus early detection of neoplasm perfusion with ultrasonic techniques. Color Doppler Ultrasound systems display mean velocity estimates of blood motion, and have been effective for looking at larger vessels. These systems typically send a sequence of pulses and measure inter-pulse motion to estimate flow. The systems consist of a wall filter to reject stationary tissue clutter, and a mean velocity estimator that often relies on lag-0 and lag-1 autocorrelation estimates [3]. Crosscorrelation techniques also exist, some of which provide sensitivity to velocity in all directions [4]. Unfortunately, color Doppler systems are not very effective for studying perfusion in the microvasculature where in a single sample volume blood may be moving in multiple directions, giving a zero mean velocity estimate. Slow flow in perfusion-based models is another problem since clutter and blood spectra will often overlap. Power Doppler systems [5] do not attempt to estimate velocity but rather display the signal power from blood that is moving. Hence, Power Doppler systems are often
1
Electronic Mail: [email protected]
C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 318–329, 2003. © Springer-Verlag Berlin Heidelberg 2003
Ideal Observer Model for Detection of Blood Perfusion and Flow Using Ultrasound
319
much more effective for detection of blood than Color Doppler techniques. Power Doppler is approximately independent of blood-flow velocity and Doppler angle, provided the Doppler frequency shift is non-zero. Another important feature of Power Doppler estimates for microvascular flow imaging is that they are not subject to aliasing, enabling use of a lower pulse repetition frequency (PRF) than that necessary in color Doppler. Power Doppler ultrasonography is therefore more sensitive to slow flow than color Doppler. Spectral integration means that noise is also less of a problem than in color Doppler. High gain settings are therefore possible with power Doppler. Tissue motion is however a problem. Tissue clutter will tend to bias the result, dominating the total energy. Wall filters may be used to partially reject tissue clutter. Power Doppler systems can use lag-0 autocorrelation estimates already computed for Color Doppler mode imaging. The post-wall filtered lag-0 autocorrelation simply gives the power in the moving signal, by Parseval’s theorem. Despite wide commercial implementation, very little has been written in the literature about engineering approaches used in Power Doppler, and there are no first principle performance assessment techniques for Power Doppler systems. This is our contribution. We provide a framework for modeling the ideal observer for detection of blood. The ideal observer provides the upper bound on detection performance for a Power Doppler system. It can be applied to standard measurement approaches to evaluate system design and wall filters, and it provides a framework for improvements.
2 Modeling Perfusion and Echo Signals 2.1 Signal Models In Doppler ultrasound systems ensembles of L pulses are sent in sequence along axial lines of sight at a rate called the pulse repetition frequency (PRF). The echoes from each pulse return to the transducer following their propagation and scattering by microstructures in the body. The measured voltage trace from a single pulse is indicative of scattering strength as a function of depth. The voltage signal is sometimes called radio-frequency (RF) data because it has an underlying modulation (due to the transmitted pulse) in the frequency range of MHz. The modulation of the signal allows for phase sensitive measurements to be made. Motion between pulses can be used to estimate velocity. With L pulses, there will be L voltage traces initiated at times t1, …, tL, where tk = (k-1)∆t , k = 1, ..., L, and ∆t is the pulse repetition interval (PRI). These are called slow times. Voltage traces for each pulse are acquired in fast time. Fast time acquisition rates fs are usually 10 times the modulation frequency or better, whereas slow time pulse-repetition frequencies are more typically in the order of magnitude of 1-100 KHz. A voltage trace for each pulse sent is collected before the next pulse is sent. After the entire ensemble of pulses is sent along one line of sight, the beam is translated mechanically or ‘steered’ electronically (using phased array transducers) to interrogate an adjacent line of sight with another ensemble of pulses (Fig. 1).
320
R.J. Zemp, C.K. Abbey, and M.F. Insana
Fig. 1. Illustration of pulse sequence of a typical Doppler ultrasound system
To investigate estimation and detection problems, we assume that echo signals due to Doppler pulse sequences may be modeled as a shift-invariant linear system. While ultrasound systems have spatially varying beam properties, many modern phased array systems use special techniques such as dynamic focusing to obtain a reasonably uniform resolution cell throughout much of the image. We assume that the ultrasound system can take a sequence of ‘snapshot images’ of the object, and that the object can vary over time.
r (x, tk ) = h(x) *[zb (x, tk ) + zc (x, tk )]+ n(x,t k )
(1)
x
where r is the measured voltage trace due to the echo signal and n is a signal independent noise process. The shift-invariant function h(x) is the system pulse-echo impulse response, which defines the spatial resolution, and represents acoustic transmission, scattering from a point source, and reception. The object function consists of two components: blood and tissue clutter acoustic impedances represented as zb and zc respectively, defined continuously over times t and discretely over spatial locations x = [x1 x 2 … x M] Ωx of measure X, and where each sub-vector xk defines a spatial coordinate point. The operation of convolution with h may alternatively be represented in operator notation as H. Hence (1) may be written as r = H zb + H zc + n , where r is an RF data vector for the experiment. The signal model is graphically illustrated in Fig. 2. t
Spatial Fourier Domain. It will be convenient to study the problem in the spatial Fourier Domain:
R(u, tk ) = H (u)[Z b (u, tk ) + Z c (u, tk )]+ N (u, tk ) .
(2)
One advantage of this form is that that for large image sizes, a shift-invariant system and WSS object functions, the Fourier transform is a Karhunen-Loeve transformation that uncouples signals in different spatial frequency channels uk. We will show later that the blood may be treated as a non-stationary process in special cases. R will often be represented in vector form
R = [R(u1 , t1 ),..., R(u1 , t L ),..., R(u M , t1 ),..., R(u M , t L )] t .
Ideal Observer Model for Detection of Blood Perfusion and Flow Using Ultrasound
321
Fig. 2. Illustration of the linear systems model (1) neglecting clutter.
Assumptions. Our model makes a number of assumptions besides linear shiftinvariance. We assume that spatial sampling is done sufficiently densely such that the discrete object function over space is a reasonable model. We assume that echo signals have negligible interference from echoes due to previous pulses. This is reasonable because round trip signal attenuation in tissue is typically sufficiently low compared to the signals of interest and compared to the dynamic range of the hardware. We also assume that the object moves negligibly during the signal acquisition time due to one pulse within an ensemble. Thus, we only consider translational velocities that are much less than the speed of sound co. The examples in this paper will use a 1-D model for simplicity. In this case the coordinate xk is simply a scalar position xk. The disadvantage of the 1-D model is that we must neglect the three-dimensional nature of the beam. Despite our focus on 1-D models, we will nevertheless keep the discussion general, and assume that xk may be a vector coordinate to allow for fast acquisition schemes and future applications. A more complete linear systems model that allows for the 3-D beam properties, shift variance, and more complex motion within an image acquisition sequence is discussed in [6]. The purpose of the simple models used in this paper is to gain intuition and analytically tractable results. We will assume that the background clutter represented by the object function zc is a wide-sense stationary (WSS) stochastic process over space and over slow time, as is the noise n. In practice, objects heterogeneous on a scale larger than the transmitted pulse volume are partitioned in to locally wide-sense stationary regions. Tumor vessels are small compared with the pulse volume. We also assume that the tissue motion is WSS on the time scale of the measurement, and hence that tissue motion can be characterized by autocorrelations or power spectra. This requires that tissue motion patterns remain approximately constant over the interrogation time LxPRI, and that accelerations are minimal. This is a reasonable assumption even for arterial flows with a 1 Hz cardiac frequency when PRF>1 KHz and the ensemble size is L<10. Finally, we assume that blood, clutter, and noise processes are all statistically independent of each other.
322
R.J. Zemp, C.K. Abbey, and M.F. Insana
2.2 Cross-Spectral Density Matrix If X is the support region volume of the image, the cross-spectral density matrix Q is defined in a way similar to covariance matrices. Here we wish to consider the crossspectral density due to clutter and noise:
Q c + n = X 2 R c + n R c†+ n ,
(3)
where elements of R c + n are given as Rc+ n (u, t k ) = H (u) Z c (u, t k ) + N (u, tk ) . For WSS clutter and noise processes, the cross-spectral density matrix is block diagonal:
Qc + n
where
0 Q c + n (u1 ) Q c + n (u 2 ) = Q c + n (u M ) 0
(4)
Q c + n (u k ) is an LxL matrix with elements given by
Qc+n (t µ , tν | u k ) = H (u k ) X 2 Z c (u k , t µ ) Z c* (u k , tν ) + S nδ (t µ − tν ) . 2
(5)
Here we have assumed that electronic noise can be modeled as a white Gaussian noise process, and that Sn is the (flat) noise power spectrum.
3 Detection Performance Equipped with the linear systems model, and the cross-spectral density structures, we now call on the tools of statistical decision theory to address the task: detection of blood that is perfusing through tissue (positive ‘+’ hypothesis) from region with no blood or negligibly weak blood (null ‘-’hypothesis). This task could be important for distinguishing perfused regions of a tumor from necrotic regions. The degree of perfusion heterogeneously may be correlated with metastatic potential. It is also one step towards the more complex task of discriminating two types of flow patterns. We also assume that the background clutter and the noise are random processes but that blood can be modeled as one deterministic realization of a random process. A similar model has been used for static target discrimination in ultrasound [7]. Likelihood models for the spatial frequency domain data are normally distributed under the assumptions of [8]:
pdf (R | + ) = X 2 LM
X2 (R − R + )† Q c−+1 n (R − R + ) exp − 2
(2π )LM det(Q c+n )
(6)
Ideal Observer Model for Detection of Blood Perfusion and Flow Using Ultrasound
pdf (R | − ) = X 2 LM
X2 (R − R − )† Q c−1+n (R − R − ) exp − 2
323
(7)
(2π )LM det(Q c+n )
Here the mean values for our task are:
R+ (u k , tl ) = R+ (u k , tl )
n
= B(u k , tl ) ≡ H (u k ) Z b (u k , tl )
and
R− (u k , t l ) = 0 . The ideal observer test statistic obtained from the log-likelihood ratio is thus:
λ = ∆R †Q c−+1 n R ,
(8)
where ∆R = R + − R − = B . This test statistic is linear in the data and for the situation at hand is equivalent to a Hotelling observer [9]. The strategy is to perform matched filtering of the time-evolving signal with a pre-whitening step. Pre-whitening over the slow time spectral domain means weighting the Doppler Fourier amplitudes
R by the factor ∆R †Q c−+1 n that maximizes coefficients least influenced by clutter and noise. Performance is predicted by
SNRI2 = ∆R †Q c−+1 n ∆R = B †Q c−+1 n B .
(9)
Using the approach of [7], we average over random realizations of the blood to obtain
{
} {
}
SNRI2 = tr BB † Q c−1+ n = tr Q bQ c−1+ n , where
(10)
Q b = BB † . When Q b and Q c−+1 n are simultaneously diagonalizable, by
the same Karhunen-Loeve eigendecomposition, the SNR expression reduces to
SNRI2 = tr
{
b
[
c
+
n
]−1},
where the /’s are diagonal eigenvalue matrices
corresponding to blood, clutter and noise cross-spectral density matrices respectively. 3.1 Large Ensemble Size When L is large, and adequate slow time resolution exists, the Karhunen-Loeve transformation reduces to a temporal DFT. In this situation aliasing is largely avoided. The SNR can then be written as
2 GNEQ
SNR
≡ SNR
2 I b
N
= ∑ ∫ du k =1
∆Z b (u, f k )
2
H (u)
H (u) Sc (u, f k ) + S n (u, f k ) 2
(11)
2
b
.
324
R.J. Zemp, C.K. Abbey, and M.F. Insana
This is a generalization of the Wagner-Brown theory of detectability [10] applied to Doppler ultrasound systems. Sc and Sn are the clutter and noise power spectra respectively. The power spectra are in the spatial frequency (u), temporal-frequency (f) domain. The object contrast or “task” is defined as
S ∆B (u, f k ) = ∆Z b (u, f k )
2
/X
(12)
b
and represents the power spectrum of the difference between blood object functions of hypothesis (+) and hypothesis (-) signals. This is not the same thing as the difference of power spectral densities. The SNRGNEQ is dependent on the differential blood object power spectrum to make a decision about flow being present or absent (normal or abnormal). The quantity that weights the blood object power spectrum in the integration is like a generalized version of Noise Equivalent Quanta (NEQ), a measure commonly used to characterize system performance in photon imaging modalities. In our situation, the generalized noise equivalent quanta (GNEQ) is defined as
GNEQ(u, f k ) =
H (u)
2
(13)
H (u) Sc (u, f k ) + S n (u, f k ) 2
and characterizes the system, noise and clutter in a blood flow independent way. NEQ for photon imaging systems represents the fraction of photons that contribute information. For ultrasound, GNEQ represents the fraction of speckle energy that carries information. In photon imaging systems, the ideal observer detection signal to noise ration is an integration of NEQ weighted by the Fourier magnitude of the target signal. Acoustic physics dictates that the mean of the object is not the important feature for detection using ultrasound, rather the variance or spatial fluctuations in the object function. Importantly, GNEQ can be measured for a particular system, tissue type, and tissue motion pattern since all of its components are measurable quantities. Curiously, H in this framework only depends on spatial frequencies, and not temporal frequencies: this is because all motion dependence is due to the object and not to the system. 3.2 Limited Ensemble Size In general, simultaneous diagonalization of
Q b and Q c−+1 n is not possible. For small
ensemble sizes, and wide-sense stationary statistics, the cross-spectral density matrices are Toeplitz, but not well approximated by block-circulant matrices, and hence are not diagonalizable by a DFT. Since most Power and Color ultrasound systems use only 4-20 pulses, consideration of the limited ensemble size scenario is of great practical importance. We consider here specific analytically tractable models that may nevertheless be of considerable interest for tumor flow and perfusion imaging.
Ideal Observer Model for Detection of Blood Perfusion and Flow Using Ultrasound
325
Clutter Model. Tissue motion is often minimal for tumor imaging. If we assume the clutter does not move, and remains correlated over the entire pulse train interrogation, then it can be shown that the clutter cross-spectral density has the block diagonal form of (), with block-diagonal elements given as (14)
Qc + n (t µ , tν | u k ) = H (u k ) Sc (u k ) + S nδ (t µ − tν ) . 2
Q c + n (u k ) is the sum of uniform matrix (a matrix with constant values everywhere) and a diagonal matrix. It has an analytically tractable inverse
q q p , p
p q 1 −1 Q c + n (u k ) = 2 L H (u k ) Sc (u k ) + Sn q where
p = H (u k )
2
(15)
S c (u k ) [L − 1]+ 1 and q = − H (u k ) 2 Sc (u k ) . Block Sn Sn
diagonal matrices are small (only LxL) so computational approaches to inversion would also be an attractive future direction. Blood Model. As a first step to understanding more complicated flow models, we consider an object model for blood that is a random process moving with translational velocity vb:
zb (x, t ) = zb (x − v bt ) .
(16)
In the spatial-frequency domain, this can be written as:
Z b (u, t ) = e − i 2πtu⋅ v b Z b (u)
(17)
If blood is a non-stationary process then the cross-spectral density matrix may not be of block-diagonal form because of spatial frequency correlations. However, only the block diagonal elements of the cross-spectral density matrix are important for the trace (10). The block diagonal elements are given as: 2
Qb (tµ , tν | u k ) = X 2 B(uk , tµ ) B* (u k , tη ) = H (u k ) SZ b (u k )e
[
]
− i 2π t µ − tη u k ⋅ v b
.
(18)
If the blood object function can be modeled as a wide-sense stationary white gaussian 2 noise process ~N(0,σ I), windowed by a window w(x), it can be shown that
S Zb (u k ) = σ 2 ∫ dx w(x) = σ 2 A , 2
where A is the area of the perfused region.
(19)
326
R.J. Zemp, C.K. Abbey, and M.F. Insana
Another important aspect of the blood model is decorrelation over time. Microvascular networks can have tortuous and multidirectional flow patterns within an ultrasonic sample volume that makes velocity estimation difficult. One way of modeling this complex situation is to guess that the collection of blood scatterers decorrelates over time, and that the decorrelation rate is proportional to the flow rate, and hence metabolic demand. One model for temporal decorrelation is a complex exponential decay. As a phenomenological addendum to (18), we hence postulate the cross-spectral density model 2
Qb (t µ , tν | u k ) = H (u k ) S Z b (u k )e
[
]
− i 2π t µ − tη u k ⋅ v b
e
−α t µ − tη
(20)
where α is the decorrelation parameter. Gaussian decorrelation models and other diffusion-based models are also possible. Signal decorrelation may be due to object correlation or due scattering regions moving out of the ultrasonic sample volume. In the latter case the decorrelation parameter may depend on the velocity beam properties. SNR. Evaluating the trace (10), the ideal observer performance metric for stationary clutter and moving, decorrelating blood can be shown to be 2
Sc (u) 2 SNR = ∫ du S H (u) Γ(u) + L , 2 L H (u) S c (u) + S n n 2 I
H (u) S Z b (u)
(21)
where
L Γ(u) = L( L − 1) − 2 Re∑ ( L − m)α m , m =1 −α ∆t
(22)
− i 2π∆tu⋅ v
b e . Here ∆t is the pulse repetition interval (PRI). The series and α = e can be evaluated as the sum of a geometric series and the derivative of a geometric series:
α − α L +1 Lα L +1 − ( L + 1)α L + 1 m L m L − = − ( ) α α ∑ 1−α . (1 − α ) 2 m =1 L
(23)
When there is no decorrelation, and zero blood velocity, Γ(u) vanishes, leaving (24)
2
SNR = ∫ du 2 I
H (u) S Z b (u) H (u) S c (u) + S n / L 2
,
which is similar to expressions obtained previously, except that the noise power is lowered by a factor of L. This makes sense: the task is now to simply detect additive, unmoving blood from a random background, and there are L images averaged. Hence, the noise power should decrease as there are more images averaged. For L=1, the expression corresponds precisely to the lesion detection SNR of Ref. [7], Eq. 23.
Ideal Observer Model for Detection of Blood Perfusion and Flow Using Ultrasound
327
Clearly then, all the information about motion is contained in the function Γ, which depends on the PRI, decorrelation parameter, velocity, ensemble size, and spatial frequency vector. Aside from the PRI and the ensemble size, Γ is independent of the 2 system, and represents the information about blood flow and perfusion. The SNR of (21) can thus be interpreted as the sum of a time-dependent term and a timeindependent term. The time-dependent term gives all the information about motion and decorrelation. The time-independent term describes static detection.
4 Results To gain more intuition regarding the role of motion, we computationally evaluated (21) with various system and motion parameters for a 1-D system where the system psf is modeled as a Gaussian weighted sinusoid. We modeled coherent motion as being along the direction of pulse propagation. Results are illustrated in Fig. 3. The system bandwidth was varied while maintaining the total pulse energy constant. Fig. 3(a) shows that bandwidth is relatively unimportant as long as the same amount of total energy is employed. Fig. 3(b) shows that detectability is lowest when there is no motion. The ideal observer uses motion information to help the detection process. There are repeating nulls every 250 mm/s. This is the point where PRI x 2fo/c x vb = 1. Note that 2fo/c is the spatial frequency at which there is greatest system sensitivity. As velocity is increased, blood power shifts away from the clutter power until it is aliased back into the clutter spectrum, giving the null at 250 mm/s etc. As expected, the detectability improves with L, and Fig. 3(c) demonstrates that this relationship is nearly linear. As the PRI is increased, SNR is also improved (results not shown). Of course transient flow conditions may prevent one from using long PRI’s or large ensemble sizes if temporal resolution is desired. One interesting point is that decorrelation of blood is actually helpful to signal detection (Fig. 3(d).). The ideal observer is most interested with how the signal changes over time, hence the more change in the observation period the better. Thus decorrelation and velocity shift spectral energy of blood away from the DC region where clutter obscures detection. Velocity shifts can however be aliased back over the clutter region once again hampering the task. Large velocities are necessary to induce substantial aliasing, and these speeds are not physiological in tumors. Hence if decorrelation is well sampled, the large ensemble limit of section 3.2 may be a reasonable framework for studying tumor perfusion.
5 Conclusions An ideal observer model for detecting perfusing blood in a tissue background has been developed. When decorrelation is adequately sampled (large enough ensemble size), and when velocities are low, the ideal observer signal to noise ratio depends on a quantity similar to Noise Equivalent Quanta (NEQ) used in photon imaging modalities. Coherent motion and temporal decorrelation of blood were modeled in a simple 1-D ultrasound system. The system detection performance for a simple model
328
R.J. Zemp, C.K. Abbey, and M.F. Insana
was shown to be the sum of motion-dependent and motion-independent terms. This model included the effects of aliasing. Decorrelation and translational velocity of blood were shown to be advantageous to the task. The ideal observer model, assumed that velocities and decorrelation rates were known a priori. Task performance may change when the velocity is unknown. The theoretical framework here is a starting point for future work in the area of performance assessment of Power Doppler systems. (a) 25
(b ) 15 L =1 6
20 SNR
SNR
L=12
15 L=8
10
10 5
L=4
5 0
(d ) 15
20 10 0 0
0 0
100
SNR
SNR
(c) 30
50 %BW
10 20 en sem b le len g th
30
200 400 velocity [m m /s]
10 5 0 0 50 00 10000 d ecorrelation p aram eter [1/s]
Fig. 3. Detection performance SNR as a function of (a) bandwidth, maintaining total pulse energy constant (b) velocity (c) ensemble length and (d) decorrelation parameter. Parameters were, fo=3 MHz, BW=80%, c=1500 m/s, L=8 pulses, PRF=1 KHz., v=5 mm/s, and α =1/PRI unless otherwise specified. The clutter to noise and blood to noise ratios were 38.8 and 1.94 respectively.
References 1. 2.
3.
4. 5.
J. A. Jensen, Estimation of Blood Velocities Using Ultrasound: A Signal Processing Approach. New York: Cambridge Univ. Press, 1996. K. W. Ferrara, C. W. Merritt, P. N. Burns, F. S. Foster, R. F. Mattrey, S. A. Wickline, “Evaluation of Tumor Angiogenesis with US: Imaging, Doppler, and Contrast Agents,” Acad. Radiol. Vol. 7, pp. 824–839, 2000. C. Kasai, K. Namekawa, A. Koyano, R. Omoto, “Real-time two-dimensional blood flow imaging using an autocorrelation technique. IEEE Trans. Sonics Ultrason. SU-32, pp. 458–464. G. E. Trahey, J. W. Allison, and O. T. von Ramm, “Angle independent ultrasonic detection of blood flow,” IEEE Trans. Biomed. Eng. Vol. 34, pp. 965–967, 1987. J. M. Rubin, R. O. Bude, P. L. Carson, R. L. Bree, R. S. Alder, “Power Doppler US: A Potentially Useful Alternative to Mean-Frequency-based Color Doppler US,” Radiology, Vol. 190, pp. 853–856, 1994.
Ideal Observer Model for Detection of Blood Perfusion and Flow Using Ultrasound
6.
329
R. J. Zemp, C. K. Abbey, M. F. Insana, “Linear system models of ultrasound imaging: Application to signal statistics,” IEEE Trans. Ultrason., Ferroelect., Freq. Contr., To be published. 7. R. J. Zemp, C. K. Abbey, M. F. Insana, “Generalized NEQ for assessment of ultrasound image quality,” Proc. SPIE Medical Imaging, San Diego, CA, 2003. 8. R. F. Wagner, M. F. Insana, and D. G. Brown, “Statistical properties of radio-frequency and envelope-detected signals with applications to medical ultrasound,” J. Opt. Soc. Am. A, Vol. 4, pp. 910–922, 1987. 9. H. H. Barrett, “Objective assessment of image quality: effects of quantum noise and object variability,” J. Opt. Soc. Am. A, Vol. 7, pp. 1266–1278, 1990. 10. R. F. Wagner and D. G. Brown, “Unified SNR analysis of medical imaging systems,” Phys. Med. Biol. Vol. 30, pp. 489–518, 1985.
Permutation Tests for Classification: Towards Statistical Significance in Image-Based Studies Polina Golland1 and Bruce Fischl2 1
Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA. [email protected] 2 Athinoula A. Martinos Center for Biomedical Imaging Massachusetts General Hospital, Harvard Medical School, Boston, MA. [email protected] Abstract. Estimating statistical significance of detected differences between two groups of medical scans is a challenging problem due to the high dimensionality of the data and the relatively small number of training examples. In this paper, we demonstrate a non-parametric technique for estimation of statistical significance in the context of discriminative analysis (i.e., training a classifier function to label new examples into one of two groups). Our approach adopts permutation tests, first developed in classical statistics for hypothesis testing, to estimate how likely we are to obtain the observed classification performance, as measured by testing on a hold-out set or cross-validation, by chance. We demonstrate the method on examples of both structural and functional neuroimaging studies.
1
Introduction
Image-based statistical studies typically compare two or more sets of images with a goal of identifying differences between the populations represented by the subjects in the study. For example, the shape of subcortical structures or the cortical folding patterns in the scans of schizophrenia patients would be compared to a set of brain images of matched normal controls to test a particular hypothesis on the effects of the disease on the brain. The analysis starts with a feature extraction step that creates a numerical representation of the example images in a form of feature vectors, followed by statistical analysis that generates a description of differences between the two groups and an estimate on how well the detected differences will generalize to the entire population. The high dimensionality of the feature space and the limited number of examples place this problem outside the realm of classical statistics. The solutions used today in most studies simplify the analysis either by reducing the description of the anatomy to a single measurement, such as the volume of the hippocampus, or by considering every feature separately, for example, comparing cortical thickness at every location on the cortical surface independently of its neighbors. The former sacrifices localization power, while the latter ignores potential dependencies among the features, making integration of the individual statistical tests C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 330–341, 2003. c Springer-Verlag Berlin Heidelberg 2003
Permutation Tests for Classification
331
very difficult. An alternative approach demonstrated recently in several studies of neuroanatomy and function is to train a classifier function for labeling new examples [2,9,10,13,14]. It is based on a presumption that if a classifier function can label new examples with better than random accuracy, the two populations are indeed different, and the classifier implicitly captures the differences between them. The training algorithm does not have to assume independence among features and therefore can discriminate between the two groups based on the entire ensemble of highly localized features1 . This approach to statistical analysis uses the classifier performance as a measure of robustness of the detected differences, or of dissimilarity of the populations in question. We can estimate the expected accuracy of the classifier by testing it on a separate hold-out set, as the average test error is an unbiased estimator of the expected performance, but the small size of the test set is typically insufficient for the variance of this estimator to be low enough to lead to a useful bound on how close we are to the true expected error [12]. Furthermore, the total number of available examples might be so small that we have to resort to cross-validation, such as bootstrap or jackknife procedures, in order to estimate the expected accuracy [4]. Unfortunately, the cross-validation trials are not independent and therefore do not allow variance estimation at all without extensive modeling of the dependence of errors in the cross-validation trials2 . Thus, neither testing nor cross-validation provides satisfactory estimates on how close the observed test error to the true expected error of the trained classifier. Furthermore, the high dimensionality of the data often renders many theoretical bounds based on the complexity of the data or that of the classifier useless. In this paper, we demonstrate how permutation tests [11] can provide a weaker guarantee, namely that of statistical significance. This approach effectively reformulates the question on the differences between the populations, as measured by the classifier performance, in the traditionally used framework of hypothesis testing. Intuitively, it provides a guarantee on how likely we were to obtain the observed test accuracy by chance, only because the training algorithm identified some pattern in the high-dimensional data that happened to correlate with the membership labels as an artifact of a small data set size. Permutation tests were originally developed in statistics for testing whether the observed differences between two data sets, as measured by a particular statistic, are likely to occur under the null hypothesis that assumes that the two distributions that generated the data are identical. Since the test does not assume a generative model for the data to derive the distribution of the statistic under the null hypothesis, but rather estimates it empirically, it is applicable to a wide range of problems. We apply permutation tests to the classification setting by using the 1
2
Examples of training algorithms used in this domain include Fisher Linear Discriminant, its generalization to two arbitrary Gaussian distributions, Support Vector Machines and others. While the mean of the set of cross-validation error measurements is an unbiased estimate of the expected error, the variance of the set is an overly optimistic estimate of the unknown variance of the cross-validation estimator for most training algorithms.
332
P. Golland and B. Fischl
estimated classifier accuracy as a statistic that measures how different the two classes are. The null hypothesis is that the selected family of classifiers cannot learn to predict labels based on the given training set. The test estimates the statistical significance of the classifier by estimating the probability of obtaining the observed classification performance under the null hypothesis. In the next section, we provide the necessary background on hypothesis testing and permutation tests. In Section 3, we explain how to estimate statistical significance of a classifier function, followed by several experimental examples in Section 4 and discussion and conclusions. Before we proceed, it is worth mentioning that permutation tests have been used previously in neuroimaging studies to estimate statistical significance of individual voxels or clusters of voxels, both in anatomical studies [1,16] and for signal detection in fMRI [15]. We explain the differences between such use of permutation tests and the work presented in this paper in Section 3.1. We believe permutation testing is a little known statistical tool that many other researchers in this community will find useful in their work.
2
Background. Hypothesis Testing
In two-class comparison hypothesis testing, the differences between two data distributions are measured using a data set statistic, which is a function T : (Rn × {−1, 1})∗ → R, such that for a given data set {(xk , yk )}lk=1 , where xk ∈ Rn are observations and yk ∈ {−1, 1} are the corresponding class labels, T (x1 , y1 , . . . , xl , yl ) is a measure of how similar the subsets {xk |yk = 1} and {xk |yk = −1} are. The null hypothesis typically assumes that the two classes have identical data distributions, ∀x : p(x|y= 1) = p(x|y= −1). The goal of the hypothesis testing is to reject the null hypothesis at a certain level of significance α, which defines the maximal acceptable probability of false positive (declaring that the classes are different when the null hypothesis is true). For any value of the statistic, the corresponding p-value is the highest level of significance at which the null hypothesis can still be rejected. In classical statistics, the data are often assumed to be one-dimensional (n = 1), but any multi-variate statistic T can be used in this fairly general framework. For example, in the two-sample t-test, the data in the two classes are assumed to be generated by one-dimensional Gaussian distributions of equal variance. The null hypothesis furthermore assumes that the distributions have the same mean. The probability density of the so-called t-statistic, equal to the difference between the sample means normalized by the standard error, under the null hypothesis is the well known Student distribution [17]. If the integral of the Student distribution over the values higher than the observed t-statistic is smaller than the desired significance level α, we reject the null hypothesis in favor of the alternative hypothesis that the means of the two distributions are different.
Permutation Tests for Classification
333
In order to perform hypothesis testing, we need to know the probability distribution of the selected statistic under the null hypothesis. Unfortunately, deriving a parametric distribution for a particular statistic requires making strong assumptions on the generative model of the data. Consequently, non-parametric techniques, such as permutation tests, can be of great value if the distribution of the data is unknown. 2.1
Permutation Tests
Suppose we have chosen an appropriate statistic T and the acceptable significance level α. Let {(xk , yk )}lk=1 be the set of examples, and Zl be a set of all permutations of indices 1, . . . , l. The permutation test procedure that consists of M iterations is defined as follows: – Repeat M times (with index m = 1 to M ): • sample a permutation zm from a uniform distribution over Zl , • compute the statistic value tm = T (x1 , yz1m , . . . , xl , yzlm ). – Construct an empirical cumulative distribution M 1 Pˆ (T ≤ t) = Θ(t − tm ), M m=1
where Θ is a step-function (Θ(x) = 1, if x ≥ 0; 0 otherwise). – Compute the statistic value for the actual labels, t0 = T (x1 , y1 , . . . , xl , yl ) and its corresponding p-value p0 under the empirical distribution Pˆ . – If p0 ≤ α, reject the null hypothesis. The procedure computes an empirical estimate of the cumulative distribution of the statistic T under the null hypothesis and uses it for hypothesis testing. Since the null hypothesis assumes that the two classes are indistinguishable with respect to the selected statistic, all the training data sets generated through permutations are equally likely to be observed under the null hypothesis, yielding the estimates of the statistic for the empirical distribution. An equivalent result is obtained if we choose to permute the data, rather than the labels, in the procedure above. Ideally, we would like to use the entire set of permutations Zl to construct the empirical distribution Pˆ , but it might be not feasible for computational reasons. Instead, we resort to sampling from Zl . It is therefore important to select the number of sampling iterations M to be large enough to guarantee accurate estimation. One solution is to monitor the rate of change in the estimated distribution and stop when the changes are below an acceptable threshold. To better understand the difference between the parametric approach of the t-test and the permutation testing, observe that statistical significance does not provide an absolute measure of how well the observed differences between the sample groups will generalize, but it is rather contingent upon certain assumptions about the data distribution in each class p(x|y) being true. The t-test
334
P. Golland and B. Fischl
assumes that the distribution of data in each class is Gaussian, while the permutation test assumes that the data distribution is adequately represented by the sample data. Neither estimates how well the sample data describe the general population, which is one of the fundamental questions in statistical learning theory and is outside the scope of this paper.
3
Statistical Significance in Classification
The permutation tests can be used to assess statistical significance of the classifier and its performance using the test error as a statistic that measures dissimilarity between two populations. Depending on the amount of the available data, the test error can be estimated on a large hold-out set or using cross-validation in every iteration of the permutation procedure. The null hypothesis assumes that the relationship between the data and the labels cannot be learned reliably by the family of classifiers used in the training step. The alternative hypothesis is that we can train a classifier with small expected error. We use the permutations to estimate the empirical cumulative distribution of the classifier error under the null hypothesis. For any value of the estimated error e, the appropriate p-value is Pˆ (e) (i.e., the probability of observing classification error lower than e). We can reject the null hypothesis and declare that the classifier learned the (probabilistic) relationship between the data and the labels with a risk of being wrong with probability of at most Pˆ (e). To underscore the point made in the previous section, the test uses only the available data examples to evaluate the complexity of the classification problem, and is therefore valid only to the extent that the available data set represents the true distribution p(x, y). Unlike standard convergence bounds, such as bounds based upon VC-dimension, the empirical probability distribution of the classification error under the null hypothesis says nothing about how well the estimated error rate will generalize. Thus permutation tests provide a weaker guarantee than the convergence bounds, but they can still be useful in testing if the observed classification results are likely to be obtained by random chance. Note that the estimated empirical distribution also depends on the classifier family used in the training step. It effectively estimates the expressive power of the classifier family with respect to the training data set. The variance of the empirical distribution Pˆ constructed by the permutation test is in inverse relationship with the difficulty of the classification problem. Consequently, the same accuracy value is more likely to be significant for more complex problems because a classifier trained on the permuted data set is unlikely to perform well on the test set by chance. 3.1
Related Work
As we mentioned in the introduction, permutation tests have been used before in statistical studies of neuroanatomy and function [1,15,16]. In such studies, the anatomical labels or the functional signal intensity in each voxel were tested
Permutation Tests for Classification
335
for significance and the permutation tests were used to replace the Student distribution with the non-parametric distribution that was constructed for each pixel separately from the images. Consequently, this approach addresses the concern that the value distribution at each voxel was not necessarily Gaussian under the null hypothesis. In addition, non-parametric tests were derived for identifying statistically significant spatially contiguous clusters of voxels that exceeded a particular threshold. Testing for clusters of voxels implicitly accounts for dependencies among the values in neighboring voxels and therefore represents an important advance towards global significance assessment. Our approach extends the non-parametric testing of statistical significance to the entire ensemble of the features extracted from the images by utilizing the classification framework to measure the predictive power of the feature set with respect to the given labels.
4
Example Applications
We illustrate the use of permutation testing in application to two different examples: a study of changes in the cortical thickness due to Alzheimer’s disease and a comparison of brain activation, as measured by fMRI, in response to different visual stimuli. In all experiments reported in this section, we used linear Support Vector Machines [19] to train a classifier3 , and jacknifing (i.e., sampling without replacement) for cross-validation. The number of cross-validation iterations was 1,000, and the number of permutation iterations was 10,000. All data sets, both training and testing, were perfectly balanced, containing equal number of examples from each class. 4.1
Cortical Thickness Study
In this study, we compare the thickness of the cortex in 50 patients diagnosed with dementia of the Alzheimer type (DAT) and 50 normal controls of matched age. The gray/white matter interface and the pial surface were automatically segmented from each MRI scan [3,7,8], followed by a registration step that brought the surfaces into correspondence by mapping them onto a unit sphere while minimizing distortions and then non-rigidly aligning the cortical folding patterns [5, 6]. The cortical thickness was densely sampled at the corresponding locations for all subjects. And while visualization of the detected differences and understanding the physiological implications of the detected differences is of great interest and is the topic of our current research, we limit our report here to the statistical guarantees that are the focus of this paper. We first study the classification error with the goal of identifying a sufficient training set size for reliable detection of differences between the two groups. We 3
A linear classifier function f (x) = x.w+b, where w is the normal to the separating hyperplane, also called a projection vector, and b is the offset.
336
P. Golland and B. Fischl Test error
0.5
Significance
K=10 K=20 K=40
K=10 K=20 K=40
0.4
Error
p−value
0.4
0.3
0.2 0
20 40 Training set size N
60
0.3 0.2
0.1 0.05 0 0
20 40 Training set size N
60
Fig. 1. Cross-validation error (left) and statistical significance (right) for the cortical thickness study for different training set size N and test set size K.
gradually increase the training set size and estimate the test error and statistical significance, reported in Figure 1. Every point in the plots is characterized by the corresponding training set size N and test set size K. For each such pair, we ran 1,000 iterations of cross-validation, selecting randomly N + K examples from the original data set (N/2 + K/2 from each group), used N examples to train a classifier and the remaining K examples to test it. The graph on the left shows the average test error over 1,000 cross-validation iterations. By examining the plots, we conclude that at approximately N = 40, the performance saturates at 71% (e = 0.29). In addition, we ran 10,000 iterations of permutations. In each iteration, we randomly selected N + K examples, arbitrarily relabeled them, while maintaining an even distribution of labels, and performed 1,000 iterations of the cross-validation procedure (training on N examples and testing of the remaining K examples) on this newly labeled data set. The graph on the right shows p-values estimated for various training and test set sizes. It is not surprising that increasing the number of training examples improves the robustness of classification as exhibited by both the accuracy and the significance estimates. Increasing the number of independent examples on which we test the classifier in each iteration does not significantly affect the estimated classification error, but substantially improves the statistical significance of the same error value, as can be seen in Figure 1. This is to be expected: as we increase the test set size, a classifier trained on a random labeling of the training data is less likely to maintain the same level of testing accuracy. Figure 2 illustrates this point for a particular training set size of N = 40. It shows the empirical distribution Pˆ (e) curves for the test set sizes K = 10, 20, 40. The right graph zooms on the small framed area on the left graph. The filled circles represent classification performance on the true labels and the corresponding p-values. We note again that the three circles represent virtually same accuracy, but substantially different p-values. For this training set size, if we set the significance threshold at α = 0.05, testing on K = 20 examples is sufficient
Permutation Tests for Classification Empirical distribution
337
Empirical distribution
1
0.2
p−value
p−value
0.8 0.6 0.4
K=10 K=20 K=40
0.2 0 0
0.1
0.05 0.01
0.3
0.7 error
1
0.25
0.3 error
0.35
Fig. 2. Empirical distribution Pˆ (e) of the classifier error estimated using permutation tests in a cross-validation procedure (left); the small framed area of the graph is shown at higher resolution on the right. The size of the training set in all experiments is N = 40. Filled circles indicate the classifier performance on the true labels for the corresponding training and test set sizes (N = 40, K = 10: e = 0.30, Pˆ (e) = 0.17; N = 40, K = 20: e = 0.29, Pˆ (e) = 0.05; N = 40, K = 40: e = 0.29, Pˆ (e) = 0.007).
to establish significance. Testing on K = 40 independent leads to p < 0.007, achieving significance under a much more strict threshold of α = 0.01. To gain an additional insight into these results, let’s consider the classical situation of two-sided t-testing, with the data generated by two one-dimensional Gaussian densities. Our goal is to test the null hypothesis that postulates that the two distributions have identical means. There are two factors affecting statistical significance: the distance between the means and the amount of data we have to support it. We can achieve low p-values in a situation where the two classes are very far apart and we have a few data points from each group, or when they are much closer, but we have substantially more data. P-value by itself does not indicate which of these two situations is true. Using discriminative analysis allows us to estimate how far apart the two classes are4 , in addition to establishing statistical significance of the detected differences. 4.2
Categorical Differences in fMRI Activation Patterns
This experiment compares the patterns of fMRI activations in response to different visual stimuli in a single subject. We present the results of comparing activations in response to face images to those induced by house images, as these categories are believed to have special representation in the cortex. The comparison used 15 example activations for each category (for details on data acquisition, see [18]). The fMRI scans were aligned to the structural MRI of the same subject using rigid registration. For each scan, the cortical surface was 4
In this particular study, the classification accuracy saturates at 71%, indicating there is a substantial overlap between the two classes.
338
P. Golland and B. Fischl Test error
0.4
Significance 0.4
all voxels only visually active
0.3 Error
p−value
0.3
0.2 0.1 0 0
all voxels only visually active
0.2 0.1
5
10 15 20 Training set size N
25
0.05 0.01 0
5
10 15 20 Training set size N
25
Fig. 3. Cross-validation error (left) and statistical significance (right) for two different feature sets in the fMRI study.
extracted using the same techniques as in the study of cortical thickness, and the average activation values were sampled along the cortical surface. A surfacebased representation models the connectivity of the cortex and is therefore well suited for studies of fMRI activation of cortical regions. First, we used all voxels in the fMRI scan to perform the comparison between the two categories. We then repeated the experiments with only the “visually active” region of the cortex. The mask for the visually active voxels was obtained using a separate visual task. The goal of using the mask was to test if removing irrelevant voxels from consideration improves the classification performance. When the training and the test set sizes were varied in this experiment, we observed similar trends in the test error and significance behavior to those discussed in the previous section. This study contains substantially fewer examples than the previous one, forcing us to work with smaller test sets. Here, we report the results for the test set size K = 6, as it was the smallest test set size for which we we could demonstrate statistical significance and it allows testing over a wider range of training set sizes. Figure 3 reports the accuracy and the p-value estimates for both feature sets. As expected, reducing the feature set to locations relevant to the visual pathways increases the classification accuracy and statistical significance of the detected differences. For example, for training size N = 20, using the entire cortex yields 93% accuracy (e = 0.07, p < 0.05), while using only visually active voxels improves the accuracy to 100% (e = 0, p < 0.02). In contrast to the cortical thickness study, we achieve statistical significance with substantially fewer training and testing examples in this experiment. The two classes in this study are so far apart in the feature space that the learning problem is much easier, requiring fewer examples to achieve robust detection. One way to visualize the detected differences is to display the projection vector w. Each original feature xi is assigned weight wi by the training algorithm. The magnitude of the weight is indicative of the predictive power of the corresponding feature. We can visualize w by displaying it in the same way we
Permutation Tests for Classification
339
Fig. 4. Weight maps for the face class (positive class) in comparison with the house class (negative class). The six views show (top-to-bottom) lateral, medial and inferior views of the left and the right hemispheres. The color is used to indicate the weight of each voxel, from light blue (negative) to yellow (positive).
visualize the original feature vectors xk . Figure 4 and Figure 5 show the two feature sets in the fMRI study by paining the weights wi onto the inflated cortical surface. The grayscale pattern shows cortical folding, while the color is used to indicate the magnitude and the sign of the weights. The weights have been thresholded for visualization purposes, leaving only the areas of high weighting painted. If used on “visual only” set of features, the method produces a map that is very similar to the corresponding subset of the one for the entire cortex, indicating robustness of the estimation. One of our current goals is to develop an automated feature selection mechanism capable of detecting relevant locations and reducing the feature set to include just those areas.
5
Conclusions
In this paper, we adapt the permutation tests for estimation of statistical significance of a classifier function. We demonstrate the tests on several examples of neuroimaging studies comparing two or more sets of anatomical or functional scans. The test is useful in experiments for which the standard convergence bounds fail to produce meaningful guarantees due to the high dimensionality of the input space and the extremely small sample size. We hope other researchers
340
P. Golland and B. Fischl
Fig. 5. Weight maps for the face class (positive class) in comparison with the house class (negative class) using just the visually active area of the cortex (posterior). The boundary of the “visually active” mask is shown in yellow. The six views show (top-tobottom) lateral, medial and inferior views of the left and the right hemispheres. The color is used to indicate the weight of each voxel, from light blue (negative) to yellow (positive).
in the community will find the technique useful in assessing statistical significance of observed results when the data are high dimensional and are not necessarily generated by a normal distribution. Acknowledgements. This research was supported in part by NSF IIS 9610249 grant, Athinoula A. Martinos Center for Biomedical Imaging collaborative research grant and NIH R01 RR16594-01A1 grant. The Human Brain Project/Neuroinformatics research is funded jointly by the NINDS, the NIMH and the NCI (R01-NS39581). Further support was provided by the NCRR (P41-RR14075 and R01-RR13609). The authors would like to acknowledge Dr. M. Spiridon and Dr. N. Kanwisher for providing the fMRI data, Dr. R. Buckner for providing the cortical thickness data and Dr. D. Greve for help with registration and feature extraction in the experiments discussed in this paper. Dr. Kanwisher would like to acknowledge EY 13455 and MH 59150 grants. Dr. Buckner would like to acknowledge the assistance of the Washington University ADRC, James S McDonnell Foundation, the Alzheimer’s Association, and NIA grants AG05682 and AG03991.
Permutation Tests for Classification
341
References 1. E.T. Bullmore, et al. Global, Voxel, and Cluster Tests, by Theory and Permutation, for a Difference Between Two Groups of Structural MR Images of the Brain. In IEEE Transactions on Medical Imaging, 18(1):32–42, 1999. 2. J. G. Csernansky, et al. Hippocampal Morphometry in Schizophrenia by High Dimensional Brain Mapping. In Proceedings of National Academy of Science, 95(19):11406–11411, 1998. 3. A. M. Dale, et al. Cortical Surface-Based Analysis I: Segmentation and Surface Reconstruction. NeuroImage, 9:179–194, 1999. 4. B. Efron. The Jacknife, The Bootstrap, and Other Resampling Plans. SIAM, Philadelphia, PA. 1982. 5. B. Fischl, et al. Cortical Surface-Based Analysis II: Inflation, Flattening, a SurfaceBased Coordinate System. NeuroImage, 9:195–207, 1999. 6. B. Fischl, et al. High-resolution intersubject averaging and a coordinate system for the cortical surface. Human Brain Mapping, 8:272–84, 1999. 7. B. Fischl, et al. Measuring the thickness of the human cerebral cortex from magnetic resonance images. In PNAS, 26:11050–5, 2000. 8. B. Fischl, A. Liu, A.M. Dale. Automated Manifold Surgery: Constructing Geometrically Accurate and Topologically Correct Models of the Human Cerebral Cortex. In IEEE Transactions on Medical Imaging, 20(1):70–80, 2001. 9. G. Gerig, et al. Shape versus Size: Improved Understanding of the Morphology of Brain Structures. In Proc. MICCAI’2001, LNCS 2208, 24–32, 2001. 10. P. Golland, et al. Discriminative Analysis for Image-Based Studies. In Proc. MICCAI’2002, LNCS 2488:508–515, 2002. 11. P. Good. Permutation Tests: A Practical guide to Resampling Methods for Testing Hypothesis. Springer-Verlag, 1994. 12. I. Guyon, et al. What Size Test Set Gives Good Error Rate Estimates? In IEEE Trans. Pattern Analysis and Machine Intelligence. 20(1): 52–64, 1998. 13. J. V. Haxby, et al. Distributed and Overlapping Representations of Faces and Objects In Ventral Temporal Cortex. Science, 293:2425–2430, 2001. 14. J. Martin, A. Pentland, and R. Kikinis. Shape Analysis of Brain Structures Using Physical and Experimental Models. In Proceedings of CVPR’94, 752–755, 1994. 15. T.E. Nichols and A.P. Holmes. Nonparametric Permutation Tests For Functional Neuroimaging: A Primer with Examples. Human Brain Mapping 15:1–25, 2001. 16. P. M. Thomson, et al. Dynamics of Gray Matter Loss in Alzheimer’s Disease. Journal of Neuroscience, 23(3), 2003. 17. L. Sachs. Applied Statistics: A Handbook of Techniques. Springer Verlag. 19984. 18. M. Spiridon and N. Kanwisher. How distributed is visual category information in human occipito-temporal cortex? An fMRI study. Neuron, 35(6):1157–1165, 2002. 19. V. N. Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998.
Ideal-Observer Performance under Signal and Background Uncertainty S. Park1 , M.A. Kupinski2,3 , E. Clarkson1,2,3 , and H.H. Barrett1,2,3 1
Program in Applied Mathematics, The University of Arizona at Tucson, [email protected], 2 Department of Radiology, The University of Arizona at Tucson, 3 Optical Sciences Center, The University of Arizona at Tucson
Abstract. We use the performance of the Bayesian ideal observer as a figure of merit for hardware optimization because this observer makes optimal use of signal-detection information. Due to the high dimensionality of certain integrals that need to be evaluated, it is difficult to compute the ideal observer test statistic, the likelihood ratio, when background variability is taken into account. Methods have been developed in our laboratory for performing this computation for fixed signals in random backgrounds. In this work, we extend these computational methods to compute the likelihood ratio in the case where both the backgrounds and the signals are random with known statistical properties. We are able to write the likelihood ratio as an integral over possible backgrounds and signals, and we have developed Markov-chain Monte Carlo (MCMC) techniques to estimate these high-dimensional integrals. We can use these results to quantify the degradation of the ideal-observer performance when signal uncertainties are present in addition to the randomness of the backgrounds. For background uncertainty, we use lumpy backgrounds. We present the performance of the ideal observer under various signal-uncertainty paradigms with different parameters of simulated parallel-hole collimator imaging systems. We are interested in any change in the rankings between different imaging systems under signal and background uncertainty compared to the background-uncertainty case. We also compare psychophysical studies to the performance of the ideal observer.
1
Introduction
We take as fundamental the idea that image quality must be defined by the performance of an observer on a specified task. The tasks of interest in medical imaging can be categorized as detection tasks or estimation tasks. This work is focused on signal-detection tasks and the Bayesian ideal observer. The Bayesian ideal observer has all the statistical information about the data and makes optimal use of this information; it sets an absolute upper bound on task performance as measured by many common figures of merit derived from the ROC (receiver operating characteristic) curve. One such figure of merit is C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 342–353, 2003. c Springer-Verlag Berlin Heidelberg 2003
Ideal-Observer Performance under Signal and Background Uncertainty
343
the area under the curve (AUC). We use the AUC of the ideal observer as a measure of the quality of the image data produced by imaging hardware. Since the ideal observer requires full knowledge of the statistics of the image data, it is difficult to compute the ideal observer test statistic, the likelihood ratio. Instead of using the ideal observer, there has been a lot of work done on the performance of non-optimal observers in different kinds of backgrounds and signals. On the other hand, to make the ideal-observer computation tractable, unrealistic assumptions such as fixed backgrounds have been made in past work. Clarkson and Barrett [4] have developed mathematical methods to approximate the AUC of the ideal observer without having to assume such backgrounds. On the performance of the ideal observer in random backgrounds and fixed signals, Kupinski et al.[7] have developed computational methods to estimate the likelihood ratio using Markov Chain Monte Carlo (MCMC) techniques. In this work, we extend these computational methods [7] to estimate the likelihood ratio in the case where both backgrounds and signals are random with known statistical properties. The computation has been done using MCMC techniques. We present the results of simulation studies in which we compare the ideal-observer performance under various signal-uncertainty paradigms for different parameters of simulated parallel-hole collimator imaging systems. We shall see the degradation of the ideal-observer performance and find out how the ranking changes between three different imaging systems under signal and background uncertainty compared to the fixed signal cases. We also present psychophysical studies to compare the ideal observer and human observers under signal and background uncertainty.
2
Background
The imaging process can be represented mathematically by g = Hf + n
(1)
where H is a continuous-to-discrete imaging operator which maps an object f to an M × 1 vector of image data g, where f is a function of continuous variables, and n is an M × 1 vector of measurement noise. A linear continuous-to-discrete imaging operator H can be mathematically represented by gm =
S
drhm (r)f (r) + nm
(2)
where r is a 2D spatial coordinate, S is a field of view (FOV), hm is the mth sensitivity function of H, and gm and nm are elements of g and n. 2.1
Problem Setting
The goal of a signal-detection task is to determine whether or not a signal, such as a tumor, is present. We would like to know how imaging systems perform on
344
S. Park et al.
signal-detection tasks, i.e., how well observers such as human observers, model observers, or the Bayesian ideal observer, can determine the presence of a tumor in images generated by imaging hardware. We consider the tumor to be a signal f s in a random background f b , so imaging between two hypotheses can be represented mathematically by H0 : g = Hf b + n H1 : g = H(f b + f s ) + n.
(3) (4)
For notational convenience, we define the background and signal image to be b = Hf b
(5) (6)
s = Hf s .
In this work, we consider signal-known-statistically (SKS) tasks where both the signal s and background b are random compared to signal-known-exactly (SKE) tasks where s is fixed and just the background b is random. Gaussian blur functions are used for hm (r) to simulate our simplified parallel-hole collimator imaging systems. 2.2
Ideal Observer
The ideal observer computes a test statistic, the likelihood ratio, and compares it to a threshold to make a decision between the two hypotheses. The likelihood ratio is defined as pr(g|H1 ) Λ(g) = (7) pr(g|H0 ) where pr(g|Hj ) is the probability density of image data g under the hypothesis Hj . By varying the threshold and plotting TPF vs FPF, an ROC curve can be generated for the task. The AUC is a common scalar figure of merit that is maximized by the ideal observer. In this sense, the ideal observer measures the amount of detectable information of an image system. Therefore we employ the ideal observer as our observer and the AUC as our figure of merit for detection tasks. We assume that s is statistically independent of b. We can rewrite (7) as a ratio of integrals over random backgrounds and signals [1],[11], [7], db ds pr(b)pr(g|b, s, H1 )pr(s) (8) Λ(g) = db pr(b )pr(g|b , H0 ) where b and s are random backgrounds and signals, respectively. For an imaging system with Poisson noise n, pr(g|b, s, H1 ) =
M m=1
and
e−(bm +sm )
(bm + sm )gm , gm !
(9)
Ideal-Observer Performance under Signal and Background Uncertainty
M
pr(g|b, H0 ) =
e−(bm )
m=1
(bm )gm . gm !
345
(10)
To make the computation of the likelihood ratio feasible, we define ΛBSKE as ΛBSKE (g|b, s) =
pr(g|b, s, H1 ) . pr(g|b, H0 )
(11)
where ΛBSKE is the background and signal-known exactly (BSKE) likelihood ratio in the case where the background and signal are fixed and not random. Substituting (9) and (10) in the BSKE likelihood ratio in (11), we get ΛBSKE (g|b) =
M m=1
sm 1+ bm
gm
e−sm .
(12)
We are now able to rewrite the likelihood ratio on random backgrounds and signals in terms of an integral of the BSKE likelihood ratio, i.e., the posterior mean of the BSKE likelihood ratio [7], (13) Λ(g) = ds db ΛBSKE (g|b, s)pr(b|g, H0 )pr(s), where pr(b|g, H0 ) =
2.3
pr(g|b, H0 )pr(b)
db pr(g|b , H0 )pr(b )
.
(14)
Object Models
To describe background uncertainty (BU), we use lumpy backgrounds. These backgrounds were proposed by Rolland and Barrett [10] in an attempt to estimate human and model performances with more realistic looking backgrounds than flat or Gaussian backgrounds. Lumpy backgrounds are mathematically represented by f b = fb (r) =
N
L(r − cn |a, s),
(15)
n=1
where r is a 2D or 3D spatial coordinate, N is the random number of lumps in the object (Poisson with mean N ), L(·) is the lump function, cn , the center of the nth lump, is randomly chosen by a uniform distribution, and a and s are the fixed magnitude and width of the lump function. For our work, we use 2D-lumpy backgrounds with circularly symmetric Gaussian profiles. From (2),(5),and (15), the mth element of the background vector b can be written as N dr hm (r)L(r − cn |a, s) . (16) bm = n=1
S
346
S. Park et al.
We can take advantage of our knowledge of the background statistics to learn how the variability in the backgrounds affects the performance of the ideal observer with different parameters of simulated parallel-hole collimator imaging systems. 2.4
Signal Models
To model signal uncertainty, we use circularly symmetric Gaussian signals with random locations for location uncertainty (LU), elliptical Gaussian signals for shape uncertainty (SU), and both for shape and location uncertainty (SLU). Signals are mathematically represented by f s = fs (r) = as exp{−[R† (r − c)]† D−1 [R† (r − c)]}
(17)
where r is a 2D spatial coordinate, as is the magnitude of the signal function, c is the center of a signal, R is a rotation matrix, and D is a diagonal matrix whose diagonals are 2σ12 and 2σ22 : cos θ sin θ R= (18) − sin θ cos θ
and D=
2σ12 0 0 2σ22
.
(19)
For the LU cases, we fix shapes of signals as circularly symmetric Gaussians, i.e., θ and σ1 (= σ2 ) are fixed, and we choose the locations of the signals from a uniform distribution. For the SU cases, we fix signals at the center of backgrounds and choose the shapes of signals by uniform distributions, i.e., c is fixed, and θ, σ1 and σ2 are random. For the SLU cases, the locations and shapes of signals are randomly chosen. In all cases, there is either 0 or 1 signal in an image. Using the signal models as described above, we will find out how the randomness in signals changes the performance of the ideal observer in the case where backgrounds are random. We are also interested in any change in rankings of the imaging systems.
3
Methods
To explain our computational methods to estimate the likelihood ratio, let us start by considering (13). Due to the high dimensionality of the given integral, it is not computationally practical to estimate the integral as it is. We can reduce the dimension of the integral by using our knowledge of the background models f b and the signal models f s [7]. We define parameter vectors θ to be {N, c1 , c2 , . . . , cN } and α to be {c, θ, σ1 , σ2 }. We have assumed that f b and f s are statistically independent of each other. Then the lumpy backgrounds f b and the signals f s are completely characterized by θ and α, respectively.
Ideal-Observer Performance under Signal and Background Uncertainty
We write (13) as an integral over θ and α: Λ(g) = dα dθ ΛBSKE (g|b(θ), s(α))pr(θ|g, H0 )pr(α) where
pr(θ|g, H0 ) = 3.1
(20)
pr(g|b(θ), s(α), H1 ) pr(g|b(θ), H0 )
(21)
pr(g|b(θ), H0 )pr(θ) . dθ pr(g|b(θ ), H0 )pr(θ )
(22)
ΛBSKE (g|b(θ), s(α)) = and
347
Markov Chain Monte Carlo
Ideally we would estimate this integral in (20) using Monte Carlo integration [6] J 1 ˆ Λ(g) = ΛBSKE [ g | b(θ (j) ), s(α(j) ) ] . J j=1
(23)
However, it is difficult to sample images directly from pr(θ|g, H0 )pr(α) because pr(θ|g, H0 )pr(α) is not usually known. To overcome this difficulty, we use MCMC techniques, in particular, the Metropolis-Hastings algorithm with appropriate proposal densities for pr(θ|g, H0 )pr(α) [6], [7]. We construct a Markov chain with pr(θ|g, H0 )pr(α) as the stationary density for the chain. Because s and b are statistically independent, we can use proposal densities of our choice independently for pr(θ|g, H0 ) and p(α). We choose a proposal density qb (θ|θ (i) )qs (α|α(i) ) for our Markov chain where qb (θ|θ (i) ) and qs (α|α(i) ) are proposal densities, respectively for pr(θ|g, H0 ) and pr(α), and choose an initial parameter vector (θ (0) , α(0) ). Given (θ (i) , α(i) ), we draw ˜ α) ˜ from the proposal densities and accept or reject it with a sample vector (θ, acceptance probability
˜ α)) ˜ = min α((θ (i) , α(i) ), (θ,
1,
˜ s (α(i) |α) ˜ qb (θ (i) |θ)q (i) (i) (i) ˜ ˜ (i) ) [pr(θ |g, H0 )pr(α )] qb (θ|θ )qs (α|α ˜ H0 )pr(α) ˜ pr(θ|g,
. (24)
For qs (α|α(i) ), we use uniform distributions for choosing angles of rotation θ from 0 to 2π, widths of signals σ1 and σ2 from a to b, and locations c in images 1 g, i.e., qs (α|α(i) ) = 2π(b−a) 2 M (= pr(α)). To make qb (θ|θ (i) ) symmetric [7], Φ is defined as a matrix composed of a binary column vector β of dimension N (= 100 × N ) followed by a list of centers of dimension N × 2, i.e., † β1 β2 · · · βN , (25) Φ= c1 c2 · · · cN where the centers cn are row vectors. In generating the backgrounds to construct a Markov chain, the lumps are removed or added by flipping βs (i.e., a 1 to a
348
S. Park et al.
0 or a 0 to a 1) with probability η. We define a mapping function θ(Φ) as θ(Φ) = {cn : βn = 1}, which consists of the centers of all the lumps in the background. The proposal density is given by ˜ = η f (1 − η)N qb (θ(Φ)|θ(Φ))
−f
j:β˜j =βj =1
1 ˜−j )G(˜ cj − cj ), δ(c−j − c C
(26)
˜ given by f = N |βi − where f is the number of βs that are flipped from Φ to Φ i=1 ˜ and G(·) is β˜i |, C is the number of terms in the sum in (26) given by C = β † β, ˜−j ) is a (2C − 2)a symmetric Gaussian. For 2D lumpy backgrounds, δ(c−j − c dimensional Dirac delta function where c−j is a concatenation of all the center ˜ vectors ci satisfying β˜i = βi = 1, except cj itself. The symmetry of qb (θ(Φ)|θ(Φ)) ˜−j ), and G(˜ follows from the symmetry of the f , C, δ(c−j − c cj − cj ) regarded ˜ as function of Φ and Φ. Finally we can rewrite the ratio in (24) into a computationally simpler one by canceling all qb (θ|θ (i) ), qs (α|α(i) ), and pr(α), i.e.,
˜ ˜ H0 )pr(θ) pr(g|b(θ), (i) pr(g|b(θ ), H0 )pr(θ (i) )
(27)
where pr(g|b(θ), H0 )pr(θ) = pr(g|b(θ), H0 )pr(N )pr({cn }). We construct Markov chains corresponding to a given ensemble of images, esˆ timate corresponding likelihood ratios Λ(g)s, and compute an estimate of AUC. 3.2
Consistency Check
A common way to see if our MCMC technique generates consistent simulation results is, for a given ensemble of images, to run an experiment repeatedly with a number of different random-number seeds estimating a number of ensembles of likelihood ratios, and compute a number of estimates of AUCs and a sample variance of the AUCs. This way, the Markov chains progress differently. The sample variance measures the variability in the AUC and if it is small enough, we would know that the MCMC technique gives consistent AUC values of the ideal observer. Resampling methods such as bootstrap can also be used to refine the variance estimates. Another way is to check if estimated AUCs satisfy the known bounds on the ideal-observer AUC [5]. First, the moment-generating function M0 (β) for Λ under the hypothesis H0 and the likelihood-generating function G(β) are defined by [2] ∞ Λβ pr(Λ|H0 )dΛ = Λβ 0 (28) M0 (β) = 0
and
ln M0 (β + 12 ) G(β) = . (β 2 − 14 )
(29)
Ideal-Observer Performance under Signal and Background Uncertainty
349
It follows from the bounds on the AUC of the ideal observer [5] that 1 1 1 1 1 1 − exp − G(0) ≤ AU CΛ ≤ 1 − exp − G(0) − G(0) − G (0) . 2 2 2 2 3 (30) The plot of M0 (β) must go through unity at β = 0 and β = 1, and it is convex upward[2]. If the simulated AUC values were to be consistent, we would expect them to satisfy the conditions on M0 (β) and the bounds in (31). Then we may at least say that the MCMC technique generates consistent results in terms of these bounds. 3.3
Psychophysical Studies
We have also completed some two-alternative forced-choice (2AFC) experiments [2] to compare the performances of the ideal observer and the human observer on the detection tasks where the signal and background uncertainty is present. Five observers were presented with 100 pairs of signal-absent and signal-present images after 100 trials of training for each imaging system. The lights ware turned off in the room where the experiments were performed, and a black background was used for the computer screen to not distract the observers. For each detection task, three images were presented where a signal alone was presented in the middle image to show what the signal looks like. The other two images were random backgrounds with or without a signal, and the signalpresent image was randomly on the left or right. The observer selected which image had the signal with a mouse. The observers were allowed unlimited time to reach a decision.
4
Simulations
For comparison of imaging systems using the performance of the ideal observer, we use three different simplified parallel-hole collimator imaging systems for nuclear medicine. We model the imaging system response functions hm (r) as Gaussians centered on the mth pixel pm , h (r − pm )† (r − pm ) hm (r) = (31) exp − 2πw2 2w2 where r is a 2D spatial coordinate. Each imaging system has different resolution w and relative sensitivity h for the hm (r) functions given in Table 1. These resolutions and relative sensitivities correspond to different collimator parameters and exposure times. For each imaging system, we generated 100 pairs of signal-absent and signalpresent images. The mean number of lumps in a 64 × 64 image was N = 25 but we initially generated 106 × 106 images with N = 69 and took the central 64 × 64 out of them to avoid boundary problems. For SU, we generated 64 × 64
350
S. Park et al.
images with N = 25. For the comparison of the 2AFC-psychophysical studies and the ideal observer, we used N = 176 on 170 × 170 images and took the central 128 × 128 out of them. For all the backgrounds, we used amplitude 1 and s = 7, and as = 0.1, 0.6, and 1.2 for the signal model. For BU(a) cases, a signal of width a is centered at each lumpy image. For LU(a), locations of signals of width a are chosen by the uniform distribution. For SU(a, b) and SLU(a, b), uniform distributions are used to choose locations, widths in an interval (a, b), and orientations. Table 1. Characteristics of the three imaging systems. Imaging System Resolution w Relative Sensitivity h A 0.5 40 B 2.5 100 C 5 200
For each image, we generated 150,000 iterations of the Markov chain. For each calculation, the first 500 iterations are ignored for burn-in, and the ΛBSKE (·) of the remaining 149,500 iterations are used to compute the likelihood ratios. The likelihood ratios were computed with 5 different random-number seeds to estimate each AUC’s sample variance. For consistency checks, we used 5 ensembles of 1000 signal-absent images and computed 5 ensembles of estimates of likelihood ratios to get their sample variance.
5
Simulation Results
We computed the ideal observer for 100 pairs of signal-absent and signal-present images for each imaging system under various paradigms of background and signal uncertainties. ROC analysis with LABROC4 [8] and PROPROC [9] software has been used to generate the AUCs to rank the three imaging systems on the ideal-observer performance. All the AUCs in the figures are shown with their sample standard deviations. The ideal-observer performances on LU(3) and LU(9) are respectively compared to BU(3) and BU(9) as shown in Fig.1(a) and (b). The ideal-observer performance degrades on LU too badly to see the ranking between the imaging systems. We increased signal magnitude (as = 0.6 and as = 1.2) to see the rankings between the three imaging systems on BU(3) and LU(3) as shown in Fig.2(a) and (b). While Fig.1(a) and (b) show that the ideal-observer performance degrades on LU(3) compared to BU(3), Fig.2(a) and (b) show that the rankings remain the same. The 2AFC method [2] has been used to compute the ideal-observer AUCs in Fig.2(a) and (b).
Ideal-Observer Performance under Signal and Background Uncertainty 1
1
A
0.9
B
C
0.8
0.7
0.7
0.6
0.6
1
1.5
2
2.5
3
A
0.9
0.8
0.5 0.5
351
3.5
0.5 0.5
1
B
1.5
(a)
2
C
2.5
3
3.5
(b)
Fig. 1. (a) AUCs for A,B and C on BU(3) and LU(3) with as = 0.1. (b) AUCs for A,B and C on BU(9) and LU(9) with as = 0.1. The solid lines correspond to BU and the dashed lines correspond to LU.
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
A 0 0.5
1
B 1.5
2
(a)
C 2.5
3
A 3.5
0 0.5
1
B 1.5
2
C 2.5
3
3.5
(b)
Fig. 2. (a) AUCs for A,B and C on BU(3) with as = 0.6. (b) AUCs for A,B and C on LU(3) with as = 1.2. The solid and dashed lines correspond to the ideal observer and the human observer, respectively.
The 2AFC-psychophysical studies on BU(3) and LU(3) are also shown in Fig.2(a) and (b) to compare the ideal-observer and human-observer performance. The ideal observer greatly outperforms the human observer on these cases. The efficiency is defined as the square of the ratio da (human)/da (ideal) and is taken as a measure of the perceptual efficiency of the human observer [3]. From our psychophysical studies, the efficiency for the imaging system C is 0.0059 ±0.0055 on BU(3). Figure 3(a) shows that the ideal-observer performance degrades on SU(6,8) when the size of a signal in each background is close to the size of lumps
352
S. Park et al. 1
1
A
0.9
B
C
0.8
0.8
0.7
0.7
0.6
0.6
0.5 0.5
1
1.5
2
2.5
A
0.9
3
0.5 0.5
3.5
1
B
1.5
(a)
2
C
2.5
3
3.5
(b)
Fig. 3. (a) AUCs for A,B,and C on SU(3, 5) and SU(6, 8) with as = 0.1. (b) AUCs for A,B,and C on SLU(3, 5) and SLU(6, 8) with as = 0.1. The solid lines correspond to SU(3,5) and SLU(3,5) and the dashed lines correspond to SU(6,8) and SLU(6,8). 1.1
M0(β)
1
0.9
0.8
0.7
0.6 0
0.2
0.4
β
0.6
0.8
1
Fig. 4. A M0 (β) curve for a consistency check for A on SU(3, 5)
compared to SU(3,5). As shown in Fig.3(a), the ideal-observer performance degrades significantly on SLU(3,5) and SLU(6,8) respectively compared to SU(3,5) and SU(6,8) as in Fig.3(a). A plot of M0 (β) for a consistency check on SU(3,5) for the imaging system A is shown in Fig.4. The mean M0 (1) is 1.0449 ±0.1508. The mean AUC on SU(3,5) for A is 0.8401 ±0.0161 as in Fig.3(a) and lies between 0.7346 ±0.0141 and 0.9092 ±0.0051, the lower and upper bounds computed by the consistency check.
Ideal-Observer Performance under Signal and Background Uncertainty
6
353
Discussion and Conclusion
We have shown that the ideal-observer performance degrades under both background and signal uncertainty but the rankings of the three different imaging systems appear to remain the same under background and signal-location uncertainties as under background uncertainty. We have also shown quantatively that the ideal observer performs better than the human observer on detection tasks under background and signal-location uncertainty. We consider this work as another step toward more realistic signal-detection tasks. This work can be used for hardware optimization.
References 1. H. H. Barrett and C. K. Abbey: Bayesian Detection of Random Signals on Random Backgrounds. 4th International Conference on Information Processing in Medical Imaging, 1997. 2. H. H. Barrett, C. K. Abbey, and E. Clarkson: Objective assessment of image quality III: ROC metrics, ideal observers, and likelihood- generating functions. J. Opt. Soc. Am. A 15, 1520–1535 (1998). 3. H. H. Barrett, C. K. Abbey, and E. Clarkson: Some unlikely properties of the likelihood ratio and its logarithm. Proc. SPIE 3340 (1998). 4. E. Clarkson and H. H. Barrett: Approximation to ideal-observer performance on signal-detection tasks. Applied Optics 39, 1783–1794 (2000). 5. E. Clarkson: Bounds on the area under the receiver operating characteristic curve for the ideal observer. J. Opt. Soc. Am. A 19 1963–1968 (2001). 6. W. R. Gilks, S. Richardson, and D. J. Spiegelhalter (Eds): Markov Chain Monte Carlo in Practice. Chapman and Hall, Boca Raton (1996). 7. M. A. Kupinski, J. W. Hoppin, E. Clarkson, and H. H. Barrett: Ideal Observer Computation Using Markov-Chain Monte Carlo. J. Opt. Soc. Am. A (Accepted)(2002). 8. C. E. Metz, B. A. Herman, and J. H. Shen: Maximum Likelihood Estimation of Receiver Operating Characteristic (ROC) Curves from Continuously -Distributed Data. Statistics In Medicine 17, 1033–1053 (1998). 9. C. E. Metz and X. Pan: Proper Binormal ROC Curves: Theory and MaximumLikelihood Estimation. Journal of Mathematical Psychology 43, 1–33 (1999). 10. J. P. Rolland and H. H. Barrett: Effect of random background inhomogeneity on observer detection performance. J. Opt. Soc. Am. A 9, 649–658 (1992). 11. H. Zhang: Signal Detection in Medical Imaging. Ph.D. Dissertation, The University of Arizona (2001).
Theoretical Evaluation of the Detectability of Random Lesions in Bayesian Emission Reconstruction Jinyi Qi Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Abstract. Detecting cancerous lesion is an important task in positron emission tomography (PET). Bayesian methods based on the maximum a posteriori principle (also called penalized maximum likelihood methods) have been developed to deal with the low signal to noise ratio in the emission data. Similar to the filter cut-off frequency in the filtered backprojection method, the prior parameters in Bayesian reconstruction control the resolution and noise trade-off and hence affect detectability of lesions in reconstructed images. Bayesian reconstructions are difficult to analyze because the resolution and noise properties are nonlinear and object-dependent. Most research has been based on Monte Carlo simulations, which are very time consuming. Building on the recent progress on the theoretical analysis of image properties of statistical reconstructions and the development of numerical observers, here we develop a theoretical approach for fast computation of lesion detectability in Bayesian reconstruction. The results can be used to choose the optimum hyperparameter for the maximum lesion detectability. New in this work is the use of theoretical expressions that explicitly model the statistical variation of the lesion and background without assuming that the object variation is (locally) stationary. The theoretical results are validated using Monte Carlo simulations. The comparisons show good agreement between the theoretical predications and the Monte Carlo results.
1
Introduction
Task-specific evaluation of medical imaging methods has become increasingly important. Two major applications of PET are lesion detection and region of interest quantitation. Due to the low signal to noise ratio (SNR) in PET data, statistically based image reconstruction methods have been developed to improve image quality [1,2,3,4]. To explore the full potential of statistical reconstruction, the reconstruction methods need to be optimized for PET applications. Such optimization requires fast computation of the task-specific figures of merit for statistical reconstructions. Here we focus on the detection task.
This work is supported in part by the National Institute of Biomedical Imaging and Bioengineering under grants R01 EB00363, R01 EB00194, and by the Director, Office of Science, Office of Biological and Environmental Research, Medical Sciences Division, of the U.S. Department of Energy under contract no. DE-AC03-76SF00098.
C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 354–365, 2003. c Springer-Verlag Berlin Heidelberg 2003
Theoretical Evaluation of the Detectability of Random Lesions
355
A general methodology for studying lesion detectability is the human observer ROC (receiver operating characteristics) study, where physicians read images and decide whether there is a lesion or not. A ROC curve is generated by plotting the true positive fraction vs. the false positive fraction. Using human observers can be time consuming, so numerical observers based on signal detection theory have been developed [5]. While numerical observers reduce the time of ROC studies, Monte Carlo reconstructions are often needed to provide the sample images. Hence the total computation time is still long. While theoretical analysis is difficult because statistical algorithms are nonlinear, progress has been made in understanding the nonlinear properties of statistical reconstruction methods. Barrett et al. [6] derived approximate formulae for the mean and covariance of the maximum likelihood (ML) expectation maximization (EM) reconstruction as a function of the iteration number. The same approach was extended to the maximum a posteriori (MAP) EM algorithms by Wang and Gindi [7] and most recently to block iterative algorithms by Soares et al [8]. Using the results in [6] with numerical observer models, Abbey et al. [9] have studied lesion detectability in ML EM reconstruction. This iteration-based approach is attractive for methods that are terminated before convergence, as is common practice for the EM algorithm and its ordered-subsets variants [10]. However, evaluation of the expressions for large numbers of iterations is timeconsuming. In addition, this approach requires that the reconstruction algorithm has an explicit update equation. Hence, it is not applicable to gradient-type algorithms that involve line searches. An alternative approach was proposed by Fessler and Rogers [11,12], who analyzed the mean, variance, and spatial resolution at a fixed point of the objective function. The resolution and noise properties are computed at the fixed point using partial derivatives and truncated Taylor series approximations. These results are independent of the particular optimizing algorithm used and require only that the algorithm iterates to effective convergence. Qi and Leahy [13,14] extended this approach by deriving simplified expressions for the local impulse response function and covariance using Fourier transform. These expressions allow fast evaluation of the resolution and noise properties of Bayesian reconstruction. The results have been used to choose the prior parameter to maximize the contrast to noise ratio [13] and to achieve uniform contrast recovery in fully 3D PET [14]. Similar approximations have been used by Stayman and Fessler [15] in designing a penalty function to achieve isotropic local impulse response functions. Combining with computer observer models, these theoretical results have been applied to the study of lesion detectability [16,17,18,19]. In [17,19] the signal-known-exactly, background-known-exactly (SKE-BKE) detection task was used, which is a highly simplified scenario compared to real situations. While some variability of background and lesion was included in [16,18], they both assumed that the covariances of the background and lesion are stationary (or at least locally stationary) for fast computation. Such a stationary assumption is generally invalid for random lesions.
356
J. Qi
Here we study lesion detectability where both the lesion and background are described by statistical distributions. We derive simplified theoretical expressions that allow fast evaluation of lesion detectability for various linear observers without assuming the object covariance is locally stationary.
2 2.1
Theory Data Model
Emission data are well modeled as a collection of independent Poisson random ¯ ∈ IRM ×1 , related to the unknown variables with the (conditional) expectation, y N ×1 , through an affine transform tracer distribution, x ∈ IR ¯ ≡ E[y|x] = P x + r, y
(1)
where P ∈ IRM ×N is the detection probability matrix with the (i, j)th element equal to the probability of detecting an event from the jth voxel at the ith measurement with consideration of photon attenuation and detector efficiency, and r ∈ IRM ×1 accounts for the presence of scattered and random events in the data. The Poisson likelihood function is p(y|x) =
e−¯yi y¯yi i
yi !
i
,
and the log likelihood function is given by L(y|x) = (yi log y¯i − y¯i − log yi !) ,
(2)
(3)
i
where y ∈ IRM ×1 is the measured sinogram data. For PET data that are precorrected for random events, a shifted Poisson model can be used [20]. 2.2
Bayesian Image Reconstruction
Bayesian methods regularize the image through the use of a prior distribution on the unknown image. Most image priors have a Gibbs distribution of the form p(x) =
1 −βU (x) e , Z
(4)
where U (x) is the energy function, β is the smoothing parameter that controls the resolution of the reconstructed image, and Z is a normalization constant. Here we focus on quadratic priors, for which the energy function can be expressed as 1 (5) U (x) = x Rx, 2
Theoretical Evaluation of the Detectability of Random Lesions
357
where R is a positive definite (or semidefinite) matrix and denotes transpose. The commonly used pair-wise quadratic priors and thin plate priors [21,22] are just special cases of (5). Combining the likelihood function and the image prior, the MAP reconstruction is found as ˆ (y) = arg max [L(y|x) − βU (x)] . (6) x x≥0 Since L(y|x) is a concave function of x, (6) generally has a unique solution for convex priors. The smoothing parameter β has a strong effect on the image property. If β is too small, the reconstructed image approaches the ML estimate and becomes very noisy; if β is too large, the reconstructed image will be very smooth and useful structural information can be lost. The goal of this paper is to derive theoretical expressions for fast evaluation of the detectability of lesions in each data set with different reconstruction parameters. 2.3
Lesion Detection with Numerical Observers
ˆ , a linear numerical observer computes a test For a given reconstructed image x statistic (a scalar-valued decision variable), η(ˆ x), by ˆ, η(ˆ x) = t x
(7)
where t is the observer template. A decision whether there is a lesion or not is then made by comparing this statistic to a preselected threshold. If η(ˆ x) exceeds ˆ is determined to have a lesion; otherwise, it is not. By the decision threshold, x varying the threshold, we can obtain a ROC curve. The area under the ROC curve (AUC) is often used to measure the lesion detectability. This numerical observer model assumes that the location of the possible lesion is known a priori. One can use separate observer templates for different locations. The detection performance can also be measured by the SNR of η(ˆ x), which is defined as 2
SNR2 [η(ˆ x)] =
x)|H0 ]) (E[η(ˆ x)|H1 ] − E[η(ˆ , (var[η(ˆ x)|H1 ] + var[η(ˆ x)|H1 ]) /2
(8)
2 (t z) . t Σx ˆ |H1 t + t Σx ˆ |H0 t
(9)
2
=
where E denotes expectation, H0 is the null hypothesis representing lesion absent, H1 is the hypothesis representing lesion present, Σx ˆ |H1 and Σx ˆ |H0 are ˆ under hypotheses of H1 and H0 , respectively, and the covariance matrices of x x|H0 ]. Without loss of generality, we assume that the probaz ≡ E[ˆ x|H1 ] − E[ˆ bilities of the two hypotheses are equal. When η(ˆ x) is normally distributed, the AUC is related to the SNR by [23] SNR 1 1 + erf , AUC = 2 2
358
J. Qi
where erf(x) is the error function. Here we will use SNR to measure the lesion detectability. One example of the linear observers is the non-prewhitening (NPW) observer, which uses a matched filter to compute the test statistic [5] ˆ ≡ zx ˆ. x) = (E[ˆ x|H1 ] − E[ˆ x|H0 ]) x ηNPW (ˆ
(10)
In some situations, the NPW observer has been found to correlate with human performance for lesion detection [24,25,26]. At this time, the most popular numerical observers are probably the channelized Hotelling observers (CHOs) [27,25]. They have gained much interest because many studies have shown that CHOs have good correlation with human performance, although the degree of the correlation depends on the properties of the lesion and background, and the channel functions [28,29,30]. The test statistic of CHO is ˆ, ηCHO (ˆ x) = z U K −1 U x
(11)
where U denotes the frequency-selective channels that mimic the human visual system and K is the covariance of the channel outputs, i.e., 1 U (Σx (12) ˆ |H1 + Σx ˆ |H0 )U + KN , 2 where KN is the covariance of the internal noise in the channels to model the uncertainty in human detection process [31,27]. K=
2.4
Lesion Detectability in MAP Reconstruction
To compute the SNR for each numerical observer, we need the expressions of x|H0 ] and covariance matrices Σx z ≡ E[ˆ x|H1 ] − E[ˆ ˆ |H1 and Σx ˆ |H0 . Using the results in [12] and [11], the local impulse response of MAP reconstruction can be approximated by −1 LIRx F, (13) ˆ ≈ [F + βR] and the covariance by
1 1 Σx Σy|Hk diag P [F + βR]−1 . P diag (14) ˆ |Hk ≈ [F + βR] y¯i y¯i ¯ ≡ P E[x] + r = 12 P (E[x|H0 ] + E[x|H1 ]) + r, F ≡ P diag y¯1 P , and where y i Σy|Hk is the covariance matrix of the measurement y under Hk , k = 0, 1. Note (13) represents the spatially variant local impulse response function with each column denoting the local impulse response at the corresponding voxel location. Without object variability, y consists of independent Poisson random vari¯ = P x + r. When yi ] with y ables, of which the covariance is Σy|x = diag[¯ considering object variation, the overall covariance of the measurements is −1
Σy|Hk = E{Σy|x |Hk } + P Σx|Hk P , where Σx|Hk is the covariance of the object variation under Hk .
(15)
Theoretical Evaluation of the Detectability of Random Lesions
359
Since we are particularly interested in small lesions, we can assumed that the presence of a lesion almost has no effect on the Poisson noise in the data, i.e., E{Σy|x |Hk } ≈ diag[y¯i ], k = 0, 1. For small lesions, z can also be approximated by the convolution between the expectation of the lesion profile f¯ l ≡ E[x|H1 ] − E[x|H0 ] and the local impulse response function at the lesion location. Therefore, we have z ≈ [F + βR]−1 F f¯ l
−1 Σx [F + F Σx|Hk F ][F + βR]−1 . ˆ |Hk ≈ [F + βR]
(16) (17)
Substituting (16) and (17) into (9), we can obtain the theoretical expressions of the SNR of any linear numerical observer in the form of (7). In particular, for CHO in (11), the SNR is
where
SNR2CHO = z U K −1 U z,
(18)
K ≈ U [F + βR]−1 [F + F Σx F ] [F + βR]−1 U + KN ,
(19)
and Σx is defined as 2.5
1 2 Σx|H0
+
1 2 Σx|H1 .
Fast Computation
Direct computation of (18) is very time-consuming due to the large size of the matrix. Often people assume the local impulse function and the covariance are locally stationary and use fast Fourier transform to compute the expressions in frequency domain (e.g., [16,18]). However, for random lesions that are independent of the background, the covariance of the object is not stationary around the lesion. This has been one major difficulty in computing the detectability of random lesions. Here we solve this problem by dividing the variance in (19) into two parts: U [F + βR]−1 F [F + βR]−1 U , (20) which is caused by the Poisson noise in the data, and U [F + βR]−1 F Σx F [F + βR]−1 U ,
(21)
which is due to the lesion (and background) variation. Equation (20) is the same as the covariance matrix studies in [16,18] and can be approximately computed using fast Fourier transform in frequency space based on the fact that F and R are locally stationary [14], i.e., λi ˜ , ˜ diag U U [F + βR]−1 F [F + βR]−1 U ≈ U (22) 2 (λi + βµi ) where {λi , i = 1, . . . , N } and {µi , i = 1, . . . , N } are the Fourier coefficients of the column vectors corresponding to the lesion location of F and R, respec˜ is the Fourier coefficients of the channel functions. Details on the tively, and U computation of λ and µ can be found in [14].
360
J. Qi
Since x is not stationary, Σx is not a block Toeplitz matrix and hence equation (21) cannot be computed in frequency space. However, we found that we can compute F [F + βR]−1 U in Fourier space and then calculate the product with Σx in spatial domain, Thus (21) can be approximated by U [F + βR]−1 F Σx F [F + βR]−1 U λi λi ˜ Σx Q diag ˜ , U U ≈ Q diag λi + βµi λi + βµi
(23)
where Q represents the Kronecker form of the Fourier transform. Because the number of channels is small (often less than 10), (23) can be computed very easily. Since we have used the locally stationary approximation on F [F + βR]−1 , (23) requires that the correlation length in Σx is relatively short and that the energy of the channel function U concentrates around the lesion location. Using similar approximations, z can be computed in the Fourier space as
U ˜k,i λi ξi λi ˜ , (24) U z = U diag ξ= λi + βµi λi + βµi i where {ξi , i = 1, . . . , N } is the Fourier transforms of f¯ l , and [ck ] denotes a column vector with the kth element being ck . Substituting (22)-(24) into (18), we can get the final expressions for SNR of CHO
U U ˜k,i λi ξi ˜k,i λi ξi 2 −1 K SNRCHO = , (25) λi + βµi λi + βµi i i where
λi ˜ U 2 (λi + βµi ) λi λi ˜ Σx Q diag ˜ + KN .(26) U U + Q diag λi + βµi λi + βµi
˜ diag K≈U
2.6
Lesion with Known Profile
As a special case, we study the random lesions with known profile. Here the tracer uptake in lesions is modeled as a fixed profile times a random scaling factor c with mean of one and variance of σc2 (variable contrast), i.e., f l = cf¯ l .
(27)
This statistical model is especially useful in emission imaging when the tracer uptake in small lesions can be assumed to be homogeneous. The covariance of the lesion under this model is
Σf = σc2 f¯ l f¯ l . l
(28)
Theoretical Evaluation of the Detectability of Random Lesions
361
5
4
3
2
1
0
Fig. 1. The background phantom image and the locations of the simulated lesions.
Substituting it into (21) and assuming there is no background variation, we get U [F + βR]−1 F Σx F [F + βR]−1 U = σc2 U zz U ,
(29)
and the covariance of channel output K becomes
U U ˜k,i λi ξi ˜k,i λi ξi λ i 2 ˜ diag ˜ +σ K≈U U + KN , (30) c (λi + βµi )2 λi + βµi λi + βµi i i which can be computed in frequency space without assuming Σx is locally stationary!
3
Monte Carlo Simulations
We conducted computer simulations to validate our theoretical approximations. We simulated an ECAT HR+ whole-body PET scanner (CTI PET Systems, Knoxville, TN) operating in two dimension mode. The sinogram data has 288 angles of view and 288 lines of response in each view. The background phantom image (Figure 1) was obtained from a reconstructed image of a patient scan. Two circular lesions with variable contrast, one 8mm diameter and one 16mm diameter, were simulated in the liver region. The locations are shown in Figure 1. The data were generated by forward projecting the phantom image with and without a lesion. Photon attenuation was modeled. Poisson noise was added to the sinogram data after scaling the expected total number of events to 200k. For each case, 500 independent noisy data sets were reconstructed using a preconditioned conjugated gradient method [32]. The SNRs for the CHO were calculated using both Monte Carlo reconstructions and the theoretical expressions with (25) and (30). Two sets of channel functions were studied: (i) five rotationally symmetric, non-overlapping square channels (SQR); and (ii) three difference-ofGaussian channels (DOG). These channel functions are similar to that used in [25,33].
362
J. Qi 3
1.5
SNR
SNR
2.5 1
2
1.5 −1 10
0
10
1
10
2
β
10
3
10
0.5 −1 10
4
10
0
10
1
10
(a)
2
β
10
3
10
4
10
(b)
3
1.5
SNR
SNR
2.5 1
2
1.5 −1 10
0
10
1
10
2
β
(c)
10
3
10
4
10
0.5 −1 10
0
10
1
10
2
β
10
3
10
4
10
(d)
Fig. 2. Comparison of the Monte Carlo results (‘+’) with the theoretical predications (solid line). (a) 16mm lesion with DOG channels; (b) 8mm lesion with DOG channels; (c) 16mm lesion with SQR channels; (d) 8mm lesion with SQR channels. The tumor to background activity ratio was uniformly distributed between 1.18 and 1.75.
Figure 2 shows the SNRs computed from independent Monte Carlo reconstructions and the theoretical approximations. The tumor to background activity ratio was uniformly distributed between 1.18 and 1.75. The background activity was fixed with the maximum at 5.5. White noise with variance of 5 × 10−3 was used to model the internal channel noise in human visual system. The error bars in the Monte Carlo results represent 68% confidence intervals that were estimated using a bootstrap method. In general, the theoretical predictions match with Monte Carlo results very well. All theoretical predictions lie inside the 68% confidence interval of the Monte Carlo estimates. We can also see that the optimum smoothing parameter (maximum SNR) for the 16mm lesion is slightly larger than that for the 8mm lesion, indicating that lower resolution image is preferred for detecting larger lesions. Figure 3 shows another comparison of the CHO performance for the 8mm lesion with activity ratio varying from 1.75 to 3.2. With the higher contrast, the lesion detectability is significantly increased, but the shape of the curves
4
4
3.5
3.5
SNR
SNR
Theoretical Evaluation of the Detectability of Random Lesions
3
2.5
2 −1 10
363
3
2.5
0
10
1
10
2
β
10
3
10
4
10
(a)
2 −1 10
0
10
1
10
2
β
10
3
10
4
10
(b)
Fig. 3. Comparison of the Monte Carlo results (‘+’) with the theoretical predications (solid line) for the 8mm lesion. (a) DOG channels; (b) SQR channels. The tumor to background activity ratio was uniformly distributed between 1.75 and 3.2.
are similar to those shown in Figure 2. Again, good match between theoretical predictions and Monte Carlo results is evident. Additional simulations show that the lesion detectability can also be reliably estimated by the theoretical expressions using noisy data.
4
Conclusion and Discussion
We have derived theoretical expressions for fast computation of lesion detectability in the Bayesian reconstruction. Both lesion and the background can contain variability. No assumption on the local stationary of the object variation is used. The results are applicable to a wide range of linear numerical observers and can be used to find the optimum regularization for lesion detection. We have conducted Monte Carlo simulations and the comparisons show good agreement between the theoretical predications and the Monte Carlo results. In our experiments (results not shown) we have found that the lesion detectability depends on the channel parameters of the observer. In particular, the lesion detectability with large β (>100) heavily depends on the internal noise level in the low frequency channels. Finding the best channel parameters that correlated with human performance is essential for optimizing image reconstruction for lesion detection. Research has already been conducted to estimated the observer template directly from human-observer studies (e.g.,[34]). While the channel parameters used in this paper are somewhat arbitrary and are unlikely to be the optimum, the theoretical results do not rely on the particular choice of the channel parameters and can be applied to almost any channel parameters that are suitable for the detection task. Another issue is that the Bayesian reconstruction algorithm that we studied uses a Gibbs prior with a quadratic energy function. The statistical information about the lesion and the background is only used in the observer study. We
364
J. Qi
choose this simple form of Bayesian method because it is widely used in practice. In a true Bayesian paradigm, however, one may want to use all the prior information in the reconstruction process (including the distribution of the lesion and background). It would be interesting to see how such a true Bayesian approach will affect the image quality. This is one of our future research directions. Acknowledgments. The author would like to thank Drs. Richard Leahy, Ronald Huesman, and Grant Gullberg for various discussions on this topic, and thank Dr. Jeffery Fessler and anonymous reviewers for their valuable comments on the manuscript.
References 1. Fessler, J.A.: Penalized weighted least squares image reconstruction for PET. IEEE Trans. Med. Im. 13 (1994) 290–300 2. Mumcuoglu, E., Leahy, R., Cherry, S., Zhou, Z.: Fast gradient-based methods for Bayesian reconstruction of transmission and emission PET images. IEEE Trans Med Im 13 (1994) 687–701 3. Fessler, J.A., Hero, A.O.: Penalized maximum-likelihood image reconstruction using space-alternating generalized EM algorithms. IEEE Trans Im Proc 4 (1995) 1417–1429 4. Bouman, C., Sauer, K.: A unified approach to statistical tomography using coordinate descent optimization. IEEE Trans Im Proc 5 (1996) 480–492 5. Barrett, H.H., Yao, J., Rolland, J., Myers, K.: Model observers for assessment of image quality. Proc. Natl. Acad. Sci. 90 (1993) 9758–9765 6. Barrett, H.H., Wilson, D.W., Tsui, B.M.W.: Noise properties of the EM algorithm: I. theory. Phy. Med. Bio. 39 (1994) 833–846 7. Wang, W., Gindi, G.: Noise analysis of MAP-EM algorithms for emission tomography. Phy. Med. Bio. 42 (1997) 2215–2232 8. Soares, E.J., Byrne, C., Glick, S.: Noise characterization of block-iterative reconstruction algorithms: 1. theory. IEEE Trans Med Im 19 (2000) 261–270 9. Abbey, C.K., Barrett, H.H.: Observer signal-to-noise ratios for the ML-EM algorithm. In: Proceedings of SPIE. Volume 2712. (1996) 47–58 10. Hudson, H.M., Larkin, R.S.: Accelerated image reconstruction using ordered subsets of projection data. IEEE Trans Med Im 13 (1994) 601–609 11. Fessler, J.: Mean and variance of implicitely defined biased estimators (such as penalized maximum likelihood): Applications to tomography. IEEE Trans Im Proc 5 (1996) 493–506 12. Fessler, J.A., Rogers, W.L.: Spatial resolution properties of penalized-likelihood image reconstruction: Spatial-invariant tomographs. IEEE Trans Im Proc 9 (1996) 1346–1358 13. Qi, J., Leahy, R.M.: A theoretical study of the contrast recovery and variance of MAP reconstructions from PET data. IEEE Trans Med Im 18 (1999) 293–305 14. Qi, J., Leahy, R.M.: Resolution and noise properties of MAP reconstruction for fully 3D PET. IEEE Trans Med Im 19 (2000) 493–506 15. Stayman, J.W., Fessler, J.A.: Regularization for uniform spatial resolution properties in penalized-likelihood image reconstruction. IEEE Trans Med Im 19 (2000) 601–615
Theoretical Evaluation of the Detectability of Random Lesions
365
16. Bonetto, P., Qi, J., Leahy, R.M.: Covariance approximation for fast and accurate computation of channelized Hotelling observer statistics. IEEE Trans. Nucl. Sci. 47 (2000) 1567–1572 17. Qi, J., Huesman, R.H.: Theoretical study of lesion detectability of MAP reconstruction using computer observers. IEEE Trans Med Im 20 (2001) 815–822 18. Fessler, J.A., Yendiki, A.: Channelized Hotelling observer performance for penalized-likelihood image reconstruction. In: Proc. IEEE NSS-MIC. (2002) to appear 19. Xing, Y., Gindi, G.: Rapid calculation of detectability in Bayesian SPECT. In: Proceedings of IEEE International Symposimum on Biomedical Imaging. (2002) CDROM 20. Yavuz, M., Fessler, J.A.: Statistical image reconstruction methods for randomsprecorrected PET scans. Medical Image Analysis 2 (1998) 369–378 21. Blake, A., Zisserman, A.: Visual Reconstruction. The MIT Press (1987) 22. Lee, S.J., Rangarajan, A., Gindi, G.: Bayesian image reconstruction in SPECT using higher oreder mechanical models as priors. IEEE Trans Med Im 14 (1995) 669–680 23. Barrett, H.H., Gooley, T., Girodias, K., Rolland, J., White, T., Yao, J.: Linear discriminants and image quality. Image and Vision Computing 10 (1992) 451–460 24. Myers, K.J., Barrett, H.H., Borgstrom, M.C., Patton, D.D., Seeley, G.W.: Effect of noise correlation on detectability of disk signals in medical imaging. Journal of the Optical Society of America A 2 (1985) 1752–1759 25. Myers, K.J., Barrett, H.H.: Addition of a channel mechanism to the ideal-observer model. Journal of the Optical Society of America A 4 (1987) 2447–2457 26. de Vries, D., King, M., Soares, E., Tsui, B., Metz, C.: Effects of scatter subtraction on detection and quantitation in hepatic SPECT. J. Nucl. Med. 40 (1999) 1011– 1023 27. Yao, J., Barrett, H.H.: Predicting human performance by a channelized Hotelling model. In: Proc. of SPIE. 1768 (1992) 161–168 28. Abbey, C.K., Barrett, H.H.: Observer signal-to-noise ratios for the ML-EM algorithm. In: Proc. of SPIE. 2712 (1996) 47–58 29. Narayan, T., Herman, G.: Prediction of human observer performance by numerical observers: an experimental study. Journal of Optical Society of America A 16 (1999) 679–693 30. Gifford, H., King, M., de Vries, D., Soares, E.: Channelized hotelling and human observer correlation for lesion detection in hepatic SPECT imaging. J. Nucl. Med. 41 (2000) 514–521 31. Burgess, A.E., Colborne, E.: Visual signal detection. IV. obserer inconsistency. Journal of Optical Society of America A 5 (1988) 617–627 32. Qi, J., Leahy, R.M., Cherry, S.R., Chatziioannou, A., Farquhar, T.H.: High resolution 3D Bayesian image reconstruction using the microPET small animal scanner. Phy. Med. Bio. 43 (1998) 1001–1013 33. King, M., de Vries, D., Soares, E.: Comparison of the channelised hotelling and human observers for lesion detection in hepatic SPECT imaging. In: Proc. of SPIE 3036 (1997) 14–20 34. Abbey, C.K., Eckstein, M.P.: Optimal shifted estimates of human-observer templates in two-alternative forced-choice experiments. IEEE Trans Med Im 21 (2002) 429–440
A Unified Statistical and Information Theoretic Framework for Multi-modal Image Registration Lilla Z¨ ollei1 , John W. Fisher1 , and William M. Wells1,2 1
Massachusetts Institute of Technology, Artificial Intelligence Laboratory, Cambridge, MA 02139, USA {lzollei, fisher, sw}@ai.mit.edu 2 Department of Radiology, Harvard Medical School and Brigham and Women’s Hospital, Boston, MA 02115, USA [email protected]
Abstract. We formulate and interpret several registration methods in the context of a unified statistical and information theoretic framework. A unified interpretation clarifies the implicit assumptions of each method yielding a better understanding of their relative strengths and weaknesses. Additionally, we discuss a generative statistical model from which we derive a novel analysis tool, the auto-information function, as a means of assessing and exploiting the common spatial dependencies inherent in multi-modal imagery. We analytically derive useful properties of the auto-information as well as verify them empirically on multi-modal imagery. Among the useful aspects of the auto-information function is that it can be computed from imaging modalities independently and it allows one to decompose the search space of registration problems.
1
Introduction
Registration of multiple data sets is the problem of identifying a geometric transformation (or a set of transformations) which maps the coordinate system of one data set to that of another (or others). There exist a variety of registration methods whose objective functions are based on sound statistical principles. These include maximum likelihood [4], maximum mutual information [6,9], minimum KL divergence [1] and minimum joint entropy [8] methods. However, the relationship of these approaches to each other from the standpoint of explicit/implicit assumptions, use of prior information, performance in a given context, and failure modes has not received a great deal of attention. Additionally, while the various objective criteria may be well understood, their relationship to an underlying generative statistical model is often left unspecified. Our motivation here is three-fold. First, we formulate and interpret several registration algorithms in the context of a unified statistical and information theoretic framework which illuminates the similarities and differences between the various methods. Second, a unified statistical interpretation clarifies the implicit assumptions of each method yielding a better understanding of their relative strengths and weaknesses. Third, we discuss a generative statistical model from which we derive a C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 366–377, 2003. c Springer-Verlag Berlin Heidelberg 2003
A Unified Statistical and Information Theoretic Framework
367
novel analysis tool, the auto-information function, as a means of assessing and exploiting the common spatial dependencies inherent in multi-modal imagery. Currently, few if any of the commonly used registration algorithms exploit spatial dependencies except perhaps in an indirect way. Consequently, we devote significant discussion to the auto-information function, providing both theoretical and empirical analysis.
2
Unified View of Maximum-Likelihood, Mutual Information, and Kullback-Leibler Divergence
For simplicity we consider the case of two registered data sets, u(x) and v(x) sampled on x ∈ N . These data sets represent, for example, two imaging modalities of the same underlying anatomy. In practice, we observe u(x) and vo (x) in which the latter is related to v(x) by vo (x) = v(T ∗ (x)) −1 v(x) = vo (T ∗ ) (x) ,
(1) (2)
where T ∗ : N → N is a bijective mapping. The goal of registration is to find Tˆ ≈ T ∗ (or equivalently its inverse) which maximizes some objective criterion of the observed data sets.1 We now discuss four objective criteria within a common statistical framework. Spatial samples xi are modeled as random draws of an independent and identically distributed (i.i.d.) random variable X. Consequently, observed pixel / voxel intensities vo (xi ) and u(xi ) are modeled as i.i.d. random variables as well. 2.1
Maximum Likelihood
We begin with the classical maximum likelihood (ML) method of parameter estimation. In order to apply the method to image registration we must presume that we can model the joint densities of pixel intensities as a function of transformation parameters. That is u(xi ), vo (xi ) ∼ p (U, V ; T ∗ ) ,
(3)
and the ML estimate of the transformation on v(x) is TM L = arg max T
1
N
log p(u(xi ), v(T ∗ (xi )); T ),
(4)
i=1
Technically speaking, u(x) may have undergone some transformation as well, but without loss of generality we assume it has not. If there were some canonical coordinate frame (e.g. an anatomical atlas) by which to register the data sets one might consider transformations on u(x) as well.
368
L. Z¨ ollei, J.W. Fisher, and W.M. Wells
where N is the number of samples. It is important to note, in contrast to subsequent methods, that the joint observations remain static while the joint density under which we evaluate the observations is varied as a function of T . There is a fundamental link between ML estimation and information theoretic quantities. Specifically, under the i.i.d. assumption for fixed T and T ∗ , N 1 log p(u(xi ), vo (xi ); T ) = N →∞ N i=1
lim
− (H (p (u, v; T ∗ )) + D (p (u, v; T ∗ ) p (u, v; T ))) ,
(5)
where H(p) is the entropy of the distribution p and D(pq) is the KullbackLeibler (KL) divergence [3] between the distributions p and q. KL divergence is a nonnegative quantity defined as D(p||q) = Ep {log (p/q)} = p(x) log (p(x)/q(x)) dx. Eq.(5) follows from Eq.(3) (the observations are i.i.d. draws), subsequently the (normalized) summation of Eq.(4) is equivalent to an expectation by the weak law of large numbers. Consequently, the ML estimate (when it is unique) is the one which minimizes the KL divergence between the true and hypothesized distributions. As a practical matter, one generally cannot model the joint density of observations as a function of all relative transformations T . Furthermore, even if such a model were available, as the relative transformation becomes “large” it is reasonable to assume that joint observations become independent (i.e. p(u, v) = p(u)p(v) - which is an essential assumption exploited by mutual information approaches). The utility of classical ML decreases greatly for such situations as a large set of transformations become equally likely. 2.2
Approximate Maximum Likelihood
While obtaining a joint density model over all relative transformations is perhaps impractical, suppose we have a model of the joint density of our data sets when they are registered which we will denote p◦ (u, v) = p(u, v; TI ) where TI indicates the identity transformation. Such a density is utilized in the approximate maximum likelihood method (MLa) [4] which estimates T as TMLa = arg max T
N
log p◦ (u(xi ), vo (T (xi ))) = arg max T
i=1
N
log p◦ (u(xi ), v(T ∗ ◦ T (xi ))) .
i=1
For practical reasons (e.g. one might be able to obtain reasonable density models of joint pixel intensities from previously registered data) and in contrast to the classical ML method, the joint observations are varied as a function of T while the density under which they are evaluated is held static. Similar to previous statements, one can show that N 1 log p◦ (u(xi ), v(T ∗ ◦ T (xi ))) = lim N →∞ N i=1
− (H (p (u, v; T ∗ ◦ T )) + D (p (u, v; T ∗ ◦ T ) p (u, v; TI ))) . (6)
A Unified Statistical and Information Theoretic Framework
369
As compared to Eq.(5) we see that both terms vary as a function of T . In general, one cannot guarantee that the combination of terms will be minimized when T ∗ ◦ T = TI . This is related to the information theoretic notion of typicality [2]. Informally, typicality states that, with probability approaching unity, N independent draws from a density p with a corresponding entropy H(p) have a likelihood very close to −N H(p). Furthermore, N independent draws from a density q with corresponding entropy H(q) evaluated under p have a likelihood very close to −N (H(q) + D(qp)) of which Eq. (6) is an application. Perhaps counter-intuitively, one can construct a density q such that typical draws from q are more likely under p than typical draws from p. The implicit assumption of the approximate maximum likelihood method is therefore that as T ∗ ◦ T approaches TI Eq.(6) is nondecreasing. In [1] it was shown empirically that this assumption does not always hold which, in part, motivates the registration method suggested in that work. 2.3
Kullback-Leibler Divergence
While one cannot guarantee that Eq.(6) is nondecreasing as T ∗ ◦T approaches TI , the second term of Eq.(6) is nondecreasing as T ∗ ◦T approaches TI . Consequently, Chung et al [1] suggest that one estimate T as TKL = arg min T
pˆ (u, v; T ∗ ◦ T ) log
u,v
pˆ (u, v; T ∗ ◦ T ) po (u, v)
p (u, v; T ∗ ◦ T ) p (u, v; TI )) , ≈ arg min D (ˆ T
(7) (8)
where p (u, v; TI ) is estimated as in [4] from registered data sets and pˆ (u, v; T ∗ ◦ T ) is estimated from transformed sets of observed joint pixel intensities {u(xi ), vo (T (xi ))}. In relation to the previous methods, both the samples and the evaluation densities are being varied as a function of the transformation T . In [1] it was demonstrated empirically that this objective criterion, as expected, did not exhibit some of the incorrect/undesirable local extrema encountered in the MLa method. 2.4
Maximum Mutual Information and Joint Entropy
As has been amply documented in the literature [6,7,9], mutual information (MI) is a popular information theoretic objective criterion which estimates the transformation parameter T as TMI = arg max I (u; vo (T )) = arg max I (u; v(T ∗ ◦ T )) , T
T
(9)
where MI is a function of marginal and joint entropy terms as I(u; vo (T )) = H(p(u)) + H(p(vo (T ))) − H(p(u, vo (T ))).
(10)
370
L. Z¨ ollei, J.W. Fisher, and W.M. Wells
Again by typicality (or by the Weak Law of Large Numbers), this expression can be approximated as I(u; vo (T )) ≈ − +
N N 1 1 log pˆ (u(xi )) − log pˆ (vo (T (xi ))) N i=1 N i=1 N 1 log pˆ (u(xi ), vo (T (xi )))) , N i=1
(11)
where xi are i.i.d. draws of the spatial variable X and pˆ(.) are density estimates. There exist variants in the literature which approximate mutual information by other means, but for our purposes we will consider them all to be equivalent. If T is restricted to the class of symplectic transformations (i.e. volume preserving) then H(u) and H(vo (T )) are invariant to T . In that case, maximization of MI is equivalent to minimization of the joint entropy term, H(u, vo (T )), the −1 presumption being that the joint entropy is minimized when TMI = (T ∗ ) . As in the KL divergence approach, both the samples and the evaluation densities are being simultaneously varied as a function of the transformation T . MI can also be expressed as a KL divergence measure [3] I (u, vo (T )) = D (p(u, v; T ∗ ◦ T )p(u)p(v; T ∗ ◦ T )) ,
(12)
that is, mutual information is the KL divergence between the observed joint density and the product of its marginals. The implicit assumption of MI methods is that as T ∗ ◦T diverges from TI , joint intensities look increasingly independent. Considering the collection of approaches discussed we see that the MLa and KL divergence methods exploit prior information in the form of joint density estimates over previously registered data. Subsequently, both make similar implicit assumptions regarding the behavior of joint intensity statistics as T ∗ ◦ T approaches TI . In contrast, the MI approach makes no use of prior joint statistics – estimating these instead during the search process. On the other hand, MI approaches, implicitly assume that as T ∗ ◦ T approaches TI , the joint intensity statistics become increasingly dependent, again, as measured by a KL divergence term. In light of this, we now define the auto-information function as an empirical analysis tool for exploring aspects of these assumptions.
3
Auto-, Cross-Information Functions
We define the auto- and cross-information functions. The functions measure statistical dependence, indexed over transformation parameters, much as the well-known auto-correlation function measures the degree of second-order correlation as a function of displacement. Given two different image modalities, u and v, we simply define the auto- and cross-information functions as: I (T ) = I(u(x); v(T (x))), RuI (T ) = I(u(x); u(T (x))) and Ru,v
A Unified Statistical and Information Theoretic Framework
371
where I(u; v) is the mutual information measure already defined in Eq.(10). Analysis of such functions, in particular the auto-information function which can be computed prior to registration, may provide guidance for commonly used coarse-to-fine search strategies. Additionally, further spatial properties might also be inferred from the auto-information function which lead to better and faster converging registration algorithms. This new approach can be described in the context of the following latent variable model pu|l (ui |li ) pv|l (vi |li ) , (13) p (u, v, l) = pl (l1 , · · · , lN ) i
where the sets {u1 , · · · , uN } and {v1 , · · · , vN } represent observations of two different image modalities and {l1 , · · · , lN } a set of latent variables which describe tissue properties (e.g. label types). The joint properties of {l1 , · · · , lN } may be only partially specified. Each of the algorithms cited in the previous sections corresponds to a hypothesis over this statistical model differing only in which aspects of the graph are specified or assumed a priori. The model simply asserts the independence of the observations conditioned on the latent variables. An example is shown in the graphical model 2 of Fig. 1. The proposed formulation has two notable consequences. First, spatial dependencies in the observations arise directly from known or assumed spatial dependencies in the latent variables. Second, bounds on the spatial dependencies (modulo the unknown transformation) can be estimated from the individual imaging modalities. In particular, it is easily derived that: I(uj ; uk ), I(vj ; vk ) ≤ I(lj ; lk ) and I(uj ; vj ) ≥ I(uj ; vk ). ∀ j, k = 1, ..., N (14) Consequently, the auto-information functions of induced images lower bound that of the underlying latent anatomy and we guarantee local extrema for the MI objective function given that the auto-information values for the pairs of corresponding image elements is always greater than or equal to that of noncorresponding ones. (For proofs, see the Appendix.) More importantly, Eq. 14 shows that under the latent variable model, MI as an objective criterion is guaranteed to have a local maximum about the point of correct registration. To our knowledge, while this property has been empirically observed, no sets of conditions have been established such that it could be rigorously proven. 3.1
Auto-information Identity
We can define the following identity between the auto-information functions of two datasets (v, vo ) that are related via transformation T ∗ as vo (x) = v(T ∗ (x)): RvI 0 (T ) = I(v0 (x); v0 (T (x))) = I(v(T ∗ (x)); v(T ∗ ◦ T (x))) = I(v(y); v(T ∗ ◦ T ◦ T −∗ (y))) = RvI (T ∗ ◦ T ◦ T −∗ ) = RvI (T ), (15) 2
A similar representation incorporating voxel positions has been recently introduced for elastic image registration via conditional probability computations [5].
372
L. Z¨ ollei, J.W. Fisher, and W.M. Wells
Fig. 1. Example of a latent anatomy model
where T is a similarity transform of T by T ∗ . In other words, the autoinformation function of a transformed image (vo ) can be computed from the auto-information function of the initial input image. This property is potentially very useful when examining how the auto-information function changes with respect to an initial transformation. 3.2
Experiments
In this section, we describe several experiments that were constructed to demonstrate certain key properties of the auto-information function and to give some insight for which applications it might be useful. We carried out experiments using both simulated and medical image datasets. To date, the experiments have been carried out in 2D and the nature of the transformations was restricted to rigid-body movements (displacement and rotation). We defined the rotation to be carried out around the center point of the input image. Note also, that prior to running our experiments, we introduced a preprocessing step. We increased and padded with zero the background region of the images in order to ensure that no cropping takes place as a result of transformations. (This property is required to fully satisfy our assumptions defining, for example, the identity relationship). We used two pairs of medical images for our experiments. One pair consisted of a Proton Density and a T2-weighted acquisition and the other of a corresponding MRI and CT image of the head. (See these images on Fig. 2.) Identity. In order to experimentally verify the relationship established in Eq. (15), we compared the auto-information maps of initially transformed datasets to the same maps that were estimated by the identity. Up to numerical precision, the identity holds, the summed squared difference values are zero. Smoothing. With another set of experiments we aimed to demonstrate how the smoothing operation affects the auto-information function maps. We computed the 3D auto-information map for both the image and a smoothed version of it (created by a Gaussian filter with window size 5). As expected, the autoinformation map became significantly flatter and less peaky after the smoothing
A Unified Statistical and Information Theoretic Framework
373
Fig. 2. Medical input images used for our experiments. Left-to-right: Corresponding Proton Density and T2-weighted images; Corresponding CT and MRI acquisitions.
operator was applied to the data. While the initial map has a sharp peak at the zero offset pose and quickly decreasing lobes, in the case of the smoothed image that transition is much more gradual. An example showing the auto-information map slices, in the case of the original and the smoothed PD images is shown on Fig. (3). Changes due to an Initial Pose Difference. Examining the autoinformation map of the input images does not reveal much in the way of underlying structure embedded in the images. (See Fig. 3 (a), (b)). Therefore, we also examined the changes in the auto-information function maps due to an initial transformation applied to the input image. We created a map of the input image and a map of its transformed version. (The transformation that we applied is further referred to as T3∗ and it is comprised of both a displacement and a rotational component.) Comparing Fig. 3 (c),(d),(e) and (f), we note that there is a distinctive pattern of difference in the maps due to the initial transformation applied to the input (the effect of the rotation, for example, is well visible on the slices). However it is difficult to interpret these at the first sight. Therefore, we displayed the difference images of the maps of the input with no initial transformation and that of the transformed image. The results, (Fig. 3 (g) and (h)), computed on both the CT and MRI images, convey more information about the effects resulting from the transformation. We observe that the two difference maps are almost identical, which allows us to conclude that a fixed transformation applied to multi-modal images of the same underlying object results in the same type of changes in the auto-information surfaces. This empirical observation is encouraging in that it gives indication of the utility of the auto-information function in the context of registration. Decoupling the transformation components. In this section, we demonstrate a way to decouple the transformation components when searching for alignment (or the initial pose) in a registration scenario. It turns out that one can use the autoinformation function to decouple the components of transformation T and search for them separately. (Compare auto-information map slices in Fig. 3 (c) and (e), for example.) The decoupling observation is explained as follows. If T ∗ is a composition of a displacement and a rotational component, then it can be written as a rotation operation followed by displacement: T ∗ (r∗ , d∗ ) = D(d∗ ) ◦ R(r∗ ). Then consider the identity in Eq. (15); if we rewrite
374
L. Z¨ ollei, J.W. Fisher, and W.M. Wells
Fig. 3. Auto-Information map slices of the (a) PD, (b) smoothed PD, (c) CT, (d) MR, (e) the transformed CT and (f) the transformed MR images. Squared difference maps between the auto-information map of the (g) CT and the T3∗ -transformed CT images and of the (h) MRI and the T3∗ -transformed MRI images. Note the similarities between the image slices of (g) and (h). The slices, each a map of translation, in all cases correspond to various rotational offsets in the auto-information map volume. (Top-to-bottom, left-to-right: the rotational offset is 0,2,...,30 degrees)
A Unified Statistical and Information Theoretic Framework
375
the transformation composition of Tcomp = T ∗ ◦ T ◦ T −∗ with the above expression for T ∗ , we get: Tcomp = D(d∗ ) ◦ R(r∗ ) ◦ T ◦ R(r−∗ ) ◦ D(d−∗ ). Also after replacing T with T (r, d) = D(d) ◦ R(r): Tcomp = D(d∗ ) ◦ R(r∗ ) ◦ D(d) ◦ R(r) ◦ R(r−∗ ) ◦ D(d−∗ ). Now, if we only examine the auto-information map in the displacement dimensions of T , i.e.: T (r, d) = D(d), we would compute the transformation Tcomp = D(d∗ ) ◦ R(r∗ ) ◦ D(d) ◦ R(r−∗ ) ◦ D(d−∗ ).
(16)
As the composition of a rotation, displacement and the inverse of the rotation operation is just another displacement, D(d ), and displacement operations commute, the D(d∗ ) terms cancel out: Tcomp = D(d∗ ) ◦ D(d ) ◦ D(d−∗ ) = D(d ) = R(r∗ ) ◦ D(d) ◦ R(r−∗ ). Thus the zero-rotation subspace of the autoinformation function is invariant to displacement. Accordingly, we can search for the unknown rotational component, by comparing subspace maps, without considering any potential displacement component of the aligning transformation. Such a reduction in search space facilitates a reduced computational cost in optimization. In a set of preliminary experiments, we looked at the zero rotation subspace of the auto-information map and searched for the rotational component of T ∗ in both a uni- and a multi-modal scenario. In Fig. 4, we show the results for these cases. In the former, we aligned a PD image to a transformed version of itself, while in the latter the MRI slice to the CT image. We optimized the sum of squared differences and the cross-correlation coefficient, respectively of the auto-information subspace maps, in order to estimate the best transformation component. We decided to apply these simple similarity measures as the surfaces to be compared were composed of the same type of measures, the autoinformation values (as opposed to intensities of different modalities, for example). Both of the registration results closely matched the ground truth rotation angle.
4
Conclusion
We provided a unified statistical and information theoretic framework for comparing several well known multi-modal image registration methods. The consequence of which was to illustrate the underlying assumptions which distinguish them. Specifically this served to clarify the assumed behavior of joint intensity statistics as a function of transformation parameters. This motivated the introduction of a latent variable generative model from which we were able to derive several interesting properties of the statistical dependencies across modalities. Significantly, we provided the first rigorous proof, to our knowledge, of the existence of a local maxima for the mutual information criterion about the point of correct registration in the context of the latent variable model.
376
L. Z¨ ollei, J.W. Fisher, and W.M. Wells
Fig. 4. Decoupled rotation angle search: (a) Unimodal search using the PD image – minimizing sum of squared errors (b) Multi-modal search using the MRI and CT images – maximizing cross-correlation coefficient. The ground truth solution in both images is indicated with the vertical line.
We also introduced the auto- and cross- information functions which characterize the joint intensity statistics as a function of the relative transformations between images within and across modalities. Several properties of the auto-information function, which can be computed from each modality independently, were derived analytically and verified empirically. One aspect of the auto-information function is that it facilitates decoupling of rotation and displacement parameters in the search space. Furthermore, our empirical results on anatomical data showed that the auto-information functions across modalities exhibit striking similarities which we conjecture can be exploited in multi-modal registration methods currently in development. Further theoretical and empirical analysis of the properties of the auto- and cross-information functions are the subject of future research. Acknowledgement. This work has been supported by NIH grant R21CA89449, by NSF ERC grant (JHU Agreement #8810-274), by the Whiteman Fellowship and The Harvard Center for Neurodegeneration and Repair.
References 1. A.C.S. Chung, W.M.W. Wells III, A. Norbash, and W.E.L. Grimson. Multi-modal image registration by minimizing kullback-leibler distance. In International Conference on Medical Image Computing and Computer-Assisted Intervention, volume 2 of Lecture Notes in Computer Science, pages 525–532. Springer, 2002. 2. T.M. Cover and J.A. Thomas. Elements of Information Theory. John Wiley & Sons, Inc., New York, 1991. 3. Kullback and Solomon. Information Theory and Statistics. John Wiley and Sons, New York, 1959. 4. M. Leventon and W.E.L. Grimson. Multi-modal volume registration using joint intensity distributions. In First International Conference on Medical Image Computing and Computer-Assisted Intervention, 1998.
A Unified Statistical and Information Theoretic Framework
377
5. A.M.C. Machado, M.F.M. Campos, and J.C. Gee. Bayesian model for intensity mapping in magnetic resonance image registration. Journal of Electronic Imaging, 12(1):31–39, Jan 2003. 6. F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, and P. Suetens. Multimodality image registration by maximization of mutual information. IEEE Transactions on Medical Imaging, 16(2):187–198, 1997. 7. J.P.W. Pluim, J.B.A. Maintz, and M.A. Viergever. Image registration by maximization of combined mutual information and gradient information. In Proceedings of MICCAI 2000, pages 567–578. 8. C. Studholme, D.L.G. Hill, and D.J. Hawkes. An overlap invariant entropy measure of 3d medical image alignment. Pattern Recognition, 32(1):71–86, 1999. 9. W.M. Wells III, P. Viola, and R. Kikinis. Multi-modal volume registration by maximization of mutual information [medical imaging]. In Proceedings of 2nd International Symposium on Medical Robotics and Computer Assisted Surgery, pages 358,55–62, 1995.
Appendix Both of the relationships in Eq. (14) result from extending the Data Processing Inequality theorem [2]. Accordingly, if X, Y and Z are random variables forming a Markov chain (X → Y → Z), then I(X; Y ) ≥ I(X; Z), i.e. no processing of Y can increase the information that Y contains about X. Proof I The relationship between the random variables in the first inequality of Eq. (14), vj ← lj − lk → vk , can be rewritten in two different forms using Bayes rule: vj ← lj ← lk ← vk and vj → lj → lk → vk . Given these and applying the Data Processing Inequality theorem, we arrive at the following: I(vk ; lk ) ≥ I(vk ; lj ) ≥ I(vk ; vj ) and I(lk ; lj ) ≥ I(lk ; vj ) I(vj ; lj ) ≥ I(vj ; lk ) ≥ I(vj ; vk ) and I(lj ; lk ) ≥ I(lj ; vk )
(17) (18)
Given I(X; Y ) = I(Y ; X), we can establish I(lj ; lk ) ≥ I(vj ; vk ) ∀ j, k. Proof II In a similar manner as above, we can obtain the following inequalities for uj , vj , lj , lk , vk : I(vj ; lj ) ≥ I(uj ; vj ) and I(vk ; lk ) ≥ I(vk ; lj ) ≥ I(vk ; uj ).
(19)
Again, using Bayes rule, we can establish the following relationships: vj ← lj ← lk ← uk and vj ← lj ← uj . As we assume that I(vk ; lk ) = I(vj ; lj ), we need to consider two scenarios: (a) if lk → lj indicates a lossless relationship, then I(uj ; vk ) = I(uj ; vj ), (b) if lk → lj indicates a lossy connection, then I(uj ; vk ) < I(uj ; vj ). Therefore, we can conclude that I(uj ; vj ) ≥ I(uj ; vk ).
Information Theoretic Similarity Measures in Non-rigid Registration William R. Crum, Derek L.G. Hill, and David J. Hawkes Division of Imaging Sciences, The Guy’s, King’s and St Thomas’ School of Medicine, Guy’s Hospital, London SE1 9RT,UK [email protected]
Abstract. Mutual Information (MI) and Normalised Mutual Information (NMI) have enjoyed success as image similarity measures in medical image registration. More recently, they have been used for non-rigid registration, most often evaluated empirically as functions of changing registration parameter. In this paper we present expressions derived from intensity histogram representations of these measures, for their change in response to a local perturbation of a deformation field linking two images. These expressions give some insight into the operation of NMI in registration and are implemented as driving forces within a fluid registration framework. The performance of the measures is tested on publicly available simulated multi-spectral MR brain images.
1
Introduction
Mutual Information (MI) was proposed as an image similarity measure useful for image registration independently by Collignon et al [1] and Wells [2] and Viola [3]. It is now widely used in rigid and affine registration. Some modifications to MI have been suggested which make it more robust to large mis-alignment of images; the Normalised Mutual Information (NMI) [4] and the Entropy Correlation Coefficient [5] are the best known of these. The evolution of all of these measures has been rather ad hoc; they are used because they appear to work. Non-rigid registration is now being applied to more challenging tasks for which MI and NMI are currently the best available similarity measures. Although, under certain circumstances, a function of MI can be derived as the optimal similarity measure linking two images [6], the use of NMI in particular has little theoretical support. In registration, the response of a similarity measure to a change in registration parameter must be evaluated. This can be done in several ways but an attractive approach is to compute derivatives of the similarity measure with respect to the transformation parameters. These derivatives can be used in an optimisation scheme to find the registration parameters, which maximise the image-similarity measure. This maximum is assumed to correspond with the optimal image alignment. For similarity measures such as mean-square intensity difference or intensity cross-correlation, analytic expressions can be derived for these derivatives. However for MI and NMI, an empirical calculation is often performed to obtain numerical values for the derivaC.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 378–387, 2003. © Springer-Verlag Berlin Heidelberg 2003
Information Theoretic Similarity Measures in Non-rigid Registration
379
tives, perhaps because the most common way to compute these measures is from discrete intensity histograms. In this paper we derive analytic expressions for derivatives of Joint Entropy (JE), MI and NMI by considering a point perturbation of a displacement vector field. The expressions give a gradient vector at each voxel which indicates the direction in which perturbing the associated displacement vector results in the largest increase in the image similarity measure. We have used these gradient vectors as driving forces in a fluid registration framework [7] and present some initial experiments inspired by [8] to establish the behaviour of the technique. These experiments involve a comparison between fluid registration driven by JE, MI and NMI and fluid registration driven by intensity cross-correlation using simulated image data where the ground truth transformation is known.
2
Methods
First, the analysis, which leads to the derivatives of JE, MI and NMI, is presented together with a brief discussion of the operation of these measures as driving forces. In the second part of the methods, the experiments using these derivatives to drive nonrigid registration are described. 2.1
Theory : The Force at a Voxel in Non-rigid Registration
We consider a non-rigid registration scenario where a source image, (B), is being warped into the space of a target image, (A). The registration is an iterative process and a pseudo-time variable, t, defines the current state of the registration. The source image, (B), is assumed to be a function of t as it is updated to reflect the most up-todate estimate of the transformation linking the two images. We assume the target image, (A), is static and defines a fixed region over which the similarity measures are computed. Writing HAB(t) for the joint entropy and HA and HB(t) for the marginal entropies of the two images the standard definitions of the Mutual Information and Normalised Mutual Information are:
M AB (t ) = H A + H B (t ) − H AB (t )
(1)
N AB (t ) = (H A + H B (t ) ) H AB (t )
(2)
The marginal entropies of the target and source image and the joint entropy are defined in the usual way, as functions of the marginal (Qj and Pi) and joint (pij) intensity probabilities.
[ ]
H A = − ∑ Q j log Q j and H B = −∑ Pi log[Pi ] j
i
(3)
380
W.R. Crum, D.L.G. Hill, and D.J. Hawkes
The force at a voxel can be obtained by writing down the effect of an infinitesimal change at that point on the global JE, MI and NMI. The voxel intensity of a point in the source image corresponding to the point in the target image can be obtained using the current estimate of the transformation field. The coordinate system is set up so that a voxel at coordinates X in the target image corresponds to a voxel at X-u(X) in the source image where u() is a vector displacement field defined at each voxel. A change in the x-component of a particular displacement vector will generally have an effect on entries in the marginal and joint probability histograms and affect HB(t) and HAB(t). Fig. 1 shows the relationship between the spatial position of corresponding voxels, the immediate voxel neighbors in the source image, and the intensity bins they contribute to in the joint intensity histogram.
m
-u(X) X
X
Target Intensities
Target, A
Qm
m
r
s
t
Source, B
pmr
pms
pmt
r
s
t
Pr Ps Pt Source Intensities Fig. 1. The relationship between the spatial location of corresponding voxels in the target, (A), and source, (B), images and entries in the joint intensity histogram.Voxels are labelled with bins - m in the target image and r, s and t in the source image - corresponding to their intensities. pmr, pms and pmt are joint intensity probabilities for the specified pairs of bins and Pr, Ps and Pt and Qm are marginal intensity probabilities for the source (P) and target (Q) respectively.
2.1.1 The Joint-Entropy Force From the definition of the JE and with reference to fig. 1, an expression for the derivative of JE with respect to a small positive perturbation of the x-component of the displacement vector can be written:
Information Theoretic Similarity Measures in Non-rigid Registration
381
∂H AB(t )
(4)
∂u x
= −(1 + log pms ) u x+
∂pms ∂p − (1 + log pmr ) mr ∂u x ∂u x
The differential terms on the right-hand-side of equation 4 can be written explicitly in terms of the total number of voxels in each image, T. This result can be obtained by examination of figure 2 where each voxel is considered to have unit dimensions and initially to contribute one unit to an intensity histogram bin. The effect of a small positive increase in the x-coordinate of the displacement vector can be modelled as a fractional decrease, -δx, in the bin corresponding to intensity s and a fractional increase, δx, in the bin corresponding to intensity r. This is equivalent to a Partial Volume interpolation scheme [5]. (The reason that this approach is tractable is that an equivalence has been assumed between the infinitesimal spatial displacement and the infinitesimal change in bin contents. This is an assumption, which may not always be appropriate.) Therefore the changes in contents of these bins are given by δns = −δx and δnr = +δx respectively and converting to probabilities and taking the limit as δxÆ0 gives:
∂Ps 1 ∂Pr 1 =− , =+ , ∂u x T ∂u x T
∂pms 1 ∂pmr 1 =− , =+ T ∂u x ∂u x T
(5)
In equation 5, T is the total number of voxels in the image. Inserting these results into equation 4 gives equation 6:
∂H AB (t ) ∂u x
ux +
1 = − log[pmr p ms ] T
(6)
By performing a similar construction for a small negative perturbation of the displacement vector and adding, an expression for the net JE force at the voxel is obtained:
Ex =
∂H AB (t ) ∂u x
=−
1 log[pmr pmt ] T
(7)
Note that Ex is negative if pmr > pmt and therefore an increase in ux will increase the JE by tending to decrease pmr and increase pmt. This corresponds to an increase in the dispersion in the joint intensity histogram. If pmr < pmt, the sign of the force changes and is still in the direction so as to increase the JE. For image registration we usually want to minimise the JE so we choose the negative of Ex as the driving force. Note that the force does not depend on the joint intensity probability at the voxel of interest; this is akin to a centered finite-difference representation of an intensity gradient. Components of this force in the y and z directions follow similarly where probabilities associated with neighbouring voxels in the appropriate directions will replace those in equation 7.
382
W.R. Crum, D.L.G. Hill, and D.J. Hawkes
-u’
-u
δx
r
δx s
t
Fig. 2. The effect of a small positive change in the displacement vector u on the contribution a voxel initially of intensity s makes to the intensity histogram
2.1.2 The Mutual Information Force A similar analysis can be performed starting with the definition of the MI in equation 1 to yield the expression for the MI force:
Fx =
∂M AB (t ) ∂u x
=
p 1 log mr T Pr
pmt Pt
(8)
This is equivalent to a special case of the expression for the gradient of the MI in [9]. This force is towards the voxel with the larger ratio of joint and marginal probabilities and acts to increase the larger ratio and reduce the smaller ie if the logarithmic term is positive then the force is positive and towards voxel r. When the marginal probabilities are similar, the force acts to minimise the JE as before. 2.1.3 The Normalised Mutual Information Force Finally the NMI force can be obtained from the definition in equation 2 to yield equation 9 (where quantities with superscripts are raised to the power of the superscript).
Gx =
∂N AB (t ) ∂u x
=
H A + H B (t ) pmr 1 log H AB (t ) 2 T ⋅ H AB (t ) Pr
H + H B (t )
pmtA Pt
H AB (t )
(9)
It is instructive to use equation 1 to rewrite equation 9 as follows:
Gx =
1 H
2 AB (t )
(H
F − M AB (t )E x )
AB (t ) x
(10)
Now it can be seen from equation 10 that the NMI force is composed of two competing terms: an MI term and a JE term. When the JE between the two images is large (which implies that the MI is small), the NMI force is dominated by a component which tries to maximise the MI. Conversely, when the JE is small implying that the images are quite well matched, the MI will be large and the NMI force will be dominated by a term seeking to minimise the JE.
Information Theoretic Similarity Measures in Non-rigid Registration
2.2
383
Experiments : JE, MI, and NMI Used to Drive Fluid Registration
The results of section 2.1 were used to implement driving forces within a fluid registration framework. During each registration iteration the force was computed at each voxel in the source image. The fluid registration code solved the Navier Stokes equation for a viscous compressible fluid using Successive Over Relaxation and a Full Multi-Grid approach as described in [10] to generate a diffeomorphic displacement field from the input force-field. The number of histogram bins was set as 128 after some experimentation; the registration was relatively insensitive to larger numbers of bins but was increasingly sensitive to smaller numbers of bins. The other parameters used in the fluid registration are in Table 1. The registration terminated when the maximum number of iterations had been reached or earlier if the global similarity measure satisfied a run-test indicating that it hadn’t changed significantly over the previous few iterations. In neither case will the driving forces be zero but if the runtest is satisfied then signficant further improvements in image matching are not possible under the constraints of the fluid model. Note that this will even be the case when images are “perfectly” matched since measures like JE are only truly minimised for intensity probability distributions with zero dispersion. We followed the approach in [8] and tested our algorithm on “gold-standard” warped images of the Montreal Neurological Institute BrainWeb digital brain phantom [11][12]. The T1-weighted BrainWeb image was non-rigidly registered and transformed to match a volunteer study using existing B-spline control-point based software [13]. Independent Rician noise was added to the original and transformed BrainWeb images to generate a pair of images with corresponding anatomy and tissue intensity characteristics but with uncorrelated noise and warped with respect to each other by a non-rigid deformation. The same transformation and noise process was applied to the T2 and PD-weighted BrainWeb images. Experiments to register warped T1, T2 and PD images back to a T1 standard could now be performed and the recovered transformation compared with the known transformation used to generate the warped images. Twelve registrations in total were performed. There were 3 registration scenarios (T1-T1, T2-T1 and PD-T1) and 4 similarity measures tested (JE, MI, NMI and intensity cross-correlation [10]). Transformations were compared by computing the rootmean-square difference between the fluid registration voxel-displacement vectors and the corresponding “gold-standard” voxel-displacement vectors over the brain volume. Table 1. Parameters used in the fluid registration Isotropic Image Dimensions
181x217x181
Padded Image Dimensions
225x241x225
Lame Constants
µ=0.01, λ=0.0
Regridding Threshold
0.5
Maximum Iterations
400
384
3
W.R. Crum, D.L.G. Hill, and D.J. Hawkes
Results
For fluid registration using MI, fig. 3 shows the joint intensity histogram before and after registration for each of the registration scenarios. It can be seen that in all cases the amount of dispersion in the histogram is very effectively reduced by the registration although this does not in itself prove that the registration is correct. Similar plots for the JE and NMI cases appear visually very similar as the registrations differ only in the degree to which the changing marginal entropy of the source image affects the image similarity measure.
PD-T1
T1-T1
T2-T1
Fig. 3. The change in the joint intensity histogram pre- (top row) and post- (bottom row) fluid registration using Mutual Information driving forces. The bin contents are displayed on a log intensity scale. It can be seen that dispersion in the histograms is dramatically reduced postregistration and that structure evident in the histograms pre-registration is retained and enhanced post-registration
Fig. 4 shows the evolution of JE, and MI for the T1-T1, T2-T1 and PD-T1 cases. The convergence of registrations using NMI was very similar to MI and has not been shown in the interests of brevity. In all cases the registration acts to improve the image similarity measure; the PD-T1 case terminates earlier than the other two cases. Finally in fig. 5, the rms transformation error for each registration scenario and each similarity measure can be seen in relation to that obtained for the T1-T1 case registered using intensity cross-correlation driven fluid registration. T1-T1 and PD-T1 cases have consistent and low rms errors but there is more variation in the T2-T1 case. The intensity cross-correlation fluid registrations of T2-T1 and PD-T1 produced errors an order of magnitude larger than the scale in figure 5 and have been omitted.
4
Discussion
In this paper we have derived simple results, which enable JE, MI and NMI to be used as voxel-level similarity measures in image registration. These results have a pleasingly simple form and some insight has been gained by relating the result for NMI to those for MI and JE. Specifically, maximising NMI seems to balance the twin desires to maximise MI and minimise the dispersion in the joint intensity histogram.
Information Theoretic Similarity Measures in Non-rigid Registration
385
Joint Entropy as a Function of Ite ration for Histogram Fluid Re gistration 4.5 4.4
Joint Entropy
4.3 4.2 4.1 4 3.9 3.8
T1-T1
3.7
T2-T1
3.6
PD-T1
3.5 0
20
40
60
80
100
120
140
160
140
160
Iteration
M utual Information as a Function of Ite ration for Histogram Fluid Re gistration 1.5 1.4 Mutual Information
1.3 1.2 1.1 1 0.9 0.8
T1-T1
0.7
T2-T1
0.6
PD-T1
0.5 0
20
40
60
80
100
120
Iteration
Fig. 4. Convergence properties of fluid registration driven by JE, and MI
RMS Transformation Error for Fluid Registration Compared with Gold-Standard Deformation Field 5 NMI MI RMS Transformation Error / mm
4
JE Correlation
3
2
1
0 T1-T1
T2-T1
PD-T1
Fig. 5. The RMS transformation error for each registration scenario run with each similarity measures. The dashed line shows the rms error for the T1-T1 case run with intensity crosscorrelation for comparison. For the T2-T1 and PD-T1 cases run with intensity cross correlation the RMS errors were > 100mm. The imposed rms error was 0.8mm over brain
386
W.R. Crum, D.L.G. Hill, and D.J. Hawkes
In terms of the methodology, we have had to choose the number of bins used to construct the intensity histograms. In addition, because the JE, MI and NMI forces are constructed at a voxel level, there is a potential source of bias in the positioning of bin-boundaries. In this work we have attempted to reduce this bias by setting the force between voxels not separated in intensity by at least a bin width to zero. In future work we will pursue a similar analysis using a small number of voxel samples and Parzen windows techniques to build continuous representations of the intensity histograms which may prove less biased. By adopting the validation experiment of [8] we have shown that fluid registration using these similarity measures achieves rms transformation errors far less than 1mm for an exemplar T1-T1 registration. For the exemplar PD-T1 registration the error remains ≤ 0.5mm but for the T2-T1 registration the rms errors vary between ≅0.5mm (MI), ≅1.0mm (NMI) and ≅1.5mm (JE). The latter case warrants further investigation although it seems probable that the registration is confounded by the widely different appearance of tissues surrounding the brain in T1 and T2-weighted images. Applying a brain mask to the images prior to registration might solve this problem. It is perhaps unrealistic to expect any registration scheme driven purely by voxel-intensity information to perform perfectly for image pairs where there is not a one-to-one correspondence between intensities as can be seen in the examples in figure 3. In addition, noise and the fact that voxels do not generally have a unique context within medical images mean that even in our experiment, the “gold-standard” transformation we attempted to recover is undoubtedly not the only one, which could be defined to map the identifiable features in the images. Some detail of the “gold-standard” transformation is also down to the choice of transformation model ie the registration software we used to generate the “gold-standard” transformation used a B-spline control-point interpolation scheme with 10.0mm spacing between nodes whereas our fluidregistration software produces a voxel-level displacement vector field. The problem of validation of non-rigid registration remains unsolved although in certain applications, biomechanical models may be of use in producing known and realistic registration problem scenarios [14]. A different approach to the same problem is described in [8] where the MI driving force is obtained using variational calculus in conjunction with Parzen Windows techniques and the fluid equations are solved using a convolution filter approach [15]. The number of histogram bins must still be chosen as in our approach and computation times appear comparable per-iteration for registration of similarly sized volumes. Our analysis, while simple, is more easily interpretable in terms of the effect of driving forces on the joint intensity histogram.
Acknowledgements. The authors would like to thank Dr Lewis Griffin for useful discussions and the four referees for valuable critiques. Bill Crum is funded by the UK EPSRC/MRC Medical Images and Signals IRC. This paper would not have been possible without the opportunities afforded by this programme.
Information Theoretic Similarity Measures in Non-rigid Registration
387
References 1. 2.
3. 4. 5.
6.
7. 8.
9.
10.
11.
12.
13.
14.
15.
Collignon, A. : Multi-modality medical image registration by maximization of mutual information, PhD thesis, Catholic University of Leuven, Leuven, Belgium (1998) Wells, W.M., Viola., P., Atsumi, H., Nakajima, S., Kikinis, R. : Multi-modal volume registration by maximization of mutual information, Medical Image Analysis 1(1) (1996) 35–51 Viola, P. : Alignment by maximization of mutual information, PhD Thesis, Massachusetts Institute of Technology, Boston, MA, USA (1995) Studholme, C., Hill, D.L.G., Hawkes, D.J. : An overlap invariant entropy measure of 3D medical image alignment, Pattern Recognition 32 (1) (1999) 71–86 Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Suetens, P. : Multi-modality image registration by maximization of mutual information, IEEE Transactions on Medical Imaging 16 (2) (1997) 187–198 Roche, A., Malandain, G., Ayache, N., Prima, S. : Towards a better comprehension of similarity measures used in medical image registration, Proceedings of MICCAI 1999, LNCS 1679 (1999) 555–566 Christensen, G.E., Rabbitt, R.D., Miller, M.I. : Deformable Templates Using Large Deformation Kinematics, IEEE Transactions on Image Processing 5 (10) (1996) 1435–1447 D’Agostino, E., Maes, F., Vandermeulen, D., Suetens, P., A Viscous Fluid Model for Multimodal Non-rigid Image Registration Using Mutual Information, Proceedings of MICCAI 2002 Part II, LNCS 2489, (2002) 541–548 Maes, F., Vandermeulen, D., Suetens, P. : Comparative Evaluation of Multiresolution Optimization Strategies for Multimodality Image Registration by Maximization of Mutual Information, Medical Image Analysis 3(4) (1999) 373–386 Freeborough, P.A., Fox, N.C. : Modeling Brain Deformations in Alzheimer disease By Fluid Registration of Serial 3D MR Images, Journal of Computer Assisted Tomography 22 (5), (1998) 838–843 Kwan, R.K.-S., Evans, A.C., Pike, G.B.: MRI simulation-based evaluation of imageprocessing and classification methods, IEEE Transactions on Medical Imaging 18 (11) (1999) 1085–97 Collins, D.L., Zijdenbos, A.P., Kollokian, V., Sled, J.G., Kabani, N.J., Holmes, C.J., Evans, A.C. : Design and Construction of a Realistic Digital Brain Phantom, IEEE Transactions on Medical Imaging 17 (3) (1998) 463–468 Rueckert, D., Sonoda, L.I., Hayes, C., Hill, D.L.G., Leach, M.O., Hawkes, D.J. : Nonrigid Registration Using Free-Form Deformations : Application to Breast MR Images, IEEE Transactions on Medical Imaging 18 (8) (1999) 712–721 Schnabel, J.A.,Tanner, C., Castellano-Smith, A.D., Leach, M.O., Hayes, C., Degenhard, A., Hose, R., Hill, D.L.G., Hawkes, D.J. : Validation of Non-rigid Registration Using Finite Element Methods, IPMI 2001, LNCS 2082, (2001) 344–357 Bro-Nielsen, M., Gramkow, C. : Fast Fluid Registration of Medical Images, VBC’96, LNCS 1131, (1996) 267–276
A New & Robust Information Theoretic Measure and Its Application to Image Alignment F. Wang1 , B.C. Vemuri1 , M. Rao2 , and Y. Chen2 1
Department of Computer & Information Sciences & Engr., 2 Department of Mathematics University of Florida, Gainesville, FL 32611
Abstract. In this paper we develop a novel measure of information in a random variable based on its cumulative distribution that we dub cumulative residual entropy (CRE). This measure parallels the well known Shannon entropy but has the following advantages: (1) it is more general than the Shannon Entropy as its definition is valid in the discrete and continuous domains, (2) it possess more general mathematical properties and (3) it can be easily computed from sample data and these computations asymptotically converge to the true values. Based on CRE, we define the cross-CRE (CCRE) between two random variables, and apply it to solve the image alignment problem for parameterized (3D rigid and affine) transformations. The key strengths of the CCRE over using the mutual information (based on Shannon’s entropy) are that the former has significantly larger tolerance to noise and a much larger convergence range over the field of parameterized transformations. We demonstrate these strengths via experiments on synthesized and real image data.
1
Introduction
The concept of entropy is central to the field of information theory and was originally introduced by Shannon in his seminal paper [14], in the context of communication theory. Since then, this concept and variants thereof have been extensively utilized in numerous applications of science and engineering. To date, one of the most widely benefiting application has been in data compression and transmission. Shannon’s definition of entropy originated from the discrete domain and its continuous counterpart called the differential entropy is not a direct consequence of the definition in the discrete case. Note that the Shannon definition of entropy in the discrete case does not converge to the continuous definition [6]. Moreover, the definition in the discrete case,which states that the entropy H(X) in a random variable, X, is H(X) = − x p(x)log(p(x)) is based on the density of the random variable p(X), which in general may or may not exist e.g., for cases when the cumulative distribution function (cdf) is not differentiable. It would not be possible to define the entropy of a random variable
This research was in part funded by the NIH grant NS42075.
C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 388–400, 2003. c Springer-Verlag Berlin Heidelberg 2003
A New & Robust Information Theoretic Measure
389
for which the density function is undefined. However, it seems plausible that entropy should exist for such cases as well. In this paper, we will define a new measure of information in a random variable that will overcome the aforementioned drawbacks of the Shannon entropy and has very general properties as a consequence. We will then derive some interesting properties and state some theorems which are proved elsewhere [4]. Following this, we will use this new measure to define cross-CRE (CCRE) and use it in the image alignment problem and compare it to methods that use the Shannon entropy in defining a matching criterion specifically, the mutual information. 1.1
Previous Work
There are several information theoretic measures that have been reported in literature [7] since the inception of Shannon’s entropy in 1948 [14]. Some of these are more general than others but all of them, like Shannon’s entropy, were defined based on the probability density function. This is the point of departure in our approach i.e., our definition is based on cumulative distribution instead and has some very interesting properties some of which are discussed subsequently. In the context of the image alignment problem, information theoretic measures for comparing image pairs differing by an unknown coordinate transformation has been popular since the seminal works of Viola & Wells [18] and Collignon et.al., [5]. There are numerous methods in literature for solving the image alignment problem. Broadly speaking, these can be categorized as feature-based and direct methods. The former typically compute some distinguishing features and define a cost function whose optimization over the space of a known class of coordinate transforms leads to an optimal coordinate transformation. The latter set of methods involve defining a matching criterion directly on the intensity image pairs. We will briefly review the direct methods and refer the reader to the survey [11] for others. Sum of squared differences (SSD) has been a popular technique for image alignment [16,17]. Variants of the original formulation have been able to cope with the deviations from the image brightness constancy assumption [8]. Other matching criteria use of statistical information in the image e.g., correlation ratio [12] and maximum likelihood criteria based on data sets that are preregistered [10]. Image alignment is achieved by optimizing these criteria over a set of parameterized coordinate transformations. The statistical techniques can cope with image pairs that are not necessarily from the same imaging modality. Another direct approach is based on the concept of maximizing mutual information (MI) – defined using the Shannon entropy – reported in Viola and Wells [18], Collignon et al., [5] and Studholme et al., [15]. MI between the source and the target images that are to be aligned is maximized using a stochastic analog of the gradient descent method in [18] and other optimization methods such as the Powells method in [5] and a multiresolution scheme in [15]. Reported registration experiments in these works are quite impressive for the case of rigid motion. In [15], Studholme et.al., presented a normalized MI (NMI) scheme for
390
F. Wang et al.
matching multi-modal image pairs misaligned by a rigid motion. Normalized MI was shown to be able to cope with image pairs not having the same field of view (FOV), an important and practical problem; for an alternative competing method, see Liu et.al., [9] wherein local frequency representations of the images are matched in a statistically robust framework. In recent times, most of the effort on the MI-based methods has been focussed on coping with non-rigid deformations between the source and target multi-modal data sets [13,3].
2
Cumulative Residual Entropy: A New Measure of Information
In this section we define our new information theoretic measure and present some properties/theorems. We do not delve into the proofs but refer the reader to a more comprehensive mathematical (technical) report [4]. Definition: Let X be a random vector in RN , we define the cumulative residual entropy (CRE) of X, by : P (|X| > λ) log P (|X| > λ)dλ (1) E(X) = − RN +
Where X = (X1 , X2 , ..., X N ), λ = (λ1 , ....λN ) and |X| > λ means |Xi | > λi and N N R+ = X ∈ R ; Xi ≥ 0 Proposition 1 If Xi are independent, then E(X) = E(|Xj |) E(Xi ) i
i =j
Proposition 2 (Weak Convergence). Let the random vectors X k converge in distribution to the random vector X; by this we mean lim E[ϕ(X k )] = E[ϕ(X)]
k→∞
(2)
for all bounded continuous function φ on RN , if all the X k are bounded in Lp for some p > N , then lim E(X k ) = E(X) (3) k→∞
Definition: Given random vectors X and Y ∈ RN , we define the conditional CRE E(X|Y ) by : P (|X| > x|Y ) log P (|X| > x|Y )dx (4) E(X|Y ) = − RN +
A New & Robust Information Theoretic Measure
391
Proposition 3 For any X and Y E[E(X|Y )] ≤ E(X)
(5)
Equality holds iff X is independent of Y . This is a useful property and is analogous to the Shannon entropy case. Definition: The continuous version of the Shannon entropy called the differential entropy [6] H(X) of a random variable X with density f is defined as H(X) = −E[log f ] = − f (x) log f (x)dx The following proposition describes the relationship between CRE and the differential entropy and we prove that the CRE is exponentially larger than the differential entropy. This in turn will influence the relations between quantities derived from E(X) and H(X) such as mutual information which will be used in the alignment process. Proposition 4 Let X ≥ 0 have density f , then, E(X) ≥ C exp(H(X)),
C = exp(
1
0
log(x| log x|)dx)
(6)
∞ Proof: Let G(x) = P [X > x] = x f (u)du using the Log-Sum inequality [6] we have, ∞ f (x) 1 1 f (x) log = log dx ≥ log ∞ G(x)| log G(x)| E(X) G(x)| log G(x)|dx 0 0 The left hand side in (7) equals ∞ −H(X) − f (x) log(G(x)| log G(x)|)dx 0
so that,
H(X) +
0
∞
f (x) log(G(x)| log G(x)|)dx ≤ log E(X)
Finally a change of variable gives: 0
∞
f (x) log G(x)| log G(x)| dx =
1 0
log x| log x| dx
Using the above and exponentiating both sides of (7), we get (6)
Definition: The mutual information I(X, Y ) of two random variable X and Y using Shannon entropy is defined as : I(X, Y ) = H(X) − E[H(X/Y )]
(7)
392
F. Wang et al.
We define a quantity called the cross-CRE (CCRE) given by C(X, Y ) = E(X) − E[E(Y /X)]
(8)
Note that I(X, Y ) is symmetric but C(X, Y ) need not be. We can define a symmetrized version of CCRE by adding E(Y ) − E[E(X/Y ) to C(X, Y ) and premultiplying it by a factor of 12 . From Proposition 3, we know that symmetrized CCRE is non-negative. We will however use the non-symmetric CCRE in all our image alignment experiments as it was sufficient to yield the desired results. We empirically show its superior performance under low SNR and also depict its larger capture range with regards to the convergence to the optimal parameterized transformation. The image alignment problem that we solve in this paper can now be stated as finding the coordinate transformation T – in our case, represented as a parameterized rigid or affine transformation in 3D – that would maximize the CCRE measure C(I1 (T (x)), I2 (x)), x = (x, y)T , over all the appropriate (rigid, affine) class of transformations T . 2.1
Estimating Empirical CRE
In order to compute CRE of an image, we use the histogram of an image to estimate the P (X > λ) where X corresponds to the image intensity which is considered as a random variable. Note that as a consequence of property 2, empirical CRE computation based on the samples will converge in the limit to the true value. This is not the case for the Shannon entropy computed using histograms to estimate the probability density functions, as is usually done in current literature. In the case of CRE, we have, ∞ E(X) = − P (X > λ) log P (X > λ)dλ 0 P (X > λ) log P (X > λ) (9) =− λ
Hence, using a histogram to compute the CRE is well defined and justified theoretically. Note that estimating E(X/Y ) is done using the joint histogram and then marginalizing it with respect to the conditioned variable.
3
Experiment Results
In this section we present four sets of experiments, two of them are on simulated data sets and two on real data sets. The first set consists of an example of several 3D registrations under varying noise conditions used to depict the robustness property of CCRE over MI and NMI in the presence of noise. The second set consists of 3D MR T1 and T2 weighted data sets obtained from the Montreal Neurological Institute (MNI) database. The data sets were artificially misaligned by known rigid transformations and our algorithm as well as the MI and NMI
A New & Robust Information Theoretic Measure
393
schemes were used to estimate the transformation and then compared. The third set consists of a real data experiments involving rigid motion. The data which we use in this set are CT-MR volume data sets obtained locally from the University Hospital. The fourth sets consists of MR-T1 and DWIs of a mouse brain where images differ by an affine motion in 3D (scaling between the images is due to the voxel resolution differences). 3.1
Robust Property of the CCRE Measure
In this section, we demonstrate the robustness property of CCRE and hence justifying the use of the CCRE measure in the registration problem. This is achieved by showing the noise immunity for CCRE over the MI and NMI algorithms. Noise, as one of the most typical degradation effects in the medical image, alters the intensity distribution of the image which may affect the MI registration criterion. The data we use for this experiment is an MR T1 and T2 image pair from the brainweb site at the Montreal Neurological Institute [2]. They are originally aligned with each other. A single corresponding slices are shown in Fig1. The two volumes are defined on a 1mm isotropic voxel grid in Talairach space, with dimension (181 × 217 × 181). The effect of noise in (a)
(b)
Fig. 1. Display of a single slice of aligned a) T1 weighted MR and b) T2 weighted MR images used in the computation of C and I over the range of rotations.
the input image pairs was evaluated by comparing CCRE, MI and NMI traces obtained for the degraded (via addition of noise) T1 and T2 image pair. The traces are computed over the rotation angle that was applied to the T2 image to achieve the misalignment between the T1 and T2 pair. In each plot of the Fig 2 the X-axis shows the rotation angle, while the Y -axis shows the values of CCRE, MI and NMI computed between the misaligned (by a rotation) image pairs. The original MR T1 and T2 data intensities range from 0-255 with the mean 55.6 and 60.6 respectively. Zero mean Gaussian noise of varying standard deviation 0, 20, 37 was added to the image pair and for each level of variance, CCRE, MI and NMI were computed between the two noisy pair of images. Note that the SNR in these experiments was at a level where the Riccian noise in the MRI magnitude data can be well approximated by the Gaussian. Fig.(2) shows (X-axis: rotation angle, Y-axis: CCRE, MI, NMI) that increasing the level of
394
F. Wang et al.
noise results in a decrease in magnitude of CCRE, MI and NMI values respectively. The range of the traces of all the three match measures also decreases. For each noise level, CCRE shows a significantly larger range of values compared to MI and NMI. This has a significant influence in shaping the robustness property of CCRE with respect to noise in the data as well as the capture range in finding the optimal alignment. Traditional MI
Cross CRE
Normalized MI
35
25
0.8
20
0.6
15
0.4
1.1 1.08
10 −40
1.12
1
30
−20
0 20 Cross CRE
40
0.2 −40
1.06 1.04 −20 0 20 Traditional MI
40
15
1.02 −40
−20 0 20 Normalized MI
40
−20 0 20 Normalized MI
40
−20
40
1.08
10
0.4
5
0.2
1.06 1.04
0 −40
−20
0 20 Cross CRE
40
0 −40
1.02 −20 0 20 Traditional MI
40
1 −40
5 4
0.2
1.02
3
0.15
1.015
2
0.1
1.01
1 0 −40
0.05 −20
0
20
40
−40
1.005 −20
0
20
40
−40
0
20
Fig. 2. Effects of the presence of the noise on CCRE, MI and NMI. CCRE, MI and NMI traces plotted for the misaligned T1 & T2 image pair where misalignment is generated by a rotation of the MR T2 weighted image over the range −40◦ to 40◦ . First row: no noise; Second row: σ = 20; Third row: σ = 37.
3.2
Synthetic Motion Experiments
In this section, we demonstrate the robustness property of CCRE and will make a case for its use over MI and NMI in the alignment problem. The case will be made via experiments depicting superior performance in matching the misaligned image pairs under noisy inputs and in addition via the depiction of a larger capture range in the estimation of the motion parameters.
A New & Robust Information Theoretic Measure
395
Rigid Motion. In this section, we show the algorithm performance for intermodality rigid registrations. All the examples contain synthesized misalignments applied to same MR data sets as in the first experiment. With the MR T1 weighted image as the source, the target image is obtained by applying a known 3D rigid transformation to the MR T2 image. Next, we applied CCRE, MI and NMI algorithms to estimate motion parameters in 30 cases of misaligned MR T1 and T2 pairs. The 30 cases were generated using 30 randomly generated rigid transformations to misalign the T1 and T2 image pair. These 30 transformations are normally distributed around the values of (10◦ , 5mm), with standard deviations of (3◦ , 3mm) for rotation and translation respectively. Table 1 shows the statistics of errors resulting from the three different methods (CCRE, MI and NMI). Six parameters are displayed in each cell. The first three are rotation angles(in degrees), while the next three values show the translations(in mm). Both the rotation and translation parameters are in (x, y, z) order. For most of the cases, the average error of rotation angle is less than 0.5◦ and the translation is less than 0.5mm. Out of the 30 trials, the traditional MI and NMI failed 3 times and 4 times respectively, while our CCRE does not fail for any of the 30 trials.(“failed” here and subsequent usages means that the numerical algorithm for optimizing the cost function did not converge within 500 iterations of the optimizer). If we only count the cases which resulted in acceptable results, as shown in the first (for CCRE), second (for MI) and third (for NMI) rows, CCRE, MI and NMI have a comparable performance, all being quite accurate. Table 1. Statistics (computed from 30 cases) comparison for rigid motion estimation errors between CCRE, traditional MI and NMI.
CCRE Traditional MI Normalized MI (NMI)
0.1255◦ 0.0764 0.1497◦ 0.0728 0.2167◦ 0.0629
mean 0.0905◦ 0.0617 0.1049◦ 0.0499 0.1224◦ 0.0881
0.0654◦ 0.0536 0.1004◦ 0.0413 0.1663◦ 0.0335
standard deviation 0.1254◦ 0.1266◦ 0.0492◦ 0.0541 0.0540 0.0484 0.1792◦ 0.0857◦ 0.0952◦ 0.0616 0.0410 0.0374 0.1411◦ 0.1003◦ 0.0869◦ 0.0574 0.1261 0.0372
In the second experiment, we compare the robustness of the three methods (CCRE, MI and NMI) in the presence of noise keeping the misalignment fixed. Still choosing the MR T1 image pair from the previous expt. as our source image, we generate the target image by applying a known rigid motion to the T2 weighted MR image. We conduct this experiment by varying the amount of Gaussian noise added and then for each instance of the added noise, we register the two images using the three techniques. We expect all the schemes to fail at some level of noise. By comparing the noise variance of the failure point, we can comment on the degree to which these methods are tolerant to noise. We choose the fixed motion to be 10◦ rotation, and 5 pixel translation in X and Y directions
396
F. Wang et al.
respectively. The numerical schemes we used to implement these registrations are all based on the sequential quadratic programming (SQP) algorithm [1]. Table 2 shows the registration results for the three schemes. From the table, we observe that the traditional MI fails when the σ of the noise is increased to 13. It is slightly better for NMI, which fails at 16, while CCRE is tolerant until σ = 40, a significant difference when compared to the traditional MI and the NMI methods. This experiment conclusively depicts that CCRE has more noise immunity than both traditional MI and the NMI. Next, we fix the variance of noise and vary Table 2. Comparison of the registration results between CCRE and other MI algorithms for a fixed synthetic motion and varying noise.The image intensity range before adding noise is 0-255. The true motion here is (10◦ , 10◦ , 10◦ , 7, 7, 7) σ 10 12 15 16 40 42
Cross CRE traditional MI normalized MI 9.9809◦ 9.9924◦ 10.0497◦ 9.9523◦ 10.0038◦ 10.0497◦ 10.1356◦ 9.7861◦ 10.5424◦ 6.9833 6.8536 7.0635 7.0067 6.9599 7.0213 7.0156 6.9311 7.0976 9.9351◦ 10.0325◦ 10.0554◦ 10.1471◦ 10.1643◦ 10.2158◦ 10.1757◦ 10.4617◦ 10.4603◦ 7.0134 6.9272 7.0341 6.9985 6.8495 6.9777 6.9515 6.7816 6.9114 9.9363◦ 10.0327◦ 10.0550◦ 10.0268◦ 9.9580◦ 9.9064◦ FAILED 7.0130 6.9271 7.0343 6.8633 6.7924 7.1617 9.9679◦ 9.9924◦ 10.0140◦ FAILED 6.9693 7.0948 7.0126 ◦ ◦ ◦ 9.6314 9.8377 10.2273 6.9840 7.0960 7.0080 FAILED
the magnitude of the synthetic motion until all of the methods fail. With this experiment, we can compare the convergence range for each registration scheme. In all the methods, we use the sequential quadratic programming technique to estimate the optimal motion. From Table 3, we find that the convergence range of traditional MI and NMI is estimated at (12◦ , 12◦ , 14◦ , 12, 12, 12) and (15◦ , 15◦ , 15◦ , 15, 15, 15) respectively, while our new algorithm has a much larger capture range at (32◦ , 32◦ , 25◦ , 31, 31, 31). It is evident from this experiment that the capture range for reaching the optimum is significantly larger for CCRE when compared with MI and NMI in the presence of noise. 3.3
Real Data 3D Rigid Motion
In this section, we present the performance of our method on data containing real rigid misalignments. The results are compared to ground-truth registration obtained semi-automatically by an ”expert”, whose registrations are currently used in clinical practice at University hospital. For the purpose of comparison, we also apply traditional MI implemented as was presented in Collignon et al. [5], to these data. We tested our algorithm and the MI based technique on MRCT data from eight different subjects. The MR-CT pairs were misaligned due
A New & Robust Information Theoretic Measure
397
Table 3. Comparison of the convergence range of the rigid registration between CCRE and other MI schemes for fixed noise standard deviation 7. GroundTruth 7◦ 7◦ 7◦ 5 5 5 12◦ 12◦ 15◦ 12 12 12 15◦ 15◦ 15◦ 15 15 15 32◦ 32◦ 25◦ 31 31 31 33◦ 33◦ 33◦ 33 33 33
CCRE traditional MI normalized MI 6.970◦ 6.912◦ 7.014◦ 6.973◦ 7.003◦ 7.057◦ 6.955◦ 7.186◦ 6.954◦ 4.952 5.001 5.073 5.003 4.942 5.069 5.015 4.931 5.097 11.961◦ 12.033◦ 15.055◦ 12.025◦ 12.980◦ 11.990◦ FAILED 12.013 11.921 12.034 11.843 12.094 11.916 15.031◦ 15.008◦ 15.012◦ FAILED 14.963 14.955 15.020 ◦ ◦ ◦ 32.086 32.034 25.092 32.037 31.932 31.967 FAILED
to the motion of the subject. The CT image is of size (512, 512, 120) while the MR image size is (512, 512, 142), and the voxel dimensions are (0.46, 0.46, 1.5) and (0.68, 0.68, 1.05) for CT and MR respectively. Table 4 summarizes the results of the comparison for the eight data sets. The row labeled ”True” in the table shows the ground truth rotation and translation parameters(as assessed semi-automatically by the local expert) of each data set. Both rotation and translation parameters are in (x,y,z) order. The row labeled, ”CCRE Err.” depicts the absolute error between the ground truth and the estimated parameters using our method, while ”MI Err.” indicates the corresponding errors for the MI method. As evident, our (CCRE-based) algorithm has achieved higher accuracy in the registration of these eight dataset. For most of the cases, the average error of rotation angle is less than 0.5◦ . and the translation error is less than 0.5mm for our algorithm. The maximum error of six parameters are (0.412◦ , 0.366◦ , 1.0188◦ , 0.2519mm, 0.5110mm, 0.5471mm), while for traditional MI, the maximum errors in the parameters are (0.795◦ , 1.177◦ , 2.681◦ , 1.060 mm, 1.525mm, 0.584mm), which shows that our algorithm is more reliable than the MI based alignment algorithm. In the large motion cases, we observed that the CCRE converges very fast as compared to the traditional MI scheme where we need to resort to a multi-resolution implementation to find the solution in a coarse-to-fine framework and this takes extra computational effort to get the two images aligned. 3.4
Real Data Affine Motion
In this section, we demonstrate the experimental results for the affine motion estimation. The data we used in our experiments is a pair of MR images of mouse brain. The source image is (46.875 × 46.875 × 46.875) micron resolution with the field of view (2.4 × 1.2 × 1.2cm), while the target is 3D diffusionweighted image with (52.734 × 52.734 × 52.734) micron resolution with the field of view (2.7 × 1.35 × 1.35cm). Both the images have the same acquisition matrix (256 × 512 × 256).
398
F. Wang et al. Table 4. 3D rigid motion estimates for eight MR-CT data sets. Set 1
Item
Rotation (degree)
Translation (mm)
True (5.455 − 1.146 − 14.003) (3.822 9.254 3.1094) CCRE Err. (0.257 0.247 0.606) (0.076 0.364 0.124) MI Err. (0.714 0.755 0.004) (0.013 0.223 0.441) 2 True (1.765 − 2.023 − 12.284) (6.661 2.340 6.280) CCRE Err. (0.297 0.316 1.018) (0.252 0.108 0.053) MI Err. (0.453 0.025 0.109) (0.401 0.002 0.130) 3 True (−5.099 7.357 − 18.472) (9.790 − 0.901 − 0.228) CCRE Err. 0.040 0.272 0.419) (0.072 0.168 0.212) MI Err. (0.137 0.533 0.218) (0.127 0.297 0.012) 4 True (−5.581 − 2.865 − 21.675) (10.561 − 4.306 18.874) CCRE Err. (0.317 0.092 0.359) (0.165 0.283 0.145) MI Err. (0.539 0.254 1.025) (0.121 0.253 0.415) 5 True (−2.498 7.540 − 23.737) (3.249 2.425 3.734) CCRE Err. (0.298 0.238 0.414) (0.001 0.0338 0.485) MI Err. (0.795 1.313 1.479) (0.334 0.115 0.584) 6 True (−1.381 − 3.386 − 30.028) (0.6022 7.4366 − 7.125) CCRE Err. (0.155 0.060 0.111) (0.133 0.148 0.415) MI Err. (0.031 0.498 0.195) (0.173 0.191 0.313) 7 True (13.922 − 3.965 − 18.719) (8.700 7.328 − 22.421) CCRE Err. (0.412 0.1145 0.472) (0.251 0.395 0.371) MI Err. (0.487 1.461 1.833) (1.060 0.818 0.179) 8 True (14.024 3.569 − 21.778) (0.120 12.970 − 9.870) CCRE Err. (0.063 0.3662 0.515) (0.082 0.5110 0.547) MI Err. (0.115 1.177 2.681) (0.500 1.525 0.770) M ean CCRE Err. (0.230 0.213 0.489) (0.129 0.251 0.294) MI Err. (0.443 0.654 0.886) (0.322 0.492 0.331) S T D CCRE Err. (0.131 0.111 0.258) (0.089 0.164 0.184) MI Err. (0.269 0.503 0.880) (0.296 0.485 0.241) Figure 3.4 shows the registration results for the datasets. As is visually evident, the misalignment has been fully compensated for after the application of the estimated affine deformation.
4
Summary
In this paper, we presented a novel measure of information that we dub cumulative residual entropy (CRE). This measure has several advantages over the traditional Shannon entropy whose definition is based on probability density functions which need not necessarily exist for some random variables. In addition, CRE can be easily computed from the sample data and these computations asymptotically converge to the true value. Unlike the Shannon entropy, the same CRE definition is valid for both discrete and continuous domains. We defined a new measure of match called cross-CRE (CCRE) and applied it to estimate the parameterized misalignments between 3D image pairs and tested it on synthetic as well as real data sets from multi-modality (MR T1 and T2 weighted, MR
A New & Robust Information Theoretic Measure
399
Fig. 3. Affine registration of an MR-T1 & MR-DWI mouse brain scan. Left to Right: an arbitrary slice from the source image, a slice of the transformed source overlayed with the corresponding slice of the edge map of the target image and the target image slice.
& CT, MR T1 and DWIs) imaging sources. Comparisons were made between CCRE and traditional MI and normalized MI both of which were defined using the Shannon entropy. Experiments depicted significantly better performance of CCRE over the MI and NMI-based methods currently used in literature. Our future work will focus on extending the class of transformations to non-rigid motions. Acknowledgements. Authors would like to thank, Dr. S.J. Blackband, Dr. S. C. Grant both of the Neuroscience Dept., UFL and Dr. H. Benveniste of SUNY, BNL for providing us with the mouse DWI data set, Dr. F. Bova of the Neurosurgery Dept. of UFL for providing us with the MR-CT pairs. SJB and SCG were supported by NIH-NCRR P41 RR16105 and HB was supported by NIH-RO1 EB00233-04.
References 1. D.P. Bertsekas, Nonlinear programming, Athena Scientific. 2. Simulated brain database [Online]. http://www.bic.mni.mcgill.ca/brainweb/ 3. C. Chefd’Hotel, G. Hermosillo and O. Faugeras, “A variational approach to multimodal image matching,” in IEEE Workshop on VLSM, pp. 21–28, 2001, Vancouver, BC, Canada. 4. Y. Chen, M. Rao, B. C. Vemuri and F. Wang, “Cumulative residual entropy, a new measure of information,” Institute of Fundamental Theory, Technical Report IFT-Math-01-02, University of Florida, Gainesville, Florida. 5. A. Collignon, F. Maes, D. Delaere, D. Vandermeulen, and P. S. ang G. Marchal, ”Automated multimodality image registration using information theory,” Proc. IPMI,Y.J.C.Bizais, Ed., pp. 263–274,1995. 6. Thomas M. Cover, Joy A. Thomas, Elements of Information Theory, John Wiley and Sons, 1991. 7. J. N. Kapur, Measure of Information and their Applications, John Wiley & Sons Inc. 8. S. H. Lai and M. Fang, “Robust and efficient image alignment with spatially varying illumination models,” in IEEE CVPR 1999, pp. 167–172.
400
F. Wang et al.
9. J. Liu, B. C. Vemuri and J. L. Marroquin, “Local frequency representations for robust multi-modal image registration,” IEEE Trans. on Medical Imaging, Vol. 21, No. 5, 2002, pp. 462–469. 10. M. Leventon and W. E. L. Grimson, “Multi-modal volume registration using joint intensity distributions,” in MICCAI 1999. 11. J.B. Maintz and M. A. Viergever,“A Survey of Medical Image Registration,” MedIA Vol. 2, pp. 1–36,1998. 12. A. Roche, G. Mandalain, X. Pennec and N. Ayache, “The correlation ratio as new similarity metric for multi-modal image registration, in MICCAI’98. 13. D. Ruckert, C. Hayes, C. Studholme, M. leacha nd D. Hawkes, “ Non-rigid registration of breast MRI using MI,” in MICCAI98. 14. C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, pp. 379–423 and 623–656, July and October, 1948. 15. C. Studholme, D. L. G. Hill and D. J. Hawkes, “An overlap invariant entropy measure of 3D medical image alignment,” Pattern Recognition, Vol. 32, pp. 71– 86,1999 16. R. Szeliski,J. Coughlan, ”Spline-based image registration,” IJCV, v.22 n.3, p.199– 218, March/April 1997 17. B. C. Vemuri, S. Huang, S. Sahni, C. M. Leonard, C. Mohr, R. Gilmore and J. Fitzsimmons, “An efficient motion estimator with application to medical image registration”, Medical Image Analysis, Oxford University Press, Vol.2, No. 1, pp. 79–98, 1998 . 18. P. A. Viola and W. M. Wells, ”Alignment by maximization of mutual information,” in Fifth ICCV, MIT, Cambridge, MA, pp. 16–23, 1995
Gray Scale Registration of Mammograms Using a Model of Image Acquisition Peter R. Snoeren and Nico Karssemeijer Department of Radiology, University Medical Center Nijmegen, PO Box 9101, 6500HB Nijmegen, The Netherlands. [email protected]
Abstract. A parametric technique is proposed to match the pixel-value distributions of two mammograms of the same woman. It can be applied to mammograms of the left and the right breast, or, more effectively, to temporal mammograms, e.g., from two screening rounds. The main reason to match mammograms is to lessen irrelevant differences between images due to acquisition: by varying breast compression, different film types, et cetera. Firstly, a technique like this might reduce the radiologist’s efforts to detect relevant differences like abnormal growth in breast tissue that signals breast cancer. Secondly, though not the aim of this study, applications might be found in subtraction radiology or in the computer aided detection of abnormalities in temporal mammograms. Instead of arbitrarily shifting and/or scaling the pixel-values of one image to match the other or directly mapping one histogram to the other, the proposed method is based on general aspects of acquisition. This encompasses (1) breast compression; (2) exposure time; (3) incident radiation intensity; and, (4a) film properties and digitization for screen-film mammograms, or (4b) detector response for unprocessed digital mammograms. The method does not require a priori knowledge about specific settings of acquisition to match histograms; the degrees of freedom are estimated from the pixel-value distributions of the two mammograms themselves. By the method it is possible to match digitized screen-film mammograms (in the next also referred to as analog mammograms) as well as unprocessed digital mammograms in any of the four possible combinations: analog to analog, analog to digital, digital to analog, and digital to digital.
1
Introduction
In radiology the comparison of images obtained in subsequent examinations of a patient often is an important part of the diagnostic procedure. These comparisons are made to detect interval changes indicating lesion growth, to monitor progression of a disease, or to estimate the effect of treatment. Sometimes, temporal images are subtracted to enhance areas where differences occur. In conventional radiology the review of temporal image pairs may be seriously hampered by differences in acquisition. To some extent, positioning changes can be dealt with by geometric registration algorithms, the development of which received a C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 401–412, 2003. c Springer-Verlag Berlin Heidelberg 2003
402
P.R. Snoeren and N. Karssemeijer
lot of attention in recent years [1][2]. However, changes of exposure and detector systems may also reduce the effectiveness of temporal comparisons, because of the non-linear gray scale changes they may induce. One area where comparison with prior imaging plays a major role is breast cancer screening. Studies have shown that the use of prior mammograms in screening effectively reduces the number of false positive referrals [3][4] . This is due to the fact that the use of priors allows radiologists to distinguish lesions that grow from normal dense structures in the breast that somehow look suspicious. Differences between screen-film systems and exposure may cause subsequent mammograms to appear dramatically different, which is annoying for the radiologists and may reduce their performance. These differences cannot be corrected during display as long as conventional alternators are used for reading. With the introduction of digital mammography and dedicated mammographic workstations, however, the problem of display optimization can be addressed properly. The scale of the problem can best be understood from an example: In the Netherlands, where a nation-wide breast cancer screening program is carried out, about two million women in the age group 50 to 75 are invited once every two years for screening. With an attendance of 80% this means that about 800, 000 women have a screening mammography every year. Within the next years, all 65 screening units in the country will convert to digital mammography. During a two year transition period digital mammograms will need to be read in combination with the prior film-screen mammograms. The current plan is that all most recent priors will be digitized to allow soft-copy reading. Optimizing the display of these images to facilitate comparisons with digital and analog mammograms is the main focus of this paper. A model based method will be described that allows matching unprocessed digital mammograms to digitized film-based mammograms and vice versa. Although the objective is image display, it is remarked that the technique will also be important in the development of computer aided detection methods that make use of temporal information. Histogram matching is the generic term of all methods that match images with the same or similar content by adjusting look-up tables of pixel-values. Hence, histogram matching is typically a one-dimensional process. In general, histogram matching methods can be divided into two classes: (1) non-parametric methods that try to find tuples of gray values that match as close as possible; and, (2) parametric methods that, as the word says, require the estimation of parameters. The latter are often based on simple polynomial functions. The method, proposed here, is a member of the parametric matching methods. It has, though, an important difference with most other parametric methods. The philosophy behind the new method is that by understanding why the value of a pixel is different in different images, one is able to match two images in a more ‘natural’ way, while the other techniques are of a more ad hoc nature. Understanding the differences is, however, too weak: one should also be able to mathematically model the differences. This is exactly what will be done. In this paper we will model: compression of the breast (Panel III in Fig. 1); exposure
Gray Scale Registration of Mammograms
403
characteristic curves pdf of gray values A 0
II
I
A
IV
0
B
gray value (g )
optical density (d)
gray value (gA)
pdf of gray values B B
1 1 pdfs of log(exposure)
III A
B
log(exposure)
Fig. 1. Modeled steps of acquisition for screen-film mammography to go from one pixel-value on Image A (Panel I) to the corresponding pixel-value on Image B (Panel IV). The dotted line shows the path of one examplar pixel-value. Panels I and IV illustrate probability density functions (pdfs) of gray values; Panel II shows the two film characteristic curves of the films (log (exposure) plotted against optical density, i.e., blackening of the film); Panel III gives an exagerated transformation of log (exposure) (by compression and exposure time differences). Panels II and III are short-wired in Fig. 2. Instead of the separate steps, the gross transformation between pixel-values is given there.
histogram transformation
gray value (gB)
1
0
1
0
gray value (g ) A
Fig. 2. Histogram transformation between the Images A and B from Fig. 1.
404
P.R. Snoeren and N. Karssemeijer
time (Panel III); radiation intensity of the incident beam (Panel III); for analog images we also model: the characteristic film curve (Panel II), and the process of digitization (Panel II to I and Panel II to IV); and, finally, for unprocessed digital images we model the relation between exposure and pixel-value. By combining the chain of events in Fig. 1 we get the objective transformation for two screen-film images, which is depicted in Fig. 2. The many unknown acquisition parameters in the model condense to less degrees of freedom at the end. The separate acquisition parameters of the sub-models can therefore not be retrieved from the (fitted) parameters.
2
Methods
Given two pixel-value distribution functions dA (g) and dB (g), we try to find the look-up table TA,B (g) (or equivalently TB,A (g)), that maps a pixel-value in the first image, labeled by A, to a pixel-value in the second image, labeled by B. A standard approach is to demand that the cumulative distribution functions DA (g) and DB (g) are similar after the transformation. For image pairs without a global contrast reversal (analog → analog and digital → digital) this means DA (gA ) =
gA
γ=0
dA (γ) dγ
gB =TA,B (gA )
γ=0
dB (γ) dγ = DB (gB = TA,B (gA )) .
(1) For image pairs with a global contrast reversal (analog ↔ digital) the integration limits in the second line are from TA,B to the maximum pixel-value, i.e., DA (gA ) 1 − DB (gB = TA,B (gA )). Several ad hoc methods exist that match two histograms, e.g., by shifting the overall brightness, TA,B (g) = g − g0 , or by also changing the contrast, TA,B (g) = c (g − g0 ). The advantage of these is that they are mostly independent of the way the images are obtained. They can be applied to normal photographs, infrared images, or any other kind of image pairs of the same modality. But, in contrast to the proposed method, most ad hoc methods are not suited for histogram matching on a combination of a digital and an analog mammogram. The non-parametric methods are an exception on this. In the proposed method the transformation is based on a model of acquisition (Sect. 2.2). This model allows more natural and accurate registration of pixel-values (Sect. 2.3) than ad hoc registration methods. 2.1
Pre-processing: Geometric Registration
It is essential that the two histograms that are to be matched are based on the same tissue as good as possible. If this is not true, for example, if in one image the pectoral muscle is visible and in the other image it is not, then the histograms can never be correctly matched. The same is true for the background of the image. If in one image a larger part of the background is visible than in the other image then matching has no significance. This is solved by first
Gray Scale Registration of Mammograms
405
spatially mapping one image to the other image (geometric registration) by global translation, rotation, scaling, and shearing and subsequently selecting corresponding areas with tissue. Only the tissue that is simultaneously visible on both images is taken into account in the histograms. Once the pixel-value transformation between the images is determined (in Sect. 2.4 or Sect. 2.5), the entire image can be matched. In order to find corresponding tissue in the images it is not required that geometric registration is very accurate. Finding a rough outline is enough for that purpose. In Sect. 2.5 we, however, also wish a pixel-to-pixel correspondence, which makes geometric registration a somewhat more critical part of histogram matching. The demands would even be higher in subtraction radiology, but the before-mentioned affine transformations are sufficient as a pre-processing step for gray scale registration. Non-rigid registration goes farther than would be necessary. Geometric registration is performed by optimization of the entropy correlation coefficient [5] (or equivalently, the normalized mutual information). In the implementation the pixel-value range is down-sampled to 6-bits and the pixel-value histograms are equalized before registration. The entropy correlation coefficient was subsequently optimized on the 64 × 64-grid of pixel-value combinations. After a first pass on the (segmented) tissue area, a second pass was made on the corresponding regions with tissue for fine-tuning. 2.2
Acquisition
In the following, some of the most relevant steps in the acquisition that influence pixel-value differences are modeled. After that, combining the sequence of steps as visualized in Fig. 1 gives the functional relation between a pixel-value on one image with the pixel-value on the other image (see Fig. 2). Compression, Exposure Time, and Incident Radiation Intensity: For compression and incident radiation intensity, a simple approximation is used that takes exposure differences due to tissue thickness differences into account. In a model for mono-energetic X-rays, the intensity attenuation due to the breast tissue is related to breast thickness h (r) by i (r) = i0 exp (−µ (r) h (r)) , where i0 is the radiation intensity of the incident beam, and where µ (r) is the mean attenuation per unit of length for X-rays at location r. This relation is known as Beer’s law [6]. The exposure at location r on the film is obtained by multiplication of radiation intensity i (r) by the exposure time t, i.e., b (r) = t i0 exp (−µ (r) h (r)) .
(2)
The relation between the exposure in one image, bA (r), and the exposure in the other image, bB (r), is obtained by assuming that the attenuation coefficient remains unchanged on corresponding locations of the images. We, furthermore,
406
P.R. Snoeren and N. Karssemeijer 4.5 4 3.5
optical density
3 Kodak MIN−R2000
2.5 2
Agfa HDR 1.5 1 0.5 0
0
0.5
1 1.5 log (exposure)
2
2.5
Fig. 3. Some empirically obtained sensitometric curves with their best fitting logistic functions (see (4)).
assume that the fraction hB (r) /hA (r) does not depend on location r. If projection angle differences of X-rays are small then at least for the interior part of the breast this can be done, because the breast is compressed between two parallel compression plates. Also for the region near the breast edge, where the breast bulges out, this assumption is reasonable, because a proportional decrease of thickness will happen in both images. After µA (r) is equated to µB (r), the following linear relation is obtained ln bB (r) =
hB hB ln bA (r) + ln (tB i0,B ) − ln (tA i0,A ) . hA h A
unknown constant
∆h
unknown constant
(3)
∆b0
If the acquisition settings are similar then ∆h 1 and ∆b0 0, and thus bA bB . We actually do not need ∆h and ∆b0 in the next, because they will be absorbed by newly to define parameters of the transformation model. Screen-Film Mammography: Characteristic Film Curve. The characteristic curve, also called the HD-curve after Hurter and Driffield [7], gives the relationship between the log (exposure) = log b (r) and the corresponding optical density, od, after development of the film (see Panel II in Fig. 1). In this paper the logistic function is used as a model for the characteristic film curve: od (b) = odmin +
odmax − odmin . 1 + e−β(ln b−α)
(4)
The characteristic curves are drawn for two types of film in Fig. 3 along with their best fitting logistic functions (in the least squares sense).
Gray Scale Registration of Mammograms
407
Digitization. It is assumed that the relation between optical densities, od, and pixel-values, G, is linear (for n-bits digitization: 0 ≤ G ≤ 2n − 1). Some digitizers produce pixel-values that have an exponential relation with optical density, i.e., a linear relation with luminance. However, after converting the pixel-values to a linear relation with optical densities, the following is still valid. For convenience, the digitization range is defined relative to the dynamic range of the characteristic curve, i.e., relative to the range odmin to odmax . We define odmin + ν (odmax − odmin ) as the maximum value of the optical density that is digitized, i.e., with the minimum pixel-value, and we define odmin +(ν − φ) (odmax − odmin ) as the minimum value of the optical density that is digitized, i.e., with the maximum pixel-value. Both new parameters, ν and φ, are larger than zero, and if the dynamic range of the characteristic curve is precisely digitized then they are both one. The linear relation between normalized pixel-values, g, and optical densities, od, is given by G 1 od − odmin g (od) = n ∈ [0, 1] . (5) = ν− 2 −1 φ odmax − odmin The reason for the seemingly odd choice of ν and φ becomes clear after (4) is substituted into (5), which gives the combined film-digitizer characteristic curve 1 1 g (b) = ν− . (6) φ 1 + e−β(ln b−α) Note that the dynamic range parameters, odmin and odmax , canceled out. By incorporating digitization ranges not all pixel-values are legal anymore in the model. All pixel-values with ν − g φ ∈ [0, 1] correspond to optical densities in the dynamic range of characteristic curve, all other pixel-values should actually not exist. If ν − φ > 0 or ν < 1 then not the whole dynamic range of the characteristic curve is digitized. In those cases we should reckon with clipping of large and small pixel-values, respectively. Digital Mammography: For digital mammography, the relation between exposure and pixel-value is mostly in good approximation linear for a wide exposure range. Instead of (6) we have g (b) = 2.3
G = γ b ∈ [0, 1] . 2n − 1
(7)
Look-up Table for Histogram Matching
The subsequent steps a histogram transformation between two screen-film inverse digitization → optical denmammograms are (see also Fig. 1): pixel-value A compression, et cetera inverse film curve film curve sity A → log (exposure) A → log (exposure) B → digitization optical density B → pixel-value B. Combining (3) and (6) gives for two screen-film Images A and B
408
P.R. Snoeren and N. Karssemeijer analog−analog
digital−digital
digital−analog
1
g (digital)
g (analog)
(a)
(c) (e)
B
B
(d)
(c)
(d)
gB (analog)
(a) (e)
(c)
(b)
(b) (a)
(b)
0
0
1
0
g (analog) (a) λ=5.0, σ=1.0 (b) λ=0.2, σ=1.0 (c) λ=1.0, σ=2.0 (d) λ=1.0, σ=0.5 (e) λ=1.0, σ=1.0
1
0
1
g (digital)
A
g (digital)
A
A
(a) λ=2.0, σ=1.0 (b) λ=0.5, σ=1.0 (c) λ=1.0, σ=2.0 (d) λ=1.0, σ=0.5 (e) λ=1.0, σ=1.0
(a) λ=0.1, σ=−1.0 (b) λ=0.1, σ=−2.0 (c) λ=0.1, σ=−4.0
Fig. 4. Some example transformations for a few parameter settings. The full dynamic range of the characteristic curves is digitized for analog images, i.e., ν = φ = 1. From left to right: transformations between two screen-film mammograms (see (8)); transformations between two digital mammograms (see (9)); and, transformations between a digital mammogram and a screen-film mammogram (see (10)).
1 1
νB − gB = TA,B (gA ) = φB 1 + λ νA −φ1A
σ ,
gA
−1
(8)
where the new parameters λ and σ contain previously defined acquisition parameters. Similarly, for digital → digital histogram transformations we have by (3) and (7) σ gB = TA,B (gA ) = λgA , (9) for digital → analog histogram transformations we have by (3), (6) and (7) gB = TA,B (gA ) =
1 φB
νB −
1 σ 1 + λgA
,
(10)
and finally, for analog → digital histogram transformations we have by (3), (6) and (7) σ 1 −1 . (11) gB = TA,B (gA ) = λ νA − φA gA Although the same symbols λ and σ are used in the previous four equations, their meaning and valid ranges differ. For screen-film images these relations are valid when the pixel-values correspond to optical densities in the dynamic range of the characteristic curve, i.e., when 0 ≤ ν − φ g ≤ 1, otherwise clipping of pixel-values takes place.
Gray Scale Registration of Mammograms
2.4
409
Matching by Restricted Model
In this section we suppose that only the dynamic range of the characteristic curve is digitized. We therefore set νA = φA = νB = φB = 1 and only estimate the values of σ and λ. The estimate is used as a first guess for the iterative process that is described in Sect. 2.5. It is remarked that if the digitizer calibration and the minimum and maximum optical density of the screen-film system is known, pixel-values can be rescaled before processing to meet this requirement. Equation 1 is used to estimate the two unknown parameters λ and σ by demanding that it is simultaneously fulfilled for two quantiles. For, let us say, the 25% and 75% percentiles, we have the following system of equations:
g (25%) = TA,B g (25%) ; B A
(12) g (75%) = TA,B g (75%) . B A The 25% and 75% percentiles are just exemplar, and can be changed into any other combination of non-equal quantiles. In case of analog ↔ digital transformations, the cumulative distribution function of one image should be inverted before the quantiles are determined to take care of the opposite sign in contrasts (P %) (100%−P %) (see also the comment after equation (1)). Hence, gB → gB . Note that all four transformations can be written in the same form σ
gB gA = λ , analog A → analog B (by (8)); 1−gB 1−gA g σ = λg , digital A → digital B (by (9)); B A gB σ = λg , digital A → analog B (by (10)); 1−gB σ
A g = λ gA , analog A → digital B (by (11)). B
1−gA
If we substitute the monotonically increasing functions
g for analog images; ξ = ln 1−g ξ = ln (g) for digital images,
(13)
(where we omitted the subscript labels A and B, and the superscript labels (25%) and (75%)), then the general solution of (12) for all four transformations is given by (25%) (75%) (25%) (75%) λ = exp ξB ξA(75%) −ξA(25%) ξB ; ξA −ξA (14) (75%) (25%) σ = ξB(75%) −ξB(25%) . ξA
2.5
−ξA
Matching by Entire Model
We stated before that histogram matching is typically a one-dimensional process. This is, however, only partly true here. It is true that a one-dimensional look-up table is developed, but the development is based on a two-dimensional distribution of joint pixel-values.
410
P.R. Snoeren and N. Karssemeijer
The joint distribution of pixel-value pairs should ideally have all its mass on a one-dimensional curve after geometric registration. Moreover, this curve should be a monotonously increasing (or decreasing) function. This is the direct consequence of (1). Actually, this curve describes the transformation (gA , TA,B (gA )) that we are trying to recover. In the ideal case histogram matching is thus indeed a one-dimensional process. In practice, however, the joint distribution has scattered mass around the curve we would ideally expect. This is due to errors in geometric registration, but also due to physical changes that occurred between the times that the mammograms are taken. If the geometric registration is reliable then the models in (8) to (11) can be fitted to the distribution of pixel-value pairs. This imposes larger demands on the quality of geometric registration than in the previous section. Geometric registration was only used there to define the regions of overlap between the images and it was supposed that within these regions tuples of pixels can be registered in principle; the tuples themselves were not explicitly needed. Now, the joint distribution of pixel-value pairs is used to fit the model and we require more of registration. Although in the geometric registration only affine transformations are used (Sect. 2.1), we found that the joint distribution of pixel-values is mostly precise enough for fitting our model to it. Better results are expected, however, when non-rigid registration of the images is employed. This is how the matching is implemented. After geometric registration, a two-dimensional distribution, dA,B (gA , gB ), was constructed by sampling pixelvalues gA and gB of the registered pixels. The transformation TA,B (gA ) (either one of (8) to (11)) was subsequently fitted to this distribution by minimizing the sum of squared errors: arg min parameters of
TA,B gA gB
2
dA,B (gA , gB ) (gB − TA,B (gA )) .
(15)
This requires an iterative optimization routine, in which we used the results of the previous section for a first guess of parameters.
3
Results
In Fig. 5, some typical results are given for two types of transformations; between an unprocessed digital image and a screen-film image, and between two screen-film images. Figure 6 shows the transformation curves and the cumulative distribution functions for these two examples. Given the simplicity of the model, it is conspicuous how well the theoretic transformation curves are able to transform the pixel-values of one image to the other. This says something about the correctness of the underlying assumptions. On the other hand, it is worrying how close the digitization ranges are fitted to extremely small and large pixel-values. By the lack of many clipped pixel-values in the data, the merit-function misuses all its freedom to scale pixel-values.
Gray Scale Registration of Mammograms
411
Fig. 5. Example transformations. The first row gives the transformation of an unprocessed digital image (left) to a screen-film image made at the same day (right); the second row gives the transformation between two screen-film images (made with an interval of two years in between). See also Fig. 6. digital → analog T
analog → analog
cdf(g )
A,B
T
B
cdf(g )
A,B
B
1
1
0
gB
1
gB
1
0 0
1
1
0 0
1
0
1
cdf(gA)
1
cdf(gA)
1
0 0
0
0 0
1 gA
0
1 gA
Fig. 6. Distribution functions of pixel-values for the two transformations in Fig. 5. The cumulative distribution functions (cdfs) of Images A are given in the lower-left quadrants; the cdfs of Images B are given in the upper-right quadrants. The dotted line is the cdf of the transformed Image A; the transformations TA,B are given in the upper-left quadrants.
412
4
P.R. Snoeren and N. Karssemeijer
Conclusion
A method of parametric histogram matching for pairs of mammograms is proposed. The technique is developed for combinations of unprocessed digital and screen-film mammograms The method is completely based on a model of acquisition, which makes it rather insusceptible for differences between the images that are not due to acquisition, while other differences do not greatly affect the quality of transformed images in a negative way. It is yet unclear what the impact of the gray scale registration will be on the ability of radiologists to read and compare mammograms of different image modalities. Without gray scale registration it is likely that a trained radiologist will be able to naturally make the necessary adjustments given an initial intensity inversion, windowing and gamma correction of the digital mammogram. The main advantage of histogram matching is that less manual adjustments are required to optimally display the mammograms. The same adjustments that are required to optimally display one image probably also display the other image in an optimal way.
References 1. Sallam, M.Y., Bowyer, K.W.: Registration and difference analysis of corresponding mammogram images. Medical Image Analysis, 3(2) (1999) 103–118 2. Wirth, M.A., Narhan, J., Gray, D.: Non-rigid mammogram registration using mutual information. Proc. SPIE Medical Imaging 2002: Image Processing, vol 4684 (2002) 562–573 3. Thurfjell, M.G., Vitak, B., Azavedo E., Svane G., Thurfjell E.: Effect on sensitivity and specificity of mammography screening with or without comparison of old mammograms. ACTA Radiologica, 41(1) (2000) 52–56 4. Burnside, E.S., Sickles, E.A., Sohlich R.E., Dee K.E.: Differential value of comparison with previous examinations in diagnostic versus screening mammography. American Journal of Roentgenology. 179(5) (2002) 1173-1177 5. Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Suetens, P.: Multimodality image registration by maximisation of mutual information. IEEE Transactions on Medical Imaging, 16(2) (1997) 187–198 6. Barrett, H.H., Swindell, W.: Radiological Imaging: the Theory of Image Formation, Detection, and Processing. Vol. I and II (1981), Academic Press, Inc 7. Hurter, F. & Driffield, V.C.: Photo-chemical investigations and a new method of the sensitiveness of photographic plates. The Journal of the Society of Chemical Industry. (1890) 455–469
Constructing Diffeomorphic Representations of Non-rigid Registrations of Medical Images Carole Twining and Stephen Marsland Imaging Science and Biomedical Engineering, University of Manchester Manchester M13 9PT, U.K.
Abstract. The analysis of deformation fields, such as those generated by non-rigid registration algorithms, is central to the quantification of normal and abnormal variation of structures in the registered images. The correct choice of representation is an integral part of this analysis. This paper presents methods for constructing a general class of multidimensional diffeomorphic representations of deformations. We demonstrate that these representations are suitable for the description of deformations of medical images in 2 and 3 dimensions. Furthermore, we show that the non-Euclidean metric inherent in this representation is superior to the usual ad hoc Euclidean metrics in that it enables more accurate classification of legal and illegal variations.
1
Introduction
Non-rigid registration algorithms [4,6,12,19] automatically generate dense (i.e., pixel-to-pixel or voxel-to-voxel) correspondences between pairs and sets of images with the aim of aligning analogous ‘structures’. The deformation fields implicit in this correspondence contain information about the variability of structures across the set. In order to analyse quantitatively this variability, we need to be able to analyse the set of deformation fields. Such analysis must be based (either implicitly or explicitly) on a particular mathematical representation of the deformation field. Previous work on the analysis of shape variability has used a range of representations; examples include polygonal [8] or spline [2] representations based on a small set of corresponding points (landmarks), Fourier representations [20] or spherical harmonics [3], medial based representations [17], or combinations of these [21]. The importance of the choice of representation has been demonstrated by the fact that explicitly optimizing the representation can lead to improved model performance [9]. Recent work on modelling dense 2D and 3D deformation fields has either used the densely-sampled deformation vectors directly (e.g., [13, 15]), or has employed a smooth, continuous representation of them (e.g., [18]). Neither of these methods guarantees that the deformation field is diffeomorphic (although B splines can be guaranteed diffeomorphic given certain non-trivial constraints on the control-point displacements [7]).
Joint first authors. Email: {carole.twining,stephen.marsland}@man.ac.uk
C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 413–425, 2003. c Springer-Verlag Berlin Heidelberg 2003
414
C. Twining and S. Marsland
We contend that the appropriate representation should be continuous and diffeomorphic, as only a diffeomorphic representation allows an unambiguous one-to-one correspondence between all points in any pair of images. Where such a correspondence is not actually physically meaningful (e.g., in the case where additional structures such as tumours appear), this should be indicated by the warp parameters assuming atypical values. When we are considering the correspondence between discrete and bounded objects such as brains, it is also desirable that the warps themselves should be discrete and bounded. This leads us to suggest that a suitable representation is that of the group of continuous diffeomorphisms with some appropriate set of boundary conditions. Such a representation can be constructed using an approach based on Geodesic Interpolating Splines (GIS) [5]. In previous work [5,16,23] it has been shown that this approach also allows the construction of a metric on the diffeomorphism group. In this paper, we demonstrate the construction of these diffeomorphic representations using a variety of spline bases. We show that these representations generate warps that are suitable for the task in hand, giving biologically ‘plausible’ warps in both two and three dimensions, whilst being of a relatively low dimensionality. We further study the significance of the metric (geodesic) distances between warps, and show that using it provides a measure of atypical variation that has greater discriminatory power than na¨ıve measures based on the ad hoc use of a Euclidean metric on the space of warp parameters.
2 2.1
The Geodesic Interpolating Spline Interpolating Splines
Consider a vector-valued spline function f (x), x ∈ Rn that interpolates between data values at a set of knotpoints {xi : i = 1 to N }, where f (xi ) = f i . We will restrict ourselves to the class of splines that can be expressed as the minimiser of a functional Lagrangian of the form: N 2 λi (f (xi ) − f i ) , (1) E [f ] = dx Lf (x) + i=1
Rn
where L is some scalar differential operator. The first term in the Lagrangian is the smoothing term; the second term with the Lagrange multipliers {λi } ensures that the spline fits the data at the knotpoints. The choice of operator L and boundary conditions defines a particular spline basis. The general solution can N be written in the form: αi G(x, xi ), (2) f (x) = g(x) + i=1
where: the affine function g is a solution of: Lg(x) = 0, the Green’s function G is a solution of: L† L G(x, y) ∝ δ(x − y),
(3) (4)
and L† is the Lagrange dual of L. In the table overleaf are a selection of commonly-used Green’s functions. Dn is used to denote the unit ball in Rn , and ∂Dn is its boundary.
Constructing Diffeomorphic Representations of Non-rigid Registrations
415
The choice of Green’s function depends on the boundary conditions and smoothness appropriate to the problem considered. For example, the CPS Green’s function is useful for discrete objects such as brains, whereas an image of knee cartilage would require asymptotically linear boundary conditions. Name
Dim
thin-plate [10] even (TPS) thin-plate [10] odd (TPS) biharmonic 2 clamped plate [1,23] (CPS) triharmonic 3 clamped plate [1] (CPS) Gaussian
2.2
L† L
Boundary conditions on f (x) asymptotically linear asymptotically linear f =f =0 on and outside ∂D2
(∇2 )2 (∇2 )2
G(x, y) x − y4−n log x − y x − y4−n
x − y2 A2 − 1 − log A2 , √ 2 2 x y −2x·y+1 A(x, y) = x−y (∇2 )3 f =f =0 x − y A + A1 − 2 , √ 2 2 on and outside ∂D3 x y −2x·y+1 A(x, y) = x−y 2 n exp − ∇ asymptotically exp −βx − y2 4β linear (∇2 )2
The Affine Function
Taking the general form of f (x) given above, and substituting into equation (1), we obtain: E [{αk }] = G(xi , xj )αi · αj = Gij αi · αj .1 (5) i,j
The spline coefficients {αi } and the affine function (see equation (3)) are then obtained by optimising this form of the energy function with the set of constraints g(xi ) = Gij αj − f i . For operators where the boundary conditions are that the deformation is asymptotically linear, the affine function is the general linear function; using the notation of Camion and Younes [5], this can be written (in n dimensions) in the form: x1 x2 . . . xn 1 1 1 1 .. µ µ Q= , g (xi ) = Qia γa , µ = 1, . . . , n; a = 1, . . . , n + 1, . x1N x2N . . . xnN 1 (6) where {xµi } are the coordinates of the ith knotpoint, and the affine parameters {γaµ } are combinations of the scaling parameter, the rotation angle(s), and the coordinate translations. Solving for {αi } and {γaµ }, we then obtain: µ −1 µ µ −1 fj − Qja γaµ .2 (7) γaµ = QT G−1 Q ab QTbi G−1 ij fj , αi = Gij For the case of the clamped-plate spline, we will take the affine function to be that defined in equation (3), with the boundary conditions being imposed after the affine alignment.
416
C. Twining and S. Marsland
Let us now consider using the splines defined above to represent the denselysampled deformation field of an image, where a point at original position x is warped to a position x + f (x). There will be some set of knotpoints (with associated initial and final positions) and knotpoint displacements for which the interpolated displacement field exactly matches our given displacement field at all sample points. We will call such a set of knotpoints and knotpoint displacements a representation of the deformation field f (x). However, such an interpolated deformation field is only guaranteed to be diffeomorphic in the limit of sufficiently small displacements (see Fig. 1, plots (iii) and (vi) for examples). 2.3
Geodesic Interpolating Splines
The usual approach to constructing a large-deformation diffeomorphism is to consider such a deformation as an infinite sequence of infinitesimal deformations [5,11,14,22]; that is, we have an infinite sequence of the spline-part generated by the Green’s function G, and an infinite sequence of infinitesimal affine transformations. We have a flow-time t, where our knotpoints now follow paths {xi (t); t ∈ [0; 1]}, with an associated energy (the generalisation of equation (5) ): 1 dt G (xi (t), xj (t)) (αi (t) · αj (t)) .
E [αk (t)] =
(8)
0
The spline parameters {αi (t)} and the affine parameters {γiµ (t)} are now related by the obvious generalisation of equation (7), where the displacements {fiµ } are dxµ (t)
i . Note that we no replaced by the velocities of the knotpoints viµ (t) = dt longer have an exact solution, since the knotpoint paths are only constrained at their end-points. We therefore have to numerically optimise the expression for the energy in equation (8), where the free variables are the knotpoint paths between their fixed end points. We use the optimisation scheme previously described in [23]. For an example of the curved optimised paths, see Fig. 1. As in section 2.2, we take the initial and final knotpoint positions to be a representation of the diffeomorphism generated by the interpolation. We denote such a geodesic interpolating spline (GIS) diffeomorphism by ω({xi (0)}, {xi (1)}). As was shown by Camion and Younes [5], who considered using the GIS for inexact landmark matching, the optimised value for the energy has the important property that it can be considered as the square of a geodesic distance function d on the group of diffeomorphisms. That is, Eopt (ω) = d2 (e, ω), where e is the identity element of the group. This distance function has the important property that it is invariant under (left-) multiplication by an arbitrary group element [5]. This means that, if we are considering the pairwise geodesic distances between a set of warps generated by non-rigid registration of a set of images, then the calculated distances are independent of the choice of reference image. We also note that this metric provides us with a principled way of defining warps that interpolate between any two given warps [23]; the flow in the space of diffeomorphisms gives a geodesic on the space of warps, and the geodesic distance allows us to calculate a warp on this geodesic halfway between the two initial warps.
Constructing Diffeomorphic Representations of Non-rigid Registrations
417
Fig. 1. Example random displacements with geodesic and non-geodesic spline interpolants. (i) Initial knotpoints positions (black circles), final positions (grey circles) and the knotpoint paths (grey lines). (ii) The GIS Gaussian warped image and knotpoints. (iii) The nongeodesic thin-plate spline warp. (iv) The rotated displacements. (v) The GIS Gaussian warp. (vi) The non-geodesic thin-plate spline warp.
As an example, we consider the diffeomorphisms generated by the motion of a small set of knotpoints with random displacements. We use the Gaussian Green’s function with a fixed value of the parameter β. The initial knotpoint positions are shown in plot (i) of Fig. 1. For each random set of knotpoint displacements, we also consider the same displacements, but with an added affine transformation of a rotation of π8 about the central point (shown in plot (iv) in the figure). This additional part of the transformation should not affect the computed geodesic distance of this warp from the identity as it is purely affine, and the additional affine part of the transformation should appear in the flow in the space of affine functions. The GIS warped versions of the grids are shown in plots (ii) and (v). For comparison, we also give the thin-plate spline interpolant (plots (iii) and (vi)) of the same deformation fields (i.e., NOT the geodesic interpolating spline). It can be seen clearly that both these non-geodesic warps fold and are therefore not diffeomorphic, in contrast to both of the GIS warps (plots (ii) and (v)), which do not fold, despite the large knotpoint displacements. We compared the computed geodesic distances for a set of 98 random displacements with and without a rotation. The computed geodesic distances ranged from 0.05 to 0.9 across the set of examples. It was found that the geodesic distances with and without rotation were almost identical for all 98 examples, with a mean fractional difference between the values of less than 2%, and a correlation coefficient between the values of 0.9982; we would not have expected the values to be absolutely identical, given the numerical errors inherent in the finite-dimensional representation of the knotpoint paths.
418
C. Twining and S. Marsland
Fig. 2. An example 2-D brain slice with the bounding circle.
3 3.1
Fig. 3. Left: Annotation (white line) and knotpoints (white circles) on the original brain slice. Right: The same knots positioned on another brain slice.
Representing Diffeomorphisms Representing 2D Diffeomorphisms
When considering warps of 2D biological images, it is obviously important that the generated warps are not only diffeomorphic, but also biologically plausible. To investigate this, we considered a set of 2D T1-weighted MR axial slices of brains, where the slices chosen show the lateral ventricles. For each image, the positions of the lateral ventricles and the skull were annotated by a radiologist using a set of 163 points. We took a subset of 66 of these points to be the positions of our knots (see Fig. 3). Given a pair of images, the knotpoint positions on the images give us the initial and final positions of our knotpoint paths. We then calculated the geodesic interpolating spline warp corresponding to these positions using the 2D clamped-plate spline as Green’s function. The bounding circle for the spline is as shown in Fig. 2. Note that we did not affinely align the knots before calculating the warp; hence the algorithm has to deal with a non-trivial pseudo-affine part. Including the affine part would have made the task easier. Example results are shown in Fig. 4. It can be seen that the warps are indeed diffeomorphic, and appear to be very smooth – each of the brain slices still looks biologically plausible, which would not be true had a simpler scheme been used. The warped images are not resampled – the images are instead plotted as coloured surfaces, so that the size and position of each warped pixel is retained. The pairs of images were chosen to illustrate cases where the required deformation was considerable, both in terms of the change in shape of the ventricles and skull, and in terms of the difference in scale and orientation of the slice as a whole. The resultant warped images do indeed appear to be biologically plausible, despite the relatively low dimensionality of the representation used – structures other than the labelled ones have been brought into approximate alignment. This suggests that a dense correspondence (for instance, one given by a non-rigid registration using maximisation of mutual information) could also be represented by these warps without an inordinate increase in the dimensionality of the representation.
Constructing Diffeomorphic Representations of Non-rigid Registrations
419
Fig. 4. Three examples of warp interpolation using the clamped-plate spline. Pixel intensity is unchanged, but note that the image structures are approximately aligned. Left: Source image, Centre: Warped image, Right: Target image.
3.2
Representing 3D Diffeomorphisms
We have shown that this geodesic interpolating spline basis can generate biologically plausible warps in 2D; we now proceed to show that we can also do the same in 3D. Furthermore, we need to show that given a warp, we can
420
C. Twining and S. Marsland
Fig. 5. Target (bottom) source (top) hippocampi knotpoints (black circles). correspondence between shapes is indicated by shading.
and with The the the
Fig. 6. Distribution of point discrepancy between source and target (grey bars), and warped source and target (white bars). Units as in Fig. 5.
Fig. 7. The maximum, median and mean square discrepancies (in units of the initial discrepancy), for non-knot points only, as a function of the number of knots. Data is shown from 4 randomly selected pairs of hippocampi.
choose the knotpoints appropriately. To do this, we consider a set of segmented hippocampi. Each example consists of a triangulated surface with 268 vertices, where the vertices for each example have been manipulated to give the optimised correspondence [9]. Examples are shown in Fig. 5. Pairs of hippocampi were chosen at random, and the 2 shapes aligned using generalised Procrustes analysis. We used the triharmonic clamped-plate spline (see section 2.1) as our GIS basis, each hippocampus being scaled to fit within the unit sphere. The required warp between source and target was calculated iteratively – the warp was optimised for a given set of knotpoints, then new knotpoints added and the warp recalculated. New knotpoints are selected from the vertices using a greedy algorithm: the discrepancy between the vertices of the warped source and the target are calculated, and new knotpoints are then selected from those vertices that have the largest discrepancies. Fig. 6 shows the distribution of the discrepancies between the aligned source and target, and the final warped source and target, for a set of 70 knotpoints.
Constructing Diffeomorphic Representations of Non-rigid Registrations
421
It can be seen that the distribution of discrepancies as a whole has been shifted towards smaller values. In Fig. 7, we show the maximum, median and meansquare discrepancies for non-knot points only as a function of the number of knotpoints for 4 random pairs of hippocampi. Note that the nature of our greedy algorithm for selecting knotpoints means that the maximum discrepancy is not guaranteed to decrease monotonically. However, all three graphs show that the algorithm quickly reaches a reasonable representation of the required warp, for a number of knotpoints that is approximately 25% of the number of vertices.
4 4.1
Modelling Discrete Structures A Simple Example
There are numerous examples in biological and medical images of cases where a pair of structures remain discrete, although the spacing between them varies considerably across a population, for example, the lateral ventricles in the brain. We constructed a simple dataset to investigate the problems associated with modelling such a variation, the basic idea of which is shown The in Fig. 8. Elements of the dataset comprise 6 points defining Fig. 8. two triangles, with the separation s between the two triangles distance s beconstrained to remain positive. A training set of 100 examples tween the diswas generated, with the separation s being chosen at random. crete structures A standard method for analysing such a dataset is to build a varies. statistical shape model (SSM) [8]; the training set was Procrustes aligned, and a linear SSM constructed. The model correctly displayed only one mode of variation; example shapes generated by the model are shown in Fig. 10. As was expected, this simple linear model can generate illegal shapes – when the model parameter went below a threshold of approximately −2.22 standard deviations, the triangles intersected, so that the structures were no longer discrete. This was not seen in the training set and therefore should not be allowed. We then calculated the CPS geodesic warps generated by taking the 6 points that define each pair of triangles Fig. 9. Geodesic distance versus as knotpoints and warping between examples model parameter for the discrete generated by the model for varying model pastructure dataset. rameter and the model mean shape. The CPS was a suitable choice of Green’s function because the triangles are discrete objects. The variation of the geodesic distance from the mean shape, plotted against the model parameter (which is equivalent to the Mahalanobis distance, a distance defined by imposing a Euclidean metric on the space of point positions) is shown in Fig. 9. Positive values of the model parameter, which correspond
422
C. Twining and S. Marsland
Fig. 10. Example shapes generated from the SSM model of the discrete structures dataset for varying model parameter (Mahalanobis distance). The change in scale as the model parameter varies is caused by the fact that the points have been Procrustes aligned.
to increasing separation of the triangles, show a relationship between model parameter and geodesic distance that is very close to linear, but for negative values of the parameter (decreasing separation), the geodesic distance diverges at the precise value of the parameter that corresponds to zero separation. The geodesic distance therefore allows us to differentiate between physical and non-physical variations in a way that na¨ıve linear models cannot. 4.2
Using the Geodesic Distance to Classify Variations
We now consider the role of the geodesic distance in classifying legal and illegal variations in real biological data. We take as our dataset the annotated outlines of the anterior lateral ventricles as used in section 3.1 in the axial brain slices. Each example consists of 40 knotpoints (see Fig. 11). The set of training examples was Procrustes aligned and then scaled to fit inside the unit circle. A linear SSM was built from this training set in the usual way. We then used this SSM to generate random example shapes. These examples were classified as legal if the outlines of both ventricles did not intersect either themselves or each other, and illegal otherwise. The training set of shapes are, by definition, legal. We then calculated the GIS warps, using the biharmonic CPS basis, between the classified set of shapes and the mean shape from the model. The geodesic distance from the mean is compared with the Mahalanobis distance from the mean in Fig. 12. It is immediately obvious that we cannot separate the legal and illegal shapes by using the Mahalanobis distance from the mean. However, using the geodesic distance, it is possible to construct a simple classifier (shown by the dotted grey line) that separates the two groups, with only one example shape being misclassified (the grey circle just below the line). Given that the Mahalanobis distance for the SSM is equivalent to a Euclidean metric on the space of point deformations, this again demonstrates the superiority of the GIS metric over an ad hoc metric. The correspondences that we have used in this example are a subset of the correspondences that we would expect to be generated by a successful non-rigid registration of the images. Increasing the density of points on the training shapes would have left the result for the Mahalanobis distance essentially unchanged. However, the result for the GIS warp would have improved, giving a greater separation between the two sets of shapes. This is because, in the limit where the lines become infinitely densely sampled, it is actually impossible to construct a diffeomorphism for which the lines cross, which would mean that the geodesic
Constructing Diffeomorphic Representations of Non-rigid Registrations
423
Geodesic Distance
4
3
2
1
0
0
1
2
3
4
Mahalanobis Distance
Fig. 11. Top: Examples from the training set. Bottom: Legal (left) and illegal (right) examples generated by the SSM. Knotpoints are indicated by black circles; lines are for the purposes of illustration.
Fig. 12. Mahalanobis vs. geodesic distances from the mean shape for grey circles: illegal shapes generated by the SSM, white triangles: legal shapes generated by the SSM, black triangles: the training set.
distance for the illegal shapes would approach infinity as the sampling density increased. We can now extend this result to the case of modelling the deformation fields for the non-rigid registration; a linear model of such deformation fields would suffer the same problem as the linear SSM, where now the overlapping structures would correspond to a folding of the warp. The GIS cannot, by definition, generate such a folding since it is guaranteed to be diffeomorphic.
5
Conclusions
This paper has introduced a principled diffeomorphic representation of deformation fields with an inherent non-Euclidean metric. The spline basis of this representation is defined by the choice of Green’s function and boundary conditions, which can be altered to suit the particular task in hand. We have demonstrated that this representation method can accurately represent real biological variations in both two and three dimensions. Conventional linear modelling strategies impose a Euclidean metric on the space of parameters (in our case, the knotpoint positions). The Mahalanobis distance that we have used for comparisons in this paper is derived from such a metric. The example in section 4.2 clearly shows the superiority of the nonEuclidean metric in quantifying variation.
Acknowledgements. Our thanks to G. Gerig and M. Steiner for the hippocampus dataset, and to Rh. Davies for providing us with his optimised correspondences for this dataset. This research was supported by the MIAS IRC project, EPSRC grant number GR/N14248/01.
424
C. Twining and S. Marsland
References 1. T. Boggio. Sulle funzioni di green d’ordine m. Rendiconti - Circolo Matematico di Palermo, 20:97–135, 1905. 2. F. L. Bookstein. Principal Warps: Thin-Plate Splines and the Decomposition of Deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(6):567–585, 1989. 3. Ch. Brechbuehler, G. Gerig, and O. Kuebler. Parametrization of closed surfaces for 3D shape description. Computer Vision and Image Understanding, 61(2):154–170, 1995. 4. M. Bro-Nielsen and C. Gramkow. Fast fluid registration of medical images. In Proceedings of Visualization in Biomedical Computing (VBC), pages 267–276, 1996. 5. V. Camion and L. Younes. Geodesic interpolating splines. In M. Figueiredo, J. Zerubia, and A. K. Jain, editors, Proceedings of EMMCVPR’01, volume 2134 of Lecture Notes in Computer Science, pages 513–527. Springer, 2001. 6. C. Chefd’Hotel, G. Hermosillo, and O. Faugeras. A variational approach to multimodal image matching. In Proceedings of IEEE Workshop on Variational and Level Set Methods (VLSM’01), pages 21 – 28, 2001. 7. Y. Choi and S. Lee. Injectivity conditions of 2D and 3D uniform cubic B spline functions. Graphical Models, 62(6):411–427, 2000. 8. T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Active shape models – their training and application. Computer Vision and Image Understanding, 61(1):38–59, 1995. 9. Rh. Davies, C. J. Twining, T. F. Cootes, J. C. Waterton, and C. J. Taylor. 3D statistical shape models using direct optimisation of description length. In Proceedings of the 7th European Conference on Computer Vision (ECCV), volume 2352 of Lecture Notes in Computer Science, pages 3–20. Springer, 2002. 10. J. Duchon. Interpolation des fonctions de deux variables suivant le principe de la flexion des plaques minces. Revue Fran¸caise d’Automatique, Informatique, Recherche Op´erationelle (RAIRO) Analyse Numerique, 10:5–12, 1976. 11. P. Dupuis, U. Grenander, and M. I. Miller. Variational problems on flows of diffeomorphisms for image matching. Quarterly of Applied Mathematics, 56(3):587– 600, 1998. 12. J. Gee, M. Reivich, and R. Bajcsy. Elastically deforming 3D atlas to match anatomical brain images. Journal of Computer Assisted Tomography, 17(2):225–236, 1993. 13. A. Guimond, J. Meunier, and J.-P. Thirion. Average brain models: A convergence study. Technical Report RR-3731, INRIA, Sophia Antipolis, 1999. 14. S. C. Joshi and M. M. Miller. Landmark matching via large deformation diffeomorphisms. IEEE Transactions on Image Processing, 9(8):1357–1370, 2000. 15. L. LeBriquer and J. Gee. Design of a statistical model of brain shape. In Proceedings of IPMI’97, volume 1230 of Lecture Notes in Computer Science, pages 477–482. Springer, 1997. 16. S. Marsland and C. J. Twining. Clamped-plate splines and the optimal flow of bounded diffeomorphisms. In Statistics of Large Datasets, Proceedings of Leeds Annual Statistical Research Workshop, pages 91–95, 2002. 17. S. M. Pizer, D. S. Fritsch, P. Yushkevich, V. Johnson, and E. Chaney. Segmentation, registration, and measurement of shape variation via image object shape. IEEE Transactions on Medical Imaging, 18(10):851–865, 1999. 18. D. Rueckert, A. F. Frangi, and J. A. Schnabel. Automatic construction of 3D statistical deformation models using non-rigid registration. In Proceedings of MICCAI’01, volume 2208 of Lecture notes in Computer Science, pages 77–84, 2001.
Constructing Diffeomorphic Representations of Non-rigid Registrations
425
19. D. Rueckert, L. I. Sonoda, C. Hayes, D. L. G. Hill, M. O. Leach, and D. J. Hawkes. Non-rigid registration using free-form deformations: Application to breast MR images. IEEE Transactions on Medical Imaging, 18(8):712–721, 1999. 20. L. H. Staib and J. S. Duncan. Boundary finding with parametrically deformable models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(11):1061–1075, 1992. 21. M. Styner and G. Gerig. Hybrid boundary-medial shape description for biologically variable shapes. In Proceedings of IEEE Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA) 2000, pages 235–242, June 2000. 22. A. Trouv´e. Diffeomorphisms groups and pattern matching in image analysis. International Journal of Computer Vision, 28(3):213–221, 1998. 23. C. J. Twining, S. Marsland, and C. J. Taylor. Measuring geodesic distances on the space of bounded diffeomorphims. In Proceedings of BMVC’02, volume 2, pages 847–856. BMVA Press, 2002.
Topology Preservation and Regularity in Estimated Deformation Fields Bilge Kara¸calı and Christos Davatzikos Section of Biomedical Image Analysis, Department of Radiology, University of Pennsylvania, Philadelphia, PA 19104, USA
Abstract. A general formalism to impose topology preserving regularity on a given irregular deformation field is presented. The topology preservation conditions are derived with regard to the discrete approximations to the deformation field Jacobian in a two-dimensional image registration problem. The problem of enforcing topology preservation onto a given deformation field is formulated in terms of the deformation gradients, and solved using a cyclic projections approach. The generalization of the developed algorithm leads to a deformation field regularity control by limiting the per voxel volumetric change within a prescribed interval. Extension of the topology preservation conditions onto a threedimensional registration problem is also presented, together with a comparative analysis of the proposed algorithm with respect to a Gaussian regularizer that enforces the same topology preservation conditions.
1
Introduction
The most popular image representation scheme in medical image analysis is to describe a given patient scan as a warped template. The deformation field that characterizes the warping is solved by minimizing the energy of the residual image between the warped template and the patient scan [1,2,3], 2 R2 (ω)dω = (T (ω) − S(h(ω))) dω, (1) ω∈ΩT
ω∈ΩT
where T : ΩT → R+ denotes the template, S : ΩS → R+ denotes the patient scan (i.e., the subject), and the deformation field h is a mapping from the template domain ΩT onto the subject domain ΩS , both of which are subsets of a continuum. The underlying assumption in such a representation is that the template and the subjects are morphologically similar to each other; in other words, they consist of essentially the same structural components, deformed in various ways. The subject image can then be represented as S(h(ω)) = T (ω) + R(ω),
(2)
where h is determined by minimizing the residual image energy in Equation (1). This optimization problem is ill-posed, in the sense that the solution may not be C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 426–437, 2003. c Springer-Verlag Berlin Heidelberg 2003
Topology Preservation and Regularity in Estimated Deformation Fields
427
unique, or may be undesirable in terms of the induced correspondence between the template and the subject. A common remedy is to pair the residual energy with a regularization term that expresses the desired deformation field properties to steer the optimization away from the unwanted solutions. Continuity is the most basic deformation field property sought, and has its roots in the presumption that the template must not disintegrate when matched onto the subject. Differentiability and continuity of higher differentials proliferate a large class of regularizers, characterizing the “acceptable” types of deformations suitable to a given application. The morphological similarity assumption of the template and the subject images suggests preservation of topology as another desired deformation field property, in order to maintain the connectivity between neighboring morphological structures. Topology preservation characteristics of a deformation field is determined by the determinant of its Jacobian. In a two-dimensional image registration setting where ΩT and ΩS are subsets of R2 , this determinant (also referred to as the Jacobian) has the form ∂f (ω) ∂f (ω) ∂x ∂y |J(ω)| = ∂g (3) ∂g ∂x (ω) ∂y (ω) where f and g are the Cartesian components of the deformation field h over x and y axes. Note that topology preservation condition automatically restricts the set of deformation field solutions to continuously differentiable deformation fields. The embedding of topology preservation into the optimization problem, however, is not as straightforward as various other smoothness constraints which can easily be incorporated into a regularization term in the cost function. Instead, topology preservation is attained in an indirect way, by characterizing the search space by a model which guarantees the invertibility and the continuity of the generated deformation fields, so that topology is eventually preserved [1,4,5,2]. Other approximate ways that guide the optimization process towards a topology preserving deformation field also exist [3,6]. In the absence of a suitable model, the image registration problem is attempted on a discrete grid of points over which the images are acquired. The correspondences established by maximizing a similarity criterion between two discrete images then correspond to a discrete deformation field. Characterization of topology preserving quality of a deformation field estimated as such, however, is not straightforward. For instance, positivity of a forward, backward, or symmetric finite difference approximation of the Jacobian does not guarantee positivity of its continuous counterpart found via interpolation, as it is shown in the paper. Topology preservation is a property inherent to continuous domains where the deformation fields are continuously differentiable. There is no definite connection from a finite number of point correspondences to the induced correspondence between continuous domains, and the link between the deformation of a discrete grid and the topology preserving character of its correspondent over the contin-
428
B. Kara¸calı and C. Davatzikos
uum is not palpable. It is therefore important to derive sufficient conditions for a topology preserving transformation in the discrete domain. In this paper, we derive the conditions to be satisfied by a given deformation field defined over a discrete grid to have a topology preserving continuous counterpart, obtained by a bilinear (or trilinear) interpolation. The next section is devoted to the analysis of discrete approximations to the Jacobian and their significance in terms of detecting topology preservation violations, which are then used as the basis in formulating the necessary conditions for topology preservation. In Section 3, we develop an iterative scheme to enforce these conditions onto a given deformation field as a hard constraint, and present illustrative examples.
2
Topology Preservation Conditions over a Discrete Domain
In order for a continuously differentiable deformation field to preserve topology, its Jacobian needs to be positive everywhere in its domain. Conversely, it can be shown [2] that if a deformation field is continuous and globally one-to-one, it also preserves topology. Neither of these, however, applies to deformation fields that are defined over a discrete grid of points. 2.1
Topology Preservation in Two Dimensions
When we speak of topology preservation conditions to be satisfied by a discrete deformation field h, we refer to the topology preserving nature of the continuous deformation field hc obtained by a bilinear interpolation of the discrete deformation field. Since such an interpolation is not guaranteed to be continuously differentiable everywhere, we approach the topology preservation issue with regard to the continuity and globally one-to-one properties. By construction, hc is continuous. Globally one-to-one properties, on the other hand, is conditional on the local behavior of hc which are governed by h. Specifically, if the continuous deformation field hc is one-to-one over all square patches that partition the continuous domain defined by the discrete grid, then hc is globally one-to-one. We elaborate on and prove this assertion below. Lemma 1. Let p1 = (x0 , y0 ), p2 = (x0 + 1, y0 ), p3 = (x0 , y0 + 1), p4 = (x0 + 1, y0 + 1), p1 , p2 , p3 , p4 ∈ Ω = [0, . . . , M − 1] × [0, . . . , N − 1], and h : Ω → R2 a discrete deformation field with h = (f, g). Define the forward and backward approximations to partial derivatives of f (and similarly for g) at (x, y) as fxf (x, y) = f (x + 1, y) − f (x, y), fxb (x, y) = f (x, y) − f (x − 1, y),
fyf (x, y) = f (x, y + 1) − f (x, y), fyb (x, y) = f (x, y) − f (x, y − 1).
Then, the following statements are equivalent.
Topology Preservation and Regularity in Estimated Deformation Fields
429
1. The four possible approximations to the Jacobian of h in the square patch defined by p1 , p2 , p3 , and p4 , given by Jff J bf Jfb J bb
= fxf (p1)gyf (p1) − fyf (p1)gxf (p1) = fxb (p2)gyf (p2) − fyf (p2)gxb (p2) = fxf (p3)gyb (p3) − fyb (p3)gxf (p3) = fxb (p4)gyb (p4) − fyb (p4)gxb (p4)
are positive. 2. The angles defined by the triplets (h(p3 ), h(p1 ), h(p2 )), (h(p1 ), h(p2 ), h(p4 )), (h(p2 ), h(p4 ), h(p3 )), and (h(p4 ), h(p3 ), h(p1 )) are between 0 and π. 3. The signed area of the triangles defined by the triplets (h(p3 ), h(p1 ), h(p2 )), (h(p1 ), h(p2 ), h(p4 )), (h(p2 ), h(p4 ), h(p3 )), and (h(p4 ), h(p3 ), h(p1 )) given by 1 f (p3 ) g(p3 ) 1 Ah(p3 ),h(p1 ),h(p2 ) = 1 f (p1 ) g(p1 ) . 2 1 f (p2 ) g(p2 ) and similarly for Ah(p1 ),h(p2 ),h(p4 ) , Ah(p2 ),h(p4 ),h(p3 ) , and Ah(p4 ),h(p3 ),h(p1 ) , are positive. The proof which is solely based on algebraic manipulations is omitted. The equivalence of these statements is illustrated in Figure 1. The significance of the sign information of the triangle areas lies in the counter-clockwise ordering of the points, which is violated when the corresponding angle goes beyond the interval (0, π) ⊂ [0, 2π). h(p4 ) p3
h(p3 )
p4
h h(p2 ) p1
p2 h(p1 )
Fig. 1. Illustration of the equivalent statements of Lemma 1. Convexity of the region is violated when the images of the points cross over the diagonal connecting their neighbors. The corresponding angle exceeds π, and the counter clockwise order of the points in the triangle changes. Consequently, the area of the triangle computed by the determinant changes sign and becomes negative.
Definition 1. Let C be the class of deformation fields h = (f, g) defined over a discrete rectangular grid Ω = [0, . . . , M − 1] × [0, . . . , N − 1] ⊂ Z2 for which J f f , J f b , J bf , and J bb are positive at all (x, y) ∈ Ω.
430
B. Kara¸calı and C. Davatzikos
Proposition 1. Let a deformation field h = (f, g) defined over a discrete rectangular grid Ω = [0, 1, . . . , M − 1] × [0, 1, . . . , N − 1] ⊂ Z2 be an element of C. Then, its continuous counterpart determined via the interpolation of h over the domain Ωc = [0, M − 1] × [0, N − 1] ⊂ R2 using the (bilinear) interpolant φ(x, y) given by φ(x, y) = φ(x)φ(y) with
1 + (·) if − 1 ≤ (·) < 0 φ(·) = 1 − (·) if 0 ≤ (·) < 1 0 otherwise
(4)
(5)
preserves topology over Ωc . Proof. Consider the behavior of hc over the square patch S defined by p1 = (x0 , y0 ), p2 = (x0 + 1, y0 ), p3 = (x0 , y0 + 1), p4 = (x0 + 1, y0 + 1), p1 , p2 , p3 , p4 ∈ Ω = [0, . . . , M − 1] × [0, . . . , N − 1]. Positivity of the J f f , J f b , J bf , and J bb guarantees that h(S) is convex. Continuity of the bilinear interpolant provides continuity of hc over S, and hence, hc is locally one-to-one over all such S, and globally one-to-one over Ωc . This completes the proof. 2.2
Topology Preservation in Three Dimensions
Generalization of the conditions on a discrete deformation field to have a topology preserving continuous correspondent derived above for a two-dimensional registration problem into a three-dimensional problem consists of enforcing positivity of the forward and backward Jacobians at all eight vertices of a cubic patch, for all cubic patches in the domain. The proof, however, cannot be carried out in terms of convexity arguments, since the bilinear interpolation of the faces of a given cubic patch does not necessarily correspond to a planar surface. Proposition 2. Let h = (f, g, e) be a discrete deformation field defined as a mapping from ΩT ⊂ Z3 into R3 . If the Jacobians defined by f fx (x0 , y0 ) fyf (x0 , y0 ) fzf (x0 , y0 ) J f f f (x0 , y0 ) = gxf (x0 , y0 ) gyf (x0 , y0 ) gzf (x0 , y0 ) ef (x0 , y0 ) ef (x0 , y0 ) ef (x0 , y0 ) x y z and similarly for J bf f , J f bf , J bbf , J f f b , J bf b , J f bb , and J bbb are positive for all (x0 , y0 ) ∈ ΩT for which the corresponding forward and backward partial derivative approximations exist, then the continuous deformation field hc obtained by the trilinear interpolation of h preserves topology. Proof. Consider a cubic patch C defined by the vertices {(x0 , y0 , z0 ), (x0 + 1, y0 , z0 ), . . . , (x0 + 1, y0 + 1, z0 + 1)}, all of which are in the discrete domain ΩT . By virtue of trilinear interpolation, the partial derivatives of hc at a point
Topology Preservation and Regularity in Estimated Deformation Fields
431
(x, y) in the inclusion of C is a linear combination of the approximations at the eight vertices, and as (x, y) approaches a particular vertex, the partial derivatives of hc converge to the discrete approximations at that vertex. The Jacobian of hc also exhibits this asymptotic behavior, and converges to either one of J f f f , J bf f , J f bf , J bbf , J f f b , J bf b , J f bb , and J bbb , depending on the approached vertex. Furthermore, it can be shown that the Jacobian of hc inside C is a linear combination of these eight corner Jacobians. Consequently, positivity of these eight Jacobians is not only necessary, but also sufficient for the positivity of the Jacobian of hc inside C, which guarantees that hc is locally one-to-one inside C. The global one-to-one properties of hc over the faces which separate such C and at the vertices can be established using the continuity and local one-to-one properties of hc . This completes the proof. We have thus established the necessary conditions for a trilinear interpolation of a discrete deformation field to preserve topology. In the next section, we develop an iterative algorithm to enforce positivity of the forward and backward Jacobians onto a given discrete deformation field.
3
Approach
In the previous section, we have identified the conditions to be satisfied by a discrete deformation map, so that its continuous counterpart obtained through bilinear interpolation preserves topology. The goal, now, is to enforce these conditions on a given deformation field that does not necessarily adhere to them, with as small distortion as possible. This corresponds to minimizing ||h(ω) − hr (ω)||2 (6) ω∈ΩT
with respect to h : ΩT → R subject to min J f f , J bf , J f b , J bb > 0
(7)
for a two-dimensional setting, or min J f f , J bf , J f b , J bb , J f f b , J bf b , J f bb , J bbb > 0
(8)
+
for a three-dimensional setting, over all ω ∈ ΩT for which the corresponding partial derivative approximations exist, given an initial deformation field hr obtained possibly through some image registration process, where ||·|| denotes some measure of distance between two discrete deformation fields. Since the topology preservation conditions are expressed in terms of partial derivative approximations, we pose this optimization problem in terms of the partial derivatives of the deformation fields. In a three-dimensional problem where hr = (f r , g r , er ), this corresponds to minimizing
r r r 2 2 2 ω∈ΩT (fx (ω) − fx (ω)) + (gx (ω) − gx (ω)) + (ex (ω) − ex (ω)) 2 2 r r r + (fy (ω) − fy (ω)) + (gy (ω) − gy (ω)) + (ey (ω) − ey (ω))2 + (fz (ω) − fzr (ω))2 + (gz (ω) − gzr (ω))2 + (ez (ω) − ezr (ω))2
432
B. Kara¸calı and C. Davatzikos
such that the topology preservation conditions are satisfied, and the partial derivative triplets (fx , fy , fz ), (gx , gy , gz ), and (ex , ey , ez ) are integrable so as to produce unique reconstructions for f , g, and e respectively. We administer positivity of the corresponding Jacobians onto the partial derivative approximations of f , g, and e using the following observation. Suppose h = (f, g, e) is given such that f (x, y, z) = x + f (x, y, z)
g(x, y, z) = y + g (x, y, z) e(x, y, z) = z + e (x, y, z).
(9) (10) (11)
where f , g , and e are displacement fields in x, y and z directions. Define fα , gα , and eα as fα (x, y, z) = x + αf (x, y, z) gα (x, y, z) = y + αg (x, y, z) eα (x, y, z) = z + αe (x, y, z)
(12) (13) (14)
where α ∈ [0, 1]. Now, consider the Jacobians corresponding to (f, g, e) and (fα , gα , eα ) at a point (x0 , y0 , z0 ), denoted by J and Jα . Clearly, lim Jα = 1,
α→0
lim Jα = J.
α→1
From the continuity of Jα with respect to α, we conclude that if J < 0, then there exists a α ∈ [0, 1] such that Jα = > 0. Solving for that particular value of α for all forward and backward Jacobians at every point in the discrete domain provides updated values for corresponding partial derivatives of a given deformation field. Incidentally, the same argument holds for bringing large Jacobians within some prescribed upper bound , pointing out the regularization aspect of the topology preservation approach. Integrability comes into play when the prescribed updates by the previous operation deviate the partial derivative approximations from corresponding to physically realizable three-dimensional functions. This is a concept extensively addressed in the computer vision literature in the context of surface reconstruction from not-necessarily integrable gradient fields. In this work, we adopt the solution developed by Karacali and Snyder [7] which consists of a vector space approach that characterizes the space of integrable gradient fields as a subspace spanned by an orthonormal basis of gradient fields. First, the gradient fields corresponding to the elementary functions that span the function space are computed. Orthonormalization of these gradient fields provides a basis for the space of integrable gradient fields referred to as the feasible gradient subspace. The reconstruction of the unknown function is then performed via a projection onto the feasible subspace that globally enforces integrability and provides the minimum norm solution in the gradient space. The algorithm to enforce topology preservation conditions on a given discrete deformation field can then be summarized as follows.
Topology Preservation and Regularity in Estimated Deformation Fields
433
1. initialize (hx , hy , hz )(0) = (hxr , hyr , hrz ) 2. while ∃ω ∈ ΩT with min{J f f f (ω), J bf f (ω), . . . , J bbb (ω)} < 0, do for k = 1, 2, . . . a) enforce topology preservation conditions on (hx , hy , hz )(k−1) to obtain (hx , hy , hz ) desired)
(k−1)
for some > 0 (and < ∞, if regularization is also
b) enforce integrability on (hx , hy , hz )
(k−1)
to obtain (hx , hy , hz )(k)
This resembles to a cyclic projections algorithm encountered in the context of set theoretic estimation, used essentially to obtain a solution at the intersection of all the property sets that characterize the conditions to be satisfied in the space of all possible solutions [8,9]. The convergence results related with such cyclic projections framework, however, do not apply in this case since the technique to enforce topology preservation conditions does not correspond to a projection in the strict sense. Nevertheless, in the next section, we provide empirical evidence that the algorithm indeed converges to a desirable solution with arguably minimal correction on the initial deformation field. It should also be noted that the topology preservation and integrability conditions are not contradictory, since the set of topology preserving deformation fields is not empty. The set of integrable deformation gradients and the set of gradient fields that satisfy topology preservation therefore have a non-empty intersection, which is attained by the proposed algorithm.
4
Simulation Results
We first demonstrate the effect of regularity achieved by restricting the Jacobians into a prescribed range using the proposed algorithm. The template, which is a synthetically generated brain image from the BrainWeb database [10], is shown in Figure 2. The warped template according to a non-topology preserving deformation field is shown in Figure 3. The topology preserving warping results obtained by confining the Jacobians within intervals [0, ∞), [0.25, 4.00], and [0.50, 2.00] are shown in Figure 4. The first interval corresponds to enforcing topology preservation in its most relaxed sense onto the given deformation field. The subsequent intervals increasingly restrict the allowed Jacobian ranges to provide more regularity in the warped template. The effect of stronger regularity is clearly visible in the improved clarity of the warped structures and their topological positioning. We have also compared the performance of the proposed algorithm against a Gaussian regularizer that achieves the same bounds on the Jacobians, over threedimensional deformation fields defined on a 10 × 10 × 10 discrete grid denoted by Ω. The cartesian components of the corresponding displacement fields f , g, and e are obtained by interpolating a single random displacement inflicted at the midpoint and the eight corners of the domain using a thin plate spline interpolation [11]. The distribution of the random displacements is Gaussian with mean zero and variance σ 2 , which is treated as the control variable of
434
B. Kara¸calı and C. Davatzikos
Fig. 2. Synthetically generated brain template
Fig. 3. Warped template according to a non-topology preserving deformation field
the simulations to manipulate the deformation field strength. We measure the amount of correction enforced onto a given deformation field h = (f, g, e) to ˆ = (fˆ, gˆ, eˆ) with Jacobians in the range [0.25, 4.00] using two metrics obtain h ˆ defined by ˆ and D (h, h) D(h, h) ˆ = D(h, h) (f (ω) − fˆ(ω))2 + (g(ω) − gˆ(ω))2 + (e(ω) − eˆ(ω))2 (15) ω∈Ω
and ˆ D (h, h)=
(fx (ω) − fˆx (ω))2 + (gx (ω) − gˆx (ω))2 + (ex (ω) − eˆx (ω))2
ω∈Ω
+ (fy (ω) − fˆy (ω))2 + (gy (ω) − gˆy (ω))2 + (ey (ω) − eˆy (ω))2 + (fz (ω) − fˆz (ω))2 + (gz (ω) − gˆz (ω))2 + (ez (ω) − eˆz (ω))2 (16) where the subscript denotes the forward difference approximation to the partial derivatives in that direction. The average values of D and D for both the proposed technique and a Gaussian regularizers over 20 iterations for increasing displacement field strength are
Topology Preservation and Regularity in Estimated Deformation Fields
(a)
(b)
435
(c)
Fig. 4. Warped template according to topologically corrected deformation fields, (a) 0 < J < ∞, (b) 0.25 < J < 4.00, (c) 0.50 < J < 2.00
shown in Table 1. The correction obtained by the proposed adaptive regularization technique is much smaller than the correction enforced by a Gaussian regularizer that achieves the same bounds on the Jacobians: the proposed correction modifies only the deformation field gradients that violate the prescribed Jacobian bounds, and the remaining deformation field components are updated only through integrability. A Gaussian regularizer, on the other hand, not only smears the deformation field behavior that violates the discrete topology preserving conditions, but also removes high-frequency components from the deformation field on a global scale, just as any regularizer that is low-pass in characteristics would do. Consequently, while the Gaussian regularization effectively distributes the displacement field strength over the whole domain which corresponds to making the transformation more and more rigid, until the desired Jacobian bounds are met, the proposed technique administers local corrections and updates surrounding regions through integrability constraints. These observations are illustrated in Figure 5. Table 1. Deformation field correction comparison between the proposed technique and a Gaussian regularizer that achieve deformation field Jacobians between 0.25 and 4.00 σ 2.0 2.5 3.0 3.5 4.0
proposed ˆ D(h, h) 57.58 166.19 658.25 701.88 1146.35
technique ˆ D (h, h) 14.37 38.73 147.62 148.30 237.44
Gaussian ˆ D(h, h) 914.45 2257.35 5492.24 7466.92 9129.43
regularizer ˆ D (h, h) 190.90 475.45 878.02 1080.81 1376.19
436
B. Kara¸calı and C. Davatzikos
(a)
(b)
(c)
Fig. 5. Topology preserving regularity enforced by the proposed technique and a Gaussian regularizer, (a) the initial deformation field that violates topology preservation conditions, (b) the deformation field obtained by the proposed technique, (c) the deformation field obtained by a Gaussian regularizer. The adaptive regularization preserves the aspects of the initial deformation field structure that do not violate topology preservation, which is not the case with the Gaussian regularization result that smears the deformation field uniformly.
5
Conclusion
We have investigated the correspondence between the deformation fields defined over two- or three-dimensional discrete domains and their continuous counterparts in terms of topology preservation and regularity of Jacobians. Specifically, we have obtained the conditions to be satisfied by the discrete deformation fields so that the continuous correspondents obtained by bilinear (or trilinear) interpolation preserve topology and have extended Jacobians within certain prescribed limits. Based on these conditions, we have developed an iterative algorithm to enforce these prescribed bounds on the extended deformation field Jacobians selectively over the discrete domain. Simulation results indicate that the deformation fields obtained by the proposed algorithm to satisfy the topology preservation and regularity conditions are much closer to the initial deformation fields that violate these conditions than those obtained by a Gaussian regularizer. Characterization of deformation fields in terms of topology preservation and regularity constraints as described in this paper offers a means of establishing preferences in terms of advancing in a typical image registration problem. This approach is certainly useful in a voxel-wise image registration framework where typically more than one possible point correspondences are determined, and the objective is to select one combination out of a combinatorially increasing number of possibilities. The algorithm addresses the Jacobian characterization problem in the space of deformation field gradients, which necessitates unique reconstruction of deformation field components from their partial derivatives. A major drawback of the algorithm is the computational cost of enforcing integrability on the partial derivatives, which increases with image size and requires speedy block processing techniques.
Topology Preservation and Regularity in Estimated Deformation Fields
437
References 1. Christensen, G.E., Rabbitt, R.D., Miller, M.I.: Deformable templates using large deformation kinematics. IEEE Trans. on Image Processing 5 (1996) 1435–1447 2. Musse, O., Heitz, F., Armspach, J.P.: Topology preserving deformable image matching using constrained hierarchical parametric models. IEEE Trans. on Image Processing 10 (2001) 1081–1093 3. Ashburner, J., Andersson, J.L.R., Friston, K.J.: High-dimensional image registration using symmetric priors. NeuroImage 9 (1999) 619–628 4. Joshi, S.C., Miller, M.I.: Landmark matching via large deformation diffeomorphisms. IEEE Trans. on Image Processing 9 (2000) 1357–1370 5. Johnson, H.J., Christensen, G.E.: Landmark and intensity based, consistent thinplate spline image registration. In Insana, M.F., Leahy, R.M., eds.: International Conference on Information Processing in Medical Imaging. Volume LNCS 2082., Springer (2001) 329–343 6. Thirion, J.P.: Image matching as a diffusion process: an analogy with maxwell’s demons. Medical Image Analysis 2 (1998) 243–260 7. Kara¸calı, B., Snyder, W.: Partial integrability in surface reconstruction from a given gradient field. In: IEEE International Conference on Image Processing. Volume 2. (2002) 525–528 8. Trussell, H.J., Civinlar, M.: The feasible solution in signal restoration. IEEE Trans. on Acoustics, Speech, Signal Processing 32 (1984) 201–212 9. Combettes, P.L.: The foundations of set theoretic estimation. Proceedings of IEEE 81 (1993) 182–208 10. Cocosco, C., Kollokian, V., Kwan, R.S., Evans, A.: Brainweb: Online interface to a 3d mri simulated brain database. In: 3-rd International Conference on Functional Mapping of the Human Brain. Volume 5. (1997) 4 11. Bookstein, F.L.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Trans. on Pattern Analysis and Machine Intelligence 11 (1989) 567–585
Large Deformation Inverse Consistent Elastic Image Registration Jianchun He and Gary E. Christensen Department of Electrical and Computer Engineering The University of Iowa, Iowa City, IA, 52242 {jianchun-he,gary-christensen}@uiowa.edu
Abstract. This paper presents a new image registration algorithm that accommodates locally large nonlinear deformations. The algorithm concurrently estimates the forward and reverse transformations between a pair of images while minimizing the inverse consistency error between the transformations. It assumes that the two images to be registered contain topologically similar objects and were collected using the same imaging modality. The large deformation transformation from one image to the other is accommodated by concatenating a sequence of small deformation transformations. Each incremental transformation is regularized using a linear elastic continuum mechanical model. Results of ten 2D and twelve 3D MR image registration experiments are presented that tested the algorithm’s performance on real brain shapes. For these experiments, the inverse consistency error was reduced on average by 50 times in 2D and 30 times in 3D compared to the viscous fluid registration algorithm.
1
Introduction
Magnetic resonance images (MRI) of the head demonstrate that the macroscopic shape of the brain is complex and varies widely across normal individuals. Lowdimensional, small-deformation, and linear image registration algorithms [1,2,3, 4,5,6,7] can only determine correspondences between brain images at a coarse global level. High dimensional large deformation image registration algorithms [8,9,10] are needed to describe the complex shape differences between individuals at the local level. In this paper we present a new large-deformation, inverse-consistent, elastic image registration (LDCEIR) algorithm. This method accommodates large nonlinear deformations by concatenating a sequence of small incremental transformations. Inverse consistency between the forward and reverse transformations is achieved by jointly estimating the incremental transformations while enforcing inverse consistency constraints on each incremental transformation. The transformation estimation is regularized using a linear differential operator that penalizes second order derivatives in both the spatial and temporal dimensions. This regularization is most similar to a thin-plate spline or linear elastic regularization with the difference that it is applied to both the spatial and temporal dimensions instead of just the spatial dimension. C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 438–449, 2003. c Springer-Verlag Berlin Heidelberg 2003
Large Deformation Inverse Consistent Elastic Image Registration
439
Previous work on large deformation image registration includes the viscous fluid intensity registration (VFIR) algorithm [8,11,12], viscous fluid landmark registration (VFLR) algorithm [9], and hyperelastic intensity registration (HEIR) algorithm [10]. The viscous fluid intensity registration algorithm models the template image as a viscous fluid and each point in the image domain as a mass particle that moves in space. The method solves a modified Navier-Stokes equation for the velocities of the mass particles and find the displacement field by integrating the velocity field over time. The LDCEIR method differs from the VFIR and VFLR algorithms in that it regularizes the displacement field of the transformation instead of the velocity field of the transformation. Thus, the LDCEIR model penalizes large nonlinear deformations similar to the HEIR algorithm while the VFIR and VFLR penalize the rate at which one image is deformed into another. These differences make the LDCEIR better suited for modeling anatomical structures that deform elastically and the VFIR and VFLR better for modeling fluids in the anatomy when registering images collected of the same anatomy over time. The LDCEIR algorithm is similar to the VFLR algorithm in that both algorithms are solved in both space and time. But it is different from the VFIR algorithm which solves the Navier-Stokes equation using a greedy strategy and only regularizes in the spatial domain. The LDCEIR algorithm also differs from the VFIR algorithm in that it is bi-directional (i.e., estimates the forward and reverse transformations) while the VFIR algorithm is unidirectional (only estimates the forward transformation). This difference allows the LDCEIR algorithm to estimate transformations with much less inverse consistency error than is possible using the VFIR algorithm as demonstrated in this paper. The LDCEIR method generalizes the small-deformation, inverse-consistent, linear-elastic intensity registration (SDCEIR) algorithm [6] by including intermediate transformations so that it can accommodate large deformations. The LDCEIR method simplifies to the SDCEIR algorithm for the case of no intermediate transformations. The rest of the paper is organized as follows. The LDCEIR algorithm is described in Section 2. Registration results are presented in Section 3 that test the LDCEIR algorithm on 2D and 3D MRI brain images and these results are compared to the viscous fluid intensity registration algorithm. Finally, Section 4 summarizes and gives conclusions of this work.
2 2.1
Methods Notation
This section describes the notation and assumptions used through out the paper. For convenience, it is assumed that an image is three dimensional and is defined both on a discrete domain Ωd = {(n1 , n2 , n3 ) : 0 ≤ n1 < N1 , 0 ≤ n2 < N2 , 0 ≤ n3 < N3 } corresponding to the voxel lattice and on a continuous domain Ωx that is extended from the voxel lattice to the continuum by linear interpolation. A point x = (x1 , x2 , x3 ) ∈ Ωx corresponds to a point in the continuous domain
440
J. He and G.E. Christensen
of an image while x = (x1 , x2 , x3 ) ∈ Ωd corresponds to a specific voxel in the image. The two images to be registered are denoted as I0 (x) and I1 (x). The notation T (x, i), for 0 ≤ i < N4 , is used to denoted a sequence of images as shown in Figure 1 for N4 = 8. It will be assumed that the number of images N4 in the sequence is an even number.
Fig. 1. A periodic in time image sequence and associated incremental transformations.
An incremental transformation h(x, i) (see Figure 1) defines the pointwise correspondence between image T (x, i) and image T (x, i + 1). The incremental transformations h(x, i), for 0 ≤ i < N4 , are related to the images in the image sequence by the equations T (x, 0) = I0 (x), T (x, 2) = T (h(x, 1), 1), T (x, 4) = I1 (x), T (x, 6) = T (h(x, 5), 5),
T (x, 1) = T (h(x, 0), 0), T (x, 3) = T (h(x, 2), 2), T (x, 5) = I1 (h(x, 4)), T (x, 7) = T (h(x, 6), 6)
(1)
for the case N4 = 8. Notice that the images T (x, 0) and T (x, 4) in the image sequence are set to equal the two images being registered. The transformation that deforms I0 (x) into I1 (x) is called the forward transformation and is computed by
Large Deformation Inverse Consistent Elastic Image Registration
441
h(h(h(h(x, 3), 2), 1), 0) which is the concatenation of the incremental transformations h(x, i) for i = 0, 1, 2, 3. The reverse transformation deforms I1 (x) to I0 (x) and is computed in a similar manner using the formula h(h(h(h(x, 7), 6), 5), 4). Thus, I0 (h(h(h(h(x, 3), 2), 1), 0)) ∼ I1 (x) and I1 (h(h(h(h(x, 7), 6), 5), 4)) ∼ I0 (x).
(2)
Let u(x, i) = h(x, i) − x denote the displacement field associated with the incremental transformation h(x, i). 2.2
Image Registration
This section describes how two 3D images I0 (x) and I1 (x) are registered by constructing an image sequence T (x, i) that is periodic in both the spatial and temporal dimensions. The registration problem is formulated as an optimization problem in which the displacement fields u(x, i) = h(x, i)−x, for 0 ≤ i < N4 , are estimated instead of the incremental transformations h(x, i). The optimization problem is formulated to achieve several goals. The first goal is to estimate the incremental transformation functions h(x, i) such that h(x, i) deforms image T (x, i) into the shape of T (x, i + 1), for 0 ≤ i < N4 . This is accomplished by minimizing the intensity similarity cost function N 4 −1 (T (u(x, i) + x, i) − T (x, i + 1))2 dx. (3) CS (u) = i=0
Ωd
The second goal is to estimate a set of incremental transformations that gradually deform I0 (x) into the shape of I1 (x) such that the forward transformation is evenly distributed among the incremental transformations h(x, i), for 0 ≤ i < N4 /2. Similarly, the reverse transformation that deforms I1 (x) back to the shape of I0 (x) should be evenly distributed among the incremental transformations h(x, i), for i = N4 /2, . . . , N4 − 1. This condition produces a periodic image sequence T (x, i) that is symmetric about T (x, 0) and T (x, N4 /2) such that the images T (x, i) and T (x, N4 − i) look similar to each other. This constraint is imposed on the optimization problem by minimizing the symmetric similarity cost function given by N 4 −1 (T (u(x, i) + x, i) − T (x, N4 − 1 − i))2 dx. (4) CM (u) = i=0
Ωd
The third goal is to constrain each incremental transformation to be a smooth small deformation transformation. This goal is accomplished by regularizing each incremental transformation with a linear elastic continuum mechanical model. This constraint is incorporated into the optimization problem by minimizing the regularization cost function N 4 −1 CR (u) = ||Lu(x, i)||2 dx, (5) i=0
Ωd
442
J. He and G.E. Christensen
where the linear elastic operator L is defined as L = −(αx ∇2x + αt ∇2t ) − β∇x (∇x ·) + γ. The αt ∇2t term is added to the regular 3D linear elastic operator to smooth the transition from one time step to the next and to help uniformly distribute the total transformation among the incremental transformations. The fourth goal is to minimize the inverse consistency error between the forward and reverse transformations. By construction, the incremental transformations h(x, i) and h(x, N4 − 1 − i) should be inverses of each other since the images T (x, i) and T (x, N4 − i) are constrained to look similar. The inverse consistency error is minimized by mininimizing the inverse consistency cost function CI (u) =
N 4 −1 i=0
||u(x, i) − u ˜(x, N4 − 1 − i)||2 dx,
(6)
Ωd
where u ˜(x, i) = h−1 (x, i) − x. Notice that imposing the inverse consistency constraint on the incremental transformations imposes the inverse consistency constraint on the forward and reverse transformations. Due to its symmetric form, Eq. (6) also helps to ensure the symmetry of the image sequence T (x, i) about i = 0 and i = N4 /2. Note that it is possible to compute Eq. 6 without computing inverses using the approach of Cachier and Rey [13]. The LDCEIR image registration algorithm is formulated as the optimization problem: determine the displacement fields u(x, ·) that minimize the weighted sum of Eqs. (3), (4), (5), and (6) given by u(x, ·) = argmin(σS CS (u) + σM CM (u) + σR CR (u) + σI CI (u))
(7)
where σS , σM , σR , and σI are weighting factors. Notice that the four terms in Eq. (7) compete with one another and the final solution is a trade off between the four constraints. The image sequence T (x, ·) is initialized by setting the first half of the images in the sequence equal to image I0 (x) and the second half equal to image I1 (x). For example when N4 = 8, the image sequence T (x, i) is initialized as T (x, 0) = T (x, 1) = T (x, 2) = T (x, 3) = I0 (x) T (x, 4) = T (x, 5) = T (x, 6) = T (x, 7) = I1 (x). 2.3
(8)
Estimation Procedure
Eq. (7) is minimized assuming that the displacement field u(x, ·) is parameterized by a 4D Fourier series given by
N/2
u(x, i) =
µ[k]ej<(x,i),ωk > ,
(9)
k=−N/2
where N = [N1 , N2 , N3 , N4 ] and < ·, · > represents the standard inner product. The coefficients µ[k] are (3 × 1), complex-valued vectors with complex conjugate 1 2πk2 2πk3 2πk4 symmetry and ωk = [ 2πk N1 , N2 , N3 , N4 ].
Large Deformation Inverse Consistent Elastic Image Registration
443
With the Fourier series parameterization, Eq. (7) is solved for µ[k] using the gradient descent method. Assume the images to be registered are d-dimensional and there are Np pixels in each image, then the total number of parameters to be estimated is approximately d × N4 × Np . But the high frequency coefficients are usually very small numbers and have very limited contribution to the total deformation. So they may be omitted to reduce the number of parameters required to represent the displacement fields. In practice, Eq. (9) is approximated by the following u(x, i) =
r
µ[k]ej<(x,i),ωk > ,
(10)
k=−r
where r = [r1 , r2 , r3 , r4 ] ≤ N/2. The constants r1 , r2 , r3 and r4 represent the largest x1 , x2 , x3 and i harmonic components of the displacement fields. They are set to small numbers at the beginning and periodically increased throughout the iterative optimization procedure. In other words, the low frequency basis coefficients are estimated before the higher ones in our approach. The benefit of this approach is that the global image features are registered before the local details, so the registration is less likely to be trapped in local minimums of the total cost function. Moreover, when the values of r1 , r2 , and r3 are small, the template and target images are down sampled to reduce the computation task. Down sampling of the images in spatial domain also helps to get rid of some local minimums of the cost function. In practice, each dimension of the images is down sampled by a factor of 4 at the beginning, and then increased to a factor of 2. The full scaled images are only used for the final iterations to fine tune the registration. The steps involved in estimating the basis coefficients µ are summarized in the following algorithm. Algorithm 1. Initialize T (x, i) using Eq. (8). Set µ[k] = 0, u(x, i) = 0 and u ˜(x, i) = 0. Set r = [1, 1, 1, N4 /2]. 2. Update the basis coefficients µ[k] using gradient descent on Eq. (7). 3. Compute the displacement field u(x, i) using Eq. (10). 4. Update T (x, i) using Eq. (1). ˜(x, i) = h−1 (x, i) − x. 5. Compute h−1 (x, i) and set u 6. If the criteria is met to increase the number of basis functions then set r = r + 1, and set the new coefficients in Eq. (10) to zero. 7. If the algorithm has not converged or reached the maximum number of iterations goto step 2. 8. Use the displacement field u(x, i) to transform I0 (x) and I1 (x).
3
Results
The performance of the large-deformation, inverse-consistent, elastic image registration (LDCEIR) algorithm was tested by registering 10 pairs of 2D brain MR
444
J. He and G.E. Christensen
images and 13 3D brain MR images. The 2D experiments consisted of 6 pairs of transverse slices, 2 pairs of coronal slices and 2 pairs of sagittal slices. All of the 2D slices were selected from the set of 3D images of dimension 256 × 320 × 256. For each pair of the 2D slices, both of the forward and reverse transformations were estimated using the LDCEIR algorithm and the viscous fluid intensity registration (VFIR) algorithm. For 3D experiments, one data set from a set of 13 MRI brain images was selected as the template image and registered with the other 12 images using both the LDCEIR and VFIR algorithms. The images were all the same size of 128 × 160 × 128. Figures 2 – 5 show typical results from one of the 2D image registration experiments comparing the performance of the LDCEIR algorithm to the VFIR algorithm. Figure 2 shows the results of one of the ten 2D MRI brain image registration experiments. The left column shows the images I0 (top) and I1 (bottom) that were registered. The center and right columns contain the registration results of the LDCEIR and the VFIR algorithm, respectively. The top-center and topright panels show the result of transforming image I1 into the shape of I0 using both algorithms. Similarly, the bottom-center and bottom right panels show the result of transforming I0 into I1 . In all four cases, the deformed images closely resemble original target image that they were registered with. This figure shows that both the LDCEIR and the VFIR algorithms did a good job matching the outer contour of the brains and the ventricles, but there are still some mismatch in the cortex. The VFIR algorithm minimized the intensity difference better than the LDCEIR algorithm because it compressed some of the sulci and gyri that did not correspond between the brains into small thread like structures. This behavior is not desirable in regions where the brain structures do not correspond. Figure 3 shows the absolute intensity difference images for the experiment shown in Fig. 2. These images were computed by subtracting the intensities of each deformed image in Fig. 2 from its target image and then taking the absolute value. White in these images correspond to no intensity difference while black corresponds to a large intensity difference. Figure 3 shows that the VFIR algorithm did a better job in minimizing the intensity difference than the LDCEIR algorithm. The average absolute intensity error for the LDCEIR algorithm is 15.1 on the range of 0-255 compared to 9.0 for the VFIR algorithm. Figure 4 shows the natural logarithm of the Jacobian values of the forward and reverse transformations of the LDCEIR and the VFIR algorithms. The intensity range for these images was scaled to the same range of -2.5 – 2.5 for comparison. The results of the LDCEIR algorithm are shown in the left column and the results of the VFIR algorithm are shown in the right column. The logJacobian values ranged from -2.05 to 2.01 for the LDCEIR algorithm and from -2.55 to 3.16 for the VFIR algorithm. This figure shows that the log-Jacobian image of the VFIR algorithm has sharper details than that of the LDCEIR algorithm, meaning that the transformation of the VFIR algorithm is not as smooth as that of the LDCEIR algorithm. In all cases the log-Jacobian images show regions of expansion and contraction of the brain structures. This is particularly evident in the area of the ventricles.
Large Deformation Inverse Consistent Elastic Image Registration Original Images
445
LDCEIR Deformed Images VFIR Deformed Images
Fig. 2. 2D MRI brain image registration experiment: Left column: images I0 (top) and I1 (bottom). Center column: Registration results of LDCEIR algorithm where I1 deformed into shape of I0 (top) and I0 deformed into shape of I1 (bottom). Right column: same as center column except for the VFIR algorithm.
Figure 5 shows the inverse consistency error of the LDCEIR and the VFIR algorithms for the experiment shown in Fig. 2. The inverse consistency error measures how far a point ends up from its original position after it is transformed by the forward and reverse transformations consecutively. These images were produced by applying the concatenated forward and reverse transformations to a rectangular grid image for each method. If the forward and reverse transformations are inverse of each other or have no inverse consistency error, then the concatenated forward and reverse transformations produce the identity mapping and the deformed grid remains undeformed. Thus, the less distortion of the grid image, the better the inverse consistency of the forward and reverse transformation. This figure shows that the inverse consistency of the LDCEIR algorithm is much better than that of the VFIR algorithm. For this experiment, the maximum inverse consistency error was 0.814 pixels for the LDCEIR algorithm and 15.1 pixels for the VFIR.
446
J. He and G.E. Christensen LDCEIR
VFIR
Reverse Registration
Forward Registration
0.0
135.0
Fig. 3. Absolute intensity difference image between original images and deformed images shown in Fig. 2. White corresponds to no intensity difference while black corresponds to a large intensity difference. The average absolute intensity error for the LDCEIR algorithm is 15.1 on the range of 0-255 compared to 9.0 for the VFIR algorithm.
The results of the ten 2D and twelve 3D registration experiments are summarized in the boxplots shown in Fig. 6. The boxplots compares the results of the 2D Elastic, 2D Fluid, 3D Elastic, and 3D Fluid experiments. The top and the bottom of a box are the 75 and the 25 percentiles of the measurements, respectively. The line across the inside of the box shows the median of the measurements. The whiskers show the range of the measurements, excluding outliers. They extend up or down to the extreme values that are less than 1.5H away from the box, where H is the height of the box. Values outside this range are considered outliers and are indicated by the + signs. A small dot at the bottom of the lower whiskers indicates that there is no outlier in the measurements. Figure 6 shows that the RMS intensity error of the LDCEIR algorithm is approximately 50 percent larger than that of the VFIR algorithm for the 2D
Large Deformation Inverse Consistent Elastic Image Registration LDCEIR
447
VFIR
Reverse Registration
Forward Registration
2.5
−2.5
Fig. 4. Log-Jacobian of forward and reverse transformations for the experiment shown in Fig. 2. The range of the log-Jacobian values is from -2.05 to 2.01 for the LDCEIR algorithm and from -2.55 to 3.16 for the VFIR algorithm.
and 3D experiments. However, the inverse consistency error of the LDCEIR algorithm was reduced on average by 50 times in 2D and 30 times in 3D compared to the VFIR algorithm for results with comparable RMS intensity error. In addition, the LDCEIR transformations are smoother than the VFIR algorithm as indicated by the smaller minimum/maximum Jacobian values.
4
Summary and Conclusions
We presented a new large-deformation inverse consistent image registration (LDCEIR) algorithm that accommodates large, non-linear deformations. The LDCEIR algorithm was compared with the viscous fluid intensity registration (VFIR) algorithm in ten 2D and twelve 3D brain MR image registration experiments. The LDCEIR algorithm produced smoother transformation functions compared to the VFIR algorithm as indicated by the smaller maximum and
448
J. He and G.E. Christensen LDCEIR
VFIR
Fig. 5. Inverse consistency error visualized with a deformed grid image for the experiment shown in Fig. 2. The maximum inverse consistency error was 0.814 pixels for the LDCEIR algorithm and 15.1 pixels for the VFIR for this experiment.
RMS Int Err
Max Log Jac
Min Log Jac
Ave ICE
Max ICE 1.8
15
25
6
−1.5
1.6
14
5.5
13
−2
12
1.2
4.5
−2.5
11
15
4
10
−3
1
3.5
9
0.8 10
3
−3.5
8
1.4
20
5
0.6
2.5
7
−4
0.4 5
2
0.2
6 1.5
−4.5
5 2E 2F 3E 3F
0
0 2E 2F 3E 3F
2E 2F 3E 3F
2E 2F 3E 3F
2E 2F 3E 3F
Fig. 6. Box plot of RMS intensity error, minimum/maximum logarithm Jacobian and maximum/average inverse consistency error, where the x-labels 2E and 3E correspond to 2D/3D LDCEIR registration results, respectively, and the x-labels 2F, 3F correspond to 2D/3D VFIR registration results.
minimum log-Jacobian values of the transformations. For these experiments, the inverse consistency error was reduced on average by 50 times in 2D and 30 times in 3D compared to the viscous fluid registration algorithm for results with comparable RMS intensity error.
Large Deformation Inverse Consistent Elastic Image Registration
449
Acknowledgments. This work was supported in part by the NIH under grants NS35368 and DC03590.
References 1. J. Talairach and P. Tournoux, Co-Planar Stereotactic Atlas of the Human Brain, Beorg Thieme Verlag, Stuttgart, 1988. 2. F.L. Bookstein, “Pricipal warps: Thin-plate splines and the decomposition of deformations,” IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 11, pp. 567–585, 1989. 3. P.J. Besl and N.D. McKay, “A method for registration of 3-d shapes,” IEEEMI, vol. 14, no. 2, pp. 239–256, 1992. 4. R.P. Woods, S.T Grafton, C.J. Holmes, S.R. Cherry, and J.C. Mazziotta, “Automated Image Registration: I. General Methods and Intrasubject, Intramodality Validation,” Journal of Computer Assisted Tomography, vol. 22, no. 1, pp. 139–152, 1998. 5. J. Ashburner and K.J. Friston, “Voxel-based morphometry - the methods,” NeuroImage, vol. 11, no. 6, pp. 805–821, 2000. 6. G.E. Christensen and H.J. Johnson, “Consistent image registration,” IEEE Transactions on Medical Imaging, vol. 20, no. 7, pp. 568–582, July 2001. 7. D. Shen and C. Davatzikos, “Hammer: hierarchical attribute matching mechanism for elastic registration,” IEEE Trans. on Medical Imaging, vol. 21, no. 11, pp. 1421–1439, Dec 2002. 8. G.E. Christensen, R.D. Rabbitt, and M.I. Miller, “Deformable templates using large deformation kinematics,” IEEE Transactions on Image Processing, vol. 5, no. 10, pp. 1435–1447, Oct 1996. 9. S.C. Joshi and M. I. Miller, “Landmark matching via large deformation diffeomorphisms,” IEEE Transactions on Image Processing, vol. 9, no. 8, pp. 1357–1370, August 2000. 10. R.D. Rabbitt, J. Wiess, G.E. Christensen, and M.I. Miller, “Mapping of hyperelastic deformable templates using the finite element method,” in Vision Geometry IV, R.A. Melter, A.Y. Wu, F.L. Bookstein, and W.D. Green, Eds., 1995, Proceedings of SPIE Vol. 2573, pp. 252–265. 11. M. Bro-Nielsen and C. Gramkow, “Fast fluid registration of medical images,” in Visualization in Biomedical Computing, Karl H. H¨ ohne and Ron Kikinis, Eds., vol. 1131, pp. 267–276. Springer, Hamburg, Germany, 1996. 12. H. Lester, S.R. Arridge, K.M. Jansons, L. Lemieux, J.V. Hajnal, and A. Oatridge, “Non-linear registration with the variable viscosity fluid algorithm,” in Information Processing in Medical Imaging, A. Kuba and M. Samal, Eds., Berlin, June 1999, LCNS 1613, pp. 238–251, Springer-Verlag. 13. P. Cachier and D. Rey, “Symmetrization of the non-rigid registration problem using inversion-invariant energies: Application to multiple sclerosis,” in MICCAI’00 LNCS 1935, Pittsburgh USA, October 2000, pp. 472–481.
Gaussian Distributions on Lie Groups and Their Application to Statistical Shape Analysis P. Thomas Fletcher, Sarang Joshi, Conglin Lu, and Stephen Pizer Medical Image Display and Analysis Group, University of North Carolina at Chapel Hill [email protected]
Abstract. The Gaussian distribution is the basis for many methods used in the statistical analysis of shape. One such method is principal component analysis, which has proven to be a powerful technique for describing the geometric variability of a population of objects. The Gaussian framework is well understood when the data being studied are elements of a Euclidean vector space. This is the case for geometric objects that are described by landmarks or dense collections of boundary points. We have been using medial representations, or m-reps, for modelling the geometry of anatomical objects. The medial parameters are not elements of a Euclidean space, and thus standard PCA is not applicable. In our previous work we have shown that the m-rep model parameters are instead elements of a Lie group. In this paper we develop the notion of a Gaussian distribution on this Lie group. We then derive the maximum likelihood estimates of the mean and the covariance of this distribution. Analogous to principal component analysis of covariance in Euclidean spaces, we define principal geodesic analysis on Lie groups for the study of anatomical variability in medially-defined objects. Results of applying this framework on a population of hippocampi in a schizophrenia study are presented.
1
Introduction
Shape analysis is emerging as an important area of image processing and computer vision. Model-based approaches [1,2,3] are popular due to their ability to robustly represent objects found in images. Principal component analysis (PCA) [4] is a prevalent technique for describing model variability. However, PCA is only applicable when model parameters are elements of a Euclidean vector space. The focus of our research has been the application of shape analysis for medical image processing to improve both the accuracy of medical diagnosis as well as the understanding of processes behind growth and disease [5]. In our previous work [6] we have developed methodology based on medial descriptions called m-reps to quantify shape variability and explain it in intuitive terms such as local thickness, bending and widening. In this paper we show that m-rep models are elements of a Lie group. We develop Gaussian distributions on this Lie group and derive the maximum likelihood estimates (MLEs) of the mean and covariance. Using these distributions, we introduce principal C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 450–462, 2003. c Springer-Verlag Berlin Heidelberg 2003
Gaussian Distributions on Lie Groups
451
geodesic analysis (PGA), the extension of PCA to Lie groups. We apply this framework to the statistical analysis of shape using medial representations. As the medial representation is fundamental to our analysis, we describe it briefly.
Fig. 1. Medial atom with a cross-section of the boundary surface it implies (left). An m-rep model of a hippocampus and its boundary surface (right).
1.1
M-Rep Overview
The medial representation used is based on the medial axis of Blum [7]. In this framework, a 3D geometric object is represented as a set of connected continuous medial manifolds, which are formed by the centers of all spheres that are interior to the object and tangent to the object’s boundary at two or more points. In this paper we focus on 3D objects that can be represented by a single medial figure. We sample the medial manifold M over a spatially regular lattice. Each sample point also includes first derivative information of the medial position and radius. The elements of this lattice are called medial atoms. A medial atom (Fig. 1) is defined as a 4-tuple m = {x, r, F, θ}, consisting of: x ∈ R3 , the center of the inscribed sphere, r ∈ R+ , the local width defined as the radius of the sphere, F ∈ SO(3) an orthonormal local frame parameterized by (b, b⊥ , n), where n is the normal to the medial manifold, b is the direction in the tangent plane of the fastest narrowing of the implied boundary sections, and θ ∈ [0, π) the object angle determining the angulation of the implied sections of boundary relative to b. The medial atom implies two opposing boundary points, y0 , y1 , with respective boundary normals, n0 , n1 , which are given by n0 = cos(θ)b − sin(θ)n, y0 = x + rn0 ,
n1 = cos(θ)b + sin(θ)n, y1 = x + rn1 .
(1)
For three dimensional slab-like figures (Fig. 1) the lattice of medial atoms is a quadrilateral mesh mij , (i, j) ∈ [1, m] × [1, n]. The sampling density of medial atoms in a lattice is inversely proportional to the radius of the medial description. Given an mrep figure, we fit a smooth boundary surface to the model. We use a subdivision surface method [8] that interpolates the boundary positions and normals implied by each atom.
452
1.2
P.T. Fletcher et al.
Lie Groups
Here we present a brief overview of Lie groups. For a detailed treatment see [9]. A Lie group G is a differentiable manifold that also forms an algebraic group, where the two group operations, µ : (x, y) → xy ι:x →x
−1
:
G×G→G
Multiplication,
:
G→G
Inverse,
are differentiable mappings. Let e denote the identity element of a Lie group G. The tangent space at e, Te G, forms a Lie algebra, which we will denote by g. The exponential map, exp : g → G, provides a method for mapping vectors in the tangent space Te G into G. Given a vector v ∈ g, the point exp(v) ∈ G is obtained by flowing to time 1 along the unique geodesic emanating from e with initial velocity vector v. The exponential map is a diffeomorphism of a neighborhood of 0 in g with a neighborhood of e in G. The inverse of the exponential map is called the log map. The geodesic distance between two points g, h ∈ G is given by || log(g −1 h)||. 1.3
Discrete M-Rep as a Point on a Lie Group
We now show that a set of medial atoms defining an m-rep object can be represented as a point on a Lie group.A medial atom’s position is an element of R3 , which is a standard Lie group under vector addition. The radius parameter is an element of the multiplicative Lie group of positive reals. The medial atom’s frame is a 3D rotation, and the object angle is a 2D rotation. Both SO(2) and SO(3) are Lie groups under the composition of rotations. Thus, the set of all medial atoms forms a group M = R3 × R+ × SO(3) × SO(2), which we call the medial group. Since M is the direct product of four Lie groups, it also is a Lie group. Now consider the set of m-rep models that consist of a m × n grid of medial atoms. These models form the space M mn . Since this is simply the direct product of mn copies of M , it is a Lie group. Now, given the medial descriptions of a population of objects, we may consider each geometric model as a point on the Lie group M mn . 1.4
Matrix Groups
The most common examples of Lie groups, and those which have the greatest application to computer vision, are the matrix groups [10]. These are all subgroups of the general linear group GL(n, R), the group of nonsingular n × n real matrices. The Lie algebra associated with GL(n, R) is L(Rn , Rn ), the set of all n×n real matrices. The exponential map of a matrix X ∈ L(Rn , Rn ) is the standard matrix exponent defined by the infinite series exp(X) =
∞ 1 k X . k!
k=0
(2)
Gaussian Distributions on Lie Groups
453
It is well-known that the rotation groups SO(2) and SO(3) are matrix subgroups of GL(2, R) and GL(3, R), respectively. Related work includes the statistical analysis of directional data [11], the study of shape spaces as complex projective spaces [12], and Monte Carlo estimation on Lie groups [13]. The 2D rotation group, SO(2), has corresponding Lie algebra so(2), the set of 2 × 2 skew-symmetric matrices. Likewise, the Lie algebra for the 3D rotation group, SO(3), is the set of 3 × 3 skew-symmetric matrices, so(3). We will use the notation 0 −v1 v2 0 −θ Aθ = , Av = v1 0 −v3 , θ 0 −v2 v3 0 for elements of so(2) and so(3), respectively, where θ ∈ [0, 2π), and v = (v1 , v2 , v3 ) ∈ R3 . Here, θ represents the angle of rotation in the plane. For 3D rotations the normalized v ¯ = ||v|| is an axis of rotation, and the angle of rotation about that axis is ||v||. vector v The exponential map for so(2) takes the form exp(Aθ ) = Rθ , where Rθ is the matrix for a 2D rotation by θ. The exponential map for so(3) is given by Rodrigues’ formula [14] I3 , θ = 0, exp(Av ) = (3) sin θ 1 − cos θ I3 + Av + Av 2 , θ ∈ (0, π), θ θ2 1 tr(Av T Av ) = ||v|| in [0, π). where θ = 2 Also, the logarithm for a matrix R ∈ SO(3) is the matrix in so(3) given by 0, θ = 0, log(R) = (4) θ T (R − R ), |θ| ∈ (0, π), 2 sin θ where θ satisfies tr(R) = 2 cos θ + 1. 1.5 The Exponential and Log Maps for M-Reps Now we are ready to define the exponential and log maps for the medial group M . The Lie algebra of M is the product space m = R3 ×R×so(3)×so(2). The exponential map for R3 is the identity map, and the exponential map for R is the familiar real exponential function. Combined with the exponential maps for the rotation groups given above, the exponential map for the medial group M is exp : m → M → (x, eρ , exp(Av ), exp(Aθ )), : (x, ρ, Av , Aθ ) where we have abused notation by reusing exp, but it is clear which exponential map we mean by the context. The corresponding log map is log : M → m : (x, r, F, Rθ ) → (x, log(r), log(F), log(Rθ )).
454
2
P.T. Fletcher et al.
Gaussian Distributions on Lie Groups
In this section we develop Gaussian distributions on M n , the Lie group of m-rep figures with n atoms. We begin by developing Gaussian distributions on each of the factors in the product space M = R3 ×R+ ×SO(3)×SO(2). We define Gaussian distributions on Lie groups following Grenander [15]. A Gaussian distribution on a Lie group with mean at the identity element is a solution to the heat equation defined in the local coordinates of the Lie group: ∂f = ∆f = div(gradf ) ∂t ∂f ∂2f = g ij ( i j − Γijk k ), ∂x ∂x ∂x where g ij are the components of the inverse of the Riemannian metric, and Γijk are the Christoffel symbols [16]. Indeed, the Gaussian distribution in Rn , given by the density
(x − µ)T Σ −1 (x − µ) 1 p(x) = exp − , 2 (2π)n |Σ|
(5)
is the solution of the heat equation in Rn . The case n = 3 gives the Gaussian distribution for medial atom positions. 2.1
Gaussian Distributions on R+
For the Lie group of positive reals under multiplication, local coordinates are given by the logarithm. The solution to the heat equation on R+ is given by the lognormal density: p(x) = √
(log x − log µ)2 1 exp − . 2σ 2 2πσx
(6)
Given samples x1 , . . . , xN ∈ R+ that are independently distributed by the lognormal distribution, the maximum likelihood estimates for the mean and variance are µ ˆ=
N
xi
i=1
N1
,
N 1 σ ˆ = (log xi − log µ ˆ )2 . N i=1 2
Notice that µ ˆ, given by the geometric average, is thepoint that minimizes the sum-ofn squared geodesic distances in R+ , i.e., it minimizes i=1 log(µ−1 xi )2 . 2.2
Gaussian Distributions on SO(2)
Consider the parametrization of SO(2) by the rotation angle θ ∈ [0, 2π). Notice that → eiθ . The Gaussian SO(2) is isomorphic to the unit circle S 1 via the mapping θ distribution on SO(2) with mean µ and standard deviation σ is given by p(θ) = √
∞
(θ − µ − 2πk)2 1 , exp − 2σ 2 2πσ k=−∞
(7)
Gaussian Distributions on Lie Groups
455
Fig. 2. Solution to the heat equation, cyclic on [−π, π] (left). Wrapped Gaussian over the unit circle with σ = 1, 0.5, and 0.25 (right).
which solves the heat equation with cyclic boundary conditions on [−π, π]. This is sometimes known as the “wrapped Gaussian” on the circle (Fig. 2.2). We now derive the maximum likelihood estimate for the mean and covariance, given samples θi ∈ [0, 2π), i = 1, . . . , N that are independently distributed according to (7). To begin we assume that σ = 1, and we find the maximum likelihood estimate of the mean, which is given by µ ˆ = arg max µ∈[0,2π)
∞
(θ − µ − 2πk)2 1 i √ exp − . 2 2π i=1 k=−∞ N
(8)
Notice that since the quadratic exponential is an even function, its derivative is odd, and
(θ − µ − 2πk)2
(θ − µ + 2πk)2 ∂ ∂ i i exp − =− exp − ∂µ 2 ∂µ 2 for a fixed integer k. Thus the derivative of the summation in (8) reduces to just the k = 0 term, and the maximization problem becomes µ ˆ = arg max µ∈[0,2π)
(θ − µ)2 1 √ exp − i . 2 2π i=1 N
(9)
This is just the equation for the maximum likelihood estimate of the mean for the N Gaussian distribution. Therefore, we have µ ˆ = N1 i=1 θi . This equation can lead to ambiguities (see [11]), due to the multiple possible representations for the angles θi , e.g. we may take θi ∈ [0, 2π) or θi ∈ [−π, π). However, medial atom object angles always lie within [0, π), and thus this ambiguity does not arise (see [17]). For deriving the MLE of variance, consider the log-likelihood 1 + log l(σ; µ ˆ, θ1 , . . . , θN ) = log √ 2πσ i=1 N
∞ k=−∞
ˆ − 2πk)2 (θi − µ . exp − 2σ 2
456
P.T. Fletcher et al.
Differentiation with respect to σ gives N N 1 ∂l N 1 2 (θi − µ ˆ) + 3 =− + 3 ∂σ σ σ i=1 σ i=1
∞
(θi −ˆµ−2πk)2 2 k=−∞ (2πk) exp − 2σ 2 (θi −ˆµ−2πk)2 ∞ k=−∞ exp − 2σ 2
.
The third term above, although it converges, does not yield a closed-form solution. However, according to Mardia [11], the wrapped Gaussian density (7) is well approximated by just the k = 0 term of the summation when σ 2 ≤ 2π. This is certainly the case for medial atom object angles, which are tightly distributed. Thus we keep only the k = 0 term in the above summation, implying that N N 1 ∂l ≈− + 3 (θi − µ ˆ)2 . ∂σ σ σ i=1
Setting the above equation to zero and solving for σ, we get the approximated MLE of the variance as σ ˆ2 =
N 1 (θi − µ ˆ)2 . N i=1
(10)
2.3 The Wrapped Gaussian Distribution on SO(3) Analogous to SO(2) we use the log map to define a wrapped Gaussian distribution on SO(3) with mean µ. We note that this density is not necessarily a solution of the heat equation on SO(3). Let u(x) = Φ(log(µ−1 x)) ∈ R3 , where Φ : so(3) → R3 , u(x) Φ(Av ) = v, is the canonical isomorphism. Let u ¯ (x) = |u(x)| . Following (7) the wrapped Gaussian density on SO(3) becomes
1 u(x))T Σ −1 (u(x) − 2πk¯ p(x) = exp − (u(x) − 2πk¯ u(x)) . 2 (2π)3 |Σ| k=−∞ 1
∞
(11) Here µ ∈ SO(3) is the mean rotation, and the covariance structure is defined as a quadratic form on the Lie algebra so(3), represented as the 3 × 3 covariance matrix Σ. Given samples x1 , . . . , xN ∈ SO(3) independently distributed according to the density (11), we derive the maximum likelihood estimate for the mean and covariance. Focusing on the MLE of the mean, we may assume, without loss of generality, that the covariance is identity. The joint density is given by the product density ∞
1 1 exp − ||u(xi ) − 2πk¯ u(xi )||2 . p(µ; x1 , . . . , xN ) = 2 (2π)3 k=−∞ i=1 N
(12)
Notice that geodesics of SO(3) are isomorphic to SO(2), and the density (11) restricted to a geodesic reduces to the wrapped Gaussian on SO(2). Now we can use the
Gaussian Distributions on Lie Groups
457
same argument from the previous section to show that the derivatives in the summation cancel out. Therefore, maximizing p(µ; x1 , . . . , xN ) is equivalent to finding µ ˆ = arg max
N
µ∈SO(3) i=1
1 exp − ||u(xi )||2 . 2 (2π)3 1
(13)
Now, taking the log of the left hand side of (13), the MLE of the mean becomes µ ˆ = arg min
N
µ∈SO(3) i=1
||Φ(log(µ−1 xi ))||2 .
(14)
Notice that ||Φ(log(µ−1 xi ))|| is the Riemannian distance from µ to xi . Hence, the MLE of the mean is also the point that minimizes the sum-of-squared geodesic distances to the samples. This is also referred to as the intrinsic mean on SO(3) [14]. An iterative algorithm for computing the intrinsic mean is given in [17]. As in the case for SO(2), we assume that the variance is sufficiently small, that is, λ ≤ 2π, where λ is an eigenvalue of Σ. Using the same argument as in the previous section, the approximated maximum likelihood estimate of the covariance is N ˆ= 1 Φ(log(ˆ µ−1 xi ))Φ(log(ˆ µ−1 xi ))T . Σ N i=1
2.4
(15)
Gaussian Distributions on the Medial Group
We now combine the distributions developed on the factors R3 , R+ , SO(3), and SO(2) to define a Gaussian distribution on the Lie group M . As M is a direct product of these Lie groups, the Gaussian distribution on M is the product distribution given by p(x) =
∞
1 (2π)4 |Σ|
1 2
exp
k=−∞
−
1 u(x))) . (u(x) − 2πkρ(¯ u(x)))T Σ −1 (u(x) − 2πkρ(¯ 2
Here u(x) = log(µ−1 x) ∈ m is represented as an 8-vector. The covariance Σ is a quadratic form on the Lie algebra m, represented as an 8 × 8 matrix. As only the SO(3) and SO(2) distributions are cyclic, the operator ρ : m → m, ρ((x, log r, Av , Aθ )) = (0, 0, Av , Aθ ), causes wrapping to occur only in the rotation components. Having defined the Gaussian distribution on a single medial atom, the Gaussian distribution of a figure having n medial atoms is the n-fold product distribution on M n , defined by p(x) =
∞
1 (2π)4n |Σ|
1 2
1 u(x)))T Σ −1 (u(x) − 2πkρ(¯ exp − (u(x) − 2πkρ(¯ u(x))) . 2
k=−∞
Now the vectors u(x) = log(µ−1 x) ∈ mn are 8n-dimensional, i.e., they are the concatenation of n vectors in m, representing n medial atoms. The covariance Σ is a quadratic form on mn , represented as an 8n × 8n matrix, and the operator ρ projects onto each of the n copies of the rotation groups.
458
P.T. Fletcher et al.
The maximum likelihood estimates for the combined product distribution follow from our development of the maximum likelihood estimates for the individual factors. Recall that the MLE for the mean for each factor is the point that minimizes the sum-of-squared geodesic distances to the sample points. Therefore, the MLE of the mean for samples in the product space is also the minimizer of sum-of-squared geodesic distances. The MLE of the covariance, with the discussed approximations in the rotation dimensions, (10), (15), is the sample covariance matrix in the Lie algebra mn . Using an extension of the algorithm in [17] for the intrinsic mean on SO(3), the intrinsic mean of a collection of m-rep figures with n atoms, M1 , . . . , MN ∈ M n , is computed by Algorithm: M-rep Mean Input: M1 , . . . , MN ∈ M n Output: µ ∈ M n , the mean m-rep µ = M1 Do ∆Mi = µ−1 Mi N ∆µ = exp( N1 i=1 log(∆Mi )) µ = µ∆µ While || log(∆µ)|| > .
3
Principal Geodesic Analysis
Principal component analysis in Rn is a powerful technique for analyzing population variation. Principal components of Gaussian data in Rn are defined as the projection onto the linear subspace spanned by the eigenvectors of the covariance matrix. If we consider a general manifold, the counterpart of a line is a geodesic curve, that is, a curve which minimizes length between two points. In the Lie group M n geodesics can be computed via the exponential map. Given a tangent vector v in the Lie algebra mn , the geodesic starting at the identity with initial velocity v is given by γ : R → M n , where γ(t) = exp(tv). Similarly, the curve x · γ(t) = x · exp(tv) is a geodesic starting at the point x ∈ M n . Since the covariance matrix Σ is a quadratic form on mn , its eigenvectors are vectors in the Lie algebra mn . These eigenvectors correspond via the exponential map to geodesics on M n , called principal geodesics. The principal geodesic analysis (PGA) on a population of m-rep figures, Mi , . . . , MN ∈ M n , is computed by an eigenanalysis of the MLE of the covariance developed above. Thus we have Algorithm: M-rep PGA Input: M-rep models, M1 , . . . , MN ∈ M n Output: Principal directions, u(k) ∈ mn Variances, λk ∈ R µ = mean of {Mi } xi = log(µ−1 Mi ) N S = N1 i=1 xi xTi {u(k) , λk } = eigenvectors/eigenvalues of S.
Gaussian Distributions on Lie Groups
459
Analogous to linear PCA models, we may choose a subset of the principal directions u(k) ∈ mn that is sufficient to describe the variability of the m-rep shape space. New m-rep models may be generated within this subspace of typical objects. Given a set of coefficients {α1 , . . . , αl }, we generate a new m-rep model by M = µ exp √
l
αk u(k) ,
k=1
√
where αk is chosen to be within [−3 λk , 3 λk ].
4
M-Rep Alignment
When applying the above theory for computing means and covariances of real anatomical objects, it is necessary to first globally align the shapes to a common position, orientation, and scale. For objects described by boundary points, the standard method for alignment is the Procrustes method [18]. Procrustes alignment minimizes the sum-of-squared distances between corresponding points. We now develop an analogous alignment procedure based on minimizing sum-of-squared geodesic distances on M n , the Lie group of m-rep objects with n atoms. Let S = (s, R, w) denote a similarity transformation in R3 consisting of a scaling by s ∈ R+ , a rotation by R ∈ SO(3), and a translation by w ∈ R3 . We define the action of S on a medial atom m = (x, r, F, θ) by S · m = S · (x, r, F, θ) = (sR · x + w, sr, RF, θ).
(16)
Now the action of S on an m-rep object M = {mi : i = 1, . . . , n} is simply the application of S to each of M’s medial atoms: S · M = {S · mi : i = 1, . . . , n}.
(17)
It is easy to check from (1) that this action of S on M also transforms the implied boundary points of M by the similarity transformation S. Consider a collection M1 , . . . , MN ∈ M n of m-rep objects to be aligned, each consisting of n medial atoms. We write mij = (xij , rij , Fij , θij ) to denote the jth medial atom in the ith m-rep object. Notice that the m-rep parameters, which are positions, rotations, and scalings, are in different units. Before we apply PGA to the m-reps, it is necessary to make the various parameters commensurate. This is done in the Lie algebra by scaling the log rotations and log radii by the average radius value of the corresponding medial atoms. The squared-distance metric between two m-rep models Mi and Mj becomes 2
d(Mi , Mj ) =
n k=1
2 (|xjk − xik |2 + r¯k2 (log rjk − log rik )2 + r¯k2 | log(F−1 ik Fjk )| ),
(18) where r¯k is the radius of the kth atom in the mean m-rep. Notice in (16) that the object angle θ is unchanged by a similarity transformation. Thus, the object angles do not appear in the distance metric (18).
460
P.T. Fletcher et al.
The m-rep alignment algorithm finds the set of similarity transforms S1 , . . . , SN that minimize the total sum-of-squared distances between the m-rep figures: d(S1 , . . . , SN ; M1 , . . . , MN ) =
N i
d(Si · Mi , Sj · Mj )2 .
(19)
i=1 j=1
Following the algorithm for generalized Procrustes analysis for objects in R3 , minimization of (19) proceeds in stages: Algorithm: M-rep Alignment 1. Translations. First, the translational part of each Si in (19) is minimized once and for all by centering each m-rep model. That is, each model is translated so that the average of it’s medial atoms’ positions is the origin. 2. Rotations and Scaling. The ith model, Mi , is aligned to the mean of the remaining models, denoted µi . The alignment is accomplished by a gradient descent algorithm on SO(3) × R+ to minimize d(µi , Si · Mi )2 . This is done for each of the N models. 3. Iterate. Step 2 is repeated until the metric (19) cannot be further minimized.
5
Results
In this section we present the results of applying our PGA method to a population of 86 m-rep models of the hippocampus from a schizophrenia study. The m-rep models were automatically generated by the method described in [19], which chooses the medial topology and sampling that is sufficient to represent the population of objects. The models were fit to expert segmentations of the hippocampi from MRI data. The sampling on each m-rep was 3 × 8, making each model a point on the Lie group M 24 . First, the m-rep figures were aligned by the algorithm in §4. The overlayed medial atom centers of the resulting aligned m-reps are shown in Fig. 3. Next, the intrinsic mean m-rep hippocampus was computed (Fig. 3). Finally, PGA was performed on the m-rep figures. The first three modes of variation are shown in Fig. 3.
6
Conclusions
We present a new approach to describing shape variability called principal geodesic analysis. This approach is based on the maximum likelihood estimates of mean and covariance for Gaussian distributions on Lie groups. We expect that the methods presented in this paper will have application beyond m-reps. Lie group PGA is a promising technique for describing the variability of models that include nonlinear information, such as rotations and magnifications. Acknowledgements. We acknowledge Jeffrey Lieberman and Guido Gerig, UNC Psychiatry, for providing hippocampal data from a clinical Schizophrenia study which was supported by Stanley Foundation and by the UNC-MHNCRC (MH33127). We thank Jeff Townsend for providing the m-rep segmentations of the data. We would like to thank Ulf Grenander, Anuj Srivastava, and Amirjit Budhiraja for useful discussions regarding statistics on Lie groups. This work was done with support from NCI grant P01 CA47982.
Gaussian Distributions on Lie Groups
461
Fig. 3. The surface of the mean hippocampus m-rep (top left). The 86 aligned hippocampus m-reps, shown as overlayed medial atom centers (bottom left). The first three PGA modes of variation for the hippocampus √ m-reps (right). From left to right are the PGA deformations for −3, −1.5, 1.5, and 3 times λi .
References [1] Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models - their training and application. Computer Vision and Image Understanding 61 (1995) 38–59 [2] Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. In: Fifth European Conference on Computer Vision. (1998) 484–498 [3] Kelemen, A., Sz´ekely, G., Gerig, G.: Three-dimensional model-based segmentation. Transactions on Medical Imaging 18 (1999) 828–839 [4] Jolliffe, I.T.: Principal Component Analysis. Springer-Verlag (1986) [5] Csernansky, J., Joshi, S., Wang, L., Haller, J., Gado, M., Miller, J., Grenander, U., Miller, M.: Hippocampal morphometry in schizophrenia via high dimensional brain mapping. In: Proceedings National Academy of Sciences. (1998) 11406–11411 [6] Joshi, S., Pizer, S., Fletcher, P.T., Yushkevich, P., Thall, A., Marron, J.S.: Multiscale deformable model segmentation and statistical shape analysis using medial descriptions. Transactions on Medical Imaging 21 (2002) [7] Blum, H., Nagel, R.: Shape description using weighted symmetric axis features. Pattern Recognition 10 (1978) 167–180 [8] Thall, A.: Fast C 2 interpolating subdivision surfaces using iterative inversion of stationary subdivision rules. (2002) http://midag.cs.unc.edu/pub/papers/Thall TR02-001.pdf. [9] Duistermaat, J.J., Kolk, J.A.C.: Lie Groups. Springer (2000) [10] Curtis, M.L.: Matrix Groups. Springer-Verlag (1984) [11] Mardia, K.V.: Directional Statistics. John Wiley and Sons (1999) [12] Kendall, D.G.: Shape manifolds, Procrustean metrics, and complex projective spaces. Bulletin of the London Mathematical Society 16 (1984) 18–121 [13] Srivastava,A., Klassen, E.: Monte-Carlo extrinsic estimators of manifold-valued parameters. IEEE Transactions on Signal Processing 50 (2001) 299–308 [14] Moakher, M.: Means and averaging in the group of rotations. SIAM Journal on Matrix Analysis and Applications 24 (2002) 1–16 [15] Grenander, U.: Probabilities on Algebraic Structures. John Wiley and Sons (1963) [16] Lee, J.M.: Riemannian Manifolds: An Introduction to Curvature. Springer-Verlag (1997) [17] Buss, S.R., Fillmore, J.P.: Spherical averages and applications to spherical splines and interpolation. ACM Transactions on Graphics 20 (2001) 95–126
462
P.T. Fletcher et al.
[18] Goodall, C.: Procrustes methods in the statistical analysis of shape. Journal of the Royal Statistical Society 53 (1991) 285–339 [19] Styner, M., Gerig, G.: Medial models incorporating object variability for 3D shape analysis. In: Information Processing in Medical Imaging. (2001) 502–516
Non-rigid Image Registration Using a Statistical Spline Deformation Model Dirk Loeckx, Frederik Maes , Dirk Vandermeulen, and Paul Suetens Medical Image Computing (Radiology–ESAT/PSI), Faculties of Medicine and Engineering, University Hospital Gasthuisberg, Herestraat 49, B-3000 Leuven, Belgium. [email protected]
Abstract. We propose a statistical spline deformation model (SSDM) as a method to solve non-rigid image registration. Within this model, the deformation is expressed using a statistically trained B-spline deformation mesh. The model is trained by principal component analysis of a training set. This approach allows to reduce the number of degrees of freedom needed for non-rigid registration by only retaining the most significant modes of variation observed in the training set. User-defined transformation components, like affine modes, are merged with the principal components into a unified framework. Optimization proceeds along the transformation components rather then along the individual spline coefficients. The concept of SSDM’s is applied to the temporal registration of thorax CR-images using pattern intensity as the registration measure. Our results show that, using 30 training pairs, a reduction of 33% is possible in the number of degrees of freedom without deterioration of the result. The same accuracy as without SSDM’s is still achieved after a reduction up to 66% of the degrees of freedom.
1
Introduction
Image registration involves finding a coordinate transformation that maps each point in one image onto its geometrically corresponding point in the other, such that information contained in each image about a particular object of interest can be compared or combined into a single representation. Retrospective registration aims at computing such a transformation from the image content itself by optimization of an appropriate similarity measure that evaluates the alignment of corresponding image features, typically landmark points, object surfaces or individual voxels. While rigid body or affine registration only compensates for overall differences in pose or size between corresponding objects in the images to be registered by global translation, rotation, scaling and skew, defined by a small number of registration parameters, non-rigid registration allows to recover local deformations between both images, involving a much larger number of degrees of
Frederik Maes is Postdoctoral Fellow of the Fund for Scientific Research - Flanders (FWO-Vlaanderen, Belgium).
C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 463–474, 2003. c Springer-Verlag Berlin Heidelberg 2003
464
D. Loeckx et al.
freedom. In its most general form, non-rigid image registration models the registration transformation between both images as a vector field of displacement vectors from each voxel in the first image onto the corresponding location in the other. However, because not all deformations are physically feasible or realistic and because the image intensity information may be insufficient to unambiguously define the deformation field in each voxel (in homogeneous image regions or along object boundaries for instance), regularization of the deformation field is needed to impose local consistency or smoothness on the deformation and to propagate the registration result from areas with salient registration evidence into areas where registration features are largely absent. Such regularization approaches include: parametrization of the deformation field as a weighted sum of smooth basis functions with either local (e.g. B-spline [1]) or global (e.g. thin plate spline [2]) support; modelling of the image to be deformed as a physical medium with appropriate material properties (typically elastic [3] or viscous fluid [4]) whose deformation is governed by its equation of motion (typically elasticity or Navier-Stokes equations); or the use of a biomechanical model that allows to include tissue-specific deformation properties [5]. In this paper, we propose a statistical deformation model that constrains the parameters of a spline-based deformation field based on their likelihood as derived from a training set of similar non-rigid registration cases. Each pair of images in the training set is registered first without constraining the spline parameters. Correlation between the various degrees of freedom is subsequently exposed by principal component analysis (PCA) of the spline parameters and the major modes of deformation, explaining most of the deformation variability in the training set, are extracted and used to re-parameterize the deformation field as a linear combination thereof. This modelling approach is largely similar to that presented previously by Rueckert [6]. Yet new to our approach is to optimize in the re-parameterized field. This makes it possible to reduce the number of registration parameters while maintaining sufficient local variability (if it is assumed that the training set is sufficiently representative) by including the most relevant deformation modes only. It is also expected to increase robustness and to reduce computation time of registration of new similar cases not included in the training set. We have implemented and validated this approach for 2-D non-rigid matching of digital radiographs of the thorax of the same subject acquired at different time points. We evaluate the impact of the statistical model on the robustness of a spline-based deformation field whose parameters are optimized using pattern intensity as similarity measure [7] and investigate the influence of the number of deformation modes that is included in the model on registration quality. Our results show that, using 30 training pairs, a reduction of 33% is possible in the number of degrees of freedom without deterioration of the registration accuracy. The same accuracy as without SSDM’s is still achieved after a reduction up to 66% of the degrees of freedom.
Non-rigid Image Registration
2
465
Statistical Spline Deformation Model (SSDM)
2.1
Spline Registration
Our parametrization of the deformation field adopts the B-spline model introduced by Rueckert [1]. Because our application involves non-rigid matching of 2D radiographs, our description of the model is worked out in 2D, although extension to 3 or more dimensions is straightforward. A 2D transformation spline f x,y (x, y), where f x is the transformation in the x-direction and f y in the y direction, of order k, with two n × n meshes of 2D control points axij , ayij and two (n + k) knot vectors T x , T y , defining a (n + k) × (n + k) mesh of knots, is given by f x,y (x, y) =
n n i=1 j=1
Bi,k (x|T x ) Bj,k (y|T y ) ax,y ij
(1)
where
ti ≤ x < ti+1 and ti < ti+1 otherwise ti+k −x x−ti Bi,k (x) = ti+k−1 B −ti i,k−1 (x) + ti+k −ti+1 Bi+1,k−1 (x) Bi,1 (x) =
1 0
(2)
Increasing the order or the number of control points of the B-spline, will increase the resolution and locality of the transformation. B-splines are easily scalable and have good multiresolution properties. A small number of knots allows the modelling of global deformations, while a large number of knots allows the modelling of local deformations. B-splines have local support: the range of one control point extends only over the k × k surrounding knots. To use tensor product B-splines as deformation model, the mesh is placed over an image and the transformation in each cell of the mesh is modelled by the surrounding control points. This way the resolution of the transformation can be chosen as necessary and the transformation model is independent of explicit landmarks. Yet a main disadvantage is that all control points move independently of each other. Therefore, in regions with a lack of registration features, the transformation might be undefined. Also, the spline control points are moved one by one, in a mathematical way rather than in a physical way. Therefore, not only the probable correlation between neighbouring control points is neglected, but also a long registration time is needed. 2.2
Statistical Modelling
To resolve those problems we perform a statistical analysis on the spline control points obtained from a training set of similar images, that have been registered previously, to find the most prominent transformation components. These are subsequently used to guide the registration process in new cases to be registered.
466
D. Loeckx et al.
This way, the transformation in underdetermined regions will be estimated from the trained correlation with the rest of the image. The principal transformation components model the transformation according to the actually occurring deformations, allowing for a more physical approach of the optimization. The initial registration of the training set can be performed manually, by interactive displacement of each spline control point with visual feedback of registration accuracy, or automatically, by optimizing the control point positions using a suitable intensity-based registration criterion without a priori constraining the spline’s parameters. This way, s instances of the spline control points a are obtained. Those instances are vectorized as (ax11 , ay11 ) . . . (ax1n , ay1n ) .. .. .. ai = vect . . . y x x y (an1 , an1 ) . . . (ann , ann )
(3)
= (ax11 , ay11 , . . . , ax1n , ay1n , . . . , axn1 , ayn1 , . . . , axnn , aynn )
T
for i = 1, 2, . . . , s. These vectors merge into A = [a1 | . . . |as ], and their principal components are computed as the eigenvectors of the covariance matrix S: ¯ a= S=
1 s−1
s i=1
1 s
s i=1
ai T
(ai − ¯ a)(ai − ¯ a)
(4)
with corresponding eigenvalues λj ≥ λj+1 . Each ai in the training set can be approximated by ai ≈ ¯ a + Φbi
(5)
a) a with Φ = [φ1 | . . . |φt ] containing the first t eigenvectors and b = ΦT (ai − ¯ t dimensional vector. Because the first eigenvectors correspond to the highest eigenvalues, most of the variation is contained therein and thus modelled by the first elements of b. If all eigenvectors are included in Φ, i.e. if t = n2 , the approximation in (5) becomes an equality. A new transformation T that mimics those observed in the training set can be constructed as T
a + Φbnew anew = vect (T) = (ax11 , . . . , aynn ) ≈ ¯
(6)
and thus will be defined by bnew . Because the components in Φ are in decreasing order of importance, the first elements of b will have a larger impact on the transformation than the later ones.
Non-rigid Image Registration
2.3
467
User-Defined Transformation Components
PCA decomposes the transformations in the training set into orthogonal modes of decreasing importance. Yet, sometimes it is preferable to manually impose some transformations, for instance to discriminate between pose and shape. To eliminate differences in overall pose (translation, scaling) from the statistical analysis, we introduce explicit basis vectors for the first-order approximation of the affine modes on the spline, assuming a square 3 × 3 mesh of control points.
(1, 0) (1, 0) (1, 0) tx = vect (1, 0) (1, 0) (1, 0) (1, 0) (1, 0) (1, 0) T = (1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0)
(0, 1) (0, 1) (0, 1) ty = vect (0, 1) (0, 1) (0, 1) (0, 1) (0, 1) (0, 1) T = (0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1)
(7)
(−1, −1) (0, −1) (1, −1) s = vect (−1, 0) (0, 0) (1, 0) (−1, 1) (0, 1) (1, 1) T = (−1, −1, 0, −1, 1, −1, −1, 0, 0, 0, 1, 0, −1, 1, 0, 1, 1, 1) These orthogonal vectors are normalized and combined to form an affine basis Aaff = [tx |ty |s]
(8)
which is subsequently projected out of the matrix A: Ahand = A − Aaff ATaff A
(9)
Finally, the principal components Φ of Ahand are calculated as before. The variance of the components obtained from the PCA analysis can be found in the eigenvalues, but to know the variance of all components, including the explicit basis vectors, the variation of bi = ΦT (xi − x ¯ ) over the training set is calculated directly. ¯= b s2b =
1 s
s j=1
1 s−1
bj
s
j=1
¯ 2 bj − b
(10)
468
2.4
D. Loeckx et al.
Optimization
To take full advantage of the statistical analysis performed in the previous section, we optimize over the coefficients for the principal components rather then over the spline parameters. I.e., rather then expressing the similarity measure C as C = f (I1 , I2 , ax11 , . . . , aynn )
(11)
and optimizing by varying ax11 , . . . , aynn , we look for the optimum of C = f (I1 , I2 , b1 , . . . , bt )
(12)
by optimizing b1 , . . . , bt . This way we can freely choose the number of degrees of freedom over which we will optimize by choosing t. Also, limits of b can be specified either in absolute units or in units of standard deviations.
3
Application: Temporal Registration of Thorax CR-Images
Temporal subtraction is a technique in which a previous recording and a current recording of one patient are subtracted from each other, after proper alignment and warping. The goal of temporal subtraction is to increase the visibility of interval changes. We applied the statistical spline deformation model to the temporal registration of thorax CR-images using a training set of 30 image pairs and a validation set of 46 other pairs of different subjects. The radiographs are extracted from the PACS (Picture Archiving and Communication System) of the University Hospital Gasthuisberg of the Katholieke Universiteit Leuven in Leuven, Belgium. They were recorded for different clinical studies on different X-ray imaging systems using normal recording settings. The resolution and gray level range of the CR images are about 2500×2500 pixels and 12 bits respectively. The registration software is implemented and ran on a Dell Precision 530 workstation with dual Xeon 2.2 Ghz/512k processors in Matlab 6.5 release 13 for Linux [8], while the most time-critical procedures are coded in C++. 3.1
Spline Model
Our experiments with manual registration of temporal pairs of chest radiographs indicate that a 2D tensor product B-spline of the third order with 6 × 6 knots (2 knots of multiplicity 3 in each dimension) and thus 3 × 3 control points allows sufficient degrees of freedom to achieve an adequate registration. The knots are placed ten pixels outside of the corners of the rectangle enclosing both ROI’s. The spline control point configuration is illustrated in Fig. 2. As different control points for the horizontal and vertical transformation field are needed, 18
Non-rigid Image Registration
469
parameters are used. These are manually adjusted and interactively optimized for a training set of 30 image pairs. The SSDM as explained in section 2 is derived from the manual registration set of 30 image pairs, using Aaff = [tx |ty |s] as explicit basis vectors. Rotation was also considered, but it didn’t yield a better result. The five most significant transformation components and their contributions to the total variation are shown in Fig. 1. The explicit basis vectors cover 72.2% of the variation.
100
100%
80
80%
60
60%
40
40%
20
20%
0
Y
1
X
2
S
(a)
0%
(b)
Fig. 1. (a) Percent variability explained by each component. X, Y and S stand for the horizontal and vertical translation and scale respectively. Numbers indicate the PCA components. (b) First five transformation components, from top to bottom: Horizontal translation, vertical translation, scaling, and the first and second PCA component.
3.2
Registration Criterion
Some anatomical features outside the lung fields, such as the clavicle, mediastinum or parts of the limbs, transform independently of the lung field, and their presence in the image may interfere with finding a proper registration for the lung field itself. Hence, the lung field is segmented prior to registration and non-lung regions are excluded when computing the registration criterion, which increases registration robustness and at the same time increases the calculation speed. Lung field segmentation is achieved using a variation of the Active Shape Model segmentation algorithm with optimal image features [9]. This results in a region of interest (ROI) containing both lung fields, including the lateral lung boundaries and excluding the mediastinum. Two example images with overlaid ROI’s can be seen in Fig. 2. The ROI of the first image, which is warped towards the second image, undergoes the same transformation as the first image and is
470
D. Loeckx et al.
overlaid on the ROI of the second image. The registration criterion is computed from all pixels in the union of both ROI’s.
(a)
(b)
Fig. 2. Pair of digital thorax radiographs to be registered, with 3x3 spline control point mesh overlaid and automatically segmented lung field ROI’s.
Different registration criteria were tested, but pattern intensity [7] Pr,σ (Idiff ) =
x,y r≤rmax
2
(Idiff (x, y) − Idiff (v, w))
2
σ 2 + (Idiff (x, y) − Idiff (v, w))
(13)
2 2 with Idiff the difference image, radius r = (x − v) + (y − w) = 3.5 pixels and σ = 128 was found to give the best results. 3.3
Optimization
Because of the nature of the lung images, the cost function is not smooth and has multiple optima. As standard optimization algorithms are found unreliable for our problem, a multi resolution fast simulated annealing [10] optimization strategy is chosen. As described before the transformation is parameterized by the principal modes b instead of optimizing over the spline control points a. An example of the course of the cost function around the optimum can be seen in Fig. 3. Initially, the centers of mass of both ROI’s are horizontally aligned and the lung tops vertically. Registration then occurs in two steps. In the first step, we
Non-rigid Image Registration
-0.66
-0.66
-0.68
-0.68
-0.7
471
-0.7
-0.72
-0.72 -0.74
-0.74
-0.76 -4
1
-3
-1
-2
0 -1
1 2
1
0 0
-1
-1 -2
-2
Fig. 3. Dependence of cost-function around the optimum on (left) translations (XY) and (right) the first two PCA modes.
perform affine registration using only the three explicit modes. In the second step, we allow t degrees of freedom, where t can be any number between 3 to 18, i.e. t varies from affine to full spline registration. For the first step the original images are resized to 64 × 64 pixels, the second step is performed on images of 128 × 128 pixels. The transformation is passed on from one resolution to another by scaling the transformation mesh rather than the transformed image. Images are resampled and morphed using B-spline interpolation [11,12,13]. B-spline interpolation yields a very good sub-pixel precision, allowing for the small image processing sizes. 3.4
Validation
The main advantage of SSDM’s is the possibility to choose, by varying t, the number of degrees of freedom in the transformation model. In order to determine the optimal t we executed the algorithm over the validation set of 46 lung images for t varying from 3 to 18. To reduce the stochastic variation induced by the simulated annealing, we ran the algorithm three times for each pair and each t. We also ran the algorithm three times for each pair optimizing over the spline control points directly, i.e. using Eq. 11 without statistical model. In all those runs, the number of iterations for the second step was held constant at 200 needing about 10 s. Because both the initial and the final value of the pattern intensity vary with the image data, we normalized the obtained result per lung pair and thus over t by subtracting the mean and dividing by the standard deviation. Fig. 4(a) shows the average value for the pattern intensity as well as the values obtained in each run. For low t the pattern intensity is inversely proportional to t, yet the decrease of the criterion declines with increasing t. For t > 12 the decrease becomes negligible. The registration accuracy obtained by optimizing over the spline control points directly, shown in the same figure by a dashed line, is roughly equal to the accuracy achieved using 6 degrees of freedom. Fig. 4(b)
472
D. Loeckx et al.
shows an example of a temporal subtraction image obtained after registration with 12 degrees of freedom, i.e. 3 affine and 9 non-rigid statistically learned transformations. The average mean pixel displacement, measured over the region of interest and averaged over all cases, between the affine registration and the registration using t = 12 is 1.5 pixels on the 128 × 128 images, which is equivalent to about 30 pixels or 4.4 mm in the full size images. The average maximal pixel distance is 11.5 mm. Comparison between the optimization over the spline control points directly and the registration using t = 12 yields an average pixel displacement of 4.0 mm and an average maximal pixel distance of 9.6 mm.
1 0.9 0.8
Pattern Intensity
0.7 0.6 0.5
Result by optimizing over the spline control points directly
0.4 0.3 0.2 0.1 0 3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 t
(a)
(b)
Fig. 4. (a) Influence of the number of degrees of freedom t on the pattern intensity. The horizontal lines are the results obtained by optimizing over the spline control points directly. (b) Example of a temporal subtraction obtained after registration with 12 degrees of freedom.
A clinical validation by a human observer was also performed, rating 96% of the registered images was rated as adequate for clinical use. A validation based on landmarks was not carried out, as the relative position of landmarks, e.g. with respect to the ribs, may vary with inhalation and projection parameters.
4
Discussion
We developed the concept of a statistical spline deformation model. With this model, a transformation can be modelled by its statistically determined principal modes and the number of modes can be optimized for a given problem. In our
Non-rigid Image Registration
473
example, the temporal subtraction of lung CR-images, the number of degrees of freedom could be reduced by 33% using a training set of 30 image pairs. The same accuracy as without SSDM’s is still achieved after a reduction up to 66% of the degrees of freedom. The method of SSDM’s can be easily extended to less general transformations or to a higher accuracy by increasing the number of knots of the transformation spline. However, the number of training images should increase accordingly. The surplus value of SSDM’s depends strongly on the representativity of the training set. All degrees of freedom contained in the whole population should be represented in this set. The number of images required therefore varies from application to application and increases with increasing number of spline control points and thus resolution, because more local transformations and more noise can be expected for increasing resolution. The SSDM approach presented here assumes that registration transformations of different but similar registration cases are statistically related. This is likely to be the case for image deformations being caused by anatomical changes over time (e.g. breathing induced motion or normal evolutionary processes such as growth) or by inter-subject variations (e.g. in brain morphology). In those cases, the use of SSDM’s will probably yield a better or faster registration. On the other hand, the underlying assumption may be violated in case of specific pathological changes occurring at various, scattered locations (e.g. occurrence and disappearance of lesions). In some cases it might be preferable to use a multi stage registration approach. Here the SSDM’s can be used for a more global registration that is afterwards refined with a local optimization of the spline control points. To conclude, SSDM’s are useful for registrations where a sufficiently representative training set can be found or as an intermediate step in a wide range of registration problems. Acknowledgment. The authors wish to thank prof. dr. J. Verschakelen of the University Hospital Gasthuisberg of the Katholieke Universiteit Leuven for providing the images.
References 1. Rueckert, D., Sonoda, L.I., Hayes, C., Hill, D.L.G., Leach, M.O., Hawkes, D.J.: Nonrigid registration using free-form deformations: Application to breast MR images. IEEE Trans. Med. Imag. 18 (1999) 712–721 2. Meyer, C., Boes, J., Kim, B., Bland, P., Wahl, R., Zasadny, K., Kison, P., Koral, K., Frey, K.: Demonstration of accuracy and clinical versatility of mutual information for automatic multimodality image fusion using affine and thin plate spline warped geometric deformations. Medical Image Analysis 1 (1997) 195–206 3. Gee, J.C., Bajcsy, R.K.: Elastic matching: Continuum mechanical and probabilistic analysis. In Toga, A.W., ed.: Brain Warping. Academic Press, San Diego (1999) 4. Christensen, G., Rabbitt, R., Miller, M.: Deformable templates using large deformation kinetics. IEEE Transactions on Image Processing 5 (1996) 1435–1447
474
D. Loeckx et al.
5. Ferrant, M., Warfield, S., Nabavi, A., Jolesz, F., Kikinis, R.: Registration of 3D interoperative MR images of the brain using a finite element biomechanical model. In Delp, S., DiGioia, A., Jaramaz, B., eds.: Medical Image Computing and ComputerAssisted Intervention (MICCAI’00). Volume 1935 of Lecture Notes in Computer Science., Springer-Verlag, Berlin (2000) 19–28 6. Rueckert, D., Frangi, A., Schnabel, J.: Automatic construction of 3D statistical deformation models using non-rigid registration. In Niessen, W., Viergever, M., eds.: Medical Image Computing and Computer-Assisted Intervention (MICCAI ’01). Volume 2208 of Lecture Notes in Computer Science., Utrecht, The Netherlands, Springer, Berlin (2001) 77–94 7. Weese, J., Buzug, T.M., Lorenz, C., Fassnacht, C.: An approach to 2D/3D registration of a vertebra in 2D X-ray fluoroscopies with 3D CT images. In: Proc. CVRMed/MRCAS. (1997) 119–128 8. MATLAB. Version 6.5.0.180913a (R13) (2002) 9. van Ginneken, B., Frangi, A., Staal, J., ter Haar Romeny, B., Viergever, M.: Automatic detection of abnormalities in chest radiographs using local texture analysis. PhD thesis, Universiteit Utrecht (2001) 10. Ingber, L.: Very fast simulated annealing. Mathematical and Computer Modelling 12 (1989) 967–973 11. Unser, M., Aldroubi, A., Eden, M.: B-spline signal processing: Part I—theory. IEEE Trans. Signal Processing 41 (1993) 821–832 12. Unser, M., Aldroubi, A., Eden, M.: B-spline signal processing: Part II—efficient design and applications. IEEE Trans. Signal Processing 41 (1993) 834–848 13. Unser, M.: Splines: A perfect fit for signal and image processing. IEEE Signal Processing Mag. 16 (1999) 22–38
A View-Based Approach to Registration: Theory and Application to Vascular Image Registration Charles V. Stewart, Chia-Ling Tsai, and Amitha Perera Rensselaer Polytechnic Institute Troy, New York 12180–3590 {stewart,tsaic,perera}@cs.rpi.edu
Abstract. This paper presents an approach to registration centered on the notion of a view — a combination of an image resolution, a transformation model, an image region over which the model currently applies, and a set of image primitives from this region. The registration process is divided into three stages: initialization, automatic view generation, and estimation. For a given initial estimate, the latter two alternate until convergence; several initial estimates may be explored. The estimation process uses a novel generalization of the Iterative Closest Point (ICP) technique that simultaneously considers multiple correspondences for each point. View-based registration is applied successfully to alignment of vascular and neuronal images in 2-d and 3-d using similarity, affine, and quadratic transformations.
1
Introduction
The view-based approach to registration is motivated by two common problems in medical image analysis, illustrated in the applications of aligning 2-d images of the retina and in aligning 3-d vascular and neuronal images [2] (Figs 1-6): Problem 1: A single landmark correspondence, specified manually or detected automatically, is established between two images. Aligning the images based on this correspondence is reasonably accurate locally in the small region surrounding the landmarks, but quite poor image-wide (Fig 3a). Without requiring more landmark correspondences, is it possible to automatically “grow” an accurate registration from the initial, local alignment? Problem 2: During an image-guided intervention procedure, a substantial jump occurs between two successive images. Possible causes include patient movement (respiration), instrument movement, and time-delays between acquisition of good-quality images. As a result, incremental registration (or tracking) could converge to an incorrect local minimum, especially for images containing repeated structures such as blood vessels or bronchial tubes (Fig. 4). This paper presents a unified set of techniques to solve these two registration problems. Two primary innovations are described. The first is the introduction of a view-based framework for registration. The second is a generalization of the C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 475–486, 2003. c Springer-Verlag Berlin Heidelberg 2003
476
C.V. Stewart, C.-L. Tsai, and A. Perera
Fig. 1. Retinal images taken 3.5 years apart of a patient having a branch vein occlusion. Substantial differences in the non-vascular regions are apparent due to hemorrhaging and pathologies.
Fig. 2. The vascular features used in registering retinal images [7]. The left shows landmarks detected at the branching and cross-over points of the blood vessels. The right shows the extracted vessel centerline points.
widely-used ICP algorithm [5,9] to accommodate feature descriptions and multiple correspondences per point (feature). The main body of the paper describes these two innovations in detail and then combines them to form the algorithms solving Problems 1 and 2. The presentation assumes vascular or neuronal features have already been segmented from the images. Although many different techniques could be used, the algorithms we employ extract elongated structures using two-sided (or multisided in 3-d) boundary following [1,7]. The features are point locations along the vessel centerlines, akin to medial axis points. Each centerline point is described by a location, tangent direction, and width. Branching and cross-over points (“landmarks”), used in initialization, are extracted as well. The centerline points between a pair of landmarks are gathered into a “segment”.
A View-Based Approach to Registration
2
477
Views and View Generation
Abstractly, a view is a definition of the registration problem together with a current transformation estimate. A view poses a problem for an iterative minimization technique, such as Iterative Closest Point (ICP) [5,9] or Mutual Information [13,20], to solve and provides the starting estimate for this minimization. A view specification includes an image resolution, a transformation model, a set of image primitives, and a minimization technique. By starting with a simplified view, such as a coarse image resolution or simplified transformation model, aligning the images based on this view, and then proceeding to a more complete and more detailed view, many local minima in the alignment can be avoided. In the view-based approach, registration is addressed as a three-stage process: initialization, view generation, and minimization. The latter two are alternated until convergence. Several different initial estimates may be evaluated. Thus far, our notion of a view is simply a gathering and synthesis of current techniques that use multiple resolutions [4,17], hierarchies of transformations models [8,15], and hierarchies of image primitives [16]. In addition to this synthesis, however, we introduce two novel ideas. First, the view includes the image region over which the transformation estimate is considered accurate (Fig. 3). Second, instead of pre-specifying the transition between views (e.g. when to switch to higher resolutions or to new transformation models), some parts of the view are automatically generated. Including the image region in our definition of a view allows the registration algorithm to start in a small region of each image (a “key-hole view” — Problem 1), align the images only in this small region (Fig. 3), and gradually increase the region size (a change in view). If these steps are done correctly, this can avoid errors due to initial image-wide misalignment of complicated structures such as networks of blood vessels or neuronal fibers. It does require, however, that initialization in the small region be reasonably accurate. Ensuring this requires manual specification of the initial region, or matching of small-scale features such as landmarks and their immediate surroundings (Fig. 2). Automatic matching can not be flawless, so several different initial matches and resulting initial views must be tested and evaluated based on their final alignment accuracies. The second novel technique in our view-based approach to registration is automatic generation of views. For now, we only develop methods to generate the region (“region growth”) and the transformation model (“model selection”). The need for automatic region growth and model selection are most easily seen in Problem 1 (Fig. 3). The initial region and perhaps subsequent regions contain too few constraints to reliably estimate the 12 parameters of a quadratic model (Table 1) needed for accurate image-wide alignment [8]. Thus, a lowerorder model must be used. The model should be chosen automatically based on the available constraints. A similar model selection problem arises in registering coarse resolution images. Region growth should also be data driven, with unstable estimates causing slow region growth, and stable estimates leading to rapid region growth. The remainder of this section describes in detail the model selection and region growth techniques of automatic region generation. Both are driven by the
478
C.V. Stewart, C.-L. Tsai, and A. Perera
(a)
(b)
(c)
(d)
Fig. 3. View-based solution to Problem 1. Panel (a) shows an initial alignment of two images (Fig. 1) based on a similarity transformation computed by matching a single landmark and its immediate surroundings from each image (small white rectangular region in the center). Vascular centerline points the two images are shown in black and in white. The global, image-wide alignment is quite poor. Panel (b) shows the alignment after three iterations of view generation (region growth and model selection) and estimation. No change has yet been made in the model, but in the next iteration (c) a reduced quadratic transformation is selected (Table 1). Panel (d) shows the final alignment using a quadratic transformation.
uncertainty in transformation estimates. We assume the transformation estimation process produces a covariance matrix and a discrete set of alignment errors. The covariance matrix may be approximated by the inverse Hessian of the objective function used in estimation, evaluated at the estimate. In the description of these steps, t denotes the iteration of the view selection and estimation loop, Rt denotes the transformation region, and Mt denotes the transformation model ˆ t is the vector of estimated parameters selected for the current view (Table 1). θ of the transformation model, and Σ t is the associated covariance matrix.
A View-Based Approach to Registration
479
Fig. 4. Problem 2 — incremental registration. The left panel shows an initial alignment of two images of a healthy retina. As in Fig. 3, centerline points from the two different images are shown in black and in white. Some of the ICP correspondences are shown with white line segments between black and white points. The obvious mismatches lead to the poor alignment results shown on the right, even when using robust estimation. Table 1. The set of transformation models used in retinal image registration [8]. To clarify notation in the equations, p = (x, y)T is an image location in I1 , q = (u, v)T is the transformed image location in I2 , and p0 is the center of the registration region. In addition to the formulations, the table also shows the degrees of freedom (DoF) in each model and the average alignment error on 1024 × 1024 images. Model Similarity
Affine
2.1
Equation q=
q=
DoF Accuracy
θ11 θ12 θ13 0 0 0 X(p − p0 ) θ21 −θ13 θ12 0 0 0
4
5.05 pixels
θ11 θ12 θ13 0 0 0 X(p − p0 ) θ21 θ22 θ23 0 0 0
6
4.58 pixels
2.41 pixels
Reduced quadratic
q=
θ11 θ12 θ13 θ14 0 θ14 X(p − p0 ) θ21 −θ13 θ12 θ24 0 θ24
6
Quadratic
q=
θ11 θ12 θ13 θ14 θ15 θ16 X(p − p0 ) θ21 θ22 θ23 θ24 θ25 θ26
12 0.64 pixels
Region Expansion
For simplicity, Rt is rectilinear (Fig 3 shows transformed Rt in white). At each view generation step, each side of Rt is shifted outward. Let pc be the point in the center of a given side. The covariance matrix of the transformation can be ˆ t ; pc ) converted to the covariance matrix of the mapping of this point pc = Mt (θ using standard covariance propagation techniques. This yields the transfer error covariance of pc [11, Ch. 4]. Let the component of this 2 × 2 covariance in
480
C.V. Stewart, C.-L. Tsai, and A. Perera
the outward direction from Rt be σc2 . Point pc is expanded outward in direct proportion to β/ max (σc , 1), with the upper bound of 1 preventing growth that is too fast and β controlling the overall rate of growth. The rectangle bounding the expanded center points from each side is the new region, Rt+1 . 2.2
Model Selection
Region growth makes new constraints on the transformation estimate available for the next view. Determining if this warrants switching to a higher-order model requires the application of model selection techniques [6,19]. These techniques trade-off the increased fitting accuracy of higher-order models against the increased stability of lower-order models. The stability is usually measured as a function of the covariance matrix Σt . The criteria we use is [6]: dt wi ri2 + log det(Σt ), log 2π − 2 i
(1)
where d is the degrees of freedom in the model Mt , i wi ri2 is the sum of the ˆ t ), and det(Σt ) is robustly-weighted alignment errors (based on the estimate θ the determinant of the parameter estimate covariance matrix. This equation is evaluated for the current model Mt and for other candidate models. For each ˆ t for the current model Mt serves as the starting other model M , the estimate θ point to estimate a new set of parameters (for model M ). For simplicity, at each iteration t we usually just evaluate the model M which is the next more complicated model than Mt . Overall, the model that results in the greatest value of (1) is chosen as Mt+1 .
3
Estimation Engine: ICP / IMCF
Although MI or ICP could be used as the estimation engine in view-based registration, here we introduce a generalization of ICP that allows multiple matches per point, and uses a novel, robust weighting technique to combine the match contributions. We’ll start with a summary discussion of robust ICP. Let P = {pi } be the extracted set of blood (neuron) vessel centerline points in image I1 . Robust ICP (1) takes the point locations from P that are in registration region Rt of the current view, (2) applies the current transformation estimate to each, and (3) finds the closest centerline point location qi in I2 . This generates correspondence set Ct = {(pi , qi )}. The new transformation estimate is computed by minimizing the robust objective function E(θ t ) = ρ(d(Mt (θ t ; pi ), qi )/ˆ σ ). (2) (pi ,qi )∈Ct
Here, Mt (·; ·) maps pi into I2 , and d(·, ·) is the distance of the mapped point to qi , measured as the perpendicular distance from Mt (θ t ; pi ) to the vessel ˆ is an estimate of the standard deviation of contour tangent line through qi . σ
A View-Based Approach to Registration
q3 q2
481
q1
p’
q4
Fig. 5. IMCF. The left panel illustrates the generation of multiple correspondences in IMCF. For transformed point p , the closest centerline point (q1 ) is found and additional matches are sought from each centerline segment intersecting a circular region surrounding p . Using correspondences such as these, IMCF is able to correct the incorrect alignment shown in Fig. 4 and produce the alignment shown on the right (white centerline points aren’t seen where they exactly overlap the black ones).
the alignment errors which is robust to outliers; and ρ(·) is a robust loss function. Equation 2 is minimized using iterative-reweighted least squares [12], with weight function w(u) = ρ (u)/u. We use the Beaton-Tukey biweight function [3] which goes to 0 for errors above about 4ˆ σ , eliminating the effects of mismatches due to missing vessel structures. The ICP steps of matching and transformation estimation are repeated until the estimate converges for the current view. 3.1
IMCF
Even with a robust error norm, ICP fails when a significant fraction of the closest points are not correct correspondences (Fig. 4). To overcome this, we allow multiple matches per centerline point and then adjust the minimization to exploit the extra information provided by these matches. We also introduce a similarity measure between matches, based on comparison of feature properties. The resulting algorithm is called Iterative Multiple Close Features (IMCF). In each matching iteration, IMCF looks for multiple matches for each p ∈ ˆ p) be the transformation of p into I2 . Just as in P ∩ Rt (Fig. 5). Let p = M (θ; ICP, the closest I2 centerline point to p is found (q1 in Fig. 5). In addition, all other vessel segments intersecting a circular region surrounding p are identified. The closest point on each segment is used to form an additional match with p, ˆ. added to Ct . The search radius is a small multiple of the alignment error σ Like other multiple-match techniques ([10]), the influence of these correspondences is controlled through the weight function, but our weight function is novel. As background, in robust ICP the weight values are wi = w(ui ) = ρ (ui )/ui , where ui = d(M (θ, pi ), qi )/ˆ σ . Thus, the ICP weighted least-squares objective function is
482
C.V. Stewart, C.-L. Tsai, and A. Perera
2 wi · d(M (θ, pi ), qi ) / σ ˆ .
(3)
(pi ,qi )∈C
The steps of weight calculation and updates to θ are alternated. For each match in IMCF, we define three separate weights and then combine them into the cumulative weight, wi in (3). The “geometric-error” weight is wi,r = ρ (ui )/ui , as above. The “signature-error” weight, based on comparison of other properties of the correspondence besides distance, is wi,s . In aligning vascular images, these properties are vessel orientations and widths. The third weight is called the competitive weight. To define it for match (pi , qi ) we need all other matches for feature pi . This is the match subset Ct,i = {(pj , qj ) ∈ Ct | pi = pj }. Using this subset the competitive weight is wi,r wi,s . (pj ,qj )∈Ct,i wi,r wi,s
wi,c =
Intuitively, wi,c is the fraction of the total matching weight for feature pi that accrues to this match. The final weight is wi = wi,r wi,s wi,c . The main advantage of this technique is simplicity. No separate “outlier process” and associated tuning parameters are needed [10]. The use of competitive weights can not increase the final weight (over the combined geometric and similarity weights), so outliers remain outliers. In addition, weights of unambiguous matches remain unchanged. Hence, the competitive weight is mostly intended to downgrade the influence of ambiguously matched features, allowing less ambiguous matches to have greater influence on the alignment. As a final comment, the robust, competitive weighting scheme just described requires accurate As IMCF iterates, this is the robustly scale estimation. ˆ pi ), qi )2 / wi . Initially, however, it is a weighted rms σ ˆ 2 = i wi · d(M (θ; i robust estimate called MUSE [14] based on the matches that are most similar (orientation and width) rather than closest.
4
Two Algorithms
Given the foregoing descriptions of the view-based approach, automatic view generation, and the ICP and IMCF algorithms, the descriptions of the solutions to Problems 1 and 2 are straightforward. In fact, at a software-level there is very little difference between the implementations (one of the goals of the work). For each algorithm, we briefly describe the initialization, view generation, and estimation stages. Results are presented in the next section. For Problem 1, initial estimates and initial views are generated by placing one automatically detected landmark from each image in correspondence and computing an initial transformation by aligning the immediately surrounding vessels (Fig. 3). The correspondences may be generated manually, or they may be generated automatically by matching signatures computed from the widths and orientations of vessels meeting at the landmark. Multiple initial estimates are generated in the latter case. The view includes the region and the transformation model. These are grown (bootstrapped) from the small starting regions.
A View-Based Approach to Registration
483
For estimation, we’ve used both ICP and IMCF. The more sophisticated IMCF algorithm does not substantially improve the results because the region growth and model selection reduce matching ambiguities; in practice we use ICP. A transformation is accepted as correct when the algorithm converges to an estimate having an image-wide, median error in the alignment of centerline pixels less than an empirical threshold (1.5 pixels in retina images [8]); when no initial estimate leads to such an alignment, the algorithm indicates that the images can’t be registered. The overall algorithm is called the “Dual-Bootstrap ICP” (see [18] for more details). For Problem 2 (Fig. 4) the initial estimate is just the estimate for the previous image in the sequence. Alignment is against an image giving a map of the surgical region. The view includes the image resolution and the transformation model. Resolution changes follow a standard coarse-to-fine progression, switching image resolutions when estimation converges at a given each resolution. Transformation models are determined automatically for each resolution. IMCF is used to estimate the transformation once the view (resolution and model) is fixed. The overall algorithm is called M-IMCF (Multiresolution IMCF).
5
Results
The algorithms have been applied in four contexts: (1) registering pairs of 2-d retinal images, (2) incremental registration of sequences of retinal images with each registration starting from the previous estimate, (3) registering confocal stacks of neuronal images, and (4) registering confocal stacks of vascular (rat brain) images. Dual-Bootstrap ICP is used for (1), (3) and (4), which correspond to Problem 1 from the introduction, while M-ICF is used for (2). In the retinal registration problems, the final transformation model is a 12-parameter quadratic (Table 1), and the experimental results are extensive. For the confocal images, the final transformation is affine, and the results are preliminary. Example alignments using Dual-Bootstrap ICP are shown in Fig. 3 for 2-d retinal images and in Fig. 6 for 3-d vascular images. In retinal image registration, tests were applied to over 4000 images pairs taken from healthy eyes and taken from diseased eyes (with some inter-image time intervals of several years). For image pairs that (a) overlapped by at least 35% of the image area and (b) had at least one common landmark detected by feature extraction, the algorithm always aligned the images to within a 1.5 pixel error threshold (1024 × 1024 images). Performance gradually degraded with less overlap. Moreover, no incorrect alignments were accepted as correct (i.e. none aligned to within the error threshold). The median number of initial estimates tried was 2, and the average registration time was about 2 seconds. Registration in 3-d takes about a minute, with most time occupied by feature extraction prior to registration. The pixel dimensions were 0.7 µm within each plane and 5 µm between planes. Median alignment errors were less than a pixel in each dimension. For M-IMCF, our primary experimental goal is evaluating the domain of convergence of initialized registration. We explore this with synthetic shifts of real image pairs, using a subset of the image pairs used to study Dual-Bootstrap ICP. For each pair, we have the “correct” (validated) quadratic transformation. We
484
C.V. Stewart, C.-L. Tsai, and A. Perera 500
400
300
200
100
0
0
100
200
(a)
300
400
500
600
300
400
500
600
(b)
500
500
400
400
300
300
200
200
100
100
0
0
0
100
200
300
(c)
400
500
600
0
100
200
(d)
Fig. 6. The Dual-Bootstrap ICP registration algorithm for 3-d vascular images of a rat brain taken using confocal microscopy. Panel (a) shows an max-intensity projection, with super-imposed 3-d vessel centerlines. Panels (b)-(d) show alignment of centerlines (solid lines for one image, dashed lines for the other) taken at two different times. Panel (b) gives the initial alignment in a small region. Panels (c) and (d) show intermediate and final alignments. Even though the figures are 2-d projections, the registration occurs in 3-d.
perturb this transformation by different translations and rotations to provide initial offsets, run the algorithm for each, and determine if it converges to within 1.5 pixels of the validated transformation. We count the initial transformation as a success if it does. Fig. 7 summarizes the success rates for four algorithms: ICP, Multiresolution ICP (M-ICP), IMCF, and M-ICMF. Clearly, both multiresolution and IMCF are important in increasing the domain of convergence. To summarize the experimental results, the Dual-Bootstrap ICP algorithm overall is extremely successful at registering retinal image pairs, including pairs with relatively low overlap. M-IMCF works well for initialized registration, substantially outperforming ICP and multiresolution ICP. The difference between Dual-Bootstrap ICP and M-IMCF in an application context is that the much faster M-IMCF algorithm will be used primarily for on-line registration during surgical procedures. Finally, the preliminary 3-d results demonstrate the applicability of the view-based approach to other, related problems.
A View-Based Approach to Registration
1
1
M-IMCF M-ICP IMCF ICP
0.9
0.7
success rate
success rate
0.8
0.7 0.6 0.5 0.4
0.6 0.5 0.4
0.3
0.3
0.2
0.2
0.1 0 0
M-IMCF M-ICP IMCF ICP
0.9
0.8
485
0.1 50
100
150
200
translation (pixel)
250
300
0 0
50
100
150
200
translation (pixel)
250
300
Fig. 7. Comparing the domains of convergence of ICP, Multiresolution ICP, IMCF, and Multiresolution IMCF on a selection of image pairs of both healthy and unhealthy eyes. The horizontal axis is the radial initial offset from the correct transformation in pixels (all shift directions are combined) from the true alignment. All tested images are 1024 × 1024. The left plot shows results for no rotation, while the right plot shows results for 10◦ rotation.
6
Summary and Conclusions
This paper has introduced a view-based approach to registration. Each view includes an image resolution, a set of image primitives, a transformation model, an image region, a current transformation estimate, and an estimation technique. While the estimation technique could be a standard technique such as ICP or Mutual Information, this paper has introduced a new core estimation technique called IMCF, a generalization of ICP that allows multiple matches to be simultaneously considered. These techniques were used to solve two problems in retinal image registration, and the algorithms have been extended to 3-d registration of vascular and neuronal images as well. A C++ registration library is being built around the view-based approach. Ongoing work is extending this research in a number of directions. Clearly, the fundamental theoretical extension to the view-based approach is incorporating deformable registration. In doing this the algorithm must start with small scale deformations for simple views and allow increasingly large-scale deformations as the view expands (especially in the image region). In addition to the theory, a variety of applications of view-based registration are being investigated. Currently, the most important of these is the use of Dual-Bootstrap ICP and M-IMCF in the diagnosis and treatment of retinal diseases.
References [1] K. Al-Kofahi, S. Lasek, S. D., C. Pace, G. Nagy, J. N. Turner, and B. Roysam. Rapid automated three-dimensional tracing of neurons from confocal image stacks. IEEE Trans. on Inf. Tech. in Biomedicine, 6(2):171–187, 2002.
486
C.V. Stewart, C.-L. Tsai, and A. Perera
[2] S. Aylward, J. Jomier, S. Weeks, and E. Bullitt. Registration of vascular images. Int. J. of Computer Vision, (to appear) 2003. [3] A. E. Beaton and J. W. Tukey. The fitting of power series, meaning polynomials, illustrated on band-spectroscopic data. Technometrics, 16:147–185, 1974. [4] J. Bergen, P. Anandan, K. Hanna, and R. Hingorani. Hierarchical model-based motion estimation. In Proc. 2nd ECCV, pages 237–252, 1992. [5] P. Besl and N. McKay. A method for registration of 3-d shapes. IEEE Trans. on PAMI, 14(2):239–256, 1992. [6] K. Bubna and C. V. Stewart. Model selection techniques and merging rules for range data segmentation algorithms. Computer Vision and Image Understanding, 80:215–245, 2000. [7] A. Can, H. Shen, J. N. Turner, H. L. Tanenbaum, and B. Roysam. Rapid automated tracing and feature extraction from live high-resolution retinal fundus images using direct exploratory algorithms. IEEE Trans. on Info. Tech. for Biomedicine, 3(2):125–138, 1999. [8] A. Can, C. Stewart, B. Roysam, and H. Tanenbaum. A feature-based, robust, hierarchical algorithm for registering pairs of images of the curved human retina. IEEE Trans. on PAMI, 24(3):347–364, 2002. [9] Y. Chen and G. Medioni. Object modeling by registration of multiple range images. Image and Vision Computing, 10(3):145–155, 1992. [10] H. Chui and A. Rangarajan. A new algorithm for non-rigid point matching. In Proc. CVPR, pages II:44–51, 2000. [11] R. Hartley and A. Zisserman. Multiple View Geometry. Cambridge University Press, 2000. [12] P. W. Holland and R. E. Welsch. Robust regression using iteratively reweighted least-squares. Commun. Statist.-Theor. Meth., A6:813–827, 1977. [13] F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, and P. Suetens. Multimodality image registration by maximization of mutual information. IEEE Trans. on Medical Imaging, 16(2):187–198, 1997. [14] J. V. Miller. Regression-Base Surface Reconstruction: Coping with Noise, Outliers, and Discontinuities. PhD thesis, Rensselaer Polytechnic Institute, 1997. [15] H. Sawhney, S. Hsu, and R. Kumar. Robust video mosaicing through topology inference and local to global alignment. In Proc. 5th ECCV, volume II, pages 103–119, 1998. [16] G. Sharp, S. Lee, and D. Wehe. ICP registration using invariant features. IEEE Trans. on PAMI, 24(1):90–102, 2002. [17] D. Shen and C. Davatzikos. HAMMER: Hierarchical attribute matching mechanism for elastic registration. IEEE Trans. on Medical Imaging, 21(11):1421–1439, 2002. [18] C. Stewart, C.-L. Tsai, and B. Roysam. The dual bootstrap iterative closest point algorithm with application to retinal image registration. Technical Report RPI-CS-TR 02-9, Department of Computer Science, 2002. [19] P. Torr. An assessment of information criteria for motion model selection. In Proc. CVPR, pages 47–52, 1997. [20] P. Viola and W. M. Wells III. Alignment by maximization of mutual information. Int. J. of Computer Vision, 24(2):137–154, 1997.
Fusion of Autoradiographies with an MR Volume Using 2-D and 3-D Linear Transformations Gr´egoire Malandain1 and Eric Bardinet1,2 1 2
Epidaure, INRIA, 2004 route des lucioles BP 93, 06 902 Sophia-Antipolis cedex, France, {eric.bardinet,gregoire.malandain}@sophia.inria.fr CNRS UPR 640 - LENA, 47 Boulevard de l’Hˆ opital, 75 651 Paris cedex 13, France [email protected]
Abstract. The recent development of 3-D medical imaging devices has given access to the 3-D imaging of in vivo tissues, from an anatomical (MR, CT) or even functional point of view (fMRI, PET, SPECT). However, the resolution of these images is still not sufficient to image anatomical or functional details, that can only be revealed by in vitro imaging (e.g. histology, autoradiography). The deep motivation of this work is the comparison of activations detected by fMRI series analysis to the ones that can be observed in autoradiographic images. The aim of the presented work is to fuse the autoradiographic data with the pre-mortem anatomical MR image, to facilitate the above-mentioned comparison. We show that this fusion can be achieved by using only simple global transformations (rigid and affine), yielding a very satisfactory result.
1
Introduction
In the past years, the development of 3-D medical imaging devices has given access to the 3-D imaging of in vivo tissues, from an anatomical (MR, CT) or even functional point of view (fMRI, PET, SPECT). However, despite huge technological advances, the resolution of these images is still not sufficient to image anatomical or functional details, that can only be revealed by in vitro imaging (e.g. histology, autoradiography), eventually enhanced by staining [1]. The deep motivation of this work is the comparison of activations detected by fMRI series analysis to the ones that can be observed in autoradiographic (AR) images, and that can be considered as ground truth. Data are acquired on awake behaving animals (rhesus monkeys). In addition to the fMRI series, an anatomical MR image is also acquired against which fMRI images are registered. The aim of the presented work is to fuse the autoradiographic data with the premortem anatomical MR image, so that future comparison of detected activations (in term of location) will be straightforward. The purpose of the fusion of a set of 2-D autoradiographies, or more generally a set of contiguous thin 2-D sections, with an MR volume of the same individual, is to find the most exact correspondence in the MR volume for each 2-D section. Fusion can be done following different strategies. Among these, the most intuitive C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 487–498, 2003. c Springer-Verlag Berlin Heidelberg 2003
488
G. Malandain and E. Bardinet
consists of a twofold approach: first align the thin 2-D sections with respect to a chosen reference section, yielding a reconstructed 3-D volume; then co-register this 3-D volume to the MR volume. Reconstruction of a 3-D volume from a stack of 2-D images (histological slices or autoradiographies) has already been largely studied. It is done by registering each two consecutive images in the stack to recover a geometrically coherent 3-D alignment of the 2-D slices. Main differences with image matching, as it is classically understood, are: 1) slices to be registered are not images of the same object, but of similar objects (i.e. two consecutive sections) that can exhibit differences in shape and/or in intensity; 2) non-coherent distortions may occur from one slice to the next. The most common method is manual registration [2]. Despite its simplicity, it has a number of drawbacks (it is not reproducible, user-dependent, and very time consuming) which makes it not suitable for a large number of data. Fiducial markers (e.g. by sticking needles in the material before slicing) can be tracked over the whole stack to recover the original geometry [3,4]. However, since it is usually done by least squares minimization, bias may appear if needles are not orthogonal to the cutting planes. Moreover, tracking can be awkward, particularly if needle holes collapse. Finally, the needles may destroy part of the tissues of interest in the material. More classical registration methods have also been investigated [5]. These methods can be divided in two classes: geometrical ones that require the segmentation of some features (points, lines, or even objects of interest), and iconic ones that are based solely on the image intensities. Considering the geometrical methods, registration can first be achieved with global descriptors, e.g. centers of mass and principal axes. This has been proven of limited precision [6], but may be used as initialization [7]. More precise features have been used, i.e. contours [8], edges [9], or points [10]. Because of the difficulty to design a completely automatic and reliable segmentation method, or to manually extract features of interest, iconic methods have also been investigated. They are based on the optimization of a given similarity measure of the image intensities: cross-correlation in [11] or mutual information in [12]. From our point of view, the work presented in [11] can be considered as an hybrid ICP-like approach [13] between geometrical and iconic methods: blocks can be considered as geometrical features while co-registration of blocks is achieved by optimization of a similarity measure. Hence we consider it as particularly suitable for the purpose of 3-D volume reconstruction. Above is only addressed the problem of spatial alignment of the 2-D slices in order to reconstruct a geometrically coherent 3-D volume. It should be stressed that the problem of alignment in intensity has rarely been discussed. Indeed, 2-D autoradiographies may exhibit intensity inhomogeneities from one image to the other for several reasons (e.g. section’s thickness). It can be easily compensated if appropriate standards (microscales) of known radioactivities are also imaged (and further scanned) on each piece of film [14]. If they are not present, alternative approaches have to be proposed. The previously cited literature addresses only the problem of reconstruction of a 3-D volume from 2-D slices. It is of course of interest for many purposes but does not allow an easy comparison with more classical 3-D modalities (e.g. MRI,
Fusion of Autoradiographies with an MR Volume
489
CT). The fusion of such histological or autoradiographic data with other 3-D data has been more rarely addressed. In this case, the specific transformations due to the acquisition protocol (e.g. cutting, manipulation, chemical treatment, etc.) have to be compensated. To our knowledge, histology has been co-registered with MRI [15] and PET [16]. In both cases, photographs are acquired during the cutting and used as an intermediate modality. A 3-D linear transformation is used to map the photographic volume onto the MR volume, while 2-D highly non-linear transformations compensate residual in-plane misalignments between the histological sections and the photographs. In this paper, we address the fusion of 2-D autoradiographic slices with a 3-D anatomical MR image, when no intermediate modality, such as photographs, is available. We will demonstrate that it is possible to achieve a satisfactory correspondence by using only linear transformations.
2
Data Acquisition and Preprocessing
MR image. Several anatomical T1-weighted MR acquisitions are done on anesthetized monkeys to avoid motion artifacts. They are averaged, resulting in an MR image that has a size of 240 × 256 × 80 with a voxel of 1 mm3 . Then a sub-image is extracted, containing only the right hemisphere of the brain, and oversampled by a factor 4 with cubic splines to end up with an image of size 264 × 192 × 356 with a voxel of 0.25 mm3 . Autoradiographies (AR). For the autoradiographic study, a double-label deoxyglucose technique (DG) is used to distinguish between two activation paradigms (details can be found in [17]). [3 H]DG is injected during the presentation of the first stimulus, then the second stimulus is presented and the [14 C]DG is injected. After a short delay, the monkey is sacrificed, the brain undergoes
Fig. 1. Some of the original AR images (out of 818) of the posterior block of the monkey’s brain.
some treatments, and the hemisphere dedicated to our protocol is cut into two pieces and frozen as fast as possible, each block lying on the cutting section so that it will still be plane after freezing. Each block (posterior and anterior) is finally sectioned using a cryostat microtome (slice thickness of 40 µm). Sections are mounted on coverslips and dried. They are first exposed against 3 H sensitive films and then against 14 C sensitive films. These films are subsequently manually scanned, yielding 818 slices for the posterior block and 887 slices for the anterior one. For economical reasons, each autoradiographic film may contain
490
G. Malandain and E. Bardinet
several brain sections, hence on one image several sections may be seen. With very simple operators (thresholding and mathematical morphology), the section of interest in each image has been extracted. Then all the sections have been put into a stack resulting in a 3-D image of size 320 × 256 × 818 (for the posterior block) with a voxel size of 0.16 × 0.16 × 0.04 mm3 (see top row of figure 2).
3
Framework
The proposed fusion of a 3-D MR image with the autoradiographs consists of three steps: 1. an initial reconstruction of an AR volume, coherent both in geometry and intensity, without the help of the MR image; 2. the 3-D registration of the reconstructed AR volume against the MR image; 3. fusion loops, that alternate between the reconstruction update of the AR volume with the help of the MR image and the 3-D registration of the AR volume against the MR image. 3-D reconstruction of an autoradiographic volume. Following [11], we realign the 2D slices to build a geometrically coherent volume. Each couple of consecutive slices, Si and Si+1 are rigidly co-registered, with the method described below, yielding a 2-D transformation Ti←i+1 (or equivalently Ti+1←i ). A reference slice, of index ref 1 , is chosen, and the 2-D transformations Ti←ref 1 are computed by composition of the Ti←i+1 or Ti+1←i transformations. All the resampled slices, Si ◦ Ti←ref 1 , can now be superposed to build a geometrically coherent 3-D volume. Because the overall illumination may change from one autoradiographic slice to another, we compensate for these changes with a dedicated histogram matching from slice to slice, yielding an affine intensity transformation fi←i+1 (or equivalently fi+1←i ) between two consecutive slices. A reference slice, of index ref 2 , is chosen, and the intensity transformations fref 2 ←i are computed by composition of the fi←i+1 or fi+1←i intensity transformations. To summarize, the 3-D reconstruction of an autoradiographic volume, coherent in geometry and in intensity, is achieved by the superposition of the resampled slices fref 2 ←i ◦ Si ◦ Ti←ref 1 . Initial 3-D registration MRI / AR volume. The reconstructed AR volume is coregistered with the MR volume, providing an initial solution to the fusion problem. At this point only 3-D rigid transformation was considered. Fusion loop. This loop aims to fuse the 2-D autoradiographic slices with the 3-D MR image, given a first 3-D reconstruction of the autoradiographic volume, AV (0) , and the initial 3-D transformation TM R←AV (0) (as obtained above). Each iteration k of the loop, that alternates between the reconstruction update of the autoradiographic volume and the 3-D registration of the autoradiographic volume against the MR image, is divided into four steps.
Fusion of Autoradiographies with an MR Volume
491
1. This first step is twofold: a) Resampling of the MR volume into the geometry of AV (k) yielding the (k) volume M R = M R ◦ TM R←AV (k) . This allows to extract MR slices, (k) M R , i = 1 . . . N , that match autoradiographic slices. i
(k)
against the b) 2-D registrations of each autoradiographic slice AVi (k) corresponding MR slice M Ri , resulting in N 2-D transformations TAV (k) ←M (k) . Ri i 2. Filtering of the 2-D transformations TAV (k) ←M (k) , resulting in the 2-D R transformations TˆAV (k) ←M (k) . R
i
i
i
i
3. Building of a new autoradiographic AV (k+1) volume by superposing the (k) slices AVi ◦ TˆAV (k) ←M (k) . R i
i
4. 3-D registration of the autoradiographic volume AV (k+1) against the 3-D MR image, yielding a 3-D transformation TM R←AV (k+1) . The number of iterations, as well as the transformation classes (i.e. rigid transformations, similarities, or affine transformations), are specified by the user.
4
Tools
We now briefly present the necessary tools to implement the proposed methodology. 2-D and 3-D Registration. To register 2 images, that can be either two 2-D autoradiographies or a 3-D AR volume against a 3-D MR volume, any registration method could in principle be used [5]. We will not discuss the method we choose, namely block matching, that we consider as particularly suited for 3-D reconstruction since it has already been described and discussed in [11]. It can be viewed as an hybrid ICP-like approach [13] between geometrical and iconic method: blocks can be considered as geometrical features while co-registration of blocks is achieved by maximization of a similarity measure. This algorithm takes as input a reference image I and a floating image J, and aims to estimate a transformation T such as J ◦ T can be superimposed on I. It is done through an iterative scheme: at each iteration k, correspondences between I and J ◦ Tk , that are computed thanks to a block matching algorithm, allow to estimate a transformation δTk , which is used to update these correspondences that allows to update the searched transformation by Tk+1 = δTk ◦ Tk . This iterative procedure stops as soon as no significant change occurs in the transformation evaluation. As recalled above, the correspondences are computed thanks to a block matching strategy. A block B(x, y) in image I is defined as the sub-image of I with upper left corner (x, y) and dimensions dx × dy . We define the set Ba,b of blocks by Bsx ,sy = {B(x, y) ⊂ I such that x = asx , y = bsy with a, b ∈ N}. It comes out that B1,1 contains all the possible blocks of size dx × dy included in I, while Bsx ,sy with sx > 1 or sy > 1 contains only a subset of them.
492
G. Malandain and E. Bardinet
Once the blocks to be considered in I are defined, for each of them the best correspondence in J ◦ Tk (i.e. J resampled with Tk ) is computed. To achieve this, we look for the block B (x , y ) of upper left corner (x , y ) and size dx × dy in J ◦ Tk that optimizes the correlation coefficient as the similarity measure [18]. From each pair of corresponding blocks, B(x, y) and B (x , y ), we deduce a pair of corresponding points, C(x, y) and C (x , y ), being respectively the centers of blocks B(x, y) and B (x , y ). The optimal transformation δTk between the images I and J ◦ Tk would be the one minimizing the residuals C(x, y) − δT (C (x , y ))2 in a least-squares sense. To reject outliers, a robust estimation with a weighted Least Trimmed Squares (LTS) is preferred [19]. Intensity compensation. Consider two discrete histograms or equivalently two discrete probability density functions (PDF) p(xi ) and q(yj ), with xi , yj ∈ Z, the aim of this section is to estimate an intensity transformation f such that the distribution p is similar to q ◦ f . This problem is known as histogram matching [20]. Autoradiographies are mostly made of two tissues only, white and grey matter. Thus, we are looking for an affine function f . To avoid discretization artifacts, we will use the continuous PDF P (x) and Q(y) obtained via Parzen windowing, and the searched transformation is found by minimizing the sum of squared differences between P and Q. Filtering of transformations. The problem is the following: given a set of transformations T (i) belonging to transformation class T , estimate a filtered transformation Tˆ(i) ∈ T such that: Tˆ(i) = arg minT ∈T j g(i − j)dist (T (j), T )2
where g is some low pass filter, e.g. a Gaussian function, and dist() represents a distance for T . Such a computation is not straightforward [21] but can be achieved with the Fr´echet expectation. Since this expectation can be approximated by the standard expectation near the origin (i.e. the identity), we use this property to compute the Fr´echet expectation with an iterative procedure that then needs a first estimate of Tˆ(i), e.g. Tˆ(0) (i) = T (i) : (−1) Tˆ(k+1) (i) = Tˆ(k) (i) ◦ [ j g(i − j)(Tˆ(k) (i) ◦ T (j))]
(−1) It stops when j g(i − j)(Tˆ(k) (i) ◦ T (j)) is close enough to the identity. A weighted sum of transformations j wj T (j) within a given class of transformation T is computed by performing the weighted sum on their parameters, i.e. 12 parameters for affine transformations, and the rotation and translation vectors for rigid transformations
5
Results
Reconstruction of the autoradiographic volume. A first reconstruction of the autoradiographic volume has been performed by registering each couple of
Fusion of Autoradiographies with an MR Volume
493
Fig. 2. From left to right: the AR volume before correction; after geometric alignment of the slices, and before intensity correction; after intensity correction; corresponding slice of the resampled MR volume after 3-D rigid registration. Top row shows an axial slice while bottom row displays a sagittal slice.
consecutive sections with the 2-D block matching algorithm. The composition of computed transformations permits to register all sections against a reference section (taken at the middle of the stack). Visual inspection of the reconstructed volume allows to detect eventual remaining registration errors that are subsequently corrected by changing registration parameters. This control step is repeated until the results is satisfactory, i.e. until the reconstructed volume seems geometrically consistent. First column of figure 2 shows the stack of autoradiographies to be compared to the obtained reconstruction (second column of same figure). Intensity consistency is also obtained by performing histogram matching for each couple of consecutive sections. Please note that this computation is independent from the above geometrical registration. Third row from figure 2 depicts the final reconstruction. Initial registration with MRI. After reconstruction of the autoradiographic volume, it can be registered (here rigidly) against the MR sub-volume of interest, that is the corresponding hemisphere. Third column of Figure 2 shows cross-sections of the reconstructed AR volume while the last column shows the corresponding cross-sections of the resampled MR volume after registration. Both volumes are roughly similar, but large differences can be seen especially in posterior areas (top of the images). Fusion loops. After the AR volume has already been rigidly registered against the MR volume, the fusion loops consist in iterating four successive steps: 1. 2-D independent registrations of each AR section against the corresponding slice of the resampled MR volume (rigid for the first two loops, and affine for the third one), 2. filtering of the 2-D transformations, 3. reconstruction of a new 3-D AR volume, 4. and affine 3-D registration of the newly reconstructed AR volume against the MR volume.
494
G. Malandain and E. Bardinet
We stop after the fourth registration between MR and AR volumes since no changes occur at this step. Intermediary results for loop # 1 are given in fig-
Fig. 3. Fusion loop # 1. From left to right: the AR volume after the independent 2-D slice co-registrations; the AR volume after the 2-D transformations filtering; the MR image resampled after affine registration against the above reconstructed AR volume. Top and bottom rows show respectively an axial and a sagittal slice. The effect of this first loop can be appreciated by comparing the middle column to the third column of figure 2.
ure 3 in the geometry of the reconstructed AR volume. Figure 4 shows the final alignment of the histological slices with respect to the MR volume. Figure 4 highlights the benefits from the proposed fusion methodology by comparing the AR reconstructed volume before and after the fusion, with and without superimposition of the MR contours. We have also fused the anterior part of the same monkey’s brain (see figure 5) to the MR volume. As we did observe only minor changes between loops #1 and #2, we did not go into the third loop. Visual inspection. The deep motivation of the presented work is the comparison of activations detected from autoradiographies to the ones detected in fMRI series study. More precisely, we are interested in the visual cortex [17]. It comes out that this work may serve that purpose if the sulci involved in the visual cortex are well registered. This has been visually checked: e.g. two volumes can be visually superimposed into a 3-D viewer displaying axial, sagittal and coronal cross-sections (see figure 5) with tunable transparency. Navigation in the fused 3-D volumes allows to check the correctness of the fusion. At the level of the visual cortex, the correspondence is very accurate, as well as for most of the brain. Some small mismatch errors can be seen at the medial side of the caudate nucleus (due to ventricles collapsing).
6
Discussion
Reconstruction. This step consists of two parts, a geometry-based and an intensity-based reconstruction.
Fusion of Autoradiographies with an MR Volume
495
Fig. 4. From left to right: the original MR image (resampled with cubic spline); the first reconstructed autoradiographic volume (before any fusion loop) rigidly registered against the MR image, without and with superimposed MR contours; the last reconstructed autoradiographic volume (after all fusion loops) registered against the MR image, without and with superimposed MR contours. Top and bottom rows show respectively a sagittal and an axial slice.
Concerning the geometrical reconstruction, visual inspection is necessary in case of mismatch between two consecutive sections. It should be pointed out that mismatch does not mean poor accuracy. For such errors [11], there is a huge discrepancy between the obtained transformation and the expected one so that the failure can be very easily seen. We correct them by changing the registration parameters: this modifies the shape of the convergence basin so that it will contain the initial position. Fig. 5. Fused images can be preVisual inspection of the intensity-based resented in a single viewer with difconstruction is facilitated by the same proce- ferent color tables. dure. A single error (a mismatch in matching the histogram of two consecutive sections) results, after reconstruction, in a separation between two intensity-consistent 3-D sub-volumes. It can be questioned whether this intensity-based reconstruction is necessary or not. The 3-D reconstructed AR volume will be registered against the MR volume. To do this, 3-D sub-images of the AR volume will be compared to 3-D blocks of the MR volume with a similarity measure that assumes some relationship between intensities and thus needs the blocks to be consistent in intensity. Thus the answer is clearly yes. Fusion. The acquisition procedure of the 2-D autoradiographs causes 3-D deformations that occur before cutting (e.g. shrinkage), 2-D in-plane random transformations (due to positioning on films), and 2-D residual deformations due to chemical treatment and sections manipulation. By fusing a set of 2-D autora-
496
G. Malandain and E. Bardinet
diographies with an MR volume of the same individual, one expects to have found the most exact correspondence in the MR volume for each 2-D section. The fusion purpose is then to build a set of 2-D transformations that makes the sections to slide on each other (this ensures that they stay parallel to each other) together with a 3-D transformation towards the MR volume. Such a fusion can be done following a twofold approach: first align the 2-D sections with respect to a chosen reference section, yielding a reconstructed 3-D volume; then co-register this 3-D volume to the MR volume. But one of the main difficulties of the reconstruction of a 3-D volume from 2-D sections (either histological or autoradiographic) comes from the fact that a 3-D curved object can not be reconstructed from cross-sections without any additional information. The acquisition protocol can be designed so that it includes such additional information. One way is to stick needles in the material before slicing, as it will provide correspondences for a geometry-based reconstruction [3,4]. However, this can be awkward, e.g. if needle holes collapse. Another approach consists in taking photographs of the surface of the material during the slicing process, including a reference system fixed on the cryomicrotome [15,16]. Alignment of the photographs using the reference system will then provide a photographic 3-D volume with the real geometry of the object under study, and make simpler both the reconstruction from the 2-D sections and the registration with a 3-D modality. Nevertheless, it is not always possible, for practical reasons, to have this information available. In that case, the only information about the geometry of the brain before slicing is given by the anatomical MR volume. Here, one could have thought that a direct and strict twofold approach, namely reconstruction of a 3-D volume by alignment of the 2-D sections followed by the registration of this volume with the anatomical MR volume, would directly provide a satisfactory result. This implicitly assumes that either the computed 2-D transformations will compensate the random in-plane transformations due to the acquisition procedure, which is not realistic for curved objects, or that further 3-D elastic registration may compensate for the residual distortions. However, existing elastic transformations implemented in such algorithms are obviously not adapted. First they consider in a similar manner the 3 directions of space. In our particular problem, one direction (the one orthogonal to the cutting plane) plays clearly a different role. Moreover, from a methodological point of view, it simply ignores the acquisition reality. Precisely, transformations that have occurred during the acquisition are clearly of different types, that is: 3-D, rather smooth and applied to the whole brain, or 2-D and independent from section to section. The proposed fusion methodology mimics the acquisition procedure by considering both a stack of 2-D transformations (that correspond to the displacements of the AR sections) and a 3-D transformation corresponding to the registration against the MR volume. More precisely, after an initial reconstruction of the 2-D sections, we alternate between the correction of this reconstructed AR volume (by recomputing the 2D transformations) and a 3-D registration with the MR volume. By doing this, we expect to better estimate the random in-plane transformations due to the acquisition procedure. The choice of the transformation search spaces during
Fusion of Autoradiographies with an MR Volume
497
these fusion loops, namely starting with strongly constrained ones (i.e. rigid) and slightly relaxing (ending with affine ones both in 2-D and 3-D), is made such as to preserve integrity of the AR sections. In other words, we don’t allow the AR sections to be strongly deformed before there corresponding MR slices have been localized for a given transformation search space. This way, we expect to slightly recover residual deformations and at the same time be close to the acquisition procedure. It should be pointed out that the obtained result is much more than satisfactory, although it only involves a stack of 2-D and 3-D linear transformations.
7
Conclusion
In this paper, we have described a methodology that allows to fuse 2-D sections (autoradiographies) with a 3-D volume (MR). This is done by iterating a socalled fusion loop that alternates between the correction of the reconstructed AR volume (by recomputing the 2-D transformations) and a 3-D registration. Visual inspection of the obtained results are very satisfactory and much more than sufficient for our purpose in the framework of the planned application (comparison of activations detected in autoradiographies to the one detected in fMRI series study). From a methodological point of view, it should be pointed out that the fusion only involves a stack of 2-D affine transformations and a 3-D affine transformation (no complex 3-D or 2-D elastic transformations are required): we argue that this is appropriate since it faithfully models the acquisition reality. However, we still observe some small mismatch areas in our results. Following our strategy, we could perform a last 2-D registration step between AR and corresponding MR sections using transformations with more degrees of freedom. One way would be to use transformations proposed by [16,15], and another way would be to design such a transformation by modeling the specific 2-D distortions which deform the 2-D sections, namely piecewise transformations. Finally, thanks to the acquisition procedure (the brain and the AR sections are almost always manipulated when frozen), geometrical distortions within an AR section are somehow minimized. It will be challenging to process histological stained data with the same method since, in the later case, strong distortions will certainly occur because of the staining procedure. Acknowledgments. This work was partially funded by European project MAPAWAMO (ref. QLG3-CT-2000-30161; coordinator: Pr. Guy Orban, K.U. Leuven, Belgium). We thank W. Vanduffel, K. Nelissen, D. Fize and G. Orban for providing us with the autoradiographs and for stimulating discussions.
References 1. Muthuswamy, M., Roberson, P., Haken, R.T., Buchsbaum, D.: A quantitative study of radionuclide characteristics for radioimmunotherapy from 3d reconstructions using serial autoradiography. Int J Radiat Oncol Biol Phys 35 (1996) 165–172
498
G. Malandain and E. Bardinet
2. Deverell, M., Salisbury, J., Cookson, M., Holman, J., Dykes, E., Whimster, F.: Three-dimensional reconstruction: methods of improving image registration and interpretation. Analytical Cellular Pathology 5 (1993) 253–263 3. Goldszal, A., Tretiak, O., Liu, D., Hand, P.: Multimodality multidimensional image analysis of cortical and subcortical plasticity in the rat brain. Ann Biomed Eng 24 (1996) 430–439 4. Humm, J., Macklis, R., Lu, X., Yang, Y., Bump, K., Beresford, B., Chin, L.: The spatial accuracy of cellular dose estimates obtained from 3D reconstructed serial tissue autoradiographs. Phys Med Biol 40 (1995) 163–180 5. Maintz, J., Viergever, M.: A survey of medical image registration. Medical Image Analysis 2 (1998) 1–36 6. Shormann, T., Zilles, K.: Limitation of the principal axes theory. IEEE Transactions on Medical Imaging 16 (1997) 942–947 7. Hess, A., Lohmann, K., Gundelfinger, E., Scheich, H.: A new method for reliable and efficient reconstruction of 3-dimensional images from autoradiographs of brain sections. Journal Neurosciences Methods 84 (1998) 77–86 8. Cohen, F., Yang, Z., Huang, Z., Nissanov, J.: Automatic matching of homologous histological sections. IEEE Trans. on Biomedical Engineering 45 (1998) 642–649 9. Kim, B., Frey, K., Mukhopadhyay, S., Ross, B., Meyer, C.: Co-registration of MRI and autoradiography of rat brain in three-dimensions following automatic reconstruction of 2D data set. In: Proc of CVRMed. Volume 905 of LNCS., Springer (1995) 262–266 10. Rangarajan, A., Chui, H., Mjolsness, E., Pappu, S., Davachi, L., Goldman-Rakic, P., Duncan, J.: A robust point-matching algorithm for autoradiograph alignment. Medical Image Analysis 1 (1997) 379–398 11. Ourselin, S., Roche, A., Subsol, G., Pennec, X., Ayache, N.: Reconstructing a 3D structure from serial histological sections. Ima Vis Comp 19 (2001) 25–31 12. Kim, B., Boes, J.L., Frey, K.A., Meyer, C.R.: Mutual information for automated unwarping of rat brain autoradiographs. Neuroimage 5 (1997) 31–40 13. Besl, P., McKay, N.: A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 14 (1992) 239–256 14. Reisner, A., Bucholtz, C., Bell, G., Tsui, K., Rosenfeld, D., Herman, G.: Twoand three-dimensional image reconstructions from stained and autoradiographed histological sections. Comput Appl Biosci 6 (1990) 253–261 15. Schormann, T., Zilles, K.: Three-dimensional linear and nonlinear transformations: an integration of light microscopical and MRI data. HBM 6 (1998) 339–347 16. Mega, M.S., Chen, S.S., Thompson, P.M., Woods, R.P., Karaca, T.J., Tiwari, A., Vinters, H.V., Small, G.W., Toga, A.W.: Mapping histology to metabolism: Coregistration of stained whole-brain sections to premortem PET in Alzheimer’s disease. Neuroimage 5 (1997) 147–153 17. Vanduffel, W., Tootell, R., Orban, G.: Attention-dependent suppression of metabolic activity in the early stages of the macaque visual system. Cerebral Cortex 10 (2000) 109–126 18. Roche, A., Malandain, G., Ayache, N.: Unifying maximum likelihood approaches in medical image registration. International Journal of Imaging Systems and Technology: Special Issue on 3D Imaging 11 (2000) 71–80 19. Rousseuw, P., Leroy, A.: Robust Regression and Outlier Detection. Wiley (1987) 20. Castleman, K.R.: Point Operations. In: Digital Image Processing. Prentice Hall International Editions (1996) 83–97 21. Pennec, X., Ayache, N.: Uniform distribution, distance and expectation problems for geometric features processing. J of Math. Imaging and Vision 9 (1998) 49–67
Bayesian Multimodality Non-rigid Image Registration via Conditional Density Estimation Jie Zhang and Anand Rangarajan Department of Computer and Information Science & Engineering University of Florida, Gainesville, FL, USA
Abstract. We present a Bayesian multimodality non-rigid image registration method. Since the likelihood is unknown in the general multimodality setting, we use a density estimator as a drop in replacement to the true likelihood. The prior is a standard small deformation penalty on the displacement field. Since mutual information-based methods are in widespread use for multimodality registration, we attempt to relate the Bayesian approach to mutual information-based approaches. To this end, we derive a new criterion which when satisfied, guarantees that the displacement field which minimizes the Bayesian maximum a posteriori (MAP) objective also maximizes the true mutual information (with a small deformation penalty) as the number of pixels tends to infinity. The criterion imposes an upper bound on the number of configurations of the displacement field. Finally, we compare the results of the Bayesian approach with mutual information, joint entropy and joint probability approaches on synthetic data and simulated T1 and T2 2D MR images.
1
Introduction
Non-rigid multimodality image registration is an impossibly general term covering that aspect of the registration problem wherein the data are from different modalities or imaging protocols and where there is a need to recover deformations. Potentially, the data could arise from diverse domains such as MR, fMRI, CT, PET, SPECT, ultrasound, portal films etc. and the registration may need to be performed across subjects, between pre-operative and post-operative situations and so on. Despite the overarching generality of the term, the emerging consensus in medical imaging is that there is a need to have a general information processing vocabulary to deal with situations where a) imaging data are from different modalities, b) a flexible (non-rigid as opposed to rigid, similarity or affine) registration is required and c) there appears to be a reasonable similarity (usually unspecified) in the intensity patterns of the two modalities to warrant embarking upon this task in the first place. One particular general methodology that has arisen in recent years—first for rigid registration and increasingly for non-rigid registration—is maximization of mutual information (MI) [7], [12]. Since the main stumbling block in multimodality registration is the inability (or unwillingness?) to model the joint probability C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 499–511, 2003. c Springer-Verlag Berlin Heidelberg 2003
500
J. Zhang and A. Rangarajan
between the two imaging sources directly using imaging physics, most recent approaches take the nonparametric density estimation route and model the joint probability using histogramming, Parzen windows and the like. Once the joint probability is available, the mutual information can be computed under the assumption that the density estimator is the true probability distribution. Since this is not true in general, henceforth we refer to these approaches as maximizing the empirical mutual information (EMI). The Bayesian approach taken in this paper has much in common with EMI and with maximizing the true mutual information (both with an added deformation penalty). Instead of computing the empirical mutual information from the joint probability, we compute the conditional probability using a density estimator and use it as a drop in replacement for the true likelihood, which following tradition, we assume to be unknown. We then ask the following question: What is the relationship between the minimizer of the Bayesian maximum a posteriori (MAP) objective function and the maximizer of the true mutual information (with an added deformation penalty)? To answer this question, we derive a criterion—a sufficient condition—which when satisfied, guarantees that the minimizer of the Bayesian MAP objective function also maximizes the true mutual information in the limit as the number of pixels tends to infinity. The c criterion imposes an O(N (N ) ) (with c < 1) upper bound on the number of allowed configurations of the displacement field which is trivially satisfied in rigid registration provided the parameters are quantized. The case in the general nonrigid setting is less clear. Since the number of allowed configurations is only implicitly restricted via a deformation penalty rather than being limited by fiat, the implications of this upper bound for Bayesian MAP non-rigid registration vis-a-vis the true mutual information remain to be seen. Since the pioneering contributions of [7], [12], mutual information-based methods have become the de facto standard for multimodality rigid registration. Beginning with [4] and [8], mutual information-based methods have slowly started seeing traction for non-rigid registration as well [9], [2], [3]. The work in [6] was one of the first to point out a connection between maximizing the joint probability and mutual information. Also, the work in [3] uses sample averages in place of the true expectation which is similar to this work. Recently, attempts have been made to relate likelihood ratios (with joint densities pitted against products of marginals in a hypothesis testing paradigm) and mutual information [5]. We have not found any previous work which begins with the conditional probability estimate and uses it as a likelihood in an overall Bayesian estimation framework.
2 2.1
Bayesian and MI-Based Registration: Deriving a Relationship Conditional Density Estimation (k)
Assume that we have two images I (1) and I (2) , and let Ii be the intensity value of image I (k) at location i [pixel position (x, y)], k = 1, 2, i = 1, . . . , N , where N
Bayesian Multimodality Non-rigid Image Registration
501
is the total number of pixels in each image. (While our development is mainly for 2D, extensions to 3D appear straightforward.) The conditional probability of I (1) (1) (2) given I (2) at location i is denoted by Pr(Ii |Ii ). Also, for the remainder of the paper, non-rigid warps will be applied solely to I (2) with I (1) held fixed. Instead of using the original (possibly real-valued) intensity, we will use the following binned intensity value. This is done for a technical reason which will become more obvious as we proceed. The binned intensity for each image is (k)
(k)
Bi
= round[(K (k) − 1) ×
Ii
(k)
max1≤j≤N {Ij }
+ 1], k = 1, 2.
(1)
From (1), we see that the binned intensity values are integers in {1, . . . , K (k) }. We use (2) to compute the conditional probability at location i: (1)
(1)
(2)
Pr(Bi |Bi ) =
2.2
(2)
Pr(Bi , Bi ) (2)
.
(2)
Pr(Bi )
Bayesian Non-rigid Registration
As mentioned previously, we seek to register image I (2) to image I (1) . To achieve this, we set up a displacement vector ui at each location i. We denote an entire configuration of {ui , i ∈ 1, . . . , N } by u and the set of allowed configurations of u by Λ. A non-rigid warping of I (2) exists for each configuration u (with the understanding that interpolation may be necessary to generate a valid warped I (2) ). From Bayes’ rule, we get Pr(u|B (1) , B (2) ) =
Pr(B (1) |B (2) , u) Pr(u) Pr(B (1) |B (2) )
(3)
from which we get log Pr(u|B (1) , B (2) ) = log Pr(B (1) |B (2) , u)+log Pr(u)−log Pr(B (1) |B (2) ). (4) Since the probability Pr(B (1) |B (2) ) = u∈Λ Pr(B (1) |B (2) , u) Pr(u) is independent of u, we have log Pr(u|B (1) , B (2) ) ∝ log Pr(B (1) |B (2) , u) + log Pr(u).
(5)
Consequently, from a Bayesian perspective, the non-rigid registration problem becomes ˆ = arg min E(u) = arg min − log Pr(B (1) |B (2) , u) − log Pr(u). u u
u
(6)
We use a standard small deformation smoothness constraint on u which can be written as − log Pr(u) ∝ Lu2 . We assume conditional independence of B (1) given B (2) over the pixel locations. This assumption is clearly not necessary since the image processing/computer vision literature is replete with correlated
502
J. Zhang and A. Rangarajan
random field image models (ranging from simplistic to baroque) [13]. However, most random field models require the estimation of further parameters which increases the estimation burden. In addition, EMI-based registration methods have traditionally used very simple density estimation procedures such as histogramming [7] and Parzen windows [12] which sets a clear precedent for us. With these simplifications in place, we obtain the following Bayesian maximum a posteriori (MAP) objective function EMAP (u) = −
N 1 (1) (2) log Pr(Bi |Bi , u) + λLu2 N i=1
(7)
where we have normalized the negative log-likelihood in the first term of (7). The parameter λ is a regularization parameter and L is the regularization operator. In the 2D case, we choose ∂2u ∂2u 2 ∂2u [( 2 )2 + 2( (8) Lu2 = ) + ( 2 )2 ]dxdy ∂x∂y ∂y x y ∂x which is a standard thin-plate spline [11] small deformation cost. 2.3
Convergence of the Bayesian MAP Minimizer in the General Setting
In this section, we examine the convergence properties of the minimizer of the Bayesian MAP objective function in (7) as the number of pixels N tends to infinity. In the non-rigid setting, the cardinality of u scales linearly with N and as we shall see, this complicates the convergence proof. We begin by assuming that the chosen density estimator converges to the true density as N tends to infinity. This is usually true of histogram and Parzen window estimators. Denote the true density of B (1) , B (2) and the pair (B (1) , B (2) ) ¯ (1) ), Pr(B ¯ (2) ) and Pr(B ¯ (1) , B (2) ) respectively and the corresponding esby Pr(B (1) ˆ (2) ), and Pr(B ˆ (1) , B (2) ). We assume that ˆ timated densities by Pr(B ), Pr(B ˆ (1) ) = Pr(B ˆ (2) ) = Pr(B ¯ (1) ), lim Pr(B ¯ (2) ) lim Pr(B
(9)
ˆ (1) , B (2) ) = Pr(B ¯ (1) , B (2) ). lim Pr(B
(10)
N →∞
N →∞
and N →∞
With this notation in place, we can write the true mutual information as M I(u) =
(1) (2) K K
¯ ¯ Pr(a, b|u) log Pr(a|b, u) + terms independent of u
(11)
a=1 b=1
and the empirical mutual information as EM I(u) =
(1) (2) K K
a=1 b=1
ˆ ˆ Pr(a, b|u) log Pr(a|b, u) + terms independent of u.
(12)
Bayesian Multimodality Non-rigid Image Registration
503
The objective function we would like to minimize is EMI (u) = −M I(u) + λLu2 .
(13)
Since we use the framework of statistical learning theory [10] throughout this paper, we call EMI (u) the expected risk. This objective function is not computable ¯ is unknown and is only approached by our density since the true distribution Pr ˆ as N tends to infinity. Instead of minimizing the expected risk, we estimator Pr minimize the Bayesian MAP objective function, a.k.a. the empirical risk which is the same as (7): EMAP (u) = −LL(u) + λLu2 (14) where the log-likelihood LL(u) is defined as def
LL(u) = (1)
N 1 ˆ (1) |B (2) , u). log Pr(B i i N i=1
(15)
(2)
ˆ In (15), Pr(B i |Bi , u) is the estimated distribution from N samples. We are interested in the relationship between the minimizers of (14) and (13). Let the minimum of the expected risk in (13) be achieved by the displacement field ur and the minimum of the empirical risk in (14) be achieved by the displacement field ue . The following question is our main concern in this paper: What is EMI (ue ) − EMI (ur ), or how close is the value of the expected risk attained by the minimizer ue of the Bayesian MAP objective function (empirical risk) to the maximum value of the true mutual information (expected risk) which is attained by ur ? We answer this question by proving the following theorem. It turns out that the theorem requires a specific choice of the density estimator. We pick the estimator used in [13] which is closely related to histogramming. Assuming an i.i.d. density for each pixel, our chosen density estimator is p(I) =
N
δ(I − Ij )
(16)
j=1
+∞ where δ(·) is the Dirac delta function with −∞ δ(x)dx = 1. Note that p(I) in (16) is a density function and not a discrete distribution as required. Since the Dirac delta function is not differentiable, we switch to a continuous approximation x2 1 exp{− 2 }. (17) δ(x) ≈ √ 2σ 2πσ 2 This approximation is increasingly exact as σ → 0 and is continuous and differentiable for σ > 0. Finally, since we use binned intensities B, we normalize
504
J. Zhang and A. Rangarajan
the above delta function approximation to get the final density estimator used in this paper. The joint probability distribution is N ˆ (1) , B (2) ) Pr(B i i
=
K (1)
β (1) =1
(1)
exp{− K (2) N j=1
β (2) =1
(1)
(2)
(2)
(Bi −Bj )2 +(Bi −Bj )2 } 2σ 2 (1)
j=1 exp{−
(2)
(β (1) −Bj )2 +(β (2) −Bj )2 } 2σ 2
(18)
with similar expressions for the marginals and with the understanding that we are dealing with i.i.d. pixels. As mentioned previously, Λ = {u} is the set of all possible configurations of u. For the non-rigid registration problem with N -pixel images, Λ—the cardinality of Λ—is bounded from above by N N , since each pixel can potentially move to any other location. Since the upper bound for Λ scales with N , we let Λ be a function of N : Λ = g(N ) (19) Theorem 1: With the probability distributions estimated as in (18), and with the total number of configurations of u as in (19), the inequality 1 ¯ ˆ ˆ (1) |B (2) , u)]) > Pr(a, b|u)[− log Pr(a|b, u)] + log Pr(B Pr(sup [ i i N u∈Λ i ab
−
≤ g(N )e def
(1) 2
22 N τ (K (1) ,K (2) ,σ)
(20)
(2) 2
+(K ) is valid where τ (K (1) , K (2) , σ) = (K ) 2σ + log(K (1) K (2) ) and is any 2 given positive small number. Proof: An abbreviated proof of Theorem 1 follows. The basic idea is to calculate ˆ (1) |B (2) , u) and lower and upper bounds of the estimated distribution − log Pr(B i i then apply Hoeffding’s inequality [10] for real-valued bounded functions to get
1 ¯ ˆ ˆ (1) |B (2) , u)]) > Pr(sup [ Pr(a, b|u)[− log Pr(a|b, u)] + log Pr(B i i N u∈Λ i ab 1 ˆ ¯ ˆ ˆ (1) |B (2) , u)] > Pr[ Pr(a, b|u)[− log Pr(a|b, u)] + log Pr(B ≤ i i N i u∈Λ
ab
−
≤ g(N )e
22 N τ (K (1) ,K (2) ,σ)
(21)
which is the desired result. From (21), we obtain with probability 1 − η (where 2
def
η = exp{− (K (1) )2 +(K (2)2 )2 2σ 2
N
+log(K (1) K (2) )
}), the inequality
(1) (2) K K
a=1
N 1 ¯ ˆ ˆ (1) |B (2) , ue ] Pr(a, b|u)[− log Pr(a|b, ue )] − [− log Pr(B i i N i=1 b=1 (1) 2 log g(N ) − log η (K ) + (K (2) )2 (1) (2) + log(K K ) ≤ 2σ 2 2N
(22)
Bayesian Multimodality Non-rigid Image Registration
Let def
EEMAP (u) = −
(1) (2) K K
¯ ˆ Pr(a, b|u)[− log Pr(a|b, u)] + λLu2 .
505
(23)
a=1 b=1
Adding and subtracting λLue 2 to both terms on the left side of (22), we get log g(N ) − log η (1) (2) EEMAP (ue ) − EMAP (ue ) ≤ τ (K , K , σ) . (24) 2N Once again, from Hoeffding’s inequality, we have (1) (2) K K
a=1 b=1
N 1 ˆ (1) |B (2) , ur )] ¯ ˆ [− log Pr(B Pr(a, b|u)[− log Pr(a|b, ur )] ≥ i i N i=1 (K (1) )2 + (K (2) )2 − log η (1) (2) −[ . (25) + log(K K )] 2σ 2 2N
Adding λLur 2 to both sides of (25) we can state more simply that − log η (K (1) )2 + (K (2) )2 (1) (2) EEMAP (ur ) − EMAP (ur ) ≥ −[ . + log(K K )] 2 2σ 2N (26) Since ue is the minimizer of the empirical risk (Bayesian MAP objective function), EMAP (ur ) ≥ EMAP (ue ) and hence 0 ≤ EEMAP (ue ) − EEMAP (ur ) ≤ [EEMAP (ue ) − EMAP (ue )] +[EMAP (ur ) − EEMAP (ur )]. Substituting (24) and (26) in (27), we see that 0 ≤ EEMAP (ue ) − EEMAP (ur ) ≤ τ (K
If we assume that lim
N →∞
(1)
,K
(2)
, σ)[
log g(N ) − log η + 2N
log g(N ) = 0, N
(27)
− log η ]. 2N (28)
(29)
then we have lim |EEMAP (ue ) − EEMAP (ur )| = 0.
N →∞
(30)
Using the above, we determine EMI (ue ) − EMI (ur ): EMI (ue ) − EMI (ur ) = [EMI (ue ) − EEMAP (ue )] + [EEMAP (ue ) − EEMAP (ur )] + [EEMAP (ur ) − EMI (ur )]
(31)
where EMI (ue ) − EEMAP (ue ) =
(1) (2) K K
a=1 b=1
and
ˆ Pr(a|b, ue ) ¯ Pr(a, b|ue ) log[ ¯ ] Pr(a|b, ue )
(32)
506
J. Zhang and A. Rangarajan
EMI (ur ) − EEMAP (ur ) =
(1) (2) K K
a=1 b=1
ˆ Pr(a|b, ur ) ¯ Pr(a, b|ur ) log[ ¯ ]. Pr(a|b, ur )
(33)
From our assumptions (9) and (10), we know that ˆ ¯ u) = Pr(a|b, u). lim Pr(a|b,
N →∞
(34)
Hence lim |EEMAP (ue )−EEMAP (ue )| = 0, and lim |EEMAP (ur )−EEMAP (ur )| = 0.
N →∞
N →∞
(35) With (30), (31) and (35) we have lim |EMI (ue ) − EMI (ur )| = 0.
N →∞
(36)
We have shown that the minimizers of the expected risk (true mutual information) and the empirical risk (Bayesian MAP objective function) coincide as the number of samples approaches infinity provided that a sufficient condition (29) is met. It is time that we took a closer look at g(N ) which is the cardinality of the number of allowed configurations of u. In non-rigid registration, the upper bound of g(N ) is N N if we allow each pixel to move to any other location. In sharp contrast, in rigid registration, the upper bound of g(N ) is C 6 in 2D where we have assumed 6 free parameters (affine) with each parameter quantized into C (independent of N ) bins. It should be obvious that if g(N ) = N N , we cannot conclude that g(N ) limN →∞ log N = 0 and therefore that limN →∞ |EMI (ue )−EMI (ur )| = 0. But if c we impose an extremely minor restriction on u to the effect that g(N ) = N (N ) , g(N ) where 0 ≤ c < 1, then we have limN →∞ log N = 0 and the proof goes through resulting in limN →∞ |EMI (ue ) − EMI (ur )| = 0. Clearly, from this proof’s viewpoint, allowing every pixel to potentially visit any other location would not allow us to state that the Bayesian and the true MI measure minimizers coincide. To satisfy this criterion, we would need to place a small restriction on the set of allowed configurations.
3
Experiments and Results
In this section, we compare four different multimodality registration methods on synthetic data and simulated T1 and T2 2D MR images. The four methods used are i) Bayesian MAP, ii) empirical mutual information, iii) empirical joint probability and iv) empirical joint entropy with a small deformation penalty added to each one. The objective functions corresponding to the four methods are
Bayesian Multimodality Non-rigid Image Registration
507
N
ˆ (1) |B (2) , u) + λLu2 . log Pr(B i i (1) K K (2) ˆ ˆ 2. Mutual information: EEMI (u) = − a=1 b=1 Pr(a, b|u) log Pr(a|b, u) + λLu2 . N ˆ (1) , B (2) |u) + λLu2 . 3. Joint probability: EEJP (u) = − N1 i=1 log Pr(B i i (1) (2) K K ˆ ˆ 4. Joint entropy: EEJE (u) = − a=1 b=1 Pr(a, b|u) log Pr(a, b|u) + λLu2 . 1. Bayesian MAP: EMAP (u) = − N1
i=1
For each method, we use essentially the same optimization approach: Multimodality Non-rigid registration Set n = 0. Perform Gaussian smoothing (with σSmooth ) on the two images. Begin A: Do A until |∆E| ≤ δ or n ≥ T . Initialize u to zero. This is equivalent to an identity transformation. (n+1) < 0. u(n+1) = u(n) − α(n+1) ∆E ∆u . Choose α > 0 such that ∆E Perform bilinear interpolation. n←n+1 End A In the above, T is an iteration cap, δ is a convergence threshold and α is a standard step-size parameter. We use numerical differentiation (with = 1) for computing ∆E ∆u . It should be understood that the objective function E in the algorithm is a placeholder for any of the four methods mentioned above. In all the experiments below, T = 20, δ = 0.01, σ = 0.1, σSmooth = 0.5, K (1) = K (2) = 8, and λ = 0.05. 3.1
Experiments on Simple Multimodality Shape Images
We first create two images as shown in Figure 1. The circle and square shapes are swapped and the intensities of each shape differ between the two images.
Fig. 1. Left and middle: Two simple shape images. Right: Final registration result.
The Bayesian MAP approach was used to register the right image to the left image. A small amount of isotropic Gaussian smoothing was performed prior to registration. At each iteration, we also observe the values of the empirical mutual information (EMI), empirical joint probability (EJP) and empirical joint entropy (EJE). From Figure 2, we see that the likelihood and mutual information plots are very similar whereas the joint probability and negative joint entropy actually decrease which is counter intuitive. (We also observe that the likelihood does not increase monotonically which is perhaps due to a) bilinear interpolation factors and b) the deformation penalty.) The algorithm was able to convert the circle and the square to approximately a square and a circle in about 6 iterations.
J. Zhang and A. Rangarajan 0.52 0.5 mutual information
log−likelihood
−0.1 −0.12 −0.14 −0.16 −0.18 −0.2 −0.22
−1.1
0.48 0.46 0.44 0.42
negative joint entropy
−0.08
joint probability
508
−1.2
−1.3
−1.1
−1.2
−1.3
0.4 0
2
4 6 iteration
8
10
0.38
0
2
4 6 iteration
8
10
−1.4
0
2
4 6 iteration
8
10
−1.4
0
2
4 6 iteration
8
10
Fig. 2. The changes in i) log-likelihood, ii) mutual information, iii) joint probability and iv) negative joint entropy when minimizing the Bayesian MAP objective function.
3.2
Experiments on Simulated T1 and T2 2D MR Images
In our next experiment, we chose two frames generated by the powerful Brainweb MR simulator [1]. The 2D T1 and T2 axial images are shown in Figure 3. We used a 3% noise level for the simulation which uses the ICBM protocol. The advantage of using the Brainweb simulator is that the ground truth is known. Any non-rigid deformation applied to a T1 image can be simultaneously applied to its T2 counterpart.
Fig. 3. Leftmost: transverse T2 image. Left middle: transverse T1 image. Middle: Deformed T1. Right middle: Intensity difference between original T1 and deformed T1 prior to registration. Right: Unwarped final T1 image
0.8
−0.95
−1
−1.05
−2.6
0.75 0.7 0.65 0.6
2
4
6 iteration
8
10
0.55
Negative Joint Entropy
−0.9
Joint Probability
Mutual Information
Log−Likelihood
−0.85
−2.7
−2.8
−2.9 2
4
6 iteration
8
10
−2.6
−2.7
−2.8
−2.9 2
4
6 iteration
8
10
2
4
6 iteration
8
10
Fig. 4. The changes in i) log-likelihood, ii) mutual information, iii) joint probability and iv) negative joint entropy when minimizing the Bayesian MAP objective function.
Bayesian Multimodality Non-rigid Image Registration
509
In our experiments, we used a Gaussian radial basis function (GRBF) spline as the non-rigid parameterization. The deformed T1 image and the intensity difference between the original T1 and the deformed T1 image are shown in Figure 3. The registration algorithm attempts to register the deformed T1 image to the original T2 image. The deformed T1 image is gradually unwarped during registration. During the execution of the Bayesian MAP algorithm, we also observe the values of the empirical mutual information (EMI), empirical joint probability (EJP) and empirical joint entropy (EJE). The results are shown in Figure 4. In this case, all four curves mostly show an increase which is somewhat different from the behavior in Figure 2. Once again, the MAP and EMI curves are in lockstep as are EJP and EJE. As a comparison between the different algorithms, we executed all four approaches on the same data. The difference images (between original T1 and unwarped T1) shown in Figure 5 clearly indicate that the MAP and EMI algorithms are superior to the EJP and EJE algorithms.
Fig. 5. Difference images between original T1 and unwarped T1. Left: MAP. Left middle: EMI. Right middle: EJP. Right: EJE. The SSDs were 609 before registration, and 20.38 (MAP), 20.74 (EMI), 52.49 (EJP) and 52.47 (EJE) after registration.
We performed another experiment on the pair of T1 and T2 simulated 2D MR images. This time, the deformation on the original T1 image was much larger. The original and deformed T1 images are shown on the left in Figure 6. The results of executing the Bayesian MAP algorithm on the deformed T1 and T2 images are shown on the right in Figure 6. Clearly, despite errors near high gradient boundaries (which are mostly caused by interpolation artifacts), the MAP algorithm is up to the task of recovering a reasonably large global deformation.
4
Conclusion
The main contributions of this paper are i) a Bayesian approach to multimodality non-rigid registration, ii) use of an analytical, smoothed histogram-like density, iii) the derivation of an upper bound for the number of configurations of the displacement field which guarantees that the minimizer of the MAP objective approaches that of the true mutual information as the number of pixels tends to infinity and iv) experimental confirmation of the Bayesian approach and its close relationship to the empirical mutual information (EMI). We think this is the first
510
J. Zhang and A. Rangarajan
Fig. 6. Leftmost: Original T1 image, Left middle: deformed T1 image. Middle: Intensity difference between original T1 and deformed T1 before registration. Right middle: Intensity difference between original T1 and unwarped T1 after registration. Right: Unwarped final T1 image. The before and after SSDs were 647 and 59 respectively.
time that such a quantitative criterion has been derived to help assess the validity of non-parametric density estimation approaches to multimodality non-rigid registration. The criterion—a sufficient condition—requires that the number of c allowed configurations of the displacement field be restricted to O(N (N ) ). The criterion is easily satisfied in the case of quantized, affine parameter configurations and is usually satisfied in practice in the general non-rigid setting. However, more work is needed to fully understand the impact of this criterion on Bayesian MAP non-rigid registration. While we have elected to use a nonparametric (histogram-like) drop in replacement for the true likelihood, there is no reason why parametric alternatives (such as mixture models [6], Gibbs-Markov models [13] and the like) cannot also be considered. The principal drawback of using a parametric density is that additional parameters have to be estimated. However, if the additional parameters are O(1), then the added estimation burden does not appear to be formidable. These alternatives present appealing avenues for future research. Acknowledgements. We acknowledge support from NSF IIS 0196457. This paper was inspired by a conversation with Sarang Joshi at IPMI 2001, Davis, CA.
References 1. D. L. Collins, A. P. Zijdenbos, V. Kollokian, J. G. Sled, N. J. Kabani, C. J. Holmes, and A. C. Evans. Design and construction of a realistic digital brain phantom. IEEE Trans. Med. Imag., 17(3):463–468, 1998. 2. T. Gaens, F. Maes, D. Vandermeulen, and P. Suetens. Non-rigid multimodal image registration using mutual information. In W. Wells, A. Colchester, and S. Delp, editors, Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 1099–1106. Springer, 1998.
Bayesian Multimodality Non-rigid Image Registration
511
3. N. Hata, T. Dohi, S. Warfield, W. Wells, R. Kikinis, and F. A. Jolesz. Multimodality deformable registration of pre- and intraoperative images for MRI-guided brain surgery. In W. Wells, A. Colchester, and S. Delp, editors, Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 1067–1074. 1998. 4. B. Kim, J. L. Boes, K. A. Frey, and C. R. Meyer. Mutual information for automated unwarping of rat brain autoradiographs. NeuroImage, 5:31–40, 1997. 5. J. Kim, J. W. Fisher, A. Tsai, C. Wible, A. S. Willsky, and W. Wells. Incorporating spatial priors into an information theoretic approach for fMRI data analysis. In W. Wells, A. Colchester, and S. Delp, editors, Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 62–71. Springer, 2000. 6. M. E. Leventon and W. E. L. Grimson. Multi-modal volume registration using joint intensity distributions. In W. Wells, A. Colchester, and S. Delp, editors, Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 1057–1066. Springer, 1998. 7. F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, and P. Suetens. Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imag., 16(2):187–198, 1997. 8. J. B. A. Maintz, H. W. Meijering, and M. A. Viergever. General multimodal elastic registration based on mutual information. In Medical Imaging—Image Processing (SPIE 3338), volume 3338, pages 144–154. SPIE Press, 1998. 9. D. Rueckert, L. I. Sonoda, C. Hayes, D. L. G. Hill, M. O. Leach, and D. J. Hawkes. Non-rigid registration using free-form deformations: Application to breast MR images. IEEE Trans. Med. Imag., 18(8):712–721, 1999. 10. V. N. Vapnik. Statistical learning theory. John Wiley, New York, 1998. 11. G. Wahba. Spline models for observational data. SIAM, Philadelphia, PA, 1990. 12. W. Wells III, P. Viola, H. Atsumi, S. Nakajima, and R. Kikinis. Multi-modal volume registration by maximization of mutual information. Medical Image Analysis, 1(1):35–52, 1996. 13. S. C. Zhu, Y. N. Wu, and D. B. Mumford. Minimax entropy principle and its applications to texture modeling. Neural Computation, 9(8):1627–1660, 1997.
Spatiotemporal Localization of Significant Activation in MEG Using Permutation Tests1 1
2
3
1
Dimitrios Pantazis , Thomas E. Nichols , Sylvain Baillet , and Richard M. Leahy 1
Signal & Image Processing Institute, University of Southern California, Los Angeles, CA 90089-2564, USA {pantazis, leahy}@sipi.usc.edu 2 Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109-2029, USA [email protected] 3 Neurosciences Cognitives & Imagerie Cerebrale CNRS UPR640-LENA, Hospital de la Salpetriere, Paris, France [email protected]
Abstract. We describe the use of non-parametric permutation tests to detect activation in cortically-constrained maps of current density computed from MEG data. The methods are applicable to any inverse imaging method that maps event-related MEG to a coregistered cortical surface. To determine an appropriate threshold to apply to statistics computed from these maps, it is important to control for the multiple testing problem associated with testing 10’s of thousands of hypotheses (one per surface element). By randomly permuting pre- and post-stimulus data from the collection of individual epochs in an event related study, we develop thresholds that control the familywise (type 1) error rate. These thresholds are based on the distribution of the maximum intensity, which implicitly accounts for spatial and temporal correlation in the cortical maps. We demonstrate the method in application to simulated data and experimental data from a somatosensory evoked response study.
1 Introduction Cortically constrained spatio-temporal maps of neural activity can be computed from event related MEG data using linear inverse methods to estimate source current densities within pyramidal cells in the cortex. One of the most commonly used approaches extracts a representation of the cerebral cortex from a coregistered MR image, tessellates the result, and solves a linear inverse problem for elemental sources located at each of the vertices of the tessellated surface. The problem is hugely underdetermined, so that regularization methods are typically used [1,2]. The resulting current density maps (CDMs) are in general low resolution; interpretation is further confounded by the presence of additive noise exhibiting strong spatial correlation. As with fMRI 1
This work was supported by grant R01 EB002010 from the National Institute of Biomedical Imaging and Bioengineering.
C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 512–523, 2003. © Springer-Verlag Berlin Heidelberg 2003
Spatiotemporal Localization of Significant Activation
513
images, objective assessment of CDMs requires a principled approach to identifying regions of significant activation. Dale et al. [2] normalize the CDMs using an estimate of the background noise variance at each cortical element. These normalized images follow a t-distribution under the null hypothesis of Gaussian background noise. Thresholding the images will produce maps of significant activation. However, testing at each surface element gives rise to the multiple comparisons problem: if significance is set at the p=.05 level, for example, then statistically 5% of the surface elements will give false positives. We wish to determine a threshold to achieve the desired family-wise error rate (FWER). The simplest solution is the Bonferroni correction which scales the p-value by the number of tests performed (i.e. number of surface elements). This is of little practical value in neuroimaging experiments since, due to strong spatial dependence, it is very conservative. The most widely used methods in analysis of neuroimaging data use random field theory and make inferences based on the maximum distribution. The maximum plays an essential role in controlling FWER. Consider a statistic image Ti , thresholded at u; if the null hypothesis is true everywhere, then the FWER is
P( FWE ) = P (∪ i {Ti > u}) = P(max i Ti > u ) = 1 − Fmax T (u ).
(1)
That is, a familywise error occurs when one or more Ti are above the threshold u, but this can only occur when the maximum of the Ti is above u. Hence, to control the th FWER at level α, one needs to find the (1-α)100 percentile of the maximum distribution, −1 u = Fmax T (1 − α ) .
(2)
The random field methods proceed by fitting a general linear model to the data. The parameters of this model are estimated and then contrasted (using t-tests, F-tests, paired t-tests, ANOVA or others) to produce a statistic image. In this framework, a closed form approximation for the tail of Fmax T is available, based on the expected value of the Euler characteristic of the thresholded image. The parametric framework is valid for PET and smoothed fMRI data. However, the assumptions for the p-value local maxima and the size of the suprathreshold clusters do not hold directly for MEG data because of spatially variant noise correlation on the cortical surface. One solution to this problem is to use a transformation that warps or flattens the image into a space where the data are isotropic [3]. This approach can be applied directly to MEG, for example to determine an appropriate threshold for the noise-weighted maps described in Dale et al [2]. An application of this framework to MEG is described in [4] but the method is specifically tailored to beamforming methods rather than the linear inverse methods of interest here. These parametric random field methods require the usual parametric assumption of normality at each spatial location, in addition to random field assumptions of a point spread function with two derivatives at the origin, sufficient smoothness to justify the application of the con-
514
D. Pantazis et al.
tinuous random field theory, and a sufficiently high threshold for the asymptotic results to be accurate. Non-parametric methods rely on minimal assumptions, deal with the multiple comparisons problem and can be applied when the assumptions of the parametric approach are untenable. They have also outperformed the parametric approaches in the case of low degrees-of-freedom t images [5]. Non-parametric permutation tests have been applied in a range of functional imaging applications [5,6,7,8]. Permutation tests are attractive for the application to MEG data since they are exact, distribution free and adaptive to underlying correlation patterns in the data. Further, they are conceptually straightforward and, with recent improvements in desktop computing power, are computationally tractable. Blair et al [6] describe an application of this approach to analysis of EEG data as recorded at an array of electrodes; in contrast the work presented here is applied to inverse solutions in which the maps are estimates of cortical activation.
2 Method Our goal is to detect spatial and temporal regions of significant activity in MEG-based cortical maps while controlling for the risk of any false positives. We find global or local thresholds on statistics computed from the cortical maps that control the FWER . The method is introduced in a general framework to demonstrate its flexibility and adaptability to different experiments; we then describe the specific tests used in our experimental studies. 2.1 Permutation Approach We assume that MEG data are collected as a set of N stimulus-locked event-related epochs (one per stimulus repetition) each consisting of a pre- and post-stim interval of equal length. Each epoch consists of an array of data representing the measured magnetic field at each sensor as a function of time. A cortical map is computed by averaging over all N epochs and applying a linear inverse method to produce an estimate of the temporal activity at each surface element in cortex. Our goal is to detect the locations and times at which activity during the post-stim experiment period differs significantly from the background pre-stim period. The method, as described below, can be readily extended to address more complex questions involving multiple factors. To apply the permutation test, we must find permutations of the data that satisfy an exchangeability condition, i.e. permutations that leave the distribution of the statistic of interest unaltered under the null hypothesis. Permutations in space and time are not useful for these applications because of spatio-temporal dependence of the noise. Instead we rely on the exchangeability of the pre- and post-stimulus data for each epoch. Given N original epochs, we can create M ≤ 2 N permutation samples, each consisting of N new epochs. Since the inverse operator is linear, we can equivalently apply the inverse before or after averaging the permuted epochs. Consequently, we
Spatiotemporal Localization of Significant Activation
515
describe the permutation tests in terms of permutation of the images formed from individual epochs, although in practice it is more computationally efficient to average the permuted data before applying the inverse operator. Our modeling proceeds by succesively summarizing the information contained in the current density maps as illustrated in Figure 1. Current density maps for each epoch are denoted Yijk (t ) with t the time index, i the spatial index, j the permutation index, and k the epoch index, with j = 0 representing the original non-permuted data. We first summarize the data over epochs, finding the average effect Eij(t) of all epochs at each time point and spatial location. Then we summarize the data over time, creating an image Tij of the effect of interest. Finally we summarize over space to gauge the overall effect of the experiment, Sj:
Eij (t ) = summary statistic k {Yijk (t )}
(2)
Tij = summary statistic t {Eij (t )}
(3)
(4) S j = summary statistic i {Tij } Appropriate summary statistics include mean, mean absolute value, mean squared value, and absolute maximum value. Due to the nonparametric nature of the test, any test statistic can be used. However, as noted above, the maximum statistic captures the necessary information to control the FWER. Put another way, using the maximum statistic, we can return and make inferences in this dimension using the empirical maximum distribution.
Fig. 1. Illustration of the summarizing procedure used to construct empirical distributions from the permuted data: M permutation samples Yijk (t ) are produced from the original data Yi 0 k (t ) . The data are then summarized successively in epochs, time and space according to equ (2)-(4) respectively, to produce S j . The empirical distribution of S j can be used to draw statistical inferences for the original data.
516
D. Pantazis et al.
If we were interested in making inferences among epochs, such as looking for habituation effects, we would have to use a maximum statistic in equ (2). However, in our case we will assume no structured experimental variation among epochs and use the average, which is also consistent with the standard procedure for analyzing event related MEG data. For the time summarizing statistic in equ (3) we use the maximum over all post-stimulus data. This allows us to maintain resolution in the time domain and later check the temporal activation profile of the sources. Finally, using a maximum summarizing statistic in equ (4) to compute S j allows us to retain spatial as well as temporal resolution. After summarizing all data with respect to epochs, time and space, we can use the distribution of the S j statistic to define a global threshold that controls the FWER, i.e. if we pick a global threshold with p-value equal to 0.05 with respect to the distribution of S j , we have a 5% probability of one or more false positives throughout the entire spatio-temporal data set. We can then use this value to threshold the image at each point in time at each surface element to determine those regions for which we can reject the null hypothesis and hence detect significant activation. 2.2 Achieving Uniform Sensitivity Permutation tests are always valid given the assumption of exchangeability under the null hypothesis. However, if the null distribution varies across space or time, there will be uneven sensitivity in that dimension. For example, with a maximum statistic over space, surface elements for which background noise variance is high will contribute more to the maximum distribution than others with low noise; the impact is a relatively generous threshold for the high-noise variance locations and a stringent threshold for the other locations. We can overcome this problem by including some form of normalization in the summary statistic. Thus, before computing the maximum statistic Tij, we first normalize the data at each surface element by the sample standard deviation at that element computed from the pre-stim data (this is equivalent to the noise normalization performed in [2], except we measure our noise in the surface element domain, instead of the detector domain). We assume homogeneous variance over time, so do not perform any normalization in this dimension. Under the assumption that the data are Gaussian, the noise normalization converts the data, under the null hypothesis, to a t-distribution. In this case, the permutation test will yield uniform spatial sensitivity. However, if the data are non-Gaussian, then simply normalizing by the standard deviation may not be sufficient for this purpose (Fig. 2). An alternative is to normalize based on the p-values themselves, i.e. at each spatial location we compute the empirical distribution across permutations and then replace the statistic Tij for each permutation sample with its p-value. The p-value at surface element i for permutation j, called Tijp , is defined by:
Tijp = pi (Tij ) ,
pi (t ) =
1 ∑ j H (Tij − t ) , M
1 if x ≥ 0 H ( x) = 0 if x < 0
(5)
Spatiotemporal Localization of Significant Activation
517
where pi (t ) is the p-value function for surface element i, the proportion of permutations as large or larger than t. For each i, {Tijp } has a uniform distribution under the null hypothesis, and hence is normalized. We next compute the summary statistic as the distribution of the minimum of Tijp for each surface element over the entire cortical surface (minimum p-value plays the same role as the maximum statistic in FWER). From this we compute the threshold on the Tijp values to achieve the desired FWER, and from this compute the corresponding threshold to apply at each individual spatial location. Voxel 1
α1 = 0.010206
Voxel 1
α1 = 0.000136
Voxel 1
α1 = 0.000145
Voxel 2
α = 0.010206
Voxel 2
α = 0.049008
Voxel 2
α = 0.019739
Voxel 3
α3 = 0.010206
Voxel 3
α3 = 0.000000
Voxel 3
α3 = 0.003794
Voxel 4
α4 = 0.010206
Voxel 4
α4 = 0.001209
Voxel 4
α4 = 0.031139
Voxel 5
α = 0.010206
Voxel 5
α = 0.000026
Voxel 5
α = 0.000000
Maximum
2
5
αFWER = 0.05
Maximum
2
5
αFWER = 0.05
Maximum
2
5
αFWER = 0.05
Fig. 2. Illustration of the impact of heterogeneous voxel null distributions on a 5% FWER threshold. Shown are null distributions of 5 surface elements in three cases: all sharing the same normal distribution, each having different variances and each having different skewed distributions. The first case (left) shows that with homogenous nulls the false positive rate at each surface element is homogeneous. The second case (middle) demonstrates the variable false positive rate when test statistics are not normalized (e.g. raw CDM values, Eij(t)). The last case demonstrates the impact of non-Gaussianity, even when variance is normalized, and motivates the use of p-values to normalize Tij. Note that in all cases FWER is controlled at 5%.
One practical problem with this approach is the discreteness of the p-values Tijp , which in turn causes Sj to be discrete. If many Tijp have the smallest possible value (1/M), then small α -levels for Sj may be unattainable. For example, one Monte Carlo experiment with M=1,000 found that 30% of the permutations had a minimum Tijp of value 0.001 and hence the smallest possible FWER threshold corresponded to α = 0.3 . Therefore, this p-value normalization approach, while makes no assumptions on differing shapes of the local distributions, requires many permutations.
518
D. Pantazis et al.
2.3 Two Detection Methods We have described above the procedure we use for generating the summary statistics from which we compute thresholds to detect significant activation, as well as the available normalization procedures. The two methods we will examine further in our simulations are summarized in Table 1. Both methods use the mean statistic to summarize epochs, as well as maximal statistics to summarize in time and space. However, method 1 does not normalize the time-summarizing Tij , while method 2 transforms Tij into p-values, essentially normalizing Tij with the local permutation distribution. They subsequently use the maximum (method 1) or minimum (method 2) to summarize space. We can then use the empirical distribution of the spacesummarizing statistic S j to define a global threshold S th that achieves a 5% FWER. Table 1. Summary statistics and normalization schemes for the detection methods
Method 1 Method 2
Eij (t )
Tij
Normalized
TimeSummarizing
Normalized
SpaceSummarizing
Eijn (t )
Tij
Tijn
Sj
meank {Yijk (t )} Eij (t ) / sij
max t {| Eijn (t ) |}
Tij
max i {Tijn }
meank {Yijk (t )} Eij (t ) / sij
max t {| Eijn (t ) |}
pi (Tij )
min i {Tijn }
EpochSummarizing
Eij (t )
The process for testing the original data against S th is as follows. For method 1 the threshold can be applied directly to the normalized CDM’s, Ein0 (t ) ; any source with Ein0 (t ) ≥ S th at any time can be declared significant. For method 2 the test S th has units of p-values and cannot be directly applied to Ein0 (t ) . Moreover, the p-values were computed separately for each source i, so the same p-value at different sources will correspond to different values of Ein0 (t ) . Method 2’s variable thresholds are found with the inverse p-value transformation, where source i at time t is significant if Ein0 (t ) ≥ pi−1 ( S th ) .
3 Simulation Studies In this section we present simulation results to evaluate the two methods summarized in Table 1. A cortical surface was extracted from an MRI scan using BrainSuite, a brain surface extraction tool [9] and coregistered to the MEG sensor arrangement of a CTF Systems Inc. Omega 151 system. The original surface contained approximately 520,000 faces and was down-sampled to produce a 15,000 face (7481 vertices) surface suitable for reconstruction purposes. Further, the original surface was smoothed to assist easy visualization of CDMs. An orientation constraint was applied to the reconstruction method using surface normals estimated from the original dense cortical surface. The forward model was calculated using overlapping spheres. The inverse matrix H was regularized using the Tikhonov method with λ = 4 ⋅10 −7 .
Spatiotemporal Localization of Significant Activation
519
3.1 Source Simulation Experiment We simulated two sources on the left and right hemisphere of the brain, as shown in Figure 3. The timecourses represent an early and a delayed response to a stimulus (Fig. 4).
Fig. 3. Source 1 (left) and source 2 (right) are shown on the original and smoothed version of a cortical surface.
Fig. 4. Timecourse of simulated sources and points of source identification at α=0.123. Triangles for both methods, circles for only method 1
A total number of 100 epochs were generated, each consisting of 100 pre-stimulus and 100 post-stimulus time points. Gaussian i.i.d. noise with power 2000 times the average signal power was added to the channel measurements. The epochs were resampled producing M = 1000 permutation samples for method 1 and M = 10000 for method 2 (the larger number of resamples required for method 2 was due to the discreteness of the p-values as discussed in Section 2.2). We then applied the inverse operator H to all data, to produce CDMs Yijk (t ) . All further processing for the extraction of the empirical distribution of S j is summarized in Table 1. The global α=0.05 threshold S th for method 1 was S1th = 5.236 . Due to the discreteness problem, the smallest possible threshold with method 2 was α=0.123; for this level, S1th = 5.0503 ⋅ 10 −4 and S 2th = 0.0001 ; note that the first is a threshold on maximum statistics while the second is a p-value threshold. We applied these thresholds to the original data as described in Sect. 2.3. For α=0.05, method 1 identified 6
520
D. Pantazis et al.
time points as containing significant activations. For α=0.123 method 1 identified 9 and method 2 identified 8 time instances as containing significant activations (Fig. 4). Note that both methods successfully identify regions with either source 1 or source 2 active. Importantly, neither give any false positives in regions where there is no source.
Fig. 5. Examples of significant activation maps for method 1 and 2 for two time instances. Reconstruction appears spread on the smooth cortical surface, but active sources are in neighboring sulci in the original cortical surface. The lowest achieved FWER for method 2 is α=0.123
Ei 0
Ein0
1 − Pi
Fig. 6. Thresholded and Unthresholded maps of the current density ( Ei 0 ), the noise normalized current density ( Ein0 ) and (1-p)-value map at t = 113 . The. Ei 0 and E in0 maps are thresholded subjectively while the (1-p) value map is thresholded at a p=.05 for each source.
Figure 5 shows that method 1 and method 2 produce very similar results. In simulation, this is expected since the noise is Gaussian. We should comment here that permutation tests do not address the limited resolution of MEG reconstruction. All
Spatiotemporal Localization of Significant Activation
521
mutation tests do not address the limited resolution of MEG reconstruction. All inverse methods are ill-posed and CDMs tend to mislocalize source activation. If the inverse method demonstrates experimental variation in some regions, permutation tests will identify these regions regardless of the presence of an actual source there. We can display the unthresholded p-value maps of method 2, transforming the CDMs of the original data into p-values. Even though this does not address the multiple comparisons problem, it is interesting to compare the achieved localization of CDMs, noise-normalized CDMs and p-value maps. Such a result is given in Figure 6. 3.2 Noise Simulation Experiment In order to test both methods for specificity, we applied permutation tests using noiseonly data. We estimated the thresholds for method 1 using standard Gaussian noise; we did not evaluate method 2 due to the discreteness problem. Then, we created 100 measurements, each consisting of 100 epochs. The epochs had 100 pre- and 100 poststimulus time points. We tested these data for significant activation, keeping in mind that the approximate Monte Carlo standard error for a true 0.05 rejection rate is 2.2; hence we expect 5+/-2 false positives. Method 1 exhibited false positives only 6 out of 100 times, consistent with being an exact test.
Fig. 7. Examples of false positives for method 1 and 2. Due to high correlation, false positive sources are in neighboring areas on the cortical surface
4 Real Data Experiment The effectiveness of the proposed algorithm was evaluated using data from a real somatosensory experiment. The data acquisition was done using a CTF Systems Inc. Omega 151 system. The somatosensory stimulation was an electrical square-wave pulse delivered randomly to the thumb, index, middle and little finger of each hand of a healthy right-handed subject. For the purposes of the current experiment, only data from the right thumb where tested for reconstruction. This experiment demonstrated that method 2 is more sensitive than method 1. Also, the discrepancies in the significant activation maps is an indication that the data are not Gaussian, as they were in the simulation experiments. As shown in Figure 8, in
522
D. Pantazis et al.
t = 22ms only method 2 detected significant activity. Further, it seems to correct the CDM, which shows the main activity in the ipsilateral hemisphere. Significant activation in the left somatosensory cortex is expected, as the experiment involved stimulation of the right thumb, so method 2 produces reasonable results. For t = 28ms the same remarks for sensitivity are true. Figure 9 shows the thresholds applied by each method. Again, due to discreteness, the lowest achieved FWER by method 2 is α=0.086.
Fig. 8. Reconstruction and Significant maps from methods 1 and 2 for two time instances. All maps are scaled by 10 12
S1th = 8.69
Fig. 9. Global threshold applied by method 1 ( S1th ) at level α=0.086, as compared to the histogram of the thresholds applied to each source by method 2. Also, a map of the thresholds on the cortical surface is given on the right. Most of the individual thresholds are below S1th
5 Conclusion We have presented a method to apply permutation tests for processing of MEG data and extracting maps of significant brain activation. It can be combined with any inverse imaging method and is flexible in terms of available statistics and normalization
Spatiotemporal Localization of Significant Activation
523
procedures. The method is exact (i.e. it achieves the specified FWER) providing confidence that activation is present in the cortical regions that do test as significant. One limitation of the method is that the pre- and post-stimulus size of the data should be the same for the permutation scheme to work; we will be considering bootstrap alternatives to avoid this requirement. Also, this work does not address the limited resolution of the inverse methods in MEG. If the CDMs demonstrate experimental variation in some regions, permutation tests, or indeed other tests based on the rejection of H0, will identify these regions regardless of the presence of an actual source at that location. Thus there is the potential to detect significant brain activation but for the sites to be misplaced relative to true activation area. It is important to take this effect into account when interpreting maps of cortical activation derived from MEG data. Acknowledgements. We thank Sabine Meunier for providing the experimental data.
References 1. 2.
3. 4. 5. 6.
7.
8.
9.
Phillips, J.W.; Leahy, R.M.; Mosher, J.C.: MEG-Based Imaging of Focal Neuronal Current Sources, IEEE Transactions of Medical Imaging, Vol. 163, June (1997) 338–348 Dale, A. M., Liu, A. K., Fischi, R. B., Buckner, R. L., Belliveau, J. W., Lewine, J. D., Halgren, E., Dynamic Statistical Parametric Mapping: Combining fMRI and MEG for HighResolution Imaging of Cortical Activity. Neuron, Vol. 26 (2000) 55–67 Worsley, K. J., Andermann, M., Koulis, T., MacDonald, D., Evans, A. C.: Detecting Changes in Nonisotropic Images, Human Brain Mapping 8 (1999) 98–101 Barnes G. R. Hillebrand, A.: Statistical Flattening of MEG Beamformer Images, Human Brain Mapping 18 (2003) 1–12 Nichols, T. E., Holmes, A. P.: Nonparametric Permutation Tests For Functional Neuroimaging: A Primer with Examples, Human Brain Mapping 15 (2001) 1–25 Blair, R. C., and Karnisky, W.: Distribution-Free Statistical Analyses of Surface and Volumetric Maps. Functional Neuroimaging: Technical Foundations, Academic Press, San Diego, California, (ed.) Thatcher, R. W., Hallett, M., Roy, J. E., Huerta, M. (1994) Arndt, S., Cizadlo, T., Andreasen, N.C., Heckel, D., Gold, S., and O’Leary, D.S.: Tests for comparing images based on randomization and permutation methods, Journal of Cerebral Blood Flow and Metabolism, 16, (1996) 1271–1279 Holmes, A.P., Blair, R.C., Watson, J.D.G., and Ford, I.: Nonparametric analysis of statistic images from functional mapping experiements, Journal of Cerebral Blood Flow and Metabolism, 16, (1996) 7–22 Shattuck, D. W., Leahy, R. M.: BrainSuite: An Automated Cortical Surface Identification Tool. Medical Image Analysis, 6 (2) (2002) 129–142
Symmetric BEM Formulation for the M/EEG Forward Problem Geoffray Adde, Maureen Clerc, Olivier Faugeras, Renaud Keriven, Jan Kybic, and Th´eodore Papadopoulo Odyss´ee Laboratory – ENPC - ENS Ulm - INRIA – France [email protected], [email protected], http://www-sop.inria.fr/odyssee Abstract. The forward M/EEG problem consists in simulating the electric potential and the magnetic field produced outside the head by currents in the brain related to neural activity. All previously proposed solutions using the Boundary Element Method (BEM) were based on a double-layer integral formulation. We have developed an alternative symmetric BEM formulation, achieving a significantly higher accuracy for sources close to tissue interfaces, namely in the cortex. Numerical experiments using a spherical semi-realistic multilayer head model with a known analytical solution are presented, showing that the new BEM performs better than the formulations used in our earlier comparisons, and in most cases outperforms the Finite Element Method (FEM) as far as accuracy is concerned, thus making the BEM a viable choice.
1
Introduction
The so-called forward problem of electro-encephalography (EEG) addresses the calculation of the electric potential V on the scalp for a known configuration of sources, provided that the physical properties of the head tissues (conductivities) are also known. Note that the same forward model as for the EEG [1] can be used for the magneto-encephalography (MEG) [2,3] as well, since the magnetic field can be calculated from the potential V by simple integration [4], see also Section 3.3. An accurate solution of the forward problem is a necessary prerequisite for solving the inverse problem, and is extremely important to take maximum benefit from the EEG and the costly MEG machines, which have the advantages of noninvasiveness and excellent time resolution but presently lag seriously behind alternative technologies such as fMRI in terms of spatial resolution. The potential V is related to the primary current sources Jp through the generalized Poisson equation (1) ∇ · σ∇V = f = ∇ · Jp in R3 which derives directly from the Maxwell equations in the quasi-static (low frequency) regime. We shall concentrate here on a head model consisting of regions with homogeneous conductivities (Fig. 1), the conductivity of air σN +1 being zero. Because C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 524–535, 2003. c Springer-Verlag Berlin Heidelberg 2003
Symmetric BEM Formulation for the M/EEG Forward Problem
Ω2 σ2
Ω1 σ1 S1 S2 ΩN +1
525
σN +1
ΩN σN SN
Fig. 1. The head is modeled as a set of nested regions Ω1 , . . . , ΩN +1 with constant conductivities σ1 , . . . , σN +1 , separated by interfaces S1 , . . . , SN . Arrows indicate the outward normal n.
of the piecewise-constant conductivity assumption, the problem (1) can be decomposed as σi ∆V = f V j = σ∂n V j = 0
in Ωi , for all i = 1, . . . , N + 1
(2)
on Sj , for all j = 1, . . . , N
(3)
where ∂n V = ∂V ∂n is the normal derivative on Sj and where [·]j denotes the jump across an oriented interface Sj . For example for r ∈ Sj , def V j (r) = lim+ (V (r − α n) − V (r + α n)) = V − (r) − V + (r) . α→0
The Boundary Element Method (BEM) [5] is based on integral equations involving unknowns on the interfaces, whereas the FDM (finite difference method) or FEM (finite element method) consider the entire volume. Thus the BEM greatly reduces the number of unknowns, and has the advantage of requiring only surface meshes instead of volume meshes. A disadvantage often reported for the BEM and much improved by the present approach, is the drop of precision when the distance d of the source to one of the surfaces becomes comparable to the size h of the triangles in the mesh [6,7,8],[9]. This is a problem, because for physiological reasons, one mainly considers the primary currents to lie in the cortex, a layer only a few millimeters thick. Therefore, to obtain a satisfactory precision, the surface of the cortex must be discretized extremely finely. In Section 2, we recall the Representation Theorem which allows to consider unknowns on a surface, instead of volume quantities. We establish the classical BEM formulation using a double-layer potential in Section 3.1. This formulation, introduced by Geselowitz [10] in 1967, has been the base of all previous BEM implementations. In Section 3.2, we present the new method, which involves an additional unknown ∂n V , and which has the advantage of leading to a symmetric system matrix. Finally, we present numerical results in Section 4, showing mainly that the new symmetric integral formulation performs much better than the classical BEM in terms of precision.
526
2
G. Adde et al.
Integral Representations
The fundamental Representation Theorem of potential theory, which we recall below, shows that a harmonic function1 u is determined everywhere in R3 by its jump and the jump of its derivative across a boundary ∂Ω, whether ∂Ω is composed of a single surface, or of two surfaces as on the left of Fig. 2 – the latter case being proved in [11]. We start by defining the four integral operators S, D, D∗ and N involved 1 , which in the Representation Theorem. Given the Green function G(r) = 4πr satisfies −∆G = δ0 where δ0 is a Dirac mass centered at position 0, the double-layer and single-layer integral operators are defined by Df (r) = ∂n G(r − r )f (r ) ds(r ) , (4)
∂Ω
Sf (r) =
G(r − r )f (r ) ds(r ) .
(5)
∂Ω where r ∈ R3 and n denotes the outward normal vector at r ∈ ∂Ω. The doublelayer potential Df is discontinuous across ∂Ω whereas its normal derivative is continuous: ∂Df [Df ]∂Ω = −f , =0. ∂n
∂Ω
The single-layer potential Sf defined by (5) has the opposite properties: [Sf ]∂Ω = 0 ,
∂Sf ∂n
∂Ω
=f .
Both the double-layer and the single-layer potentials are harmonic everywhere except on ∂Ω. We next define two integral operators, obtained through differentiation of (4) and (5) in a direction n: 2 ∂n,n Nf (r) = G(r − r )f (r ) ds(r )
D∗ f (r) =
∂Ω
∂n G(r − r )f (r ) ds(r ) .
∂Ω
Finally, we state the Representation Theorem: Theorem 1 (Representation Theorem). Let Ω ⊆ R3 be a bounded open set with a regular boundary ∂Ω. Let u : (R3 \∂Ω) → R be a harmonic function 1
A function u is harmonic if its Laplacian ∆u vanishes.
Symmetric BEM Formulation for the M/EEG Forward Problem
527 def
(∆u = 0 in R3 \∂Ω), decreasing at least as r−1 at infinity2 , and let p(r) = ∂n u(r). Then −p = +N [u] −D∗ [p] in R3 \∂Ω u= −D [u] +S [p] −p± = +N [u] + ± 2I − D∗ [p] on ∂Ω +S [p] u± = ∓ 2I − D [u] where I denotes the identity operator.
Ω1
Ω2
Ω3
S1
Ωi+1
Ωi Si−1
S2
Si
Si+1
Fig. 2. Left: two-dimensional slice through a volume Ω2 with a hollow ball topology. Arrows denote the normal orientation. Right: detail of a nested volume model. Normal vectors are oriented globally outward, as shown. However, when considering for example the surface Si as the boundary of Ωi+1 , the displayed orientation becomes locally inward.
3
Integral Formulations for M/EEG
3.1
Double-Layer Potential
The Boundary Element Method classically used for the M/EEG reconstruction can be expressed using the double-layer potential. Let us examine the case of a conductor Ω with homogeneous conductivity σ0 , placed inside a non-conductive medium. The potential V which satisfies (1), with a source f supported inside Ω, is decomposed as σV = V0 + u, where σ = σ0 inside Ω, σ = 0 outside Ω, and V0 is the homogeneous domain solution: σ0 ∆V0 = f . 2
This is called the “radiation condition” H. See [5] or [11] for details.
528
G. Adde et al.
Considering the Green function G introduced in Section 2, the homogeneous domain solution is simply V0 = −(f G)/σ0 . By definition, the function u is harmonic inside and outside Ω, and also satisfies [∂n u] = 0 across the boundary ∂Ω. It can therefore be represented as a double-layer potential D[u]. The Representation Theorem shows that the exterior limit of u on ∂Ω is u+ = −
[u] − D[u] , 2
and since u+ = −V0 and [u] = σ0 V , we obtain the integral relation on ∂Ω: V0 =
σ0 V + σ0 D V . 2
The extension to multiple interfaces (Fig. 1) yields σ +σ (σi−1 −σi )Dji VSi (V0 )Sj = j−12 j VSj + i
where VSi denotes the potential considered on interface Si and Dji is the integral operator which couples interfaces Si and Sj . Writing down explicitly the operator Dji , we obtain the well-known relation between V0 and V : for r ∈ Sj , σ +σ V0 (r) = j−12 j V (r) + (σi−1 − σi ) ∂n G(r − r ) V (r ) ds(r ) . i
3.2
Si
Symmetric Method
The symmetric approach, based on the theory of N´ed´elec [5] and related to algorithms in [12,13], has, to the best of our knowledge, never been applied to the E/MEG problem. We consider the nested volume case depicted in Fig. 1, and decompose the source as f = f1 + · · · + fN such that supp fi ⊂ Ωi . For a given i ∈ {1, . . . , N } (Fig. 2, right), we consider the function V − vΩi /σi in Ωi (6) u Ωi = −vΩi /σi in R3 \Ω i , where vΩi = −fi G is the homogeneous space solution of ∆vΩi = fi . Note that uΩi is harmonic in R3 \∂Ωi . The Representation Theorem provides the internal limit of uΩi on Si : uΩi ∂Ωi − uΩi S = − D∂Ωi uΩi ∂Ω + S∂Ωi ∂n uΩi ∂Ω . i i i 2 Defining pSi = σi [∂n uΩi ]Si = σi (∂n V )− Si and breaking down the integral operators on ∂Ωi = Si−1 ∪ Si yields − − uΩi S = V − vΩi /σi S i i (7) VSi + Di,i−1 VSi−1 − Dii VSi − σi−1 Si,i−1 pSi−1 + σi−1 Sii pSi . = 2
Symmetric BEM Formulation for the M/EEG Forward Problem
529
We next consider the function uΩi+1 defined as in (6), and again apply the Representation Theorem to calculate its external limit on Si : uΩi+1 ∂Ω + i+1 uΩi+1 S = − u ∂ + S u , − D ∂Ω Ω ∂Ω n Ω i+1 i+1 i+1 i+1 ∂Ω ∂Ωi+1 i i+1 2 and breaking down once again the integral operators,
uΩi+1
+ Si
+ = V − vΩi+1 /σi+1 S
i
(8) VS −1 −1 = i + Dii VSi − Di,i+1 VSi+1 − σi+1 Sii pSi + σi+1 Si,i+1 pSi+1 . 2
Finally, subtracting (8) from (7) gives −1 (vΩi+1 )Si − σi−1 (vΩi )Si = Di,i−1 VSi−1 − 2Dii VSi + Di,i+1 VSi+1 σi+1 −1 −1 − σi−1 Si,i−1 pSi−1 + (σi−1 + σi+1 )Sii pSi − σi+1 Si,i+1 pSi+1
for i = 1, . . . , N . (9)
− Using the same approach, we evaluate the quantities σi ∂n uΩi S = p − i − + + ∂n vΩi Si and σi+1 ∂n uΩi+1 Si = p − ∂n vΩi+1 Si with the Representation Theorem, subtract the resulting expressions and obtain (∂n vΩi+1 )Si − (∂n vΩi )Si = σi Ni,i−1 VSi−1 − (σi + σi+1 )Nii VSi + σi+1 Ni,i+1 VSi+1 − D∗i,i−1 pSi−1 + 2D∗ii pSi − D∗i,i+1 pSi+1
for i = 1, . . . , N
(10)
Observe that each surface Si only interacts with its two neighbors Si−1 and Si+1 . This leads to an operator matrix which is not only symmetric, but also block-diagonal. The assumption σN +1 = 0 has the consequence of effectively chopping off the last line and column of the matrix – see the explicit system detailed in (11). 3.3
Magnetic Field Computation
It is well-known that the magnetic field B is entirely determined from the knowledge of V on the interfaces [4]. More precisely B(r) = B0 (r) +
µ0 4π
i
(σi −σi+1 )
Si
V (r ) n(r ) ×
r − r ds(r ) , r − r
where B0 represents the magnetic field due to primary currents in a homogeneous medium µ0 r − r dV (r ) . Jp (r ) × B0 (r) = 4π R3 r − r
530
G. Adde et al.
4
Discretization and Numerical Experiments
4.1
Galerkin Approach
The surfaces are represented by triangular meshes, the potential V is approx (k) (k) imated using P1 (piecewise-linear) basis functions as VSk (r) = i xi φi (r), while p is represented in the space of P0 (piecewise-constant) basis functions,
(k) (k) pSk (r) = i yi ψi (r). This choice guarantees the asymptotic equivalence between two sources of error: the approximation of smooth surfaces by triangulated meshes, and the approximation of V and p. The equations (9),(10) are discretized using the Galerkin method. The equation (9) corresponding to the potential V is integrated against P0 test functions, and the equation (10) corresponding to the flow p is integrated against P1 test functions. This again has the consequence of balancing the approximation errors. We obtain a symmetric block-diagonal matrix system, given here in more detail:
(σ1 +σ2 )N11
−2D11 −σ2 N21 D 21
−2D∗11
S11 D∗21 −σ2−1 S21
−1 −1 +σ2 ) (σ1
.. . . . . . (σN −1 +σN )NN −1,N −1 ... −2DN −1,N −1 −σN NN,N −1 ..
−σ2 N12 D12 (σ2 +σ3 )N22 −2D22 −σ3 N32 D32
D∗12 −σ2−1 S12 −2D∗22 −1 −1 )S22 (σ2 +σ3 D∗32 −σ3−1 S32
.. .
−2D∗N −1,N −1 SN −1,N −1 D∗N,N −1
−1 −1 (σN −1 +σN )
−σ3 N23 D23 (σ3 +σ4 )N33 −2D33 .. .
D∗23 −σ3−1 S23 −2D∗33 −1 −1 )S33 (σ3 +σ4 .. .
... ... .. .
b1 x1 y1 c1 x2 b2 y2 c2 = x3 b3 −σN NN −1,N y3 c3 .. DN −1,N .. . . σN NN,N (11)
The indices k, l of the operators refer to interfaces Sk and Sl . Each block is a submatrix, representing the interactions between all test functions supported (k) on a given pair of surfaces. Denoting ψi the P0 function associated to element i (l) on surface Sk , and φj the P1 function associated to element j on surface Sl , we have (l) (k) (l) (k) Skl )ij = Skl ψj , ψi Nkl )ij = Nkl φj , φi (l) (k) Dkl )ij = D∗lk )ji = Dkl φj , ψi (k) (k) −1 ck i = σk+1 vΩk+1 − σk−1 vΩk , ψi bk i = ∂n vk − ∂n vk+1 , φi (k) (k) xk i = xi y k i = yi
Symmetric BEM Formulation for the M/EEG Forward Problem
531
The potential V being only defined up to a constant, the above system has an indeterminacy, which is lifted by deflating the last block σN NN,N [14,15,16]. 4.2
Experiments
We compare the above-described symmetric BEM with the classical (doublelayer) BEM for several discretization choices, and with the Finite Element Method (FEM). The tests are performed on a simplified head model for which an analytical solution exists [17,18,19]. It consists of three spherical surfaces with radii 0.87, 0.92, and 1.0, delimiting volumes with conductivities 1.0, 0.0125, 1.0, and 0.0, from inside √ towards outside. The sources are unitary current dipoles oriented as [1 0 1]/ 2 and placed at distances from the center equal to 49%, 78%, 88%, 93%, or 97% of the radius of the innermost sphere. The spherical surfaces are triangulated with progressively finer meshes of 42, 162, 642, and 2562 vertices. The relative error measure is V −Vanal 2 /Vanal 2 , where the 2 norm is integrated over the entire outer sphere (representing the scalp). The first set of experiments (Fig. 3) shows how the relative error increases when the current dipole source approaches the surface of discontinuity. We observe that the symmetric approach behaves better than the classical BEM.
Head 3 100
Relative error
10
S 2a 2b 2c
1
0.1
0.01
0.001 0.5
0.6 0.7 0.8 Relative dipole radius
0.9
1
Fig. 3. The relative 2 error versus the relative dipole position for meshes with 642 vertices per sphere. The label 2 refers to the double-layer potential formulation, a,b,c are respectively the P0-collocation[19], and P0-P0 and P1-P1 Galerkin discretizations. The label S refers to the symmetric formulation.
The second set of experiments shows the evolution of the relative error as a function of the ratio of the conductivity of the middle layer of our threesphere model with respect to the other two layers. Here again we observe a
532
G. Adde et al.
better behavior of the symmetric method compared to the classical double-layer BEM, the accuracy of which is generally improved by using an Isolated Problem Approach [20].
Dipole 5 1000
S 2a 2b 2c
Relative error
100
10
1
0.1
0.01 1
10
100
1000
Sigma ratio
Fig. 4. The relative error versus the ratio of conductivities between neighboring layers, for the dipole at 97% of the inner sphere radius and a mesh of 3 × 642 vertices. (Labels are the same as in Fig. 3.)
Finally, we compare the new symmetric BEM, so far the best BEM known to us, with the FEM which we used in a previous comparison [21]. The accuracy results are in Table 1. We observe that unless the dipole source is extremely close to the surface, the new BEM method provides better precision than the FEM. Equivalent meshes were used in both cases, coinciding on the triangles for all surfaces. We also compare the time requirements for BEM and FEM. Table 2 details the time needed to assemble the BEM system matrix and to solve it using either a direct or an iterative method, as compared with the time required for the FEM.
5
Conclusions
We have presented a symmetric BEM formulation, which is new to the field of M/EEG. We have shown that it outperforms the previously used formulations and we have found that unlike these [21], the new BEM is in most cases more accurate than the FEM as well.
Symmetric BEM Formulation for the M/EEG Forward Problem
533
Table 1. The relative 2 error of the symmetric BEM and FEM, for all three head models and 5 dipole positions. Values marked in bold show the more accurate of the two methods for a particular head/dipole combination. Symmetric BEM Dipole pos. Head 1 Head 2 Head 3
49%
78%
88%
93%
97%
0.1536 0.1658 0.1900 0.2273 0.3172 0.0387 0.0469 0.0539 0.0661 0.0892 0.0099 0.0134 0.0164 0.0196 0.0289
Dipole pos. Head 1 Head 2 Head 3
FEM 78% 88%
49%
93%
97%
0.2468 0.2107 0.2852 0.2250 0.1398 0.0784 0.1738 0.0660 0.1415 0.1172 0.0142 0.0323 0.0194 0.0608 0.0791
Table 2. Typical execution times for the symmetric BEM and for the FEM for varying model sizes. Triangles per sphere BEM Unknowns Assembly
FEM Unknowns Iterative solution
320
1280
5120
286
1126
4486
17926
0.42 s 7.28 s
147 s
72 min
0.03 s 1.55 s 93.21 s ≈ 100 min
Direct solution Iterative solution
80
3
≈ 0.01 s ≈ 0.3 s
≈ 6s
≈ 5 min
156
807
4881
32053
0.1 s
0.3 s
2.8 s
31.9 s
There are various ways of accelerating the BEM implementation and bringing it on-par with the FEM, among which iterative solvers [22,7,6], the fast multipole method (FMM) [9], precorrected-FFT [23,24] or SVD-based methods. Future work includes better understanding of the accuracy improvements and consequential development of the symmetric method, as well as its application in the inverse problem context.
Acknowledgements. We are grateful to Toufic Abboud, Alain Dervieux and Guillaume Sylvand for fruitful discussions. 3
A preconditioned GMRES method was used. The reported times should be taken as a coarse indication only, as the number of iteration and thus the total elapsed time varies enormously with the precision demanded and the source configuration.
534
G. Adde et al.
References 1. J. W. Phillips, R. M. Leahy, J. C. Mosher, and B. Timsari, “Imaging neural activity using MEG and EEG,” IEEE Eng. Med. Biol., pp. 34–41, May 1997. 2. J. Sarvas, “Basic mathematical and electromagnetic concepts of the biomagnetic inverse problem,” Phys. Med. Biol., vol. 32, no. 1, pp. 11–22, 1987. 3. M. H¨ am¨ al¨ ainen, R. Hari, R. J. IImoniemi, J. Knuutila, and O. V. Lounasmaa, “Magnetoencephalography— theory, instrumentation, and applications to noninvasive studies of the working human brain,” Reviews of Modern Physics, vol. 65, pp. 413–497, Apr. 1993. 4. D. B. Geselowitz, “On the magnetic field generated outside an inhomogeneous volume conductor by internal volume currents,” IEEE Trans. Magn., vol. 6, pp. 346– 347, 1970. 5. J.-C. N´ed´elec, Acoustic and Electromagnetic Equations. Springer Verlag, 2001. 6. J. Rahola and S. Tissari, “Iterative solution of dense linear systems arising from the electrostatic integral equation,” Phys. Med. Biol., no. 47, pp. 961–975, 2002. 7. J. Rahola and S. Tissari, “Iterative solution of dense linear systems arising from boundary element formulations of the biomagnetic inverse problem,” Tech. Rep. TR/PA/98/40, CERFACS, 1998. Toulouse, France. 8. A. S. Ferguson and G. Stroink, “Factors affecting the accuracy of the boundary element method in the forward problem — I: Calculating surface potentials,” IEEE Trans. Biomed. Eng., vol. 44, pp. 1139–1155, Nov. 1997. 9. M. Clerc, R. Keriven, O. Faugeras, J. Kybic, and T. Papadopoulo, “The fast multipole method for the direct E/MEG problem,” in Proceedings of ISBI, (Washington, D.C.), IEEE, NIH, July 2002. 10. D. B. Geselowitz, “On bioelectric potentials in an homogeneous volume conductor,” Biophysics Journal, vol. 7, pp. 1–11, 1967. 11. J. Kybic, M. Clerc, T. Abboud, O. Faugeras, R. Keriven, and T. Papadopoulo, “Integral formulations for the eeg problem,” Tech. Rep. 4735, INRIA, Feb. 2003. 12. L. J. Gray and G. H. Paulino, “Symmetric Galerkin boundary integral formulation for interface and multi-zone problems,” Internat. J. Numer. Methods Eng., vol. 40, no. 16, pp. 3085–3103, 1997. 13. J. B. Layton, S. Ganguly, C. Balakrishna, and J. H. Kane, “A symmetric Galerkin multi-zone boundary element formulation,” Internat. J. Numer. Methods Eng., vol. 40, no. 16, pp. 2913–2931, 1997. 14. T. F. Chan, “Deflated decomposition solution of nearly singular systems,” SIAM J. Numer. Anal., vol. 21, pp. 739–754, 1984. 15. S. Tissari and J. Rahola, “Error analysis of a new Galerkin method to solve the forward problem in MEG and EEG using the boundary element method,” Tech. Rep. TR/PA/98/39, CERFACS, 1998. Toulouse, France. 16. G. Fischer, B. Tilg, R. Modre, F. Hanser, B. Messnarz, and P. Wach, “On modeling the Wilson terminal in the Boundary and Finite Element Method,” IEEE Trans. Biomed. Eng., vol. 49, pp. 217–224, Mar. 2002. 17. J. C. De Munck, “The potential distribution in a layered anisotropic spheroidal volume conductor,” J. Appl. Phys, vol. 2, pp. 464–470, July 1988. 18. Z. Zhang, “A fast method to compute surface potentials generated by dipoles within multilayer anisotropic spheres,” Phys. Med. Biol., vol. 40, pp. 335–349, 1995. 19. J. C. Mosher, R. B. Leahy, and P. S. Lewis, “EEG and MEG: Forward solutions for inverse methods,” IEEE Transactions on Biomedical Engineering, vol. 46, pp. 245– 259, Mar. 1999.
Symmetric BEM Formulation for the M/EEG Forward Problem
535
20. M. S. H¨ am¨ al¨ ainen and J. Sarvas, “Realistic conductivity geometry model of the human head for interpretation of neuromagnetic data,” IEEE Trans. Biomed. Eng., vol. 36, pp. 165–171, Feb. 1989. 21. M. Clerc, A. Dervieux, O. Faugeras, R. Keriven, J. Kybic, and T. Papadopoulo, “Comparison of BEM and FEM methods for the E/MEG problem,” in Proceedings of BIOMAG 2002, Aug. 2002. 22. R. Barret, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. van der Vonst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. Philadelphia: SIAM, 1994. Available from netlib. 23. S. Tissari and J. Rahola, “A precorrected-FFT method to accelerate the solution of the forward problem in MEG,” in Proceedings of BIOMAG, 2002. 24. J. R. Phillips and J. K. White, “A precorrected-FFT method for electrostatic analysis of complicated 3-D structures,” IEEE Trans. CAD Int. Circ. Syst., vol. 16, Oct. 1997.
Localization Estimation Algorithm (LEA): A Supervised Prior-Based Approach for Solving the EEG/MEG Inverse Problem J´er´emie Mattout1,2,3 , M´elanie P´el´egrini-Issac4 , Anne Bellio5 , Jean Daunizeau3,6 , and Habib Benali3 1
Institute of Cognitive Neuroscience, London, UK Functional Imaging Laboratory, London, UK 3 U494 INSERM, Paris, France 4 U483 INSERM, Paris, France 5 D´epartement de Neuropsychologie, Universit´e de Montr´eal, Canada Centre de Recherches Math´ematiques, Universit´e de Montr´eal, Canada 2
6
Abstract. Localizing and quantifying the sources of ElectroEncephaloGraphy (EEG) and MagnetoEncephaloGraphy (MEG) measurements is an ill-posed inverse problem, whose solution requires a spatial regularization involving both anatomical and functional priors. The distributed source model enables the introduction of such constraints. However, the resulting solution is unstable since the equation system one has to solve is badly conditioned and under-determined. We propose an original approach for solving the inverse problem, that allows to deal with a betterdetermined system and to temper the influence of priors according to their consistency with the measured EEG/MEG data. This Localization Estimation Algorithm (LEA) estimates the amplitude of a selected subset of sources, which are localized based on a prior distribution of activation probability. LEA is evaluated through numerical simulations and compared to a classical Weighted Minimum Norm estimation.
1
Introduction
Mapping, characterizing and quantifying the brain electromagnetic activity from EEG/MEG data requires solving an ill-posed inverse problem that does not admit a unique solution. Many methods have been proposed to better condition this problem [1], following the principle that the more prior knowledge we are able to introduce, the closer we get to the solution. At first, equivalent current dipole approaches were used, based on simple spherical head models, but involving difficult to deal with non-linear estimations and requiring strong prior assumptions such as the number of activated sources [2]. Methods relying on the so-called distributed source model were then developed [3], among which weighted minimum norm solutions [4] and bayesian non quadratic approaches [5] proved quite promising. Interestingly, these approaches enable the use of realistic head models, a linear writing of the inverse problem and a better estimation of the spatial extent of the activated areas. C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 536–547, 2003. c Springer-Verlag Berlin Heidelberg 2003
Localization Estimation Algorithm (LEA)
537
Unfortunately, such benefits are obtained at the expense of a highly underdetermined writing of the problem. Indeed, since dipoles are spread all over the solution space (usually restricted to the cortical strip), the number of unknowns of the system (the dipole amplitudes) is much larger than the number of equations given by the measurements (the number of sensors). Consequently, algorithms based on the distributed source model remain often too sensitive to noise level, initialization parameters and modeling errors and may lead to biased solutions by focussing on local minima. Thus, while this latter approach seems the most suitable for both yielding a realistic solution and introducing explicit various constraints, critical issues still remain. How can the under-determined inverse problem be dealt with? Since introducing both structural and functional knowledge appears to be the only consistent way of better conditioning the ill-posed inverse problem, how far should such priors be trusted and taken into account within source reconstruction algorithms? In this note, we propose an original inverse method based upon the distributed source model and exploiting a prior distribution of activation probability related to the dipoles of the model. This so-called Localization Estimation Algorithm (LEA) involves an iterative localization procedure followed by an estimation step. Based upon the activation probabilities, whose consistency with EEG/MEG data are checked along the iterative process, the localization step enables to focuss onto the few regions that are most likely to be activated. Finally, source amplitudes are only estimated for the few selected sources only, which makes up a much better determined equation system. The proposed approach is detailed in section 2. In section 3, we present a simulation-based evaluation of LEA which has been compared with a Weighted Minimum Norm (WMN) estimation. Method and results are finally discussed in section 4.
2 2.1
Method Notations
Within the distributed model framework, the cortical surface is covered by dipoles whose position and orientation are fixed. We respectively denote by n, t and q the number of sensors, time samples and dipolar sources. Only the amplitudes of these sources remain to be estimated by solving the following linear matrix equation: M = GJ + E, (1) where M is the n × t EEG/MEG data matrix, G is the n × p forward operator, J is the q × t unknown amplitude matrix and E indicates a n × t additive noise matrix. The rows (resp. columns) of G are called the “lead fields” (resp. the “forward fields”) and describe the flow of current for a given sensor through each dipole location (resp. the measurements observed across all sensors, induced by a particular dipole) [6].
538
J. Mattout et al.
Among existing methods for solving Eq. (1), the well-known WMN solution corresponds to the unique minimum of the quadratic criterion U(J) = ||M − GJ||2 + λ||WJ||2 ,
(2)
where W is a q × q weighting matrix that is related to a prior probability distribution of activation p, such that the closer to 1 the probability pi , the more likely to be activated the corresponding dipole i. λ is a regularization parameter (or hyperparameter) that quantifies the global weight of priors embodied by W. ||A||2 denotes the L2 -norm of matrix A. The solution JWMN is given by JWMN = (G G + λ.W W)−1 G M ,
(3)
where the superscript “ ” denotes the regular transpose operator. According to the Matrix Inversion Lemma, JWMN can equivalently be obtained by the following equation: JWMN = (W W)−1 G (G(W W)−1 G + λIn )−1 M ,
(4)
where In indicates the n × n identity matrix. Equation (4) involves an n × n instead of a q×q inverse matrix, which reduces the computational load compared to Eq. (3) since the number of sensors is usually much smaller than the number of dipoles. 2.2
Localization Estimation Algorithm (LEA)
The basic idea supporting LEA is the step-by-step introduction of elementary brain regions within a regularized estimation process. Such an iterative localization process aims at focussing on regions that are most significantly needed for explaining the observed data. Once the solution space is reduced to the few dipoles that are most likely to be activated, the amplitudes of these selected dipoles can be estimated by solving a much better determined equation system. LEA thus involves two consecutive steps: – a localization procedure which aims at selecting the few regions that are most likely to be activated; – a final regularized procedure that estimates the amplitudes of selected dipoles. Localization step. This step consists of an iterative regularized pre-estimation process, followed by a focalization procedure. Provided that they present close activation probabilities, neighboring dipoles are preliminarily aggregated in order to build elementary regions. The activation probability of a resulting cluster is set equal to the highest dipole activation probability estimated within that cluster. The clusters are successively introduced in the pre-estimation process, in the decreasing order of this activation probability.
Localization Estimation Algorithm (LEA)
539
The iterative procedure begins (iteration 1) by considering the dipole that is most likely to be activated (i.e., with the highest pi ) and its nearest neigh1 of amplitudes for all the samples of the bors. For this cluster, the matrix J time window of interest is estimated using the classical weighted minimum norm criterion (2), namely U1 (J1 ) = ||M − G1 J1 ||2 + λ1 ||W1 J1 ||2 ,
(5)
where G1 (resp. W1 ) denotes the forward matrix (resp. the weighting matrix) restricted to the considered cluster. At iteration k (k > 1), the k clusters that present the highest activation probk amplitude matrix is estimated ability are considered and the corresponding J by minimizing the criterion k−1 − Sk Jk ||2 , Uk (Jk ) = ||M − Gk Jk ||2 + λk ||Wk Jk ||2 + µk ||J
(6)
where µk is a hyperparameter and Sk a diagonal mask operator that assigns the value 0 to the dipoles belonging to the cluster introduced at step k and the value 1 to those belonging to all clusters introduced up to step k − 1. The third term k−1 obtained k at iteration k to be close to the solution J enforces the solution J at iteration k − 1. We indeed rather trust the previous estimations, since they involve less clusters that are moreover more likely to be activated. Besides, the gap ∆k between consecutive residuals Rk−1 and Rk is calculated as follows: (7) ∆k = Rk−1 − Rk , where Rk indicates the residual at iteration k, namely the goodness of the data fit: k ||2 . (8) Rk = ||M − Gk J The pre-estimation process ends when all the clusters have been considered. The focalization procedure then consists of retaining only those clusters that induced a positive and significantly large gap of residuals. Those clusters indeed correspond to sources that explain most of the EEG/MEG observed data. In practice, the cluster introduced at iteration k is selected if and only if the corresponding ∆k value is higher than a given strictly positive threshold value ∆T . Note that there is no residual gap ∆1 associated to the first cluster. However, since this cluster is the most likely to be activated according to the prior probability distribution, we always retain it for the final estimation process. The focalization procedure both aims at reducing the solution space and at balancing the influence of the prior probability distribution p. Indeed, it enables the selection of clusters that may present a low activation probability, but which have been considered as significant for explaining measurements M. Estimation step. The final step consists of estimating the amplitudes of the few remaining sources by applying the weighted minimum norm criterion (2). We denote by JLEA the final solution. Since the number of sources should now be inferior to the number of sensors, the solution will be obtained by simple computation of (3).
540
2.3
J. Mattout et al.
Deriving a Prior Distribution of Activation Probability from the EEG/MEG Data
The required prior distribution p of activation probability may be derived from any functional knowledge such as functional Magnetic Reconance Imaging (fMRI) activation maps [7], but may also be inferred from the EEG/MEG data themselves. We have recently proposed a Multivariate Source Preselection (MSP) approach [8,9] for infering the probability of location of cortical activations, using information derived from the EEG/MEG data only. This method consists of comparing the normalized observed potential or magnetic field scalp topologies (denoted as M) with all possible linear combinations of the normalized forward fields (gathered in matrix G). An activation probability is estimated that quantifies the affinity of each normalized forward field with the observed potential or field normalized maps. The MSP method involves three steps. 1. The orthogonal decomposition of G leads to a forward field basis B made of n orthonormal eigenvectors. These eigenvectors may be seen as functionnally Informed Basis Functions (fIBF) that are subject specific, since they only rely on the forward operator G. The fIBF matrix B verifies
G = CΛ1/2 B ,
(9)
where C is the p × n matrix containing the coordinate of the various forward fields onto the fIBF and Λ is the n × n diagonal matrix composed by the eigenvalues associated with the fIBF. 2. As the fIBF are mutually orthogonal, the affinity of each eigenvector with normalized data M can be estimated independently, through correlation calculation. The Γ matrix embodies those correlations, being defined by Γ = B M .
(10)
At this stage, one may select the Bs submatrix of B made of the few fIBF that are significantly correlated with the data M. Such a selection enables a data filtering by projecting M onto Bs , leading to Ms = Bs Bs M where Ms representes the filtered normalized data. 3. Since the linear relation between each forward field and each fIBF is unique (eq. (9)), an individual activation probability associated with each dipole can be derived, taking into account the correlation between the filtered data and the fIBF. This probability corresponds to the norm of the projection of each forward field into the normalized and filtered data space defined by Ms . The associated projection operator Ps is given by
Ps = Ms (Ms Ms )−1 Ms .
(11)
The closer to one the activation probability, the higher the affinity between a given forward field (related to a particular dipole) and the filtered EEG/MEG data. These estimated activation probabilities may be introduced as quantitative priors into a regularization criterion. They can also lead to a substantial reduction of the inverse solution space, by only considering within the reconstruction process the dipoles that are most likely to be activated.
Localization Estimation Algorithm (LEA)
3 3.1
541
Application MEG Data Simulation
A 3D high-resolution (voxel size: 0.9375mm × 0.9375mm × 1.5mm) magnetic resonance image from a healthy volunteer was segmented. The cortical surface was approximated with small triangles whose vertices provided 7,081 initial dipole positions for the distributed source model. The orientation of each dipole was set perpendicular to the cortical surface [3]. The forward operator G associated to this dipole set was calculated using a simple single shell spherical head model [10]. MEG data were simulated over 130 sensors uniformly spread all over the head, by artificially activating either one or two extended cortical sources. An extended source was defined as a randomly chosen dipole and its four nearest neighbors and was about 5 mm in radius. We considered unit amplitudes for each activated dipole and their contribution to the measurements were obtained from the corresponding forward field of the G matrix. The time course of activation was modeled as the half-period of a sine function. Each forward field was convolved by its associated time course and the resulting waveforms were added together to constitute the data. Finally, a white gaussian noise (SNR = 20dB) was added to each simulated data set. First, a simulation study for evaluating the reconstruction of a single extended activated source was conducted. Five hundred extended sources were considered successively. Then, we investigated the performance of LEA when activating a pair of extended sources (S1 and S2 ). The two clusters were constrained to be at least 5 cm apart to test spatially well-distinct activated sites. A high temporal overlap between the two activities was considered. We here describe an example of such a source configuration that assesses the ability of the proposed approach to improve the reconstruction, compare to a classical WMN estimation, when inaccurate priors are taken into account. 3.2
Data Processing
First, whatever the data set, the Multivariate Source Preselection (MSP) was applied to the whole data window of interest and yielded the prior activation probability p. According to MSP performances, the solution space was reduced to the dipoles that present the highest activation probability. Indeed, for a single simulated extended source, solution space could be restricted to the 800 most probable activated sources, ensuring with a confidence of 99% that the five activated dipoles were belonging to the solution space. For two activated extended sources, MSP fairly allowed to restrict the solution space down to 1450 sources, ensuring with a confidence of 91% (resp. 77%), that at least one dipole (resp. two) of each simulated cluster was included in the solution space. Then, in order to decompose the cortical surface into elementary regions, we aggregated neighboring dipoles. To this end, we considered all the dipoles in the decreasing order of their prior activation probability pi . Each dipole i was
542
J. Mattout et al.
clusterized with its eight nearest neighbors, provided that the difference in activation probability between the considered dipole and each of its neighbors was less than 5%. We thus obtained a set of structurally and functionally coherent brain areas whose size ranged from 1 (a single dipole) to 9 (a dipole and all its nearest neighbors). Finally, for each data set, LEA and WMN were applied at the single time sample corresponding to the signal peak. For both approaches, the square weighting matrix W was defined as a diagonal matrix whose element wii = 1 − pi . The higher the probability pi , the higher the chance to estimate a high amplitude for dipole i. Concerning LEA and WMN, we calculated a single hyperparameter value using the “ L-curve ” approach [11]. λ and µ were thus set equal and remained unchanged. The focalization threshold ∆T was set to 1% of the maximum residual gap calculated along the iterative process. 3.3
Evaluation Criteria
We applied two quantitative criteria for evaluating and comparing LEA and WMN source reconstructions [12]: – the Localization Error (LE), defined as the distance between the center of a simulated cluster and the dipole of maximum estimated amplitude in the corresponding reconstructed source. – the Root Mean Square Error (RMSE) which is the L2 -norm of the difference of the scaled distributions J and Jref , where J = J/max(J) indicates an estimated amplitude distribution (JLEA or JWMN ) at a particular time sample and Jref indicates the corresponding normalized simulated amplitude distribution. The closer to zero the value of LE, the more precise the localization. However, in order to compare two methods (M1 and M2 ) using these criteria, one should notice that if RMSEM1 ≤ RMSEM2 , one still requires LEM1 ≤ LEM2 to conclude that M1 gives a better amplitude estimation than M2 [12]. 3.4
Results
One extended activated source. Both LEA and WMN were applied to the restricted solution space given by MSP. While WMN estimation was instantaneous, each LEA estimation lasted about 90” on a 2 GHz processor. For each of the 500 simulated data sets, the focalization process of LEA selected two clusters, leading to an even over-determined system. LEA solution was thus much more focal than the WMN smooth one. Table 1 presents the mean values of RMSE and the maximum LE value allowing to recover the simulated source in at least 80% of the configurations. These quantitative results prove both a much better localization and a better estimation with LEA than when using a conventional WMN.
Localization Estimation Algorithm (LEA)
543
Table 1. Compared results of LEA and WMN estimations: mean values of RMSE and maximum LE value allowing to recover the simulated source in at least 80% of the configurations. LE (mm) RMSE LEA WMN
11 81
2.3 4.7
Two extended activated sources. Here is presented an example of two extended activated sources (S1 and S2 ), one of them (namely S2 ) having weak prior activation probabilities estimated by MSP. Table 2 gives the rank of the dipoles belonging to S1 and S2 among the 7,081 initial ones, according to these activation probabilities. While the five dipoles of S1 are among the 1450 dipoles that are most likely to be activated, only two dipoles of source S2 are selected. Moreover, the ranks of these two dipoles are much higher than those of the dipoles belonging to source S1 , whose activation probability is suitably estimated. Table 2. Respective rank of the simulated dipoles according to the prior activation probabilities estimated by MSP. source S1
source S2
(4, 5, 6, 7, 9) (198, 487, 4298, 5403, 6067)
While WMN estimation was still instantaneous, LEA process lasted about 5’30”. Indeed, a larger amount of dipoles dramatically increases the computation time. Figures 1 and 2 show the two simulated sources, the 1450 dipoles selected by MSP and the two patterns reconstructed using LEA. Table 3 gives the LE and the RMSE values respectively obtained with LEA and WMN. LEA solution is again much more focal than WMN solution, which smoothes the activity all over the 1450 dipoles. Moreover, even though the prior activation probability of S2 was very weak, LEA did find a second activated source close to it. Finally, for both simulated sources, quantitative results demonstrate that LEA is much more performant than WMN in terms of localization and amplitude estimation.
4
Discussion
The outcome of the distributed source model provoked a renewal of interest in solving the ill-posed EEG/MEG source reconstruction problem. Indeed, this framework enables the explicit introduction of several constraints of different type. However, this model suffers from a lack of well-determination. To overcome
544
J. Mattout et al.
Table 3. Two simulated sources: compared LE and RMSE associated with LEA and WMN solutions. LE (mm) LEA S1 : 13, S2 : 36 WMN S1 : 81, S2 : 75
(a)
(b)
RMSE 3.8 5.3
(c)
Fig. 1. Rear view: (a) position of simulated source S1 , (b) location of the 1450 dipoles considered initially and (c) pattern closest to S1 reconstructed by using LEA.
this drawback of under-determination, one may attempt to reduce the solution space by focussing on a priori activated regions. We therefore need accurate and reliable constraints. Recent efforts have been made for introducing fMRI priors in order to constrain the EEG/MEG inverse problem [7]. Nevertheless, if fMRI and EEG/MEG may reveal common patterns of activity [13], important differences between the physiological and physical processes underlying these neuroimaging techniques incite to caution [14]. Here is a reason why we may need a supervised prior-based algorithm, which would be able to balance the influence of potentially biased priors, according to their consistency with the EEG/MEG observed data. The method we proposed consists of an iterative focalization process which then leads to a well-determined estimation of the amplitudes of the few sources that are most likely to be activated. As shown by the results on simulated data, the obtained solution is thus focal and much easier to interpret than the one given by a classical weighted minimum norm approach. As described, LEA exploits a prior distribution of activation probability. We chose to derive this distribution from the EEG/MEG measurements as performed by the MSP approach, so that the whole reconstruction process only relies on the EEG/MEG data itself. By the way, further priors such as those de-
Localization Estimation Algorithm (LEA)
(a)
(b)
545
(c)
Fig. 2. Front view: (a) position of simulated source S2 , (b) location of the 1450 dipoles considered initially and (c) pattern closest to S2 reconstructed by using LEA.
rived from fMRI activation maps could be introduced as soft constraints within an additional regularization term in Eq. (6). LEA has been compared to a classical WMN involving the same priors and applied to the same restricted solution space according to the prior distribution of activation probability given by MSP. For both the Monte Carlo simulations and the example of one pair of activated sources, results indicated a much better localization and estimation with LEA than with WMN. The latter is the simplest and most popular distributed approach and is known to smooth the activity all over the sources of the model. Nevertheless and even after a substantial reduction of the solution space, the WMN solution may lead to a huge mislocation of the activation spots. Conversely, LEA leads to a much more precise and much focal localization of the activated patterns. Since the localization is better and detects a very few activated sources only, the estimation as evaluated by the RMSE criterion is logically also improved. However, the more activated sources there are, the harder the source reconstruction. If LEA did prove better than WMN at being able to balance a biased activation prior over a true active source, the localisation error of source S2 in our example was still greater than 3 cm. Such a result has to be improved. There are several possibilities that have been pointed out by this study. Indeed, the amount of activated sources selected by the iterative focalization process strongly depends on the arbitrary threshold value ∆T . This selection criterion could be profitably replaced by a measure of the mutual information between the observed data and the part of the signal explained by a given elementary region. Such a measure might then be thresholded under objective statistical assumptions.
546
J. Mattout et al.
We also simply fixed the hyperparameters to a common and constant value. Recent approaches based upon Expectation Maximization algorithms [15] enable both the optimal estimation of these hyperparameters and the estimation of the associated solution JLEA . Moreover, the way we clusterize neighboring dipoles in order to generate elementary functional regions might be crucial. Indeed, neighboring dipoles are defined using the euclidian distance. Consequently, two dipoles may be considered as close neighbors while they are truly several centimeters apart following the cortical surface circonvolutions. Using a better definition of cortical distances might lead to more consistent functional regions and to a better selection of activated patterns. Finally, the drawback of LEA is its computational cost which dramatically increases with the number of successive clusters one has to take into account during the iterative localization procedure. An optimal trade-off has to be found between the optimal size of these clusters and the total amount of initial sources one has to consider, given the prior probability distribution p. All these issues are currently under investigation in order to fully exploit the wide oportunities offered by this original approach and to further improve these promising results. Acknowledgments. J´er´emie Mattout is supported by a European Marie Curie Grant.
References 1. Baillet, S., Mosher, J.C., Leahy, R.M.: Electromagnetic brain mapping. IEEE Sign. Proc. Mag. 18 (2001) 14–30 2. Koles, Z.J.: Trends in EEG source localization. Electroenceph. Clin. Neurophysiol. 106 (1998) 219–230 3. Dale, A.M., Sereno, M.: Improved localization of cortical activity by combining EEG and MEG with MRI surface reconstruction: a linear approach. J. Cognit. Neurosci. 5 (1993) 162–176 4. Pascual-Marqui, R.D.: Review of methods for solving the EEG inverse problem. Int. J. Bioelectromagnetism (http://www.tut.fi/ijbem) 1 (1999) 75–86 5. Baillet, S., Garnero, L.: A bayesian approach to introducing anatomo-functional priors in the EEG/MEG inverse problem. IEEE Trans. Biomed. Eng. 44 (1997) 374–385 6. Ermer, J.J., Mosher, J.C., Baillet, S., Leahy, R.M.: Rapidly recomputable EEG forward models for realistic head shapes. Phys. Med. Biol. 46 (2001) 1265–1281 7. Liu, A.K., Belliveau, J.W., Dale, A.M.: Spatiotemporal imaging of human activity using functional MRI constrained magnetoencephalography data: Monte Carlo simulations. Proc. Natl. Acad. Sci. U.S.A. 95 (1998) 8945–8950 8. Mattout, J., P´el´egrini-Issac, M., Garnero, L., Benali, H.: Multivariate source localization approach for MEG/EEG inverse problem. Neuroimage 13 (2001) S196 9. Mattout, J., P´el´egrini-Issac, M., Garnero, L., Benali, H.: Statistical method for sources localization in MEG/EEG tomographic reconstruction. Proceedings of the IEEE International Conference on Image Processing 1 (2001) 714–717
Localization Estimation Algorithm (LEA)
547
10. Sarvas, J.: Basic mathematical and electromagnetic concepts of the biomagnetic inverse problem. Phys. Med. Biol. 32 (1987) 11–22 11. Gorodnitsky, I.F., George, J.S., Rao, B.D.: Neuromagnetic source imaging with focuss: a recursive weighted minimum norm algorithm. Electroenceph. Clin. Neurophysiol. 95 (1995) 231–251 12. Phillips, C., Rugg, M.D., Friston, K.J.: Anatomically informed basis functions for EEG source localization: Combining functional and anatomical constraints. Neuroimage 16 (2002) 678–695 13. Logothetis, N.K., Pauls, J., Augath, M., Trinath, T., Oettermann, A.: Neurophysiological investigation of the basis of the fMRI signal. Nature 412 (2001) 150–157 14. Nunez, P.L., Silberstein, R.B.: On the relationship of synaptic activity to macroscopic measurments: does co-registration of EEG with fMRI make sense? Brain Topogr. 13 (2000) 79–96 15. Phillips, C., Rugg, M.D., Friston, K.J.: Systematic regularisation of linear inverse solutions of the EEG source localisation problem. Neuroimage 17 (2002) 287–301
Multivariate Group Effect Analysis in Functional Magnetic Resonance Imaging Habib Benali1,4 , J´er´emie Mattout1,2,4 , and M´elanie P´el´egrini-Issac3,4 1
2
Inserm U494, Paris, France Institute of Cognitive Neuroscience and Functional Imaging Laboratory, London, United Kingdom 3 Inserm U483, Paris, France 4 IFR49 de Neuroimagerie Fonctionnelle, Orsay, France
Abstract. In functional MRI (fMRI), analysis of multisubject data typically involves spatially normalizing (i.e. co-registering in a common standard space) all data sets and summarizing results in a single group activation map. This widely used approach does not explicitely account for between-subject anatomo-functional variability. Therefore, we propose a group effect analysis method which makes use of a multivariate model to select the main signal variations that are common to all subjects, while allowing final statistical inference on the individual scale. The normalization step is thus avoided and individual anatomo-functional features are preserved. The approach is evaluated by using simulated data and it is shown that sensitivity is drastically improved compared to more conventional individual analysis.
1
Introduction
This paper concerns the analysis of multisubject functional Magnetic Resonance Imaging (fMRI) experiments. Several statistical parametric approaches based on the univariate general linear model (GLM) have been proposed to analyze multisubject experiments, e.g. [1,2,3,4,5], which can be divided into two main categories. On the one hand, random- and fixed-effect methods process the statistical parametric map (or activation map) corresponding to each subject and test the null hypothesis that there is no activation in the population mean activation map defined as the mean of all individual statistical maps [3]. The fixed-effect approach considers the inter-scan variance (within-subject error) as the only source of variability when estimating the population mean, whereas the random-effect approach also takes the between-subject error into account. On the other hand, conjunction analysis [4] and variance ratio analysis [5] have been proposed to ensure statistical robustness when very small samples are considered. However, all these methods assume that the spatial distribution of activation areas is the same for all subjects, whose data must be spatially normalized, i.e. co-registered in a common standard space, typically, that of Talairach and Tournoux [6]. Therefore, none of these approaches explicitely accounts for between-subject anatomo-functional variability and co-registration errors are confounded with the between- and within-subject variations. C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 548–559, 2003. c Springer-Verlag Berlin Heidelberg 2003
Multivariate Group Effect Analysis
549
More recently, methods based on Independent Component Analysis (ICA) have been proposed [7,8]. These techniques extract a single set of time courses (so-called sources) common to the whole group, accompanied by a set of individual spatial response patterns for the subjects in the group. One interesting feature of ICA-based analysis is that it can be applied to a group without any normalization step, as in [8]. However, it remains an exploratory technique in that, as opposed to the aforementioned GLM-based approaches, no model is assumed a priori to decompose the observed BOLD signal in terms of expected effects. Therefore, a quantitative comparison of the resulting independent images and the more conventional activation maps obtained using statistical parametric approaches is difficult. In this paper, we propose a hybrid multivariate method to analyze multisubject data. The proposed approach, only briefly introduced in [9] and detailed in Sect. 2, focuses on group effects in that it specifically selects the signal variations which are common to all subjects, but allows us to infer statistical parametric maps on the individual scale. No preliminary spatial normalization of the subjects is required, thus avoiding co-registration errors of the functional runs and yielding final activation maps which are specific to the individual anatomy. The method is multivariate in that, like ICA-based analyses, it deals simultaneously with both the spatial and temporal variations of activation. It is also hybrid between exploratory methods and conventional model-driven approaches. Indeed, the generalized fixed-effect model [10] (“exploratory” step) allows us to extract the temporal variations which are common to the whole group, independently of any activation model. Then, we propose an extension of the multivariate GLM (first presented in [11]) to select, among these common time series, those which are most correlated to the expected activation effects, using an a priori model of the BOLD response to the experimental paradigm. For the first time, the proposed group effect analysis method is thoroughly evaluated on a set of simulated fMRI data (Sect. 3) and discussed in Sect. 4.
2 2.1
Theory Common Time Series in Multisubject fMRI Data
Notations. Denote by Y the (T, N ) matrix of fMRI data collected for S subjects participating in the same protocol, with T being the number of time samples (number of scans) for each subject and N the total number of voxels for all subjects. Y is thus a “meta-subject”, a set of N time series which are usually preprocessed (typically, corrected for between-scan motion). Denote by X a (T, P ) the design matrix, where each of the P columns of X is called a regressor, which is either determined by the experimental design (regressor of interest, typically a stimulus function which may have been convolved with a hypothesized hemodynamic response function) or represents confounds. Physiological model. BOLD signal components are not identical across subjects but appear to share some common temporal features. Therefore, it seems
550
H. Benali, J. Mattout, and M. P´el´egrini-Issac
reasonable to assume that the S data sets share a set of basis temporal components. Then, the acquired time series can be decomposed as follows: Y =Z +R+,
(1)
where Z (fixed-effect component) represents a set of time series common to all subjects, expected to be mostly induced by the experimental stimulus but also by unknown common physiological processes. R is a random-effect component considered as the between-subject error, i.e. temporal variations specific to a given subject. is the sampling error, or within-subject error. To identify the fixed-effect component Z or, more generally, the basis of time series underlying the fixed-effect part of the data, the generalized fixed-effect model is used. Generalized fixed-effect model. The generalized fixed-effect model [10] is a multivariate statistical method which assumes that the set of the time series Y can be written as the sum of the fixed-effect component Z defined in (1) and a random-effect component E which is independent of (i.e., orthogonal to) Z: Y =Z +E .
(2)
E is the sum of the between-subject error term R and the within-subject error term , defined in (1). Model (2) makes the following assumptions: 1. Y is an unbiased estimator of Z: E[Y ] = Z (i.e., E[R] = E[] = 0), where E[.] denotes mathematical expectation. 2. the covariance var[vec(E)] of the vectorized errors E is separable in time and space: var[vec(E)] = Θ ⊗ Υ ,
(3)
where ⊗ denotes the Kronecker product and Θ (resp. Υ ) is the T × T (resp. N × N ) symmetric positive matrix of temporal (resp. spatial) covariance of E. Both Θ and Υ are assumed to be known. 3. there exists a Q-dimensional linear subspace SQ of the data space (Q < T ), such that Z belongs to SQ . Solving (2) consists of estimating the fixed-effect component Z or, equivalently, the subspace SQ (indeed, once SQ is determined, Z is directly obtained as the projection of Y onto SQ ). It can be shown [10,12] that SQ is spanned by the Q eigenvectors u1 , . . . , uQ associated with the Q largest eigenvalues of the matrix C = Y Υ −1 Y T Θ −1 , where “T ” denotes the transpose. The eigenvectors can be obtained using Singular Value Decomposition, provided the metrics Υ −1 and Θ −1 , which are not known a priori, can be estimated. Spatial and temporal covariance matrices. In multisubject studies, spatial correlations between voxels are negligible compared with random betweensubject variations [4]. As subjets can be considered independent from each other, Υ can be written as a block diagonal matrix, where each block is a Ns × Ns
Multivariate Group Effect Analysis
551
matrix of the form (σs2 + σ2 )I Ns (with Ns : number of voxels for subject s and I n : n × n identity matrix). σs2 is the random between-subject variance, σ2 is the scan-to-scan variance assumed to be constant across subjects and σs2 + σ2 = trace (var[Y s ]), where Y s is the T × Ns matrix of data for subject s. As R is orthogonal to , Θ can be decomposed as the sum of a random-effect covariance matrix Θ R and an error covariance matrix Θ . Statistical studies have shown that sampling noise in fMRI is uncorrelated in the time domain [13], which implies Θ = σ12 I T (σ12 : within-subject variance) and suggests that temporal variability is mainly induced by the random-effect components. To a first approximation, we assume that temporal correlations are negligible compared with the random between-subject variability, which implies Θ R = σ22 I T (σ22 : between-subject variance). It is reasonable to consider that both σ1 and σ2 are constant across subjects. Since we further consider orthonormal eigenvectors ui , the eigendecomposition of matrix C is finally performed using Θ −1 = I T . Optimal number of basis times series. A preliminary step consists of determining, among the T eigenvectors ui , i = 1, . . . , T resulting from the eigendecomposition of C, an optimal number Q of time series uk , k = 1, . . . Q representing the fixed-effect Z of (2). Velicer’s criterion V() [14], which measures the strength of the linear relationship between scans i and j given the first factors, can be used for determining Q: V(Q) = min V(), with 1T
V() =
T 1 r2 (i, j) , T (T − 1) i,j=1
(4)
where r2 (i, j) is the correlation between scan i and scan j of the residual errors ˆ , after removing the effect of the first time series. Z ˆ is obtained by Y −Z projecting Y onto the first eigenvectors: ˆ = U ΛV T , Z
(5)
where U is the × T matrix of selected eigenvectors uk , k = 1, . . . , , Λ is the × diagonal matrix whose diagonal elements are the square roots of the eigenvalues λk of the covariance matrix C and V = Y T Θ −1 U Λ−1 is the N × matrix of associated eigenimages v k . Now, we wish to focus only on the Q0 < Q eigenvectors which are most correlated to the regressors of interest described in model X, while discarding the eigenvectors which are likely to reflect non-relevant physiological variations. Since eigenvectors are mutually orthogonal, the multiple correlation coefficient [15] R2 =
T −1 X T uk uT k X(X X) uT k uk
(6)
can be used to quantify how much eigenvector uk correlates with the a priori activation model. The Q0 eigenvectors whose R2 is significantly different
552
H. Benali, J. Mattout, and M. P´el´egrini-Issac
from the mean of the R2 values associated to the remaining T − Q eigenvectors (which represent the random-effect component) are selected using Student’s t test (p < 0.05). 2.2
Statistical Inference on the Individual Scale
Multivariate general linear model for multisubject analysis. The Q0 time series determined by solving (2) are an estimation of the temporal variations common to all subjects of the group. Now, they can be seen as a random observation of the temporal variations shared by the whole population this particular sample of subjects was drawn from. Therefore, it makes sense to apply a multivariate GLM to this set of time-series. This provides parameters of the experimental model X at the group level. Let us assume that the errors are restricted to the time series uk , k = 1, . . . , Q0 , i.e., there are no errors in X. Multivariate GLM consists of u1 φ1 β1 X 0 ... 0 . . .. . .. .. 0 X ... 0 uk = . βk φk + or U = (I Q0 ⊗ X)B + Φ , (7) .. . . 0 . . . . .. .. .. 0 0 ... X uQ0 β Q0 φQ0 where Φ is an i.i.d multidimensional stationary random process with E[Φ] = 0 2 2 and var[Φ] = σΦ I T Q0 (σΦ : variance corresponding to the errors). Solving (7) consists of estimating parameters B by using ordinary least squares: ˆ = (I Q0 ⊗ X)T (I Q0 ⊗ X) −1 (I Q0 ⊗ X)T U . B
(8)
Regression coefficients at the voxel level. From (5) and (7), the multivariate GLM can be written at the voxel level: ˆ = (I Q0 ⊗ X)Γ + Ψ , Z where Γ = BΛV T and Ψ = ΦΛV T . The parameters associated with each subject, at the voxel level, are thus derived as follows: T ˆ Γˆ = BΛV .
(9)
Statistical Parametric Map on the individual scale. Finally, to assess activation effects, F or T statistical tests may be used on a voxelwise basis to determine whether q P regression coefficients contribute significantly to predict signal Z. The underlying null hypothesis of no activation follows the general form H0 : Aq Γ = 0, where Aq is a known q × N T “contrast” matrix of rank q.
Multivariate Group Effect Analysis
3
553
Evaluation on Simulated fMRI Data
3.1
Materials and Methods
Simulated fMRI data. In a different setting (e.g. [8]), a blocked-design fMRI experiment using a visual checkerboard stimulation was run using an EPI protocol on a Bruker 3 T Medspec 100 system (64×64 matrix, resolution 3.8×3.8 mm, 5 mm slice thickness, 2 mm gap, TR=1333 ms). From four subjects, a oneslice-thick patch was selected in which no significant activation was detected using standard methods. Each patch consisted of 20×20 voxels and T = 240 time steps. In each of the four quadrants of a given patch, a 5×5-voxel area was randomly chosen to be an “activated area” (Fig. 1). BOLD signal was simulated in each voxel i of this area as follows: yi (t) = bi (t) + αi hi (t) + ηi (t) .
(10)
bi (t), t = 1, . . . , T , is the background (i.e., measured) signal in voxel i. αi controls the signal-to-noise ratio (SNR) as follows:
(11) αi = κ ∗ var[bi (t)], where κ was set to 0.1 in one of the four areas and to a random value between κ = 0.025 and κ = 0.05 in the other three areas for a given data set (Fig. 1). hi (t) is the hemodynamic signal simulated by convolving a box-car waveform (10 periods of 16 seconds “ON” state and 16 seconds “OFF” state) and a hemodynamic kernel obtained by randomly modifying the parameters of the prototypical 1 hemodynamic response function
of SPM99 software package . ηi is an additive gaussian noise of variance 0.1 var[bi (t)]. Data processing. All data were high-pass filtered (cut-off period: 80 seconds). On the one hand, conventional individual statistical analysis was performed for each data set by using the univariate linear model implemented in SPM99. Activation model X comprised a single regressor of interest x1 consisting of the box-car waveform convolved by the prototypical hemodynamic response function of the package. Statistical inference about the parameter β1 associated with this regressor was performed and the corresponding null hypothesis (H0 : β1 = 0) was tested by using T -statistics, which yielded individual statistical parametric maps SPM{T }. On the other hand, the proposed multivariate group effect analysis was conducted on the four data sets simultaneously using the same activation model X. Statistical inference was led on the individual scale by using T -statistics, which yielded SPM{T }s. Evaluation of results. As ground truth was available, Receiver-Operating Characteristic (ROC) curves [16] were used to compare individual and multivariate group effect analyses. For a given T threshold, the true positive ratio 1
http://www.fil.ion.ucl.ac.uk/spm/spm99.html
554
H. Benali, J. Mattout, and M. P´el´egrini-Issac
0.035
0.1
0.025
0.025
0.035
0.04
Data set #1
0.04
0.1
0.04
Data set #2 0.04
0.025
0.03
0.025
0.035 0.1
0.1
Data set #3
Data set #4
Fig. 1. Location of simulated activation clusters and κ value for each data set
amplitude
0.4
0.2
0
0
5
10
15
20
25
30
35
time (seconds)
Fig. 2. Simulated hemodynamic response functions for first (solid line), second (dashed line), third (dotted line) and fourth (dot-dashed line) activated area
Multivariate Group Effect Analysis
555
TPR (resp. the false positive ratio FPR) or, equivalently, the sensitivity (resp. one minus the specificity) of the analysis, can be defined as the number of voxel locations correctly declared (resp. declared falsely) to be activated, divided by the total number of voxels. The ROC curve for a given analysis method was thus constructed by plotting TPR values versus FPR values, considering each T value in the corresponding SPM{T } as a threshold. The performances of both analysis methods in terms of activation detection were compared by using a z-score to test whether their areas under the ROC curve were significantly different. To further investigate the practical value of the results obtained from the multivariate analysis, SPM{T }s were also thresholded by using the conventional correction for multiple comparisons implemented in SPM99 and the TPR was plotted versus different threshold values ranging from p < 10−8 to p < 0.075. 3.2
Results
When using the multivariate method, Q0 = 9 basis eigenvectors were retained from the Q = 14 eigenvectors selected by Velicer’s criterion. Figure 3 shows the ROC curves obtained for both analyses. On this voxelwise thresholding basis, both methods proved equally performant in terms of detection (comparison of areas under ROC curves yielded a non-significant z-test regardless of the data set). Nevertheless, when thresholding the maps by using conventional correction for multiple comparisons, the multivariate group effect analysis method proved drastically more sensitive, on the individual scale, than the univariate individual analysis (see Fig. 4). This is illustrated on Fig. 5, which shows SPM{T }s thresholded with a widely used value of p < 0.05. The improvement in sensitivity obtained by using the multivariate method is clear, since activation was detected in nearly all areas, whereas the individual analysis failed to detect activation in areas whose SNR coefficient κ was lower than 0.1. This improvement was achieved at the expense of specificity (true negative ratio), since more voxels were incorrectly classified as activated with the multivariate method. However, the accuracy (proportion of voxels correctly classified, either activated or not) of the proposed approach was significantly higher than that of the univariate analysis (see Fig. 6).
4
Discussion and Conclusion
In this work, we have proposed a new method for dealing with multisubject data analysis in fMRI, while accounting for whithin- and between-subject variability, yielding quantifiable results and avoiding co-registration errors since no normalization step is required. The method involves two model-guided steps. The first step determines a set of basis time series common to all subjects by using the multivariate fixed-effect model. Although this step could be configured as the widely used Principal Component Analysis (PCA) and thus, as a mere exploratory method, the multivariate fixed-effect model is more general in that it allows for the use of prior information using matrices Υ and Θ in the spatiotemporal domain to avoid certain undesired (irrelevant) data subspaces. This
H. Benali, J. Mattout, and M. P´el´egrini-Issac 1
1
0.8
0.8
0.6
0.6
TPR
TPR
556
0.4
0.4
0.2
0.2
0 0
0.2
0.4
0.6
0.8
0 0
1
0.2
0.4
0.6
0.8
1
0.8
1
FPR
FPR
Data set #1
Data set #2 1
0.8
0.8
0.6
0.6
TPR
TPR
1
0.4
0.4
0.2
0.2
0 0
0.2
0.4
0.6 FPR
Data set #3
0.8
1
0 0
0.2
0.4
0.6 FPR
Data set #4
Fig. 3. ROC curves for individual analysis (dashed line) and multivariate group effect analysis (solid line)
enables to account for whitin- and between-subject variability, as opposed to [7], where PCA was used only to reduce the amount of data as a preliminary step to an ICA approach. Incidentally, we considered a particular block-diagonal spatial matrix Υ , following previous findings that the scan-to-scan variability is negligible compared with the subject response variability [4]. Scan-to-scan variability may be accounted for by using a isotropic spatial covariance matrix (e.g. [17]). The second step estimates individual activation maps by solving the multivariate linear model using the fixed-effect estimates. Indeed, while some of the common components reflect the variations of interest (i.e., related to activation model X), others may reflect physiological variations or other confounds. Therefore, the multivariate regression analysis and the reconstruction formula (5) can be used to derive the statistical parametric map for each subject. Solving the multivariate GLM consists of estimating parameters B using ordinary leastsquares, which can be achieved without extra computational cost compared to univariate analysis. The resulting individual activation maps can be processed (thresholded directly, or submitted to a group analysis method, e.g. [5]) just as conventional statistical parametric maps, which allows direct comparison with
1
1
0.8
0.8
sensitivity
sensitivity
Multivariate Group Effect Analysis
0.6
0.4
0.6
0.4
0.2
0.2
0 1
2 3 4 5 6 7 corrected threshold value (−log(p))
0 1
8
2
3 4 5 6 7 corrected threshold value (−log(p))
8
Data set #2 1
0.8
0.8
0.6
sensitivity
sensitivity
Data set #1 1
0.4
0.2
0 1
557
0.6
0.4
0.2
2
3 4 5 6 7 corrected threshold value (−log(p))
Data set #3
8
0 1
2
3 4 5 6 7 corrected threshold value (−log(p))
8
Data set #4
Fig. 4. Sensitivity (thresholds corrected for multiple comparisons) for individual analysis (squares) and multivariate group effect analysis (diamonds). In all cases, sensitivity of multivariate group effect analysis was significantly higher than that of individual analysis (Mann-Whitney U test, p < 0.00001)
other methods. This is another key aspect of our method, compared with the ICA approach where, to our knowledge, no asymptotic theory is available for quantifying activation areas, allowing only qualitative comparisons. ROC analysis of results obtained on simulated data shows that the proposed approach performs as efficiently as a conventional univariate GLM, on a voxelwise basis. More interestingly, when multiple comparisons are accounted for when thresholding individual SPMs, the multivariate approach induces drastically enhanced sensitivity. Indeed, only the time series that are most expected to be correlated to the prior activation model are selected by Velicer’s criterion and analysis of R2 coefficient, performed on the estimated fixed-effect component, and then submitted to the multivariate GLM. We believe that this multivariate approach could help reconcile the necessity for simultaneously analyzing different data sets to exhibit temporal activation features shared by the group, and the understandable desire for obtaining results
558
H. Benali, J. Mattout, and M. P´el´egrini-Issac
1
1
0.8
0.8
0.6
0.6
accuracy
specificity
Fig. 5. For data sets #1 (left) to #4 (right): activation maps (SPM{T }, p < 0.05 (corrected for multiple comparisons) for individual analysis (top row) and multivariate group effect analysis (bottom row). Gray-scaled T -values range: 0 (black) to 25 (white)
0.4
0.2
0.2
0 1
0.4
2
3 4 5 6 7 corrected threshold value (−log(p))
8
0 1
2
3 4 5 6 7 corrected threshold value (−log(p))
8
Fig. 6. For data set #1, specificity and accuracy (thresholds corrected for multiple comparisons) for individual analysis (squares) and multivariate group effect analysis (diamonds). Specificity (resp. accuracy) of multivariate group effect analysis was significantly lower (resp. higher) than that of individual analysis (Mann-Whitney U test, p < 0.00001) for all data sets (other results not shown)
on the individual scale to preserve between-subject anatomical variability and individual characteristics. Acknowledgments. The authors are very grateful to Dr. F. Kruggel (Department of Computer Science, University of Leipzig, Leipzig, Germany) for providing us with the visual fMRI data. J. Mattout is supported by a European Marie Curie Grant.
Multivariate Group Effect Analysis
559
References 1. B¨ uchel, C., Turner, R., Friston, K.: Lateral geniculate activations can be detected using intersubject averaging and fMRI. Magn. Reson. Med. 38 (1997) 691–694 2. Price, C.J., Friston, K.: Cognitive conjunction: A new approach to brain activation experiments. NeuroImage 5 (1997) 261–270 3. Holmes, A.P., Friston, K.J.: Generalisability, random effects & population inference. NeuroImage 7 (1998) S754 4. Friston, K.J., Holmes, A.P., Price, C.J., B¨ uchel, C., Worsley, K.J.: Multisubject fMRI studies and conjunction analysis. NeuroImage 10 (1999) 385–396 5. Worsley, K.J., H, L.C., Aston, J., Petre, V., Duncan, G.H., Morales, F., Evans, A.C.: A general statistical analysis for fMRI data. NeuroImage 15 (2002) 1–15 6. Talairach, J., Tournoux, P.: Co-planar stereotaxic atlas of the human brain. 3Dimensional proportional system: an approach to cerebral imaging. Thieme, New York (1988) 7. Calhoun, V.D., Adali, T., Pearlson, G.D., Pekar, J.J.: A method for making group inferences from functional MRI data using Independent Component Analysis. Hum. Brain Mapp. 14 (2001) 140–151 8. Svens´en, M., Kruggel, F., Benali, H.: ICA of fMRI group study data. NeuroImage 16 (2002) 551–563 9. Benali, H., Mattout, J., P´el´egrini-Issac, M., Meusburger, F., Derpierre, O., Kherif, F., Poline, J.B., Burnod, Y.: Hierarchical multivariate group analysis of functional MRI data. Proceedings of the IEEE International Symposium on Biomedical Imaging, ISBI’02 (2002) 843–846 10. Caussinus, H.: Models and uses of principal components analysis. In de Leeuv, J., ed.: Multidimensional data analysis. DSWO Press, Leiden (1986) 149–178 11. Mattout, J., P´el´egrini-Issac, M., Garnero, L., Burnod, Y., Benali, H.: Multivariate PCA-based regression analysis of fMRI time series. NeuroImage 11 (2000) S586 12. Fine, J., Pousse, A.: Assymptotic study of the multivariate functional model. Application to metric choice in principal component analysis. Statistics 23 (1992) 63–83 13. Sijbers, J., den Dekker, A.J., Van Audekerke, J., Verhoye, M., Van Dyck, D.: Estimation of the noise in magnitude MR images. Magn. Reson. Med. 16 (1998) 87–90 14. Velicer, W.F.: Determining the number of components from matrix of partial correlations. Psychometrika 41 (1976) 321–327 15. Mardia, K.V., Kent, J.T., Bibby, J.M.: Multivariate Analysis. Academic Press, London (1979) 16. Metz, C.E.: Basic principles of ROC analysis. Semin. Nucl. Med. 8 (1978) 283–298 17. Kruggel, F., Zysset, S., von Cramon, D.Y.: Nonlinear regression functional MRI data: an item-recognition task study. NeuroImage 11 (2000) 173–183
Meshfree Representation and Computation: Applications to Cardiac Motion Analysis Huafeng Liu1,2 and Pengcheng Shi1 1
Department of Electrical and Electronic Engineering Hong Kong University of Science and Technology, Hong Kong 2 State Key Laboratory of Modern Optical Instrumentation Zhejiang University, China {eeliuhf,eeship}@ust.hk
Abstract. For medical image analysis issues where the domain mappings between images involve large geometrical shape changes, such as the cases of nonrigid motion recovery and inter-object image registration, the finite element methods exhibit considerable loss of accuracy when the elements in the mesh become extremely skewed or compressed. Therefore, algorithmically difficult and computationally expensive remeshing procedures must be performed in order to alleviate the problem. We present a general representation and computation framework which is purely based on the sampling nodal points and does not require the construction of mesh structure of the analysis domain. This meshfree strategy can more naturally handle very large object deformation and domain discontinuity problems. Because of its intrinsic h-p adaptivity, the meshfree framework can achieve desired numerical accuracy through adaptive node and polynomial shape function refinement with minimum extra computational expense. We focus on one of the more robust meshfree efforts, the element free Galerkin method, through the moving least square approximation and the Galerkin weak form formulation, and demonstrate its relevancy to medical image analysis problems. Specifically, we show the results of applying this strategy to physically motivated multiframe motion analysis, using synthetic data for accuracy assessment and for comparison to finite element results, and using canine magnetic resonance tagging and phase contrast images for cardiac kinematics recovery.
1 1.1
Introduction Finite Element Methods
Finite element methods (FEM) have been used extensively as an efficient computational strategy in computer vision and medical image analysis, covering a wide range of problems including segmentation [3], shape representation and characterization [14], correspondence and motion estimation [12], image registration [5], tissue state assessment [13], and image guided surgery [2]. In these efforts, the domains of interests are discretized into elements which provide spatial relationships between the sampling nodes. The main computational power of the FEM C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 560–572, 2003. c Springer-Verlag Berlin Heidelberg 2003
Meshfree Representation and Computation
561
results from the fundamental idea of replacing a continuous function defined over the entire domain by piecewise approximations over a set of finite number of geometrically simple domains. Applying proper formulation principles, system differential equations can then be approximated by a set of algebraic equations. Although the idea of domain division is ingenious, especially for complex or irregular shapes, proper meshing of the analysis domain in FEM can be difficult and laborious. Furthermore, while numerically robust and efficient in general, FEM exhibits difficulties whenever mesh and/or node refinement of the domain must be performed, such as cases where large geometrical shape changes of the objects cause considerable loss in accuracy because of extremely skewed or compressed elements. While remeshing prevents the severe distortion of elements and allows mesh lines to remain coincident with any discontinuities throughout the evolution of the problem and to maintain reasonable numerical accuracy, it requires the projection of field variables between meshes in successive stages of the problem, which often leads to logistical problems and possible degradation of accuracy. Furthermore, the computational cost of remeshing at each step of the problem becomes prohibitively expensive for large three-dimensional problems.
1.2
Meshfree Methods
The recently developed meshfree methods offer computationally efficacious alternatives to circumvent the aforementioned problems of FEM by representing the interested spatial domains with only a set of nodal points but without any mesh constraints [10]. With roots in the seminal smooth particle hydrodynamics method [11], and having flourished with the element free Galerkin (EFG) method, the reproducing kernel particle method, and others, the main objective for adopting meshfree methods is to eliminate at least part of the mesh structure by constructing the approximation of the field function and the discrete system equations entirely in terms of the nodes [9]. Compared to the finite element strategies, the meshfree methods offer a variety of advantages [9]. The principal attractions are the possibilities of simplifying spatial adaptivity (node addition or elimination) and shape function polynomial order adaptivity (approximation/interpolation types), and handling problems with moving boundaries and discontinuities. FEM remeshing procedure can now be effectively treated in a much simpler manner as a node refinement problem. In areas where more refinement is needed, nodes can be added easily to achieve desired numerical accuracy. Since there is no need to generate the mesh representation and the connectivity between nodes is generated as part of the computation and can be changed over time, they can more naturally handle very large deformations and discontinuity. Several recent meshfree efforts also incorporate the multi-scale concept for problems involving widely varying scales through wavelet basis function enhancement.
562
1.3
H. Liu and P. Shi
Meshfree Representation and Computation for Image Analysis
We present a general medical image analysis framework where the object representation and the computational strategy are based on the meshfree concepts. Specifically, we focus on the EFG method using the moving least square approximation and the Galerkin weak form formulation. We demonstrate the relevancy of the framework to medical image analysis problems, and show the application results of this method to the biomechanically constrained multi-frame analysis of the heart motion through optimal state-space estimation using two kind of data constraints, the magnetic resonance (MR) tagging images (the Dirichlet condition) and the MR phase contrast images (the Robin condition). Using synthetic data, validation and comparison to FEM are also presented.
2
Element-Free Galerkin Framework
Because of its straightforwardness and obvious analogy to the FEM methods, we focus on adopting the element free Galerkin method as the computational tools for medical image analysis. The EFG method requires only the definition of a set of nodes distributed over the analysis region and the definition of the boundary conditions [1]. The connectivity between the nodes and the approximation functions are completely constructed by the method, using the moving least square (MLS) approximation functions generated at each nodal point. The method has great convergence rate and is efficient in modelling moving interfaces. In this section, we develop an EFG framework for analyzing mechanically motivated domain evolution (i.e. segmentation and tracking) or domain mapping (i.e. non-rigid motion and registration) problems, where conceptually we are estimating the displacement fields between iterations or between images. However, EFG is actually applicable to any differential equation based problems. 2.1
Galerkin Weak Form Formulation
Similar to FEM, the formulations of the meshfree methods can be obtained using the weighted residual strategies such as the popular Galerkin weak form, which incorporates differential equations in their weak form so that they are satisfied over a domain in an integral or average sense rather than at every point[4]. With σ the stress tensor, ε the strain tensor, and c the material matrix, assuming material constitutive equation σ = cε and the strain-displacement equation ε = Lu with L a differential operator dependent on the strain types (infinitesimal or finite), the Galerkin weak form explicitly expressed in terms of displacement vector u can be derived from the Hamilton’s principle [10]: T T T δ(Lu) c(Lu) dΩ − δu b dΩ − δu t dΓ + ρδuT u ¨ dΩ = 0 (1) Ω
Ω
Γt
Ω
where Ω is the volume of the domain, ρ the material density, b the body force, Γt the surface where external forces are prescribed, and t the traction force. In
Meshfree Representation and Computation
563
order to solve a problem represented by Equation 1 with the EFG method, the following basic procedures are needed: – Construction of the meshfree nodal representation of the analysis domain. – Construction of the shape functions to approximate the field function (displacement in our case) using their values at the sampling nodes. – Imposition of the essential boundary condition on Equation 1. – Derivation of the differential system equations. – Solving the system equations. 2.2
Meshfree Domain Representation
The object of interests is represented by a set of sampling nodes scattered in the analysis domain and its boundaries, and Figure 2 shows examples of meshfree representations of the left ventricular slices acquired from segmented MR phase contrast and tagging images. The density of the nodes depends on the accuracy requirement of the analysis and the available computational resources. The nodal distribution can be non-uniform, with a denser distribution in areas with large shape variation or large displacement gradient. Because of the intrinsic spatial adaptivity of the EFG method, the node density can be adaptively controlled and the initial nodal distribution quality should not be of major concerns. 2.3
MLS Approximation for Shape Function Construction
The central issue in all meshfree methods, the challenge is now to construct the shape functions with only the sampling nodes in the domain but without any predefined node connectivity. The desired requirements include arbitrary node distribution for h−adaptivity, consistency of the shape functions to ensure numerical convergence, compact support of the influence domain for good computational efficiency, and Kronecker delta function property for easy imposing of essential boundary condition [10]. In the EFG method, the moving least squares approximation [8] has been used to construct the shape functions because of its two desirable features: 1) the approximated field function is continuous and smooth in the entire problem domain; and 2) it is capable of producing an approximation with the desired order of consistency. MLS Procedure. Let u(x) be the field variable function in domain Ω, and uh (x) be the approximation of u(x) at point x. In the MLS approximation, uh (x) =
m
pj (x)aj (x) ≡ pT (x)a(x)
(2)
j=1
where pj (x) are the polynomial basis functions, m the number of terms in the basis functions, and aj (x) the unknown coefficients which are functions of the
564
H. Liu and P. Shi
spatial coordinates x. The basis functions usually consist of monomials of the lowest orders to ensure minimum completeness, and common ones include: Linear : pT(m=3) = {1, x, y} & Quadratic : pT(m=6) = {1, x, y, x2 , xy, y 2 } Given a set of n nodal values u1 , u2 , ..., un for the field function at n nodes x1 , x2 , ..., xn that are in the influence domain of x, Equation 2 is then used to calculate the approximated values of the field function at these nodes: uh (x, xI ) = pT (xI )a(x), I = 1, 2, ..., n
(3)
The coefficients ai (x) are obtained by performing a weighted least square fit for the local approximation, which is obtained by minimizing the difference between the local approximation uh (x, xI ) and the actual nodal parameter u(xI ): J=
w(x − xI )[uh (x, xI ) − u(xI )]2 =
I
w(x − xI )[pT (xI )a(x) − uI ]2 (4)
I
where w(x − xI ) is a weighting function with compact support within the influence domain. At an arbitrary point x, a(x) is chosen to minimize the weighted residual through ∂J ∂a = 0 which results in the linear equations and solutions: A(x)a(x) = B(x)Us & a(x) = A−1 (x)B(x)Us
(5)
where A(x) =
w(x − xI )p(xI )pT (xI ), B(x) = [B1 , B2 , ..., Bn ]
(6)
I
with BI = w(x − xI )p(xI ) and Us = {u1 , u2 , ..., un }T . Substituting into Equation 2, the approximation uh (x) becomes uh (x) =
n m
pj (x)(A−1 (x)B(x))jI uI =
I=1 j=1
n
φI (x)uI = Φ(x)Us
(7)
I=1
where the MLS shape function φI (x) is defined by φI (x) =
m
pj (x)(A−1 (x)B(x))jI
(8)
j
and Φ(x) = [φ1 (x), φ2 (x), ..., φn (x)]. Note that m is the number of terms of the polynomial basis which is usually set to be much smaller than n, the number of nodes used in the influence domain for constructing the shape function. In our implementation, the n nearest nodes to x which also satisfy the visibility test [1] are selected, and the requirement of n >> m prevents the singularity of the matrix A so that A−1 exists.
Meshfree Representation and Computation
565
Fig. 1. Influence domains (left) and cubic spline weighting function (right).
Weighting Functions. Theoretically, the weighting function w(x − xI ) can be arbitrary as long as it satisfies the positivity, compact, and unity conditions. In practice, it plays two vital roles in constructing the MLS shape functions. The first is to provide weighting for the residuals at different nodes in the influence domain, where we prefer nodes farther from x to have smaller weights. The second is to ensure that nodes leave or enter the influence domain in a gradual manner whenever x moves so that the shape functions satisfy the compatibility condition of the Hamilton’s principle. In our implementation, we use the cubic spline function which is dependent on the distance dI = |x − xI | (Fig. 1): 2 for r ≤ 12 3 − 4r2 + 4r3 4 4 3 2 (9) w(r) = 3 − 4r + 4r − 3 r for 12 < r ≤ 1 0 for r > 1 in which r = dI /dmI is a normalized radius. The support size dmI of the I th node is determined by dmI = dmax cI where dmax is a scaling parameter, and the distance cI is determined by searching for enough neighbor nodes for the matrix A in Equation 6 to be regular. This process also facilitates the h−adaptivity of the node distribution. For 2D case, tensor product influence domain and weighting functions are used as shown in Figure 1. An attractive property of the MLS approximations is that their continuity is related to the continuity of the weighting function. Therefore, a low order polynomial basis p(x) such as the linear one can be used to generate highly continuous approximations by choosing appropriate weight functions. Further, unlike FEM, post-processing to generate smooth stress and strain fields is unnecessary for EFG method [1]. 2.4
Penalty Method for Essential Boundary Condition Imposition
Since the MLS shape functions do not satisfy the Kronecker delta criterion, the imposition of essential boundary conditions in EFG is quite complicated, and we modify Equation 1 with the penalty method to deal with the issue [6]. Assuming that there are k conditions which the approximated field function cannot satisfy: T
C(u) = {C1 (u), C2 (u), ..., Ck (u)} = 0
(10)
566
H. Liu and P. Shi
where C is a given matrix of coefficients, we construct a functional CT αC = α1 C12 + α2 C22 + ... + αk Ck2 with α a diagonal matrix with penalty factor αi as its ith diagonal element. CT αC is always non-negative, and would be zero only if all the conditions in Equation 10 are fully satisfied. This penalty functional is added to the left side of Equation 1 which leads to a modified Galerkin weak form. In the most commonly encountered Dirichlet boundary condition, u = u ¯ is prescribed on the essential boundary Γu . We introduce a penalty factor to penalize the difference between the displacement of MLS approximation and the prescribed displacement on the essential boundary, and arrive at: T T δ(Lu) c(LU) dΩ − δu b dΩ − δuT t dΓ Ω Ω Γt 1 T −δ ¯ ) dΓ + ρδuT u ¨ dΩ = 0 (11) (u − u ¯ ) α(u − u 2 Γu Ω In order to impose the constraint fully, the penalty factor α must be infinite, which however is practically impossible. If the penalty factor is too small, the constraints will not be properly enforced, but if it too large, numerical problems arise. As suggested in [10], αi = 1.0 × 104−13 × max (diagonal elements in the stiffness matrix) has been used. Further, α can be a varying function related to the trustworthiness of the prescribed boundary conditions, i.e. the reliability of the MR tagging detection and the signal-noise-ratio of the MR phase contrast velocity in our cardiac motion recovery applications. 2.5
Construction of System Equations
System Dynamics. For linear material models, we can arrive at the system dynamics equation from Equation 11: ¨ + CU ˙ + [K + K b ]U = R + Rb MU
(12)
where U = [u1 , u2 , ..., unt ]T is the displacement vector with nt the total number of nodes in the domain, M the mass matrix, C the damping matrix, K the stiffness matrix, R the external force, K b the boundary condition penalty matrix, and Rb the boundary condition force: b MI,J = ρΦTI ΦJ dΩ, KI,J = BTI cBJ dΩ, KI,J = ΦTI αΦJ dΓ Ω Ω Γu b T RI = ΦI α¯ u dΓ, CI,J = λ1 MI,J + λ2 KI,J (Rayleigh Damping) Γu φI,x 0 φI 0 , BI = LΦI = 0 φI,y ΦI = 0 φI φI,y φI,x with φI,x and φI,y the derivatives of the shape function with respect to x and y, ΦI the matrix of shape functions, and BI the strain matrix at the I th node.
Meshfree Representation and Computation
567
Fig. 2. Top: Segmented MR tagging image and phase contrast intensity, x-, and yvelocity images. Bottom: meshfree representations of left ventricular slices from MR tagging image (right, with * indicating tag crossings) and phase contrast image (right).
Background Cells and Evaluation of System Integrals. In order to evaluate the entries of the system matrices, we need to perform integrations over the problem domain and both the natural and the essential boundaries, which have to be carried out through numerical techniques such as the Gauss quadrature [4]. A mesh of non-overlapping cells is required for the quadrature integration, and it is called the background mesh in EFG. In contrast to the FEM mesh needed for field variable interpolation, the background mesh in EFG is used merely for the integration of the system matrices and needs to be properly designed to obtain an appropriate solution of desired accuracy [1]. Furthermore, the background cells are usually totally independent of the arrangement of nodes, such as the example in Figure 1 where a regular-grid background mesh is used. These cells also facilitate the identification of nodes which contributes to the discrete L2 norm at a quadrature point as in Equation 4. In the regular-grid cell structure, there may exist cells that do not entirely belong to the analysis domain. It means that only a portion of a such cell belongs to the domain. A simple visibility scheme that automatically separates the portion of the cell which lies outside of the physical domain is employed [1]. We have used mc × mc cells in √ the intergration, where mc = nt and nt is the total number of nodes in the domain. The number of quadrature points depends on the number √ of nodes in a cell, and we have used nQ × nQ Gauss quadrature where nQ = m + 2 and m is the number of nodes in a cell.
568
3
H. Liu and P. Shi
Applications to Cardiac Motion Analysis
Cardiac motion recovery has been one of the major topics in medical image analysis. Here, we apply the developed EFG framework to the biomechanically constrained heart motion recovery over the cardiac cycle, through a state space strategy which performs optimal multi-frame estimation based on Kalman filter. 3.1
State Space Representation of the Dynamics
The system dynamics of Equation 12 can be transformed into a state-space representation of a continuous-time linear time-invariant stochastic system: x(t) ˙ = Ac x(t) + Bc w(t) + vc (t) where
(13)
U(t) 0 x(t) = ˙ , w(t) = R + Rb U(t) 0 I 0 0 , Bc = Ac = 0 M −1 −M −1 (K + K b ) −M −1 C
Assuming Markov process, it can then be converted to discrete state equation: x(t + 1) = Ax(t) + Bw(t) + v(t)
(14)
Ac ∆T −I)Bc , v(t) the additive, zero-mean, white with A = eAc ∆T and B = A−1 c (e process noise (E[v(t)] = 0, E[v(t)v(s) ] = Qv (t)δts ), and ∆T the time interval. An associated measurement equation, which describes the observed imaging data y(t), can be expressed in the form:
y(t) = Dx(t) + e(t)
(15)
where D is a known measurement matrix, and e(t) is the measurement noise which is additive, zero mean, and white (E[e(t)] = 0, E[e(t)e(s) ] = Re (t)δts ). 3.2
Optimal Estimation of Kinematics State
A recursive procedure is then used for the optimal estimation of the kinematics state from Equations 14 and 15 until convergence [7]: 1. Initial estimates for state x ˆ(t − 1) and error covariance P (t − 1). 2. The predictions: time update equations for the state and the error covariance: x ˆ− (t) = Aˆ x(t − 1) + Bw(t), P − (t) = AP (t − 1)AT + Qv (t)
(16)
3. The corrections: measurement update equations for the filter gain, the state, and the error covariance: L(t) = P − (t)H T (HP − (t)H T + Re (t))−1 x ˆ(t) = x ˆ− (t) + L(t)(y(t) − H x ˆ− (t)) −
−
T
(17) T
P (t) = P (t) − L(t)(HP (t)H + Re (t))L (t)
(18) (19)
Meshfree Representation and Computation
569
Fig. 3. Estimates from MR tagging images. Top: frame-to-frame displacement fields (#1-2, #5-6, #9-10, and #13-14). Bottom: cardiac-specific radial strain maps with respect to end diastole (frames #1, #5, #9, and #13, the strain-color ranges from −20% (blue) to 0%(white) to +20%(red)).
3.3
Experiment
Canine Cardiac Imaging Data. The meshfree framework is used to analyze two types of canine cardiac image sequences (Figure 2). The MR tagging images are segmented and the tag-tag and tag-boundary crossings are detected for all sixteen images of the cardiac cycle. These crossings provide a set of 2D displacements between image frames, and are used as the prescribed boundary conditions in the EFG framework (boundary condition of the first kind: Dirichlet). Similarly, the MR phase contrast contrast images provide the instantaneous velocity information on the myocardium, and sampled boundary point displacements are established using the shape-matching strategy [12] on the segmented boundaries, both of which are then used as the prescribed boundary conditions in the EFG framework (boundary condition of the third kind: Robin). The state space representation is established, and the optimal estimates of the kinematics fields are achieved through the recursive estimation process. Dense field motion and deformation maps are then derived (see Figures 3 and 4 for displacement maps and cardiac specific radial strain maps). Synthetic Data. In order to assess the accuracy of the state-space motion recovery results, synthetic data sequence with known kinematics are generated, and estimation results from the EFG and corresponding FEM strategies are acquired. As shown in Figure 5, the rectangular testing object (length = 18, height = 10, and thickness = 1) is made of two different material parts (modulus = 105 and P oission Ratio = 0.49 for the middle part, and modulus =
570
H. Liu and P. Shi
Fig. 4. Estimates from MR phase contrast images. Top: frame-to-frame displacement fields (#1-2, #5-6, #9-10, and #13-14). Bottom: cardiac-specific radial strain maps with respect to end diastole (frames #1, #5, #9, and #13, the strain-color ranges from −20% (blue) to 0%(white) to +20%(red)).
75 and P oission Ratio = 0.4 for the rest). Distributed time varying pressure P (k) = 58.2(1 − cos( (k−1)π )), where n = 16 is the total number of sampling n frames and k is the k th frame, is applied onto the top of the object, and the resulting deforming object shapes are shown for frames #1, #5, #9, and #13. For plane stress condition and using the recursive filter based optimal estimator, we recover the kinematic state (displacements and strains) of the object using the finite element and the meshfree computational frameworks. The vertical strain maps between frames #1 and #9 are shown in Figure 5, along with the known true strain for comparison. It is evident that the two computational methods provide very similar strain patterns, both of which are quite close to the true strain distribution. Displacement maps and the horizontal and shear strain maps exhibit similar results. In the experiments with the EFG and the FEM frameworks, the two strategies reach convergence with similar computational time. However, it is expected that if remeshing is needed for FEM under large deformation, the EFG should exhibit much faster performance. Further, because of its h−adaptivity, the EFG does not require post-processing of the strain/stress fields for smooth display, and it should produce smoother and intuitively more plausible results in general.
4
Conclusions
We have presented a meshfree representation and computation framework for medical image analysis, which offers computationally effective and more accurate alternative to the popular finite element methods, especially in domain mapping cases where there are large deformations between images. The princi-
Meshfree Representation and Computation
testing object
571
pressure loading
frame #1
frame #5
true strain
FEM strain
frame #9
EFG strain
frame #13
strain scale
Fig. 5. Experiment with synthetic data. 1st row: testing object and loading setup; 2nd row: examples of deforming object boundary (the dots indicate the data constraining points for the FEM/EFG estimations); 3rd row: true and estimated vertical strain maps between frames #1 and #9.
pal attraction of this node based strategy is the possibility of simplifying spatial adaptivity and polynomial-order adaptivity, and easier handling problems with moving boundaries and discontinuities. We have demonstrated the relevancy of this strategy to medical image analysis problems, and shown the favorable application results to the mechanically constrained multi-frame analysis of the heart motion under two kinds of image constraints. The strategy is assessed with synthetic data for its accuracy, and is compared to FEM in performance. This work is supported in part by HKRGC CERG Grant HKUST6057/00E.
References 1. Belystchko, T., Lu, Y.Y, and Gu, L.: Element-free Galerkin methods. Internaltion Journal for Numerical Methods in Engineering 37(1994) 229–256 2. Bro-Nielsen, M.: Finite element modeling in surgery simulation. Proceedings of IEEE 86(3)(1998) 490–503
572
H. Liu and P. Shi
3. Cohen, L.D., and Cohen, I.: Finite-element methods for active contour models and balloons for 2-D and 3-D images. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(11)(1993) 1131–1147 4. Cook, R.D., Malkus, D.S., Plesha, M.E., and Witt, R.J.: Concepts and Applications of Finite Element Analysis. John Wiley & Sons(2002) New York 5. Ferrant, M., Nabavi, A., Macq, B., Jolesz, F.A., Kikinis, R., and Warfield, S.K.: Registration of 3-D intraoperative MR images of the brain using a finite-element biomechanical model. IEEE Trans. on Medical Imaging 20(12)(2001) 1384–1397 6. Gavete, L., Benito, J.J, Falcon, S., and Ruiz, A.: Implementation of essential boundary conditions in a meshless method. Communications in Numerical Methods in Engineering 16 (2000) 409–421 7. Kamen, E.E. and Su, J.K.: Introduction to Optimal Estimation. Springer (1999) London 8. Lancaster, P. and Salkauskas, K.: Surface Generated by Moving Least Squares Methods. Mathematics of Computation 37(155)(1981) 141–158 9. Li, S.F. and Liu, W.K.: Meshfree and particle methods and their applications. Applied Mechanics Review 55(2002) 1–34 10. Liu, G.R.: Mesh Free Methods. CRC Press (2003) Boca Raton 11. Lucy, L.B.: A numerical approach to the testing of the fission hypotheis. The Astronomy Journal 8(12)(1977) 1013–1024 12. Papademetris, X., Sinusas, A.J., Dione, D.P., Constable, R.T., and Duncan, J.S.: Estimation of 3D left ventricular deformation from medical images using biomechanical models. IEEE Transactions on Medical Imaging 21(7)(2002) 786–799 13. Shi, P. and Liu, H.: Stochastic finite element framework for cardiac kinematics function and material property analysis. Medical Image Computing and Computer Assisted Intervention (2002) 634–641 14. Terzopoulos, D. and Metaxas, D.: Dynamic 3D models with local and global deformations: deformable superquadrics. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(7)(1991) 703–714
Visualization of Myocardial Motion Using MICSR Trinary Checkerboard Display Moriel NessAiver1 and Jerry L. Prince2 1
University of Maryland School of Medicine, Baltimore MD 21201, USA [email protected] 2 Johns Hopkins University, Baltimore MD 21218, USA, [email protected], http://iacl.ece.jhu.edu/˜prince/index.html
Abstract. Magnetic resonance tagging is used to quantify and visualize myocardial motion. The lack of tag persistence, however, is often a problem in visualizing motion throughout the cardiac cycle, particularly in late diastole. Complementary spatial modulation of magnetization (CSPAMM) improves tag persistence, but requires complex signal subtraction, which is not available on all scanners. This paper analyzes the contrast to noise properties of a new method called magnitude image CSPAMM reconstruction (MICSR), which requires only magnitude images from the scanner and demonstrates improved contrast and tag persistence over CSPAMM. A novel “trinary checkerboard” display of myocardial motion, ideally matched to the contrast to noise properties of MICSR, is also presented.
1
Introduction
Magnetic resonance (MR) tagging has been extensively developed from its origins, where cardiac motion was visualized as the motion of isolated tag lines [1, 2, 3], to sophisticated current applications, where cardiac motion is measured in real-time, as 3D strain maps, as finite-element or B-spline models, and as parametric left ventricular models [4]. Regardless of whether the tag patterns are sinusoidal (so-called 1-1 SPAMM) [2], are “crisp” lines [3,5], or possess more complicated patterns [6], the ability for the tag pattern to persist into late diastole is a desirable property. One simple way to improve tag persistence is for the tagging pulse to yield a maximum tip angle of 180 degrees. In this case, the longitudinal magnetization, affected by both relaxation and imaging pulses, takes the longest amount of time to recover — i.e., the tags persist longer [Fig. 1(a)]. The major difficulty with this approach is that the magnitude images normally reconstructed by an MR scanner show a rectified tag pattern [Figs. 1(b),(c)], which is both difficult to process and to visualize. Further, it is generally impractical to apply the phase correction that is necessary to reconstruct a “real” tagged image. Complementary spatial modulation of magnetization (CSPAMM) provides an elegant solution to the problem of tag fading and rectification [7,8]. CSPAMM performs the imaging sequence twice, once using a [+90◦ , +90◦ ] 1-1 SPAMM tag C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 573–585, 2003. c Springer-Verlag Berlin Heidelberg 2003
574
M. NessAiver and J.L. Prince
Fig. 1. (a) Theoretical tag profile for a “real” tagged image; (b) tag profile of a magnitude reconstructed tagged image; (c) sequence of magnitude reconstructed tagged images showing rectification artifact and tag decay.
sequence and the other time using a [+90◦ , −90◦ ] 1-1 SPAMM tag sequence. Both initial tag patterns are zero-mean, sinusoidal patterns, but the second sinusoid is spatially shifted by 180 degrees — i.e., it is inverted relative to the first. By subtracting these two complex images, the resulting sinusoidal tag pattern has a zero mean and peak-to-peak amplitude that is double that of either individual pattern. Even after longitudinal decay, the tag pattern is (ideally) zero mean and sinusoidal. This process does not alleviate the problem of phase variation across the field-of-view (FOV), however, so it is typically necessary to rectify the resulting images. In contrast to rectified SPAMM images, rectified CSPAMM images have “crisp” tags, corresponding to the zero-crossings of the subtraction image, which are spatially stable with respect to longitudinal relaxation in the absence of motion. But CSPAMM images are still lacking in certain respects. First, tag persistence is still not ideal, as the tag amplitude nominally has an exponential decay due to longitudinal relaxation and imaging pulses. Second, the resulting rectified CSPAMM images are not sinusoidal, which means they cannot be directly used in HARP analysis, a recently developed, fully-automatic tag analysis approach [9, 10]. Third, despite the crisp appearance of the tag lines, they are relatively difficult to find automatically because of their varying brightness profiles. We recently reported a new method called magnitude image CSPAMM reconstruction (MICSR), which addresses the above deficiencies of standard CSPAMM reconstruction [11, 12]. MICSR reconstructions have long tag persistence, are pure sinusoids, and require no complex image reconstruction or phase correction. In this paper, we expand on these initial reports (both were abstracts) by developing and comparing explicit expressions for the contrast and contrast-to-noise (CNR) of CSPAMM, rectified CSPAMM, and MICSR. We also present a novel cardiac motion visualization approach, “trinary checkerboards,” which ideally utilizes the CNR properties of MICSR. Results are demonstrated using both simulations, a phantom, and in vivo cardiac MR tagged images.
Visualization of Myocardial Motion
2
575
Theory
In this section, we first derive and compare the imaging equations for CSPAMM, rectified CSPAMM, and MICSR. We then derive and compare expressions for the contrast-to-noise ratios of the three methods. Although our theoretical results ignore the effects of the imaging pulses that occur throughout the longitudinal decay of a tag pattern, we have found that our comparative conclusions hold up experimentally as well. 2.1
Imaging Equation
Let A represent the series of images obtained using a [+90◦ , +90◦ ] tagging pulse and let B represent those obtained using [+90◦ , −90◦ ]. Assuming that initially MZ (x) = M0 and that there is no motion, the spatial and temporal distribution of the z magnetization after the tagging sequence can be described by the equations 2πx A(x, t) = M0 1 − 1 − cos e−t/T1 , (1) P 2πx − π e−t/T1 , (2) B(x, t) = M0 1 − 1 − cos P where P is the spatial period of the tag pattern. We note that the tags in B are shifted by one half cycle (π radians) relative to those in A. Subtracting (2) from (1) yields the standard CSPAMM image, CSPAMM = A − B = 2M0 e−t/T1 cos(2πx/P ) .
(3)
Phase-correction is required in order to achieve this result, however, and this can be problematic. One simple way to eliminate the “phase roll” is to the take the magnitude (complex modulus). This yields the rectified CSPAMM image, |CSPAMM| = |A − B| = |2M0 e−t/T1 cos(2πx/P )| .
(4)
The MICSR reconstruction formula is given by [11] MICSR = |A|2 − |B|2 , and, using Eqs. (1) and (2), it is straightforward to show that MICSR = 4M02 1 − e−t/T1 e−t/T1 cos(2πx/P ) . 2.2
(5)
(6)
Contrast Behavior
We define tag contrast as the peak-to-peak amplitude of the tag pattern, which is equal to twice the amplitude of a pure sinusoid and the amplitude of a rectified sinusoid. Accordingly, from Eqs. (3), (4), and (6), we find
576
M. NessAiver and J.L. Prince
Contrast CSPAMM = 4M0 e−t/T1 , Contrast |CSPAMM| = 2M0 e−t/T1 , Contrast MICSR = 8M02 1 − e−t/T1 e−t/T1 .
(7) (8) (9)
The contrast of CSPAMM can be viewed as a theoretical upper bound since it is not easily achieved in practice. Our primary emphasis in this analysis is on a comparison between |CSPAMM| and MICSR. In particular, the ratio of MICSR to |CSPAMM| contrasts are given by MICSR Contrast = 4M0 1 − e−t/T1 . |CSPAMM| Contrast
(10)
Fig. 2 plots Eqs. (8), (9), and (10) assuming values of M0 = 1.0 and T1 = 800 ms. We see that the MICSR and |CSPAMM| contrasts are quite different in character. While |CSPAMM|’s contrast is largest at the outset, MICSR’s contrast grows toward a peak value at around 500 ms after the tag application. On the other hand, the initial contrast of MICSR is very low, which can produce inferior images in early systole. A modification to the MICSR reconstruction formula can significantly improve its early contrast, but space limitations prevent its presentation herein.
Fig. 2. Theoretical peak-to-peak contrasts for |CSPAMM| and MICSR and their contrast ratio.
Eq. (10), contains the term M0 , which means that the ratio of contrasts depends on the underlying magnetization. What really matters in imaging, however, is not whether the underlying contrast alone is improved (which after all can be done using a simple gain factor) but whether the contrast-to-noise is improved. Accordingly, we now compare the noise behavior of these three methods. 2.3
Contrast-to-Noise Behavior
The most common way to measure noise in an MR image is to use a region-ofinterest in the background — i.e., the air — to determine the underlying noise
Visualization of Myocardial Motion
577
variance. Contrast-to-noise (CNR) is usually computed using this background noise. In our case, however, since both |CSPAMM| and MICSR are computed as nonlinear functions of the underlying image signal, it is necessary to use the noise within the object being imaged in computing CNR. Also, since the noise power will depend on the signal intensity of the object, the presence of an imposed tag pattern causes the contrast-to-noise ratio (CNR) to vary both spatially and temporally. To understand the nature of the temporal and spatial variation of CNR, yet to also retain some simplicity, we have elected to compute the temporal evolution of CNR at the two signal extremes — i.e., the peaks and the zerocrossings (tags) of the tag pattern. The CNR behavior at the zero-crossings is particularly important since visualization and analysis of tag locations is a common objective in cardiac tagged MRI. Assume that the noise in each of the real and imaginary channels of the complex images A and B are independent Gaussian random variables, each with mean zero and variance σ 2 . If one were able to accurately perform phase corrections on A and B, then the noise in the real-valued CSPAMM image, A − B, is twice would have a zero mean and variance of 2σ 2 . In this case, the contrast √ the amplitude given in Eq. (3), the noise standard deviation is 2σ, and the CNR is therefore 4M0 e−t/T1 √ CNR CSPAMM = , (11) 2σ which is spatially invariant and exponentially decaying with time. It has been previously shown that image intensities in magnitude MRI images follow a Rician distribution [13]. When the signal-to-noise ratio in such an image exceeds 3.0, however, the noise approaches an additive Gaussian random variable with a variance equal to that of the underlying image. Therefore, in |CSPAMM| where the underlying image is A − B, the noise variance in the high signal regions (e.g., between the tag lines) is 2σ 2 , just like in CSPAMM. The situation changes, however, at the zero-crossings — i.e., the tags — of the CSPAMM image. Here, the underlying image value is zero, and the Rician distribution becomes a Rayleigh distribution with mean and variance given by Mean |CSPAMM|(tags) = 2σ 2 π/2 , (12) Variance |CSPAMM|(tags) = σ 2 (4 − π) .
(13)
Interestingly, we find that the mean of |CSPAMM| is not zero at the tags; therefore, the expected contrast of |CSPAMM| is actually reduced because of noise. Incorporating the reduction in expected contrast, the CNR of |CSPAMM| images at the peak signal regions is given by 2M0 e−t/T1 − 2σ 2 π/2 √ , (14) CNR |CSPAMM|(peak) = 2σ and at the tag lines is given by 2M0 e−t/T1 − 2σ 2 √ CNR |CSPAMM|(tags) = σ 4−π
π/2
.
(15)
578
M. NessAiver and J.L. Prince
We note that the ratio of the |CSPAMM| CNR at the tag lines to that at the peak is 2/(4 − π) ≈ 1.526, which highlights the spatially-varying nature of CNR in |CSPAMM| images. To calculate the noise behavior of MICSR, we use the equation MICSR = ¯ The |A|2 − |B|2 , given in Eq. (5). Let the noise-free signal in A be denoted by A. complex MRI image is given by A = A¯ cos φ + nc + j(A¯ sin φ + ns ) ,
(16)
where φ is an unknown phase angle (which may be random and/or spatially varying) and nc and ns are independent, zero-mean, Gaussian random variables each with variance σ 2 . It is straightforward to show that Mean |A|2 = A¯2 + 2σ 2 , Variance |A|2 = 4A¯2 σ 2 + 4σ 4 .
(17) (18)
Analogous expressions to those in Eqs. (16), (17), and (18) hold for the complementary image B. Since the noise terms in A and B are independent, it follows that ¯2 , Mean MICSR = A¯2 − B ¯ 2 ) + 8σ 4 . Variance MICSR = 4σ 2 (A¯2 + B
(19) (20)
Unlike that of |CSPAMM|, the variance of MICSR images is highly dependent on the spatial and temporal variations of both A and B. We now examine the temporal variation of MICSR CNR at the both the zero-crossings — i.e., the tags — and the signal peaks. Spatially, the peak MICSR signal is produced when A = M0 and B = M0 (1− 2e−t/T1 ) . At the location of the peak signal, where A σ, we can neglect the 8σ 4 term in the variance of Eq. (20). Dividing Eq. (9) by the square root of the variance in Eq. (20), and substituting the above values for A and B yields the contrast-to-noise ratio of MICSR at the peak 4M0 (1 − e−t/T1 )e−t/T1 CNR MICSR(peak) = . σ 1 + (1 − 2e−t/T1 )2
(21)
At a zero crossing, we know that A = B for all time. Therefore, the MICSR variance becomes 8σ 2 (A¯2 + σ 2 ) . Dividing Eq. (9) by the square root of this variance and substituting A = M0 (1 − e−t/T1 ) yields the CNR of MICSR at the zero crossings √ 8M 2 (1 − e−t/T1 )e−t/T1 . (22) CNR MICSR(tags) = 0 σ M02 (1 − e−t/T1 )2 + σ 2 Figs. 3(a) and (b) show the CNR’s for |CSPAMM| and MICSR as well as their CNR ratios at the tags and the peaks, respectively. The longitudinal relaxation constant is assumed to be T1 = 800 ms, and M0 is set in order to achieve an
Visualization of Myocardial Motion
579
underlying initial signal-to-noise ratio of 40, a typical value for cardiac MRI. It can be observed from (a) that the CNR of MICSR at the tags is better than that of |CSPAMM| throughout the duration of a heartbeat, except in very early systole. The improvement in CNR is roughly between 30–50% over the vast majority of the cardiac cycle. At the peak signal locations [panel (b)], MICSR CNR equals that of |CSPAMM| only after about 300 ms, increasing to about a factor of 2 at about 1 s, the ending time of a typical heartbeat. It is apparent that MICSR has superior properties throughout the heartbeat at the tag locations and in diastole at the peak signal locations.
Fig. 3. Theoretical contrast-to-noise ratios for CSPAMM, |CSPAMM|, and MICSR.
It is interesting to compare MICSR to CSPAMM, assuming phase correction were performed in order to yield the real component of the CSPAMM images. In this case, the variance throughout the entire image is 2σ 2 and √ 8M0 e−t/T1 . (23) CNR CSPAMM = σ Figs. 3(c) and (d) show the CNR’s for CSPAMM and MICSR as well as their CNR ratios at the tags and the peaks, respectively. In (c), it is observed that MICSR has nearly identical CNR at the tag locations to that of CSPAMM except in very early systole. In (d), it is observed that MICSR has overall worse CNR at the peak locations than CSPAMM, approaching the CNR of CSPAMM asymptotically.
580
3 3.1
M. NessAiver and J.L. Prince
Methods MR Image Acquisition
All scans were performed on a Marconi 1.5T Eclipse MR scanner. The CSPAMM tagging pulse sequence consisted of two non-selective, 90◦ , 400 µs RF pulses with an intervening 700 µs modulating gradient, which produced a sinusoidal tag profile. The tag profile was shifted by one-half period by phase alternating the second pulse by 180◦ . Two different versions of a FAST cine imaging sequence were used with the scan parameters detailed below in the Results section. In all sequences, the tag lines were oriented perpendicular to the frequency encode gradient. 3.2
Upsampling
A MICSR image is basically an image of the underlying anatomy multiplied by a zero-mean sinusoid [see Eq. (6)], which is subsequently distorted by motion. The MR acquisition process low-pass filters this signal and samples it. Unlike |CSPAMM|, which adds high frequency information through a nonlinear operation, MICSR preserves the underlying frequencies of the tag pattern itself. Therefore, image interpolation — while unable to restore the details of the anatomy — is very effective in restoring the details of the tag pattern. In particular, the zero-crossings of the MICSR image — i.e., the tag lines — can be determined with improved accuracy using interpolation. Prior to trinary checkerboard visualization (see below), we upsample the image data using either zero-filling in the frequency domain (sinc interpolation) or cubic spline interpolation in the image domain. In the volunteer’s data reported in the Results section below, the scanner performed a zero-filled interpolation by a factor of two, and we followed this with an eight times interpolation using cubic splines. The effective upsampling demonstrated in this example was therefore a factor of 16. 3.3
Trinary Checkerboard Visualization
The preceding analysis reveals excellent MICSR CNR at the tag locations throughout the entire cardiac cycle, rivaling that of CSPAMM itself. On the other hand, MICSR CNR at peak locations is generally poor in systole, and does not rival that of |CSPAMM| until t > 300 ms and CSPAMM until t > 900 ms. These properties of MICSR suggest the use of a nonlinear transformation of the MICSR intensities for visualization purposes that focuses attention on the tag locations while downplaying the peak intensities. There are an infinite number of transformations that satisfy these basic principles. We have developed the following mapping, which we refer to as the MICSR trinary checkerboard visualization. Unlike magnitude MR images, MICSR images have both positive and negative values. In particular, their zero-crossings represent the tag locations. In
Visualization of Myocardial Motion
581
order to “see” the MICSR image values around the tags, we maintain a linear relationship between these values and the visualized intensities. However, MICSR values that are larger in magnitude will be thresholded and displayed as constant. Accordingly, the MICSR trinary checkerboard image is given by MICSR ≥ +1 , (24) TCB = MICSR , |MICSR| < −1 , MICSR ≤ − where is a small positive number. A plot showing the relationship between TCB and MICSR image values is shown in Fig. 4.
Fig. 4. Transformation from MICSR values into trinary checkerboard values.
The choice of is important in determining the overall appearance and utility of the trinary checkerboard display. If selected too large, then a fair amount of the image is dominated by a grayscale MICSR image and the constant intensity (checkerboard) regions are diminished. If selected too small, then the transition regions between “white” and “black” squares are very narrow and are subject to noise. There are several sensible possibilities for selection of . Although there may be more principled ways to select , to date we have selected a constant value that does not change with time and which yields an overall visually pleasing appearance.
4 4.1
Results Phantom Results
A phantom consisting of four long, thin cylinders inside a larger cylinder, similar in size to a common head coil phantom, was imaged. The thin cylinders were doped to have different T1 and T2 values. All measurements were made in the cylinder with T1 /T2 values of 800/45, similar to myocardium. The phantom imaging was performed using a quadrature head coil. The MR parameters were: tag period = 6 mm;FOV=280 mm; matrix = 64 × 256; slice thickness = 7 mm; time between frames = 17.1 ms; # images = 69. Fig. 5 shows three representative pairs of images from the phantom series, with trigger delays of 30, 300, and 1000 ms. The images in (a) were filmed
582
M. NessAiver and J.L. Prince
with fixed window and level settings to clearly demonstrate how the contrast changes over time. The MICSR images in (b) are shown using a trinary “bar” visualization, which comprises tags in only one orientation. With this display, even though the actual contrast is changing over time, the displayed contrast remains constant.
Fig. 5. Phantom images at three times: MICSR and |CSPAMM| reconstructions showing (a) intensity changes over time and (b) trinary “bar” visualization and normalized intensity maps.
4.2
Human Heart
A normal human male volunteer was scanned using a 4-channel cardiac phasedarray coil. A total of four breath-holds were required, two with phase-alternated tags in one direction and two with phase and frequency directions swapped. The MR parameters were: tag period = 7 mm;FOV=290 mm; matrix = 32 × 120; slice thickness = 7 mm; time between frames = 16.1 ms; # images = 32. MICSR reconstructions on all time frames were computed using |A|2 − |B|2 in each of the two tag directions. The resulting images were upsampled by a factor of 16 (see above for description), and the trinary mapping was applied to each sequence separately. A trinary checkerboard visualization was created by multiplying these two orientations together. Fig. 6 shows the resulting trinary checkerboard visualization. In these images, one can distinctly visualize motion and deformation by tracking the appearance of a particular checkerboard element. Circumferential and radial strains are visualized by the elongation of an element in a particular direction. Shear is clearly visualized by the distortion of a square into a diamond shape.
5
Discussion
Because of the spatial and temporal differences in contrast and contrast-to-noise, comparisons between CSPAMM, |CSPAMM|, and MICSR cannot be absolute. One method cannot be claimed to be superior to the other in all respects. For example, it is clear from all standpoints that |CSPAMM| has better contrast and
Visualization of Myocardial Motion
583
Fig. 6. (a) A sequence of trinary checkerboard MICSR images, showing 64% of the cardiac cycle (starting at the top-left and proceeding horizontally to the right). Selected images are zoomed in (b), (c), and (d) for detailed inspection.
CNR than MICSR immediately after tag application. So, it might be reasonable to use |CSPAMM| if early systolic imaging were the goal. On the other hand, |CSPAMM| does not produce a sinusoidal pattern, and this might be problematic if, for example, HARP processing were going to be used. Further, although tag tracking is the usual objective when |CSPAMM| is used, we found that MICSR has superior CNR near the tags (after a very short interval after tag application), and should therefore be better in tag-tracking applications. If, on the other hand, diastolic imaging is the goal, then MICSR stands up well from all viewpoints, even in comparison to CSPAMM itself. Although it could be argued that conventional tag visualization — i.e., using higher-order SPAMM or |CSPAMM| — provides the same information as MICSR, the trinary checkerboard visualization has a strong visual appeal and may be computationally advantageous as well. Part of the appeal of this MICSR visualization is the fact that the contrast is normalized throughout the cardiac cycle. One does not have to adjust visually (or indeed computationally) to the
584
M. NessAiver and J.L. Prince
changes in underlying tag or image contrast. Another part of the appeal is related to the fact that MICSR has CNR performance nearly equal to that of true CSPAMM at the tag lines, but does not require phase correction to achieve this. This means that visualization of the movement of the tag lines, which corresponds to the edges of the trinary checkerboard elements, is very stable, and is not affected by the phase correction errors. It takes advantage of the improved SNR afforded by the dual acquisitions of CSPAMM, but does not sacrifice CNR at the tag locations by the non-linear operation of |CSPAMM|. We are confident that aside from the visual appeal of the trinary checkerboard visualization, MICSR methods will enter into new computational paradigms for automated cardiac motion analysis in the future. Clinical utility of MICSR has yet to be established. However, there are several indicators that suggest that it may play an important role in the future. First, —CSPAMM— is already regarded as the “method of choice” for imaging myocardial tags that persist throughout the entire cardiac cycle. It is argued above that MICSR shows improved CNR over —CSPAMM— throughout most of the cardiac cycle. But there are other factors that may be even more important in the long run, and are not emphasized in this paper. For one, tag lines can be easily found in MICSR images using an isocontour algorithm with a zero threshold. This is considerably easier than implementing tag-finding algorithms for —CSPAMM—. This could be used for alternate visualization or for automatic strain analyses. In this regard, it should be re-iterated that MICSR yields a zero-mean sinusoidal pattern within the tagged tissue. This is an ideal pattern for HARP analysis, which is a relatively new fully-automated strain analysis approach. Thus, with further experimentation and validation, it is expected that MICSR will find a valuable role in clinical protocols for myocardial motion visualization and analysis using magnetic resonance imaging. Acknowledgments. This work was supported in part by the NIH/NHLBI (R01-HL47405) and the NIH/NIDOCD (R01-DC001758). Note: Jerry L. Prince is a founder of and owns stock in Diagnosoft, Inc., a company which seeks to license the HARP technology. The terms of this arrangement are being managed by the Johns Hopkins University in accordance with its conflict of interest policies.
References 1. E. A. Zerhouni, D. M. Parish, W. J. Rogers, A. Yang, and E. P. Shapiro. Human heart: Tagging with MR imaging—a method for noninvasive assessment of myocardial motion. Radiology, 169(1):59–63, 1988. 2. L. Axel and L. Dougherty. MR imaging of motion with spatial modulation of magnetization. Radiology, 171:841–845, 1989. 3. L. Axel and L. Dougherty. Heart wall motion: Improved method of spatial modulation of magnetization for MR imaging. Radiology, 172(1):349–350, 1989.
Visualization of Myocardial Motion
585
4. A.A. Amini and J.L. Prince, editors. Measurement of cardiac deformations from MRI: physical and mathematical models. Kluwer Academic Publishers, Dordrecht, 2001. 5. W. H. Perman, L. L. Creswell, S. G. Wyers, M. J. Moulton, and M. K. Pasque. Hybrid DANTE and phase-contrast imaging technique for measurement of threedimensional myocardial wall motion. J. Magn. Res. Imag., 5:101–106, 1995. 6. W.S. Kerwin and J.L. Prince. A k-space analysis of MR tagging. J Magn Res, 142:313–322, 2000. 7. S. E. Fischer, G. C. McKinnon, S. E. Maier, and P. Boesiger. Improved myocardial tagging contrast. Magn. Res. Med., 30:191–200, 1993. 8. S. E. Fischer, G. C. McKinnon, M. B. Scheidegger, W. Prins, D. Meier, and P. Boesiger. True myocardial motion tracking. Magn. Res. Med., 31:401–413, 1994. 9. N. F. Osman, W. S. Kerwin, E. R. McVeigh, and J. L. Prince. Cardiac motion tracking using CINE harmonic phase (HARP) magnetic resonance imaging. Magn Res Med, 42:1048–1060, 1999. 10. N. F. Osman, E. R. McVeigh, and J. L. Prince. Imaging heart motion using harmonic phase MRI. IEEE Trans. Med. Imag., 19(3):186–202, March 2000. 11. M. NessAiver and J.L. Prince. Magnitude image CSPAMM reconstruction (MICSR) improves tag contrast and persistence. In Proc ESMRMB, 2002. Cannes, 22–25 August. 12. M. NessAiver and J.L. Prince. Improved CSPAMM tag contrast and persistence using magnitude image reconstruction (MICSR). In Proc RSNA, 2002. Chicago, 1–6 December. 13. H. Gudbjartsson and S. Patz. The Rician distribution of noisy MRI data. Magn Res Med, 34:910–914, 1995.
Velocity Estimation in Ultrasound Images: A Block Matching Approach Djamal Boukerroui1,2 , J. Alison Noble2 , and Michael Brady2 1
HEUDIASYC, UMR CNRS #6599, Universit´e de Technologie de Compi`egne, BP 20529 - 60205 Compi`egne Cedex, France. [email protected] 2 Medical Vision Laboratory, Department of Engineering Science, University of Oxford, Parks Road, Oxford OX1 3PJ, UK. {noble,jmb}@robots.ox.ac.uk
Abstract. In this paper, we focus on velocity estimation in ultrasound images sequences. Ultrasound images present many difficulties in image processing because of the typically high level of noise found in them. Recently, Cohen and Dinstein have derived a new similarity measure, according to a simplified image formation model of ultrasound images, optimal in the maximum likelihood sense. This similarity measure is better for ultrasound images than others such as the sum-of-square differences or normalised cross-correlation because it takes into account the fact that the noise in an ultrasound image is multiplicative Rayleigh noise, and that displayed ultrasound images are log-compressed. In this work we investigate the use of this similarity measure in a block matching method. The underlying framework of the method is Singh’s algorithm. New improvements are made both on the similarity measure and the Singh algorithm to provide better velocity estimates. A global optimisation scheme for algorithm parameter estimation is also proposed. We show that this optimisation makes an improvement of approximately 35% in comparison to the result obtained with the worst parameter set. Results on clinically acquired cardiac and breast ultrasound sequences, demonstrate the robustness of the method.
1
Introduction
The measurement of optical flow or image velocity is a fundamental problem in Computer Vision. Several techniques have been presented in the literature and many more continue to appear [1,14,7,9,2]. Such estimation or measurement of optical flow may be done, for example, to improve the efficiency of encoding the image, or to allow enhancement of the display of, or measurement of, the movement of some particular tracked part of the image to assist an observer to interpret the image [12]. Indeed, in medical applications for example, motion measurements constitute an essential component in the evaluation of any patient with known or suspected heart disease. In particular, detecting and characterising abnormalities in segmental wall motion function has become the hallmark of diagnosing coronary artery disease because reduced motion correlates with C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 586–598, 2003. c Springer-Verlag Berlin Heidelberg 2003
Velocity Estimation in Ultrasound Images
587
ischaemic muscle action [3,6]. Motion measurements are also used in breast deformation analysis to measure the elastic properties of tissues and to provide an indication of tissue hardness (cancerous tissue is, in general, harder than normal tissue) by computing a strain image or relative Young’s modulus image [5]. In this paper, we focus on velocity estimation in ultrasound images sequences. Ultrasound images present many difficulties in image processing because of the typically high level of noise found in them. For example, the tracking of cardiac walls in cardiac ultrasound images is difficult because of the high level of noise and also because of the nature of the cardiac motion. Various ways of motion estimation in ultrasound sequences have been proposed [10,11,12,4,8,9], but it is a difficult task in which there is room for improvement. Optical flow methods can be classified as belonging to one of main three groups [1]: Differential techniques or gradient-based methods are based on the assumption that the brightness of a pattern is invariant over time and compute image velocity from spatio-temporal intensity derivatives and use a regularisation procedure based on a priori knowledge. Differential methods give good results on good quality images, however, they are highly sensitive to noise because of numerical differentiation and produce inaccurate results where the brightness constancy assumption is violated. In the second class, frequency-based techniques, two types of methods exist: energy-based and phase-based approaches. Theoretically, phase-based optical flow estimation is the most appropriated method for ultrasound images. The use of phase information makes the method robust to attenuation artefacts. The disadvantage of filter-based estimation is that the filter response is optimal only for a velocity range. We believe that is hard to design optimal filters tuned to the velocity range in a cardiac sequence as cardiac motion varies during the cardiac cycle. It has been reported in [11,12] that spatio-temporal estimation is insufficient for low frame-rate sequences and that there are a number of localisation problems because of the non-uniformity of wall velocity during the cardiac cycle. The third class is block-matching motion estimation which defines velocity as the shift that yields the best fit between image regions/features at different times. The best match is found by maximising a similarity measure. Matching methods are in general computationally expensive for dense flow field estimation. They are particularly well suited if the estimation of the flow field is necessary only at some location (eg. only at the heart walls). Motivated by the results obtained in [10] and a recent similarity measure derived according to a simplified image formation model of ultrasound images [4], we have developed a new block-matching method. The underlying framework of the method is Singh’s algorithm [13]. The outline of this paper is as follows. First a brief description of the similarity measure used and the underlying assumptions are presented in Section 2. Section 3 describes Singh optical flow estimation with a focus on the new changes made to improve the estimation. Implementation details and the global optimisation scheme for the parameters estimation are given in section 4. Section 5, presents results on clinically acquired cardiac and breast ultrasound sequences, and the summary and conclusion appear in Section 6.
588
2
D. Boukerroui, J.A. Noble, and M. Brady
Maximum Likelihood Motion Estimation [4]
We assume that two consecutive frames x and y are the realisation of two random variables X and Y . Let’s suppose that a block xi = {xij , j = 1 . . . Λ} in x matches a block y i = {yij , j = 1 . . . Λ} in y and the displacement vector is denoted by v i = [ui , vi ]T . Here, the indice i is an index over all possible blocks and j is an index of pixels in the block. Given the above notations the maximum likelihood (ML) estimation of v i is given by [4]: L = arg max p(xi |y i , v i ) . vM i vi
(1)
where the conditional probability density function (pdf) depends in general on the noise model. Note that the above equation supposes implicitly the whiteness of the noise as the velocity estimation of a block i is independent of the remaining displacement field. In ultrasound, when the speckle is fully developed, the noise is multiplicative and follows a Rayleigh pdf. If the noiseless value of the jth pixel in region xi is denoted by sij and under the assumption of independent noise, the following model for the observed pixels in x and y stands [4]: 1 yij = ηij sij ,
xij =
2 ηij sij
(2)
,
(3)
1 2 where ηij and ηij are two independent noise elements with a Rayleigh density function given by: 1 2 1 , η1 > 0 , (4) pη1 (η 1 ) = ηα exp − (η2α)2 2 2 2 pη2 (η 2 ) = ηβ exp − (η2β)2 , η2 > 0 . (5)
Given these two models, the following equation is obtained: xij = ηij yij ,
(6)
where the noise term is defined as follow: ηij =
1 ηij 2 ηij
and pη (η) = 2
α2 β
η2
η 2 2 , +
η>0 .
(7)
α β
Taking the natural logarithm of both sides of eq. 6 and denoting ln(x) by x ˜ we obtain the following model for displayed ultrasound images: x ˜ij = y˜ij + η˜ij ,
(8)
and the pdf of the additive noise η˜ij is given by: p(˜ η) = 2
α2 β
exp(2˜ η) 2 2 . exp(2˜ η) + α β
(9)
Velocity Estimation in Ultrasound Images
589
Cohen and Dinstein [4] supposed that the independent noise of the successive frames follow the same distribution (i.e. α = β) which simplifies the above equation. Given the above model (eq. 8), the conditional pdf is given by: yi , vi ) = p(˜ xi |˜
Λ j=1
2
exp(2(˜ xij − y˜ij )) 2
(exp(2(˜ xij − y˜ij )) + 1)
.
(10)
Motion based on the above equation was denoted by CD2 1 by Cohen and Dinstein [4]. Maximising the above pdf is equivalent to maximising the following objective function: EiCD2 (v i ) =
Λ
{˜ xij − y˜ij − ln (exp(2(˜ xij − y˜ij )) + 1)} .
(11)
j=1
This similarity measure is better for ultrasound images than others such as the sum-of-square differences (SSD) or normalised cross-correlation (NCC) because it takes into account the fact that the noise in an ultrasound image is multiplicative Rayleigh noise, and that displayed ultrasound images are log-compressed. However it assumes that the noise distribution in both of the blocks is the same and this assumption is not correct for ultrasound images. The attenuation of the ultrasound waves introduces inhomogeneities in the image of homogeneous tissue [15]. The time gain and the lateral gain compensations (compensating respectively for the effects that deeper tissue appears dimmer and for intensity variations across the beam) which are tissue independent and generally constant for a given location during the acquisition, do not compensate fully for the attenuation. Further, because of the large velocity dynamic of the myocardial wall in an echocardiographic sequence, the time gain and the lateral gain compensations will be different for the same tissue for different frames in the sequence. Thus in this work an intensity normalisation is conducted before calculation of the CD2 similarity measure. This is achieved by making sure that the two blocks of data have at least the same mean and variance. In more detail, the original ˜i above are transformed into new values of x ˘i and y˘i intensity values x ˜i and y by subtracting the mean and dividing by the standard deviation (square root of the variance) of the intensity values in the block. We call this the modified similarity CD2bis .
3
Calculation of Optical Flow Estimates (Singh Approach)
Having defined the similarity measure, we now address how it is used in a blockmatching framework for the estimation of the velocity field. In the previous section, we used vectorial notation, however, in this section detailed notation is needed as we have to distinguish between the two directions x and y. 1
Note eq. 10 was given in [4] using the data before log compression (i.e. xij and yij ).
590
D. Boukerroui, J.A. Noble, and M. Brady
Due to the well-known aperture problem, velocity estimation using eq. 1 can recover only the component of velocity normal to intensity edges. The full velocity estimate can be calculated by propagating information from regions or “interesting points” that do not suffer from the aperture problem, such as corners. Moreover, the displacement vector v is discretised since the search region is discrete, and hence the accuracy of the velocity field is limited by this discretisation. To obtain more reliable estimates a smoothness constraint on the velocity field with an appropriate confidence measure must be incorporated. In Singh’s approach both conservation information (a data constraint) and neighbourhood information (a smoothness constraint) are combined in a probabilistic framework based on estimates of their associated errors [13]. In the first step (conservation information), a square window Wc having a side length of 2n + 1 is formed about a central pixel (x , y) in the first frame. A square search window Ws of side length 2N + 1 is defined in the second frame around the position of the corresponding central pixel (x , y) in the second frame. The size of Ws depends on the assumed maximal displacement between two successive frames. The intensities of the block Wc of pixels in frame 1 are then compared with all possible positions of the block in the search window Ws using a similarity measure. This gives a value of Ec (u , v) for each candidate displacement (u , v). Thus in a first step based on conservation information the similarity values Ec are used in a probability mass function to calculate a response Rc whose value at each position in the search window represents the likelihood of the corresponding displacement. Singh used the SSD as a similarity measure and the following function as a probability density function: Rc (u , v) =
1 exp{−kEc (u , v)} Z
− N ≤ u,v ≤ N ,
(12)
where Z is defined such that all probabilities sum to one and the parameter k is chosen at each position such that the maximum response in Ws is close to unity (0.95 before normalisation) for computational reasons. Singh then defines a velocity estimate as being the mean of the probability mass function: Rc (u , v)u uc u v vc = , (13) = vc u v Rc (u , v)v and its associated error (called the conservation error) by: (v − v c )T S −1 c (v − v c ) ,
(14)
where, S c is the covariance matrix given by: Rc (u , v)(u − uc )2 R (u , v)(u − uc )(v − vc ) u,v c . (15) S c = u,v 2 u,v Rc (u , v)(u − uc )(v − vc ) u,v Rc (u , v)(v − vc ) Another velocity estimate may be obtained by the use of neighbourhood information. In other words, the velocity at each pixel is unlikely to be completely
Velocity Estimation in Ultrasound Images
591
independent of the velocity of its neighbours. Thus, assuming that the velocity of each pixel in a small neighbourhood Wp has been estimated, the velocity for each pixel can be refined by using the velocity of its neighbouring pixels. Clearly it is more likely that the velocities of closer neighbours are more relevant than pixels which are further away. Therefore weights are assigned to velocities calculated for the neighbouring pixels, and the weights drop with increasing distance from the central pixel (a 2-D Gaussian mask in the window Wp of size (2w + 1)(2w + 1) is used). These weights can be interpreted as a probability mass function Rn (ui , vi ) where i is an index for pixels in Wp . In the same way as for the conservation information a second velocity estimates is obtained v n = (un , vn )T with its associated covariance matrix S n . The sum of conservation (data term) and neighbourhood (regularisation term) errors represents the total squared error of the velocity estimate: T −1 2 (v) = (v − v c )T S −1 c (v − v c ) + (v − v n ) S n (v − v n ) .
(16)
The optimal velocity is that which minimises this error and can be obtained by setting the gradient of the error with respect to v to zero giving:
−1 −1 −1 v ˆ = S −1 S c v c + S −1 . (17) c + Sn n vn Because v n and S n are derived on the assumption that the velocity of each pixel of the neighbourhood is known in advance, in practice equation (17) is solved in an iterative process (Gauss-Seidel relaxation) with the initial values of the velocity at each pixel being taken from the conservation information alone: v ˆ0 = vc −1 .
(18) m+1 m −1 −1 −1 m S c v c + (S m v ˆ = S −1 v ˆn c + (S n ) n) Inspecting the energy function to be minimised (eq. 16) we notice that the parameter k plays an important role as the covariance matrix Sc is highly dependent on the value of k. Indeed, k controls the contribution of the conservation information in the total energy and thus the amount to which v c is regularised using the neighbourhood information. In [13], the value of k is estimated at every pixel position implying that the regularisation is not uniform over the whole velocity field. A second weak point of the Singh approach is in taking the expectation of the probability as the expected velocity introduces errors where the probability mass function is non-modal. A multiple modal pdf does occur mainly because of the aperture problem. However, a low signal-to-noise ratio is a second source. To avoid these two limitations, we define the probability mass function as follow: Rc (u , v) =
Ec (u , v) − m 1 exp k Z (2n + 1)2
− N ≤ u,v ≤ N ,
(19)
where Ec is the CD2bis similarity measure and m is its maximum in the search windows Ws . Notice that the maximum of Rc , before normalisation, has a value
592
D. Boukerroui, J.A. Noble, and M. Brady
of one by construction; this avoids numerical instability problems. In our formulation, the parameter k has a constant value leading to a more uniform regularisation of the velocity field. The similarity measure is normalised by the size of the correlation windows Wc . Hence the parameter k is independent of the size of Wc so that the same value of k could be used for different window sizes. This is particularly very effective when using a coarse-to-fine estimation strategy. One way to get more reliable estimates of the expected velocity when the pdf is not mono-modal is to bias the estimation towards the predominant mode. In this work we take as an estimate of the velocity, the mean of a thresholded version of the probability mass function. In other words: h h R (u , v)u R (u , v)v , (20) , vch = u v ch uhc = u v ch R (u , v) u v c u v Rc (u , v) where, Rch (u , v)
Rc (u , v) if Rc (u , v) ≥ α ; = 0 otherwise
(21)
and the threshold α is defined as follow:
α = m −h(m − m)
with h ∈ [0, 1] ,
(22)
where m and m are the maximum and the minimum of the probability mass function Rc respectively. Equation (20) is equivalent to equation (13) for h = 1 (i.e. the velocity is the mean), and is equivalent to taking as the velocity, the argument of the maximum of the probability mass function for h = 0. One can optimise the value of h, given a ground true data for a given type of images, to have the best estimation with subpixel precision. This has the advantage to overcome the lack of robustness to noise and precision of the maximum approach and solve the problem of the multimodal pdfs for the mean approach as the new estimates will be biased towards the predominant mode. Note that the covariance matrix is calculated using the whole probability around the new estimates (which is different from the mean using the whole pdf). Therefore, the error in the velocity estimates does take into account the presence of a second mode. And, the second mode information is also taken into account in the regularisation framework.
4
Parameters Optimisation and Implementation Details
In this section we briefly describe the optimisation scheme for the estimation of k and h and give the reader the important implementation details. Parameters Optimisation: k and h may be optimised using a simulated sequence where the true velocity field is known. As we noted, k and h are highly dependent both on the signal-to-noise ratio and on the velocity field to be estimated (i.e.
Velocity Estimation in Ultrasound Images
593
Fig. 1. Block diagram of the proposed parameters optimisation scheme.
Fig. 2. Illustration of the 3 frames scheme for the estimation of the velocity between frame x at time t and frame y at time t + 1. The constant velocity approach uses the frame at t + 2 (continuous arrows); the proposed approach uses the preceding frame at time t − 1 (dotted arrows).
the amount of the smoothness of the field). Therefore, we choose an optimisation scheme using the original data. Figure 1 illustrates schematically how the values of k ∈ ]0 ∞[ and h ∈ [0 1] are optimised. A velocity field is estimated using initial values of k and h, and then all the subsequent frames are registered to the first frame using the calculated cumulative optical flow field. Finally, a registration error E(k , h) is calculated using a similarity measure. The SSD similarity is used in our experiments and we defined the optimal parameter set as: ˆ = arg min E(k , h) . (kˆ , h)
(23)
(k,h)
The Powell multidimensional minimisation algorithm was used to solve the above optimisation problem. We found the method relatively robust to the choice of initial values, for example, we found initial values of h = 0.5 and k = 0.5 suitable for an ultrasound imaging sequence. Implementation Details: A common problem in optical flow estimation using matching techniques is the multi-modal response (i.e due to the aperture problem for example or a mismatching specially when the size of the search windows is large). A common way to reduce its effect is to make the assumption of a local stationary flow field (usually 3 frames). This assumption is relatively true for high frame rate data and a smooth velocity field over time. Unfortunately, in the case of contrast echocardiography, tissue Doppler and real-time 3D-imaging low frame rates are typical (20-30 Hz). We suggest an alternative approach to tackle this problem without making any assumption on the velocity field, but
594
D. Boukerroui, J.A. Noble, and M. Brady
by assuming that the observed moving tissue conserves its statistical behaviour through time (at least for 3 to 4 consecutive frames). Suppose o, x, y and z are four consecutive frames respectively at times t − 1, t, t + 1 and t + 2. Figure 2, illustrates the blocks being compared for the two approaches (the constant velocity and the proposed one). Our approach makes use of the calculated velocities between the preceding frame o and the current frame x. Given a block xi in frame x at time t, which is compared to blocks y i in the search window Ws in frame y at the time t + 1; using the previous estimated velocity, we track back the position of xi in the preceding frame at time t − 1 and we denote the corresponding block by oi . Hence, theoretically xi and oi could be seen as two independent observations of the same tissue. Thus in the new approach, intensities of each candidate block y i in the search window Ws are compared with the intensities of the block xi centred at (x, y) in the frame x at time t, and also with the corresponding block oi centred at the calculated position (x − uo , y − vo ) in frame o at time t − 1, where v o = (uo , vo )T is the displacement field of oi to xi . Finally, a coarse-to-fine strategy is used to reduce the computational load of the algorithm when the expected velocity range is large. A multiresolution implementation is used, as suggested by Singh [13] to which the reader is referred for more details.
5
Results
The registration error E(k , h) surface of one experiment conducted on the ultrasound breast data is shown in Fig.3. Three important observations can be made: 1. For h = 0, the velocity estimation is equivalent to taking the argument of the maximum of the pdf. Hence, theoretically, the parameter k does not have any influence on the result. This can easily be observed for this experiment, and it corresponds to the maximum error. In this case, the optical flow 1 70 0.9
0.8
0.7 65 0.6
h 0.5
0.4 60 0.3
0.2
0.1 55 0
2
4
6
8
1/k
10
12
14
Fig. 3. The registration error E(k , h) for a breast ultrasound sequence.
Velocity Estimation in Ultrasound Images
595
Fig. 4. Endocardial wall tracking using velocity estimation. First frame (left); last frame (middle); Area plot of the left ventricle function of time (right).
is quantified by the pixel resolution of the image, and hence the error on the image velocity is of the order of the pixel resolution. Furthermore, this approach is not robust against noise. This explains the high error on the velocity estimation. 2. For h = 1 (as in Singh approach), the velocity estimation is equivalent to taking the mean of the probability. The results are better than for h = 0, but does not correspond to the optimal value. This result can be explained by the fact that taking the mean of the probability as an estimates of the velocity is not very precise and may conduct to biased estimation if the pdf is not mono-modal or non-well-peaked pdfs. Observe as well the expected functional dependence between the two parameters (h and k). Therefore, the search for the optimal values of h and k must be done in the 2D space. 3. Inspecting the results we notice that for h = 1 (as in Singh), the optimisation of k makes an improvement of about 15% relative to the worst value for k. The optimisation of h for different values of k makes an enhancement between 20% to 35%. In the above experiment the optimal values are h = 0.660 and k = 0.237. Using these values, an enhancement of approximately 35% is achieved in comparison to result obtained using the worst parameters set. Figure 4 shows a tracking example of cardiac boundary pixels on a short axis echocardiographic sequence. In this example, the velocity estimation is done only at contour points (i.e we mean both steps: the similarity calculation as well as the regularisation) which reduces the computational burden. The area plot of two cardiac cycles (over 100 frames) demonstrates the subpixel accuracy of the method. Indeed as the errors on the velocity estimation are propagated from frame to frame, a poor estimation accuracy will result in a large errors after one cardiac cycle. Figure 5 shows a second example of motion tracking on a free hand ultrasound breast data. The figure shows four frames at regular interval of the sequence (about 300 frames). Tissue motion estimation on this type of data is very difficult. Notice for example that at the end of the sequence, there is no signal at the right hand side of the images. As in the previous example, the velocity estimation is done only at contour points. We used one frame out of five frames acquired to demonstrate the robustness of the method in the case of low frame rate data. Without using any high level processing (global motion model, more
596
D. Boukerroui, J.A. Noble, and M. Brady
Fig. 5. Breast mass tracking using velocity estimation. The images are at different times in the sequence. Here only one frame every 5 frames acquired is used. Notice the shadow on the right hand side of the last two images. 350
300
MSE
250
200
150
100
50
0
0
2
4
6
8
10
12
14
16
18
20
Frames
Fig. 6. Evolution of the Mean Square Error between the first frame (0) and the registered one to the first frame. Here a dense velocity field is calculated. Results obtained using: the NCC similarity (solid curve); the CD2bis similarity (dotted curve).
constrained contour model) the proposed block matching provides satisfactory tracking results. The result are more convincing when visualised as a movie. The last experiment presents a comparison of two similarity measures (the Normalised Cross Correlation and the CD2bis ) used within this framework for the estimation of a dense velocity field. The breast data are used in this experiment. Figure 6 shows the evolution of Mean Square tracking Error over the frames. The MSE is calculated between the registered frames and the first frame using the cumulative velocity field. The results demonstrate the superiority of the CD2bis similarity measure. All experiments are obtained with a side length of 21 pixels (≈7 mm) for the correlation window Wc and 19 pixels for the search window Ws . With the above windows sizes and for two levels for the coarse-to fine strategy, the computation time is about 3 frames per second per contour on a Pentium 1.8 GHz.
6
Conclusion
In this paper a novel combination of two existing techniques (namely a similarity measure proposed by Cohen and Dinstein and the Singh block matching
Velocity Estimation in Ultrasound Images
597
approach) is proposed. Improvements on both techniques are made leading to more effective block-matching algorithm in ultrasound sequences. A global optimisation scheme for the parameters estimation of the algorithm is proposed. It is demonstrated that this optimisation improves the result of about 35% in comparison to the worst case. In this paper, the performance of the proposed block matching approach is demonstrated on B-mode ultrasound images from two application areas. We obtained reliable boundary tracking without use of any high level constraints on the motion field. Acknowledgements. We are grateful to Dr. M. M. Parada and Dr. J. Declerck, from Mirada Solutions Ltd, for providing software used in part of this work. This work was funded by the EC project ADEQUATE (IST: 1999-10837) and the UK GPSRC funded IRC in Medical Image Analysis and Signals (MIAS).
References 1. Barron, J.L., Fleet, D.J., Beauchemin, S.S.: Performance of optical flow techniques. Int. Journal of Computer Vision, 12(1994) 43–77. 2. Bernard, C.: Discrete wavelet analysis for fast optic flow computation. Internal Report RI415, Centre de Math´ematiques Appliqu´ees , Ecole Polytechnique, France, (1999). 3. Baxley, W.A., Reeves T.J.: Abnormal regional myocardial performance in coronary artery disease. Prog Cardiovasc Dis 13(1971) 405–421. 4. Cohen, B., Dinstein, I.: New maximum likelihood motion estimation schemes for noisy ultrasound images. Pattern Recognition 35 (2002) 455–463. 5. Han, L., Burcher, M. Noble, J.A.: Non-invasive measurement of biomechanical properties of in vivo soft tissues. In Proc. MICCAI, (2002) 208–215. 6. Kerber R.E., Abboud F.M.: Echocardiographic detection of regional myocardial infraction. Circulation 47 (1973) 997–1005. 7. Lai, S.A., Vemuri, B.C.: Robust and efficient computation of optical flow. International Journal of Computer Vision 29 2 (1998) 87–105. 8. Ledesma-Carbayo, M.J., Kybic, J., Desco, M., Santos, A., Unser, M.: Cardiac motion analysis from ultrasound sequences using non-rigid registration. In Proc. Int.Conf. Medical Image Computing and Computer-Assisted Intervention, (2001) 889–896. 9. Ledesma-Carbayo, M.J., Kybic, J., S¨ uhling, M., Hunziker, P., Desco, M., Santos, A., Unser, M.: Cardiac ultrasound motion detection by elastic registration exploiting temporal coherence. In Proc. IEEE Int. Symposium on Biomedical Imaging: Washington DC, USA, (2002) 585–588. 10. Mikic, I. Krucincki, S., Thomas, J. D.: Segmentation and tracking in echocardiographic sequences: Active contours guided by optical flow estimates. IEEE Trans. Med. Imaging 17 2 (1998) 274–284. 11. Mulet-Parada, M., Noble, J. A.: 2D+T acoustic boundary detection in echocardiography. Medical Image Analysis 4 (2000) 21–30. 12. Mulet-Parada, M.: Intensity independent feature extraction and tracking in echocardiographic sequences. Phd thesis, MVL, Dept. of Eng. Science. Oxford: Oxford University, (2000).
598
D. Boukerroui, J.A. Noble, and M. Brady
13. Singh, A.: Image-flow computation: An estimation-theoretic framework and a unified perspective. CVGIP: Image understanding 65 2 (1992) 152–177. 14. Weber, J. Malik, J.: Robust computation of optical flow in a multi-scale differential framework International Journal of Computer Vision 14 1 (1995) 67–81. 15. Hughes, D. I., Duck, F. A.: Automatic attenuation compensation for ultrasonic imaging. Ultrasound in Medicine & Biology, 23 (1997) 651–664.
Construction of a Statistical Model for Cardiac Motion Analysis Using Nonrigid Image Registration Raghavendra Chandrashekara1 , Anil Rao1 , Gerardo Ivar Sanchez-Ortiz1 , Raad H. Mohiaddin2 , and Daniel Rueckert1 1
Visual Information Processing Group, Department of Computing, Imperial College of Science, Technology, and Medicine, 180 Queen’s Gate, London SW7 2BZ, UK. {rc3, ar17, giso, dr}@doc.ic.ac.uk http://vip.doc.ic.ac.uk/ 2 Cardiovascular Magnetic Resonance Unit, Royal Brompton and Harefield NHS Trust, Sydney Street, London, SW3 6NP, UK.
Abstract. In this paper we present a new technique for tracking the movement of the myocardium using a statistical model derived from the motion fields in the hearts of several healthy volunteers. To build the statistical model we tracked the motion of the myocardium in 17 volunteers using a nonrigid registration technique based on free-form deformations and mapped the motion fields obtained into a common reference coordinate system. A principal component analysis (PCA) was then performed on the motion fields to extract the major modes of variation in the fields between the successive time frames. The modes of variation obtained were then used to parametrize the free-form deformations and build our statistical model. The results of using our model to track the motion of the heart in normal volunteers are also presented.
1
Introduction and Background
Magnetic resonance (MR) imaging is unparallelled in its ability to obtain highresolution cine volume-images of the heart, and with the use of tissue tagging [25, 4, 3] detailed information about the motion and strain fields within the myocardium can be obtained. This is clinically useful since cardiovascular diseases such as ischemic heart disease affect the motion and strain properties of the heart muscle in localized regions of the myocardium. The absence of automated tools to assist clinicians with the analysis of tagged MR images has made it difficult for MR tagging to become a valuable diagnostic tool for routine clinical use. The main difficulties encountered are the loss of contrast between tags as the heart contracts and the need to estimate through-plane motion. Although many different techniques have beeen developed to track the motion of the heart in tagged MR images, including B-splines and active contour models [2, 1, 12], deformable models [16, 17], optical flow models [18, 11, 8], and harmonic phase (HARP) MR imaging [14, 15], none make use of any prior information available about the expected types of motion of the heart. Although a C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 599−610, 2003. Springer-Verlag Berlin Heidelberg 2003
600
R. Chandrashekara et al.
model based approach was used by G´erard et al. [10] to track the motion of the left ventricle in 3D echocardiographic sequences, the model was controlled by four temporal parameters related to the global contraction and rotation of the myocardium and no information about the local motion within the myocardium was used. One of the key contributions in this paper is the development of a statistical model of cardiac motion which can be used as a-priori information for motion tracking. Recently a technique has been developed for tracking the motion of the heart from tagged MR images using nonrigid image registration [6, 5]. In this approach the motion of the myocardium during systole is tracked by registering the sequence of tagged MR images acquired during the cardiac cycle to a reference image of the myocardium taken at end-diastole. Both short-axis (SA) and long-axis (LA) images of the left ventricle are used to obtain information about the complete 3D motion of the myocaridum. To make an objective comparison of the cardiac motion fields derived from different subjects, in order to build a statistical model of these motion fields, requires their aligment in a common coordinate system. For this purpose we use a technique developed by Rao et al. [19] which aligns cardiac MR images from different subjects into a common coordinate system and also transforms the motion fields from these subjects into a common coordinate system. In this paper we present a model-based technique to track the motion of the heart from tagged MR images. The motion model used during the tracking is built by peforming a principal component analysis of the motion fields obtained from a number of healthy volunteers. Our paper is organised as follow: Section 2 describes how we have built our motion model of the heart, section 3 explains the model-based motion tracking, and section 4 presents our results for normal volunteers. Finally, section 5 gives our conclusions on the work presented and suggests directions for future research.
2
Construction of a Statistical Model of Cardiac Motion
There are three main parts to the construction of our motion model. Firstly, in section 2.1, we derive the motion fields for all times between end-diastole and end-systole from the tagged MR images of a number of different subjects. Secondly, in section 2.2, the motion fields obtained are mapped into a common coordinate system so that a comparison across subjects may be performed. Finally, in section 2.3, we build our motion model by performing a principal component analysis (PCA) of the motion fields in the common coordinate sytem for all the volunteer subjects to obtain the most dominant modes of motion between any two consecutive time frames. 2.1
Myocardial Motion Analysis
The method we use to track the motion of the heart is described in [6, 5] which is an extension of the registration algorithm developed by Rueckert et al. [22] so
Construction of a Statistical Model for Cardiac Motion Analysis
601
that it may be used for cardiac motion analysis: For a single subject, S, consider a point in the myocardium at end-diastole, x"0 = (x"0 , y0" , z0" ), which moves to another point x"t = (x"t , yt" , zt" ) at time t. Our task is to find the transformation TS(0,t) (x" , y " , z " ) which maps x"0 to x"t . We achieve this by registering the sequence of images taken during systole to the image taken at end-diastole. Since the motion of the heart involves 3D twisting, contraction and shortening, we use both short- and long-axis images of the heart to derive a complete 3D motion model. A volumetric free-form deformation (FFD) is defined on a domain Ω by a mesh of nx × ny × nz control points Φ. The domain Ω corresponds to the volume of interest and includes both the short-axis image slices as well as longaxis image slices. The domain Ω defines a single coordinate system in which to perform the tracking of the LV throughout the cardiac cycle. We have chosen a coordinate system in which the x- and y-axes of the free-form deformation are aligned with the x- and y-axes of the short-axis image planes, as this is the most natural coordinate system to work in. We use a multi-level FFD [23] where the cardiac motion TS(0,t) (x" , y " , z " ) is represented as the sum of a series of local FFDs. The estimation of the motion field TS(0,t) (x" , y " , z " ) proceeds in a sequence of registration steps: Initially, we register the SA and LA images taken at time t = 1 to the the SA and LA images taken at time t = 0. We do this by optimizing a cost function which is based on the sum of the normalized mutual informations [24]. Since normalized mutual information is a statistical measure of the relationship between the intensities in two images and makes no assumption about their functional dependence, it is robust to any intensity changes which occur over time. After registering the volume images at t = 1 to t = 0 we obtain a FFD. A second level is then added to the FFD and optimized to register the images taken at time t = 2 to time t = 0. This process continues until the SA and LA images at all time frames have been registered to the SA and LA images at end-diastole. The multi-level transformation which gives the new position of a point in the myocardium at time t is given by: TS(0,t) (x" , y " , z " ) =
t−1 (
DS(i,i+1) (x" , y " , z " ) + (x" , y " , z " )
(1)
i=0
The actual motion fields are given by: DS(0,t) (x" , y " , z " ) = TS(0,t) (x" , y " , z " ) − (x" , y " , z " )
(2)
and we use the motion fields that describe the motion between two successive time frames to build our statistical motion model which are the individual levels of the free-form deformation: DS(i,i+1) (x" , y " , z " )
(3)
Using the same approach we can calculate the myocardial motion fields for all other subjects.
602
R. Chandrashekara et al.
2.2
Transformation of Myocardial Motion Fields
Typically, the length of the cardiac cycles will vary from subject to subject. Furthermore, in prospectively gated MR imaging acquisitions, the trigger delay and temporal resolution of the acquired images will often vary from acquisition to acquisition. To compensate for the temporal misalignment resulting from these factors, we have manually determined an affine temporal mapping which aligns the end-diastolic and end-systolic time points of each subject S with the corresponding time points in the reference subject R. Using this temporal mapping we can align and resample the motion fields from different subjects in a common temporal coordinate system. However, each motion field is still defined in its own intrinsic spatial coordinate system. To map the motion fields in a common spatial coordinate system we also need to calculate a mapping between the end-diastolic anatomy of subject S and the reference subject R, allowing us to map the myocardial motion fields DS(0,i) into the coordinate system of R. The transformation between subjects R and S can be obtained using a registration of the end-diastolic images of both subjects. This yields a mapping F(R,S) between coordinate systems (x, y, z) and (x" , y " , z " ): F(R,S) : (x, y, z) $−→ (x" (x, y, z), y " (x, y, z), z " (x, y, z))
(4)
We are now in a position to transform the motion fields DS(0,i) (x" , y " , z " ) into the coordinate system of R, (x, y, z). If the motion vector at a point with positional coordinate x"0 = (x"0 , y0" , z0" ) in the coordinate system of S is equal to dS , this ˜ S at the location x0 = (x0 , y0 , z0 ) in the coordinate will transform to a vector d system of R, where " (5) x0 = F−1 (R,S) (x0 ) S
˜ , consider a path L : x(θ), θ ∈ [0, 1] defined in the In order to determine d coordinate system of R that represents the transformed motion vector, i.e., x(0) = x0 ,
S " x(1) = F−1 (R,S) (x0 + d )
By the fundamental theorem of calculus ' 1 dx(θ) ˜ S = x(1) − x(0) = dθ d dθ 0 ' 1 dx" (θ) dθ = J −1 (x(θ)) dθ 0
(6)
(7) (8)
where x" (θ) is the path L defined in the coordinate system of S representing the untransformed motion vector dS and J(x(θ)) is the Jacobian matrix of the transformation F(R,S) evaluated along x(θ): ∂x! ∂x! ∂x! ) ) ) ∂x ∂y ∂z ∂y! ∂y! ∂y! ) J = ∂x ∂y ∂z ) (9) ) ∂z ! ∂z ! ∂z ! ) ∂x
∂y
∂z
x=x(θ)
Construction of a Statistical Model for Cardiac Motion Analysis
603
which can be determined analytically. This integral can then be approximated by dividing the interval [0, 1] into n subintervals of length δθ, ˜S & d
n−1 (
J −1 (x(k) )(x"(k+1) − x"(k) )
(10)
k=0 "(k) ). where x"(k) = x"0 + kδθdS and x(k) = F−1 (R,S) (x
2.3
Principal Component Analysis of Myocardial Motion Fields
The key idea of statistical shape models [7] is to build a model of a particular class of shapes given a set of examples of this shape. In general these n shapes are represented in form of vectors, X1 , · · · , Xn , and characterise the group of shapes under investigation. Each shape vector Xi consists of a concatenation of m landmarks, Xi = (xT1 , · · · , xTm )T , describing the contour or surface of the anatomical structure of interest. The concept of statistical deformation models (SDMs) [13, 9] is closely related to the idea of statistical shape models, however the key difference is that the principal component analysis (PCA) is used to analyse motion fields rather than shape landmarks. This concept has been successfully applied for modelling of anatomical variability of neurological structures across a population of subjects. In our application we apply a PCA directly to the free-form deformations describing the cardiac motion fields [21, 20]: Suppose that we have n free-form deformations described as control point vectors C1 , · · · , Cn . For each subject, the vector Ci corresponds to a concatenation of mx × my × mz 3-D control points, Ci = (c1 , · · · , cm ), describing the motion of the myocardium between two particular time frames. The goal of SDMs is to approximate the distribution of C using a parameterised linear model of the form ˆ + Φb C=C
(11)
ˆ is the average control point vector (or average motion field) for all n where C subjects, n ( ˆ = 1 Ci (12) C n i=1 and b is the model parameter vector. The columns of the matrix Φ are formed by the principal components of the covariance matrix S: n
S=
1 ( ˆ ˆ T (Ci − C)(C i − C) n − 1 i=1
(13)
From this, we can calculate the principal modes of variation of the control points (or the associated free-form deformation) as the eigenvectors φi and corresponding eigenvalues λi (sorted so that λi ≥ λi+1 ) of S. If Φ contains the t < min{m, n} eigenvectors corresponding to the largest non-zero eigenvalues,
604
R. Chandrashekara et al.
we can approximate any motion field within the population group under investigation using eq. (11) where Φ = (φ1 |φ2 | · · · |φt ) and b is a t dimensional vector given by b = ΦT (x − x ˆ). The vector b defines the parameters of the statistical motion model. By varying these parameters we can generate different instances of a free-form deformation which describes the class of motion fields under analysis using eq. (11). The motion of the myocardium throughout the cardiac cycle of each subject Si is represented by a sequence of motion fields. Since the image sequences have all been spatially and temporally aligned, the transformed motion fields of each subject Si t−1 ( ˜ Si (x, y, z) = ˜ Si D D (14) (0,t) (i,i+1) (x, y, z) i=0
in the coordinate system of the reference subject, R = S0 , will consist of the same number of motion fields. Thus, there are two different ways in which we ˜ Si can perform a statistical analysis of the motion fields D (i,i+1) . We can either treat the motion fields between any two time frames separately and perform a separate PCA of the motion fields for each time interval. Alternatively, we can pool all motion fields for all subjects and between all times and perform only a single PCA to build a statistical model of cardiac motion. In the following we will discuss both methods. Building separate statistical motion models for each phase of the cardiac cycle.Between any two times we have a set of motion fields describing the motion of the heart between those two times. For example between t = 0 and ˜ Si (x, y, z) for the different subjects: t = 1 these are given by the first levels of D (0,t) ˜ S0 , D ˜ Sn ˜ S1 , . . . , D D (0,1) (0,1) (0,1)
(15)
These motion fields are described by a set of control point vectors for each subject: 0 1 n CS(0,1) , CS(0,1) , . . . , CS(0,1) (16) We perform a PCA on these control point vectors to obtain the major modes of variation in the motion of the heart between time frames 0 and 1 (i.e. at the beginning of the contraction). These are described by a set of eigenvectors, Φ(0,1) and eigenvalues, Λ(0,1) . Similarly we can perform a PCA on the control point vectors describing the motion of the heart between all other successive time frames. This yields t sets of eigenvectors and eigenvalues describing the major modes of variation between those successive time frames. The eigenvectors and values obtained: Φ(i,i+1) , Λ(i,i+1) where i ∈ {0, 1, . . . t − 1}
(17)
are then used to parameterise our statistical motion model. The advantage of this method is that by restricting the PCA to those motion fields at a particular
Construction of a Statistical Model for Cardiac Motion Analysis
605
time instant the tracking will use only those variations in motion specific to the time instant in question. However, to justify this approach we would have to temporally align the subject we wish to track with the motion model beforehand. Of course, we used a manual temporal alignment in order to build the model but we cannot be certain that our affine temporal model will be sufficient to accurately align the different cardiac cycles of each subject. Building a single statistical motion model for the entire cardiac cycle. ˜ Si Pooling the cardiac motion fields, D (i,i+1) , for all subjects and between all time frames and performing a PCA on these motion fields yields a single set of eigenvectors, Φ, and eigenvalues, Λ for all time frames. Our model now consists of a multi-level free-form motion with t levels but with each level being parametrised by the single set of eigenvectors Φ. The model can be used to track the motion of the heart by registering the sequence of SA and LA images taken during systole to the SA and LA images taken at end-diastole by optimising the model parameter vectors b at each level. The advantage of this method is that it does not require a temporal alignment of the cardiac cycle of the subject whose heart motion is being tracked, but the disadvantage of this method is that it does not take into account the variation in the motion of the heart over time. Figure 1 shows the first and second modes of motion for three different slices of the heart. The first row corresponds to the base of the heart, the second row corresponds to the mid-ventricle, and the third row corresponds the apex of the heart.
3
Model-Based Nonrigid Registration for Cardiac Motion Tracking
To use the statistical motion model for tracking the motion in a particular reference subject R we need to construct the statistical motion model in the coordinate system of that subject. In our current implementation we are using an affine transformation to model the mapping F(R,S) . As a result of this J −1 (x(k) ) will be constant and the expression in eq. (10) reduces to ˜ S & J −1 (dS ) d
(18)
Applying this technique to each of the motion fields of the subject gives us a ˜ S (x, y, z) from which we calculate the staset of transformed motion fields D (0,i) tistical motion model as described in the previous section. Using this statistical motion model we can reparameterize the free-form deformation model used for the motion tracking in section 2.1 via modes of variation learned from the motion fields: DS(0,t) (x" , y " , z " )
=
3 ( 3 3 ( (
ˆ + Φb)i+l,j+m,k+n Bl (u)Bm (v)Bn (w)(C
(19)
l=0 m=0 n=0
where the Bi are the cubic B-spline functions. Here, the control points of the FFD are represented as a linear combination of the principal modes of variation.
606
R. Chandrashekara et al.
(a) First Mode
(b) Second Mode
Fig. 1. First and second modes of variation for base, mid-ventricular, and apical SA slices.
Rather than optimising the location of the control points, one can optimise the parameter vector b which controls the modes of the FFD but provides a much smaller number of degrees of freedom than the number of control points. This can significantly reduce the number of degrees of freedom for the motion tracking and the associated computational complexity, while constraining the registration to statistically likely types of motion.
4
Results
Tagged MR data from 17 healthy volunteers was acquired with a Siemens Sonata 1.5 T scanner and a Philips Gyroscan Intera 1.5 T scanner consisting of series of SA and LA slices. A cine breath-hold sequence with a SPAMM tag pattern was used with imaging being done at end expiration. The image voxel sizes were typically 1.40 × 1.40 × 7 mm, with the distance between slices being 10 mm. We tracked the movement of the myocardium in all the volunteers and transformed the motion fields obtained into a common coordinate system as described in sections 2.1 and 2.2. The motion tracking algorithm [6] has been evaluated on a
Construction of a Statistical Model for Cardiac Motion Analysis
(a) Model A
607
(b) Model B
Fig. 2. Variation of the r.m.s. error between the estimated and true displacements over time for the two different motion models.
number of subjects and the error in tracking has been found to be below 2 mm for most parts of the cardiac cycle. Two separate statistical models of the motion of the heart were then built. In the first model (section 2.3), each level in the multi-level free form transformation used for tracking the heart was parametrised by a separate set of eigenvectors describing the major modes of variation in the motion of the heart at the corresponding time frame. We refer to this as motion model A. In the second model (section 2.3), each level in the multi-level free form transformation used for tracking the heart was parametrised by a single set of eigenvectors describing the major modes of variation in the motion of the heart over the entire cardiac cycle. We refer to this as motion model B. To validate the quality of the statistical motion models, we have used model A and B to track the motion of the heart in 8 healthy volunteers for all time frames between end-diastole and end-systole. For this purpose we have constructed both motion models without using the motion fields of the volunteer hearts which were tracked. To compare how well the motion tracking had been performed using the two models an observer manually tracked the motion of tag intersection points in a mid-ventricular SA slice for the 8 volunteers. The estimated displacements of the tag intersection points obtained from our statistical motion models were then compared with the true displacements measured by the human observer. The results of the motion tracking for models A and B are shown in figures 2 (a) and 2 (b) respectively for the different volunteers. Here we have plotted the r.m.s. error in the displacement of tag intersection points as a function of time. The r.m.s. tracking error was below 2 mm for the cardiac cycle between end-diastole and end-systole for most of the volunteers. As can be seen in the figure both models have performed very well in the motion tracking. Figures 3 and 4 show the performance of the motion tracking visually for a volunteer using model A in a short- and long-axis slice. Here a virtual tag grid has been aligned with the tag pattern in the images at end-diastole and has been
608
R. Chandrashekara et al.
Fig. 3. Virtual tag grid showing motion tracking for mid-ventricular short-axis slice in volunteer 1 using model A.
Fig. 4. Virtual tag grid showing motion tracking for long-axis slice in volunteer 1 using model A.
deformed according to the motion fields derived by the nonrigid registration. As can be seen the tracking as been performed very well since the virtual tag grid follows the actual tag pattern in the images.
5
Discussions and Conclusions
In this paper we have introduced a new technique for tracking the movement of the myocardium using a statistical model derived from the motion fields in the hearts of several healthy volunteers. To build the statistical model we have tracked the motion of the myocardium in tagged MR images of 17 volunteers using a nonrigid registration technique based on free-form deformations and mapped the motion fields obtained into a common reference coordinate system. A principal component analysis (PCA) was then performed on the motion fields to extract the major modes of variation in the fields between the successive time frames. The modes of variation obtained were then used to parametrise the freeform deformations using a set of modes of variation learned from a statistical model. This can significantly reduce the number of degrees of freedom for the motion tracking problem and the associated computational complexity, while constraining the motion tracking to statistically likely types of motion. The approach presented in this paper uses cardiac motion fields derived from a group of normal subjects. While the results have shown that the statistics of the motion fields of normal subjects can be well approximated by such a statistical
Construction of a Statistical Model for Cardiac Motion Analysis
609
model motion, a potential problem arises from the fact that such a motion model may not be appropriate for tracking motion in subjects with localised abnormal motion patterns. We are currently applying our motion tracking algorithm to a group of patients with localised myocardial infarction to test the applicability of the proposed method to these subjects. It should also be noted that it is possible to recover localised abnormal motion patterns by using the proposed statistical motion model-based non-rigid registration followed by the traditional non-rigid registration in which control points can move freely. This would allow the recovery of motion patterns which cannot be represented by the statistical model. Finally, our current work is focusing on the use of the statistical motion model for the comparison of cardiac motion differences between groups of subjects. Here, the key idea is to use the parameters of the statistical motion model for a classification of cardiac motion across different groups, i.e. normal subjects and subjects with hypertrophic heart disease. We are investigating the use of linear discriminant analysis to isolate the main modes of variation of cardiac motion: (a) changes due to differences in the motion pattern across subjects (inter-subject variation) and (b) changes due to the heart being in various stages of the contraction throughout the cardiac cycle (intra-subject variation).
References 1. A. A. Amini, Y. Chen, R. W. Curwen, V. Mani, and J. Sun. Coupled B-snake grids and constrained thin-plate splines for analysis of 2D tissue deformations from tagged MRI. IEEE Transactions on Medical Imaging, 17(3):344–356, June 1998. 2. A. A. Amini, R. W. Curwen, and J. C. Gore. Snakes and splines for tracking nonrigid heart motion. In Bernard Buxton and Roberto Cipolla, editors, Proceedings of the Fourth European Conference on Computer Vision, volume 1065 of Lecture Notes in Computer Science, pages 251–261, Cambridge, UK, April 1996. Springer. 3. L. Axel and L. Dougherty. Heart wall motion: Improved method of spatial modulation of magnetization for MR imaging. Radiology, 172(2):349–360, 1989. 4. L. Axel and L. Dougherty. MR imaging of motion with spatial modulation of magnetization. Radiology, 171(3):841–845, 1989. 5. R. Chandrashekara, R. H. Mohiaddin, and D. Rueckert. Analysis of 3D myocardial motion in tagged MR images using nonrigid image registration. IEEE Transactions on Medical Imaging, Submitted. 6. R. Chandrashekara, R. H. Mohiaddin, and D. Rueckert. Analysis of myocardial motion in tagged MR images using nonrigid image registration. In Proceedings of the SPIE International Symposium on Medical Imaging, pages 1168–1179, San Diego, California USA, 24–28 February 2002. SPIE. 7. T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Active shape models - their training and application. Computer Vision and Image Understanding, 61(1):38–59, 1995. 8. L. Dougherty, J. C. Asmuth, A. S. Blom, L. Axel, and R. Kumar. Validation of an optical flow method for tag displacement estimation. IEEE Transactions on Medical Imaging, 18(4):359–363, April 1999. 9. J. C. Gee and R. K. Bajcsy. Elastic matching: Continuum mechanical and probabilistic analysis. In A. W. Toga, editor, Brain Warping, pages 183–197. Academic Press, 1999.
610
R. Chandrashekara et al.
10. O. G´erard, A. C. Billon, J. Rouet, M. Jacob, M. Fradkin, and C. Allouche. Efficient model-based quantification of left ventricular function in 3-D echocardiography. IEEE Transactions on Medical Imaging, 21(9), September 2002. 11. S. N. Gupta and J. L. Prince. On variable brightness optical flow for tagged MRI. In Information Processing in Medical Imaging, pages 323–334, June 1995. 12. J. Huang, D. Abendschein, V. G. D´ avila-Rom´ an, and A. A. Amini. Spatio-temporal tracking of myocardial deformations with a 4-D B-spline model from tagged MRI. IEEE Transactions on Medical Imaging, 18(10):957–972, October 1999. 13. S. C. Joshi. Large Deformation Diffeomorphisms and Gaussian Random Fields for Statisitical Characterization of Brain Sub-Manifolds. PhD thesis, Washington University, 1998. 14. N. F. Osman, W. S. Kerwin, E. R. McVeigh, and J. L. Prince. Cardiac motion tracking using cine harmonic phase (HARP) magnetic resonance imaging. Magnetic Resonance in Medicine, 42:1048–1060, 1999. 15. N. F. Osman, E. R. McVeigh, and J. L. Prince. Imaging heart motion using harmonic phase MRI. IEEE Transactions on Medical Imaging, 19(3):186–202, March 2000. 16. J. Park, D. Metaxas, and L. Axel. Analysis of left ventricular wall motion based on volumetric deformable models and MRI-SPAMM. Medical Image Analysis, 1(1):53–71, 1996. 17. J. Park, D. Metaxas, A. A. Young, and L. Axel. Deformable models with parameter functions for cardiac motion analysis from tagged MRI data. IEEE Transactions on Medical Imaging, 15(3):278–289, June 1996. 18. J. L. Prince and E. R. McVeigh. Motion estimation from tagged MR images. IEEE Transactions on Medical Imaging, 11(2):238–249, June 1992. 19. A. Rao, G. I. Sanchez-Ortiz, R. Chandrashekara, M. Lorenzo-Valdes, R. Mohiaddin, and D. Rueckert. Comparison of cardiac motion across subjects using nonrigid registration. In MICCAI 2002, pages 722–729. Springer, 2002. 20. D. Rueckert, A. F. Frangi, and J. A. Schanbel. Automatic construction of 3D statistical deformation models of the brain using non-rigid registration. IEEE Transactions on Medical Imaging, In Press. 21. D. Rueckert, A. F. Frangi, and J. A. Schnabel. Automatic construction of 3D statistical deformation models using non-rigid registration. In Wiro J. Niessen and Max A. Viergever, editors, Proceedings of the Fourth International Conference on Medical Image Computing and Computer Assisted Intervention, pages 77–84, Utrecht, The Netherlands, October 2001. Springer. 22. D. Rueckert, L. I. Sonoda, C. Hayes, D. L. G. Hill, M. O. Leach, and D. J. Hawkes. Nonrigid registration using free-form deformations: Application to breast MR images. IEEE Transactions on Medical Imaging, 18(8):712–721, August 1999. 23. J. A. Schnabel, D. Rueckert, M. Quist, J. M. Blackall, A. D. Castellano-Smith, T. Hartkens, G. P. Penney, W. A. Hall, H. Liu, C. L. Truwit, F. A. Gerritsen, D. L. G. Hill, and D. J. Hawkes. A generic framework for non-rigid registration based on non-uniform multi-level free-form deformations. In Wiro J. Niessen and Max A. Viergever, editors, Proceedings of the Fourth International Conference on Medical Image Computing and Computer Assisted Intervention, pages 573–581, Utrecht, The Netherlands, October 2001. Springer. 24. C. Studholme, D. L. G. Hill, and D. J. Hawkes. An overlap invariant entropy measure of 3D medical image alignment. Pattern Recognition, 32(1):71–86, 1998. 25. E. A. Zerhouni, D. M. Parish, W. J. Rogers, A. Yang, and E. P. Shapiro. Human heart: Tagging with MR imaging — a method for noninvasive assessment of myocardial motion. Radiology, 169(1):59–63, 1988.
Fast Tracking of Cardiac Motion Using 3D-HARP Li Pan1, Joao A.C. Lima2, and Nael F. Osman3 1
Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, MD, USA [email protected] 2 Division of Cardiology, Johns Hopkins School of Medicine, Baltimore, MD, USA 3 Department of Radiology, Johns Hopkins School of Medicine, Baltimore, MD, USA
Abstract. MR tagging is considered as a valuable technique to evaluate regional myocardial function quantitatively and noninvasively, however the cumbersome and time-consuming post-processing procedures for cardiac motion tracking still hinder its application to routine clinical examination. We present a fast and semiautomatic method for tracking 3D cardiac motion from shortaxis (SA) and long-axis (LA) tagged MRI images. The technique, called 3DHARP (HARmonic Phase), is based on the HARP1 method and extends this method to track 3D motion. A material mesh model is built to represent a collection of material points inside the left ventricle (LV) wall. The phase timeinvariance property of material points is used to track the mesh points. For a series of 9 timeframe MRI images, the total time required for initializing settings, building the mesh, and tracking 3D cardiac motion is approximately 10 minutes. Further analysis of Langrangian strain and twist angle demonstrates that during systole, the lateral LV wall shows a greater strain values than the septum and the SA slices from the base to the apex show a gradual change in twist pattern.
1
Introduction
Although ventricular mass, volume, and ejection fraction (EF) are considered the standard for evaluating cardiac global function, extensive research has shown that regional function measures, such as wall-thickening, strain, and torsion, may be earlier sub-clinical markers to examine LV dysfunction and myocardial diseases. For example, literature shows that, in patients who develop LV dilation or hypertrophy, regional function may be depressed but global function is still preserved.[1] To evaluate regional myocardial function, quantification of the deformation of regional myocardial tissue is invaluable. Historically, markers have been implanted into the myocardium and the motion of these markers was subsequently traced by imaging. However, this method is limited by its invasive nature and its tendency to influence the regional motion pattern of the wall muscle. MR tagging was proposed by Zerhouni et al. in 1988[2] and is considered an important breakthrough in the field of 1
Nael F. Osman is a founder of and owns stock in Diagnosoft, Inc., a company that seeks to license the HARP technology. The terms of this arrangement have been disclosed to the Johns Hopkins University in accordance with its conflict of interest policies.
C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 611–622, 2003. © Springer-Verlag Berlin Heidelberg 2003
612
L. Pan, J.A.C. Lima, and N.F. Osman
cardiac MRI. MR tagging provides noninvasive markers inside the myocardium and these markers can deform with the motion of the myocardium. The analysis of the motion of tags made it possible to quantitatively evaluate regional myocardial function noninvasively. Since then, much effort has been made to develop different techniques for tagged image analysis, in order to eventually bring MR tagging into routine clinical examination. In general, tagged MR images are analyzed in two steps. The first step requires the identification and tracking of tags in the MR images, which involves the segmentation of the LV wall by identifying its outer and inner contours. The second step involves the reconstruction of 3D cardiac motion and the computation of motion-related variables, such as strain, displacement field, and torsion.[3] Unfortunately, the first step in most cases is very time-consuming and relies heavily on user interaction to guarantee proper segmentation and tag tracking. For example, Guttman et al[4] proposed a “template matching” method to detect tags and an active contour method or “Snake” to extract contours. A software package called Findtags[5] was developed, but requires several hours to finish the processing of one dataset. Denney[6] used an ML/MAP (maximum-likelihood/maximum a posteriori ) method to estimate and detect the tags without user-defined contours and thus compute the LV strain. The total processing time was still around an hour. To reconstruct 3D cardiac motion, many 3D-model-based approaches have been developed.[7,8] Huang et al[9] proposed a 4-D B-Spline model to perform spatio-temporal motion tracking. Young[10] introduced the concept of “model tags” and used a finite element (FE) model-based method that could reconstruct 3D heart wall motion without the prior identification of ventricular boundaries and tag locations. Kerwin and Prince[11] used a deformation model to track the intersection points of tag surface or “MR markers”. However, in most cases, the embedded computation complexity of these models hinders processing time. A recent technique, called the harmonic phase (HARP) method, has been developed by Osman et al.[12,13] Rather than working on the intensity image, HARP concentrates on the Fourier transform of the tagged image. It has been noticed that the change in tag patterns due to myocardial deformation in the spatial domain can produce corresponding phase shifts and redistribution of the peak energy in the Fourier domain. By using bandpass filters to isolate some spectral peaks, a complex image is produced. The phase of this image is called the harmonic phase and it is directly related to the motion of the tag lines. The advantage of the HARP method is that the local information of the phase is obtained by simple filtering without the need for segmentation in order to isolate the heart. Hence, the computation of motion from the phase is direct and rapid, without prior segmentation that would require human interaction—the main source of delay. The HARP method developed to date is restricted to track the 2D motion on an image plane. This is certainly a limitation considering the actual 3D motion of the heart. Based on the harmonic phase, Haber and Westin[14] proposed a phase-driven FE model that can be deformed by phase-based displacement estimates. However, this method does not provide a mechanism by which to determine whether the deformation is correct. In this paper, we will present a method that extends the HARP method to 3D while maintaining the primary gains of HARP, which include the tremendous reduction in processing time and the minimization of human interaction. A phase time-invariance condition is used to provide a final target to guarantee a correct deformation.
Fast Tracking of Cardiac Motion Using 3D-HARP
2
Theory
2.1
Basic Ideas of HARP
613
HARP analysis concentrates on the Fourier transform of tagged MR images. Fig. 1 shows a horizontally tagged MR image (a) and its Fourier transform (b). One DC peak, shown at the center, and two harmonic spectral peaks can be seen in (b). A bandpass filter, shown as a circle, is applied to isolate one spectral peak. The inverse Fourier transform of the isolated region produces a complex image, which is given by: I k (y, t ) = D k (y, t )e jφ k ( y ,t ) ,
(1)
where Dk is the harmonic magnitude image and φk is the harmonic phase (HARP) image. The harmonic magnitude image reflects the geometrical change of the heart and the image intensity change caused by fading, while the HARP image reflects the motion of the myocardium in the direction orthogonal to the tags. In reality, only a “wrapped” value of harmonic phase in the range of [− π ,+π ) can be obtained, which is shown as: ak (y, t )= W (φk (y, t )) ,
(2)
where W is a nonlinear wrapping function and ak is the HARP angle image. Note that the HARP angle image, ak , is the wrapped HARP image. The harmonic magnitude image and the HARP angle image are shown as (c) and (d), respectively, in Fig.1. There are three main properties of the HARP image. First, the locations of the wrapping transitions on the HARP angle image are closely coincident with the locations of the tag lines, both reflecting myocardial motion orthogonal to the tags. Second, the phase values on the HARP image are relatively dense, which carries more information than merely the locations of tags lines. Finally, HARP values are timeinvariant, i.e., the phase value of a material point is fixed from the tagging instant and does not change as long as the tags are not significantly faded. It is the last property that is primarily used for tracking the motion of the tags. 2.2
Phase Time-Invariance and Motion Tracking
The basic concept of HARP tracking is that the phase value of any material point is time-invariant. Hence, the amount of the motion of local tag patterns in the spatial domain will produce a proportional phase shift on the HARP image at the same location. Therefore, the HARP angle of each point on the HARP image represents a
614
L. Pan, J.A.C. Lima, and N.F. Osman
Fig. 1. a. An MR image with horizontal tags. b. The magnitude of the Fourier transform of the tagged MR image. The circle indicates a bandpass filter, which is applied to extract the spectral peak. c. Harmonic magnitude image. d. HARP angle image
material property of tagged tissue located at that point. The mathematical description is as follows: considering a material point is located at y n at time t n , if y n +1 is the position of this point at time t n +1 , then the phase time-invariance condition is
φ ( y n +1 , t n +1 ) = φ ( y n , t n ) .
(3)
In the case of 2D HARP, two tag orientations, and hence two HARP images, are required to track 2D motion. The concept of phase time-invariance has been implemented for 2D cardiac motion tracking, in which case, y is the 2D apparent position of the point.[12] In this paper, we will use the same concept but extend it to track 3D motion. In the case of 3D-HARP, the mathematical description in Eq. (3) is the same, but rather than the apparent 2D position, y is the actual 3D position of the material point. However, the extension to 3D is not straightforward, and some issues need to be addressed. First, three tag directions are required to track 3D motion, rather than only two tag orientations in 2D HARP. Second, the heart is sparsely sampled by SA and LA image planes, which means that the phase values can only be obtained on image planes, not at any arbitrary point in the volume of the heart. Finally, there are usually partial sets of tags defined on the image planes. The SA images with two orthogonal tagging directions can generate a 2D HARP image, and thus provide information about the 2D motion in the SA planes. For the third dimension, another set of LA images is acquired with one transverse tagging direction to describe the motion along the longitudinal direction. Although it seems that all three dimensional motion information can be obtained with this method, a problem still exists. Since the SA and LA image planes are sparse, only those intersection points both on the SA image planes and the LA image planes could possibly obtain 3D motion information from both image planes. Those points on only the SA image plane will lack the motion information orthogonal to the SA image plane, and those on only the LA image plane will lack even the two-dimensional motion information from the SA image plane. To solve these two problems, a material mesh model was used.
Fast Tracking of Cardiac Motion Using 3D-HARP
2.3
615
The Material Mesh Model
Basically, a material mesh is a collection of material points inside the myocardium that deforms with the motion of the heart. Mathematically, a mesh at timeframe n can be written as M n , which is a set of the locations of material points at that timeframe. The material mesh has two properties. First, for every material point on the mesh, i.e., m ∈ M n , the phase value is set by the tagging sequence and will keep the same value through all timeframes (phase time-invariance). Therefore, the phase values of a material point follow
ψ (m, t n ) = ψ (m, t0 ) ,
(4)
for all timeframes, t n , where t 0 is the reference time when the tags were imposed. ψ (m, t 0 ) is considered as the initial phase. Notice that the phase values, ψ (m, t n ) , of a material point are a 3D vector representing the phase in three linearly independent directions. The phase value, therefore, is a material property of the tissue as long as the tags do not fade drastically. The second property of the mesh is that it has certain mechanical properties, which are described as smoothness and elasticity. This means that the material points on a mesh are linked together by a number of constraints based on certain mechanical properties. Similar to the 2D HARP method, phase time-invariance is used to track motion. However, in the case of 3D, we have only a limited number of observations of the phase during motion. Mainly, the observations consist of the HARP images obtained from the short- and long-axis image planes. The condition, therefore, can be applied only to the intersection points of the mesh and the image planes. Mathematically, let m ∈ M n be a material point on the mesh, which is also located on an image plane (could be LA or SA) at the timeframe n . The material point, therefore, will have a location y on the local coordinate system of the image plane. The phase timeinvariance condition then indicates that
ψ (m, t n ) = φ ( y, tn ) ,
(5)
where ψ (m, t n ) is the phase value of the material point on the mesh and φ (y , t n ) is the observed phase value from the image plane at timeframe, n . Combining equation (4) and (5), we get
ψ (m, t0 ) = φ ( y, t n ),
(6)
which shows that for the intersection point, the initial phase value of the material point on the mesh should be the same as the observed phase value obtained from the image planes at timeframe, n . Our goal is to determine the mesh shape at different timeframes so that the phase time-invariance condition described is always satisfied. The material mesh is built at the reference time when the tagging sequence is applied and the material properties of the mesh, specifically the harmonic phase values, are also initialized at that time. This is mainly because, at the reference time, the tag patterns in the 3D volume are known and the phase values could be obtained for any material point on the mesh.
616
L. Pan, J.A.C. Lima, and N.F. Osman
However, when tags deform at later timeframes, including even the first timeframe, the tag patterns become unknown. Thus, the corresponding HARP values are unknown except on the image planes. It is expected then that the material mesh would not satisfy the phase time-invariance condition and a discrepancy between φ (y , t n ) and ψ (m, t n ) will exist. 2D HARP will be used to find the apparent 2D motion of this intersection point on the image planes (SA and LA) and form a displacement vector. This displacement vector is considered as an impulse force applied on this intersection point. Thus, the phase differences between the initial phase and the observed phase of all the intersection points will generate a displacement field of forces to deform the mesh. In other words, we expect then to have an impulse force at every intersection point that will deform the mesh. The internal mechanical properties of the mesh are then used to determine the new deformation of the material mesh. The phase time-invariance condition is used to guarantee the accuracy of the final deformation, which provides a unique distinction from what Haber and Westin[14] proposed. Since most likely the condition will not be satisfied immediately, the previous steps of finding forces on the intersection points and deforming the entire mesh are iterated until the phase time-invariance condition is satisfied on all the intersection points. Then, we can proceed to the next timeframe.
3
Methods
3.1
The Algorithm
From what we described in the previous section, a flow chart can be drawn to show the underlying algorithm, as shown in Fig. 2. The algorithm consists of the following steps, described below. 3.2
Initializing the Material Mesh at the Reference Time
As stated before, the material mesh is a collection of material points and is built inside the left ventricle wall at the reference time. Since images at the reference time are not available, we have chosen to build the material mesh at the first timeframe. The heart is assumed to have little motion between the reference time and the first timeframe so that the mesh built at the first timeframe is guaranteed to be inside the myocardium as well at the reference time. Thus, this mesh can be considered the initial mesh at the reference time. We designed the mesh to resemble a half elliptical sphere whose phase values only exist on latitudinal circles and longitudinal lines (see Fig. 3). The latitude circles are located on SA image planes and the longitude lines are located on LA image planes. The pole point is where the apex is. The mesh is built in three steps. First, the markers of base plane and apex point are set. Second, longitudinal lines are drawn manually on LA images. Finally, the latitudinal circles are obtained by interpolating the intersection points of longitude lines and SA planes.
Fast Tracking of Cardiac Motion Using 3D-HARP
617
%XLOGD'PDWHULDOPHVK DWWKHUHIHUHQFHWLPH 8VH'+$53WR ILQGWKHLQSODQHPRWLRQ 'HIRUPWKHPHVKE\WKHGLVSODFHPHQW ILHOGDQGWKHLQWHUQDOIRUFHV )LQGWKHLQWHUVHFWLRQSRLQWVRI WKHGHIRUPHGPHVKDQGLPDJHSODQHV 1R
&KHFNSKDVHWLPHLQYDULDQFHFRQGLWLRQ &RQYHUJH"
Fig. 2. Flow chart of the algorithm of the material mesh model
M S1
S2
S3
L3 L2 L1
Fig. 3. A material mesh built at the reference time. Si is an SA image plane and Lj is an LA image plane. M is the material mesh
In order to determine the initial phases of the material points, the tag patterns or the initial harmonic phase patterns applied at the reference time must be estimated. Since the tags are exactly rectilinear when they are applied (and before deformation), it is possible then to extend the phase values estimated from the phase images into a 3D volume. However, the estimation cannot be performed from the myocardial tissue because, between the reference time when the tagging is applied and the time when the first timeframe image is acquired, there is a short delay; thus, the tag patterns in-
618
L. Pan, J.A.C. Lima, and N.F. Osman
side the myocardium will have changed due to small motion. Therefore, we used the phase of the static tissue to estimate the 3D phase pattern at the reference time, since static tissue will produce almost no motion between these two time points. 3.3
Determining the In-plane Motion of Intersection Points Using 2D HARP
Let us consider that the mesh has deformed to a certain shape, M n , at timeframe n . We are interested then in finding the shape of the mesh, M n+1 , at the next timeframe,
n + 1 . First, we assume the initial shape of the mesh, M n<+01> = M n , at timeframe n + 1 , i.e. , to be the same as that of timeframe n . This mesh, M n<+01> , will then intersect with the image planes at timeframe n + 1 : The longitude lines will intersect with SA image planes and the latitude circles will intersect with LA image planes. The intersection points are determined by finding the zero-crossing point of a distance function. Specifically, to find the points on a latitude circle that intersects with an LA image plane, we compute the distance of all the mesh points to the image plane. A distance vs. point index function is formed, and the zero-crossing point of this curve is where the intersection point is. Notice that the latitude circle will intersect with the LA image plane twice, thus producing two zero-crossing points, while the longitude line will intersect with the SA image plane only once, thus producing only one zero-crossing point. As shown in Eq. (6), the phase time-invariance condition should be satisfied at all intersection points. The initial phase value, ψ (m, t0 ) , of the material point can be obtained from the location of the point on the mesh at the reference time. Since the mesh is discretized into a collection of mesh points, it is possible that the intersection point is located between two adjacent mesh points. Assuming that the adjacent two mesh points are close enough, linear mapping is used to locate the relative position of the intersection point on the initial mesh, thus obtaining the initial phase value of this point. On the other hand, the observed phase value from the image planes can be obtained from the corresponding HARP image. It should be noted that a 2D phase value ( φ x and φ y ) could be obtained for the intersection point of the longitude line and the SA image plane, whereas only a 1D phase value ( φ z ) can be obtained for the intersection point of the latitude circle and the LA image plane. The difference between the two phase values, ψ and φ , is used to find the in-plane apparent motion by 2D HARP. For the intersection point on SA planes, we can use the 2D HARP tracking method to search around the intersection point on the image plane, and thus find the closest point with the same 2D phase value as the initial phase value of the intersection point. As for the intersection point on LA planes, since only a 1D phase value is available, an extra step is needed to use 2D HARP tracking. We need to create a synthetic HARP image that will simulate straight vertical tag lines applied on LA images. In this sense, the 2D HARP tracking is restricted to 1D, i.e. the new point found is either above or below the intersection point on the LA image. In both cases, the displacement difference between the new point and the intersection point describes the in-plane apparent motion of the intersection point. It will generate a displacement vector on the image plane, which we consider an in-plane force ap-
Fast Tracking of Cardiac Motion Using 3D-HARP
619
plied on the intersection point. The forces generated on all the intersection points will form a displacement field of forces to deform the entire mesh. It should be noted that the forces generated on the longitudinal lines are on SA planes, and these forces relate to the concentric contraction of the ventricle and its twisting, while the forces generated on the latitudinal circles are on the LA planes, and they relate to the longitudinal compression of the ventricle—the motion of the base toward the apex. 3.4
Deforming the Mesh Using the Displacement Field and the Internal Force
The displacement generated by 2D HARP behaves like forces applied to intersection points to deform the mesh. In order to translate these point forces into a 3D deformation of the material mesh, we assume some mechanical properties for the mesh, represented by smoothness and elasticity. The forces generated by 2D HARP will deform the mesh while the mechanical properties will maintain the mesh shape. Because of these mechanical properties, the application of impulse forces at the points of intersection will spread, depending on the smoothness and elasticity of the mesh, to neighboring points. This implementation would require using an impulse response, which is done using a weighting mask. The size of the mask depends on the elasticity and stiffness of the mesh, which would affect the area through which the force will spread. The weights of the mask determine the influence of the impulse force on the neighboring points. Therefore, under the 3D force, the mesh will deform into a new shape, M n<+11> . 3.5
Checking the Phase Time-Invariance Condition
The new mesh shape, M n<+11> , will intersect with image planes and the new intersection points are obtained. The phase time-invariance condition is checked again at the points of intersection to determine whether this condition is satisfied. If the condition is not satisfied, the previous steps are repeated and a sequence of meshes ( M n<+21> , M n<+31> , etc.) is generated until the phase time-invariance condition is adequately satisfied. The iterations would then stop and the most recent deformation would be considered as the new shape of the mesh at that timeframe. It is possible then to proceed to the next timeframe and repeat the steps. It is important to stress that the checking of the phase time-invariance condition makes this method unique since this condition checking step will ensure that the deformation is correct. The issue we must consider is whether the iteration will converge and how fast. The experiments show that this algorithm converges in less than 10 iterations per timeframe. Our future work will concentrate on studying the necessary and sufficient conditions to guarantee the convergence of this technique.
620
4
L. Pan, J.A.C. Lima, and N.F. Osman
Results and Discussion
To demonstrate the ability of this method to track the 3D motion of the heart, two SA sets (6 slices with horizontal and vertical tag lines) and one LA set (6 slices with transverse tag lines) of tagged images from a normal volunteer were analyzed using 3D-HARP. The images were obtained on a GE 1.5T Signa scanner (GE Medical Systems, Milwaukee) using a SPAMM sequence for tagging. The data sets consisted of sequences of 9 timeframes with a time separation of 33ms acquired during systole. The 3D-HARP algorithm was implemented using MATLAB (Mathworks, Inc) in a software program with a graphical user interface. Following the steps of the algorithm described previously, the initial 3D mesh was manually created at only the first timeframe (end-diastole) inside the myocardium—this is the only step that requires human interaction. The subsequent steps of mesh point tracking were totally automatic. The deformation of the mesh at each timeframe converged to satisfy the phase time-invariance condition in less than 10 iterations per timeframe. The total time for building and tracking the mesh was about 8 minutes. Fig. 4 shows the deformation of the material mesh at different stages during systole, beginning with the initial mesh at the reference time, built manually at enddiastole. As can be seen, the tracked mesh shows the typical mechanics of the normal left ventricle during systole. Notice the compression of the heart along the long-axis of the ventricle caused by the motion of the base toward the apex. Notice also the gradual change in twist pattern of the left ventricle due to the different rotation angles between the slices. The twist angle of a specific slice is defined as the net rotation angle of that slice from end-diastole to the current timeframe. The observed deformation of the mesh was quantified from the coordinates of the mesh points during systole. By measuring the percentile change in length of segments on the latitude circles, we were able to measure the circumferential strain. Fig. 5 shows the average circumferential strain plots in the lateral and septal regions of the ventricle. As can be seen, the lateral systolic circumferential strain is greater than that of the septal strain. Fig. 6 shows the twist pattern of different slices around the long axis. At early systole, all slices rotate counterclockwise (viewed from the apex). Later in systole, basal slices reverse their rotation direction and plateau at late systole, whereas more apical slices maintain the trend of rotation throughout systole. This result is consistent with what was reported by Lorenz et al.[15]
Fig. 4. Mesh deformation at 4 timeframes. The reference time, at end diastole, is the time when the tags are imposed and timeframe 9 is at end systole
Fast Tracking of Cardiac Motion Using 3D-HARP
621
5 Conclusion 3D-HARP is a fast and semiautomatic analysis technique to track 3D cardiac motion and perform regional motion analysis. There are a number of advantages to the proposed technique. First, computation is fast since the segmentation is not required to track motion. Initializing the material mesh is the only step that requires human.
Fig. 5. LC strain at lateral segment and septal segment inside the LV wall. All six SA slices are mid-ventricular, ordered from base-to-apex
interaction. Second, deformation is always checked based on the phase timeinvariance condition, which guarantees the proper deformation. This condition checking steps eliminates the accumulation of error as we proceed from the first time frame to the next. Finally, the technique does not require a rigid form of the mesh, and smaller partial mesh that covers only small regions in the LV can be used. This would allow even more rapid motion tracking of these regions.
Fig. 6. Time course of twist angles at 6 mid-ventricular SA slices.
Several issues must be addressed and are subjects of future research. First, we need to further investigate the appropriate mechanical model for the material mesh. The proposed model here is simple, but we hypothesize that a better and more realistic model can be derived. Second, we need to do further research to determine the necessary and sufficient condition to guarantee the convergence. Third, we built only a one-layer mesh inside the LV wall in this example. This method could be easily ex-
622
L. Pan, J.A.C. Lima, and N.F. Osman
tended to build a multi-layer mesh that would enable us to compute transmural strain and create a 3D strain tensor map. However, further research is required, since meshes close to the epicardium and the endocardium will be more prone to noise. Finally, the algorithm is currently implemented in Matlab (Mathworks Inc.), but we expect it to be much faster if implemented in C. Acknowledgment. We thank Dr. Jerry L. Prince for his valuable discussions and suggestions. We are also thankful to Dr. William S. Kerwin from University of Washington for his kind help in providing the MRI images. This research was supported by grant HL66075-01 from the National Heart, Lung, and Blood Institute (NHLBI).
References [1] [2] [3] [4] [5] [6] [7]
[8]
[9]
[10] [11] [12]
[13] [14] [15]
L.C. Palmon, N. Reichek, et al. “Intramural myocardial shortening in hypertensive left ventricular hypertrophy with normal pump function”, Circulation, 89(1): 122–31, 1994. E.A. Zerhouni, D.M. Parish, W.J. Rogers, A. Yang, and E.P. Shapiro, “Human heart: tagging with MR imaging – a method for noninvasive assessment of myocardial motion”, Radiology, 169(1): 59–63, 1988. L. Axel, “Biomechanical dynamics of the heart with MRI”, Annu. Re. Biomed. Eng. 4: 321–347, 2002. M.A. Guttman, J.L. Prince, and E.R. McVeigh, “Tag and contour detection in tagged MR images of the left ventricle”, IEEE Tran. on Med. Imag., 13(1): 74–88, 1994. M.A. Guttman, E.A. Zerhouni, and E.R. McVeigh, “Analysis of cardiac function from MR images”, IEEE Comp. Graph. and Appl., 17(1): 30–38, 1997. T.S. Denney, “Estimation and detection of myocardial tags in MR image without userdefined myocardial contours”, IEEE Trans. Med. Imag., 18(4): 330–344, 1999. A.F. Frangi, W.J. Niessen and M.A. Viergever, “Three-dimensional modeling for functional analysis of cardiac images: a review”, IEEE Tran. on Med. Imag. 20(1): 2–25, 2001. J. Declerck, T.S. Denney, C. Ozturk, W. O’Dell, and E.R. McVeigh, “Left ventricular motion reconstruction from planar tagged MR images: a comparison”, Phys. Med. Biol. 45: 1611–1632, 2000. J. Huang, D. Abendschein, V.G. Davila-Roman, and A.A. Amini, “Spatio-temporal tracking of myocardial deformations with a 4-D B-spline model from tagged MRI”, IEEE Tran. on Med. Imag. 18(10): 957–972, 1999. A.A. Young, “Model tags: direct three-dimensional tracking of heart wall motion from tagged magnetic resonance images”, Med. Imag. Anal. 3(4): 361–372, 1999. W.S. Kerwin and J.L. Prince, “Cardiac material markers from tagged MR images”, Med. Image Anal., 2(4): 339–353, 1998. N.F. Osman, W.S. Kerwin, E.R. McVeigh, and J.L. Prince, “Cardiac motion tracking using CINE harmonic phase (HARP) magnetic resonance imaging”, Mag. Reson. Med., 42: 1048–1060, 1999. N.F. Osman, E.R. McVeigh, and J.L. Prince. “Imaging heart motion using harmonic phase MRI”, IEEE Tran. on Med. Imag. 19(3): 186–202, 2000. I. Haber and C.F. Westin, “Model-based 3D tracking of cardiac motion in HARP images”, Proc. Intl. Soc. Mag. Reson. Med. 10, 2002. C.H. Lorenz, J.S. Pastorek, and J.M. Bundy, “Delineation of normal human left ventricular twist throughout systole by tagged cine magnetic resonance imaging”, J. Cardiovasc. Magn. Reson. 2(2): 97–108, 2000.
Analysis of Event-Related fMRI Data Using Best Clustering Bases Fran¸cois G. Meyer and Jatuporn Chinrungrueng Department of Electrical Engineering, University of Colorado at Boulder, Boulder CO, 80309-0425 [email protected] http://ece-www.colorado.edu/˜fmeyer
Abstract. We explore a new paradigm for the analysis of event-related functional magnetic resonance images (fMRI) of brain activity. We regard the fMRI data as a very large set of time series xi (t), indexed by the position i of a voxel inside the brain. The decision that a voxel i0 is activated is based not solely on the value of the fMRI signal at i0 , but rather on the comparison of all time series xi (t) in a small neighborhood Wi0 around i0 . We construct basis functions on which the projection of the fMRI data reveals the organization of the time-series xi (t) into “activated”, and “non-activated” clusters. These “clustering basis functions” are selected from large libraries of wavelet packets according to their ability to separate the fMRI time-series into the activated cluster and a non activated cluster. This principle exploits the intrinsic spatial correlation that is present in the data.
1
Introduction
Functional Magnetic Resonance Imaging (fMRI), utilizes the fact that oxygenated blood and deoxygenated blood have different magnetic susceptibility to create maps of changes in cerebral venous oxygen concentration that correlate with neuronal activity. There are currently two types of experimental designs: the periodic stimulus design (block design) and the event-related stimulus design (odd-ball design). The detection of activation using event-related fMRI data is the more difficult problem and will be the focus of this work. The analysis of event-related fMRI data commonly relies on the assumption that the fMRI t signal xi = [xi (0), · · · , xi (T − 1)] at a voxel i inside the brain is obtained by cont volving the stimulus waveform si = [si (0), · · · , si (T − 1)] with some unknown hemodynamic response function hi [1]. Unfortunately, it has become clear that there is a nonlinear relationship between the variation in the fMRI signal and the stimulus presentation [2]. In the absence of any detailed substantive understanding of the mechanism of the fMRI response, we advocate a non parametric data-driven approach that can better account for all important features present in the data. Fig. 1 shows a block diagram that illustrates the principle of our approach. We consider a group of voxels, and the corresponding time series, from a small window Wi0 inside the brain. We partition these time series into two C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 623–634, 2003. c Springer-Verlag Berlin Heidelberg 2003
624
F.G. Meyer and J. Chinrungrueng
clusters. If Wi0 is located in a part of the brain that is activated, then one of the two clusters encompasses the activated voxels, or activated time series.
Wi 0 Wavelet Packets Expansion
Best Clustering Basis Selection
Time series
Non−activated voxels * * o o * * * oo o o Dimension ** * ooo o o Reduction Activated voxels Fig. 1. We partition the time series inside a small window Wi0 into two clusters. The clustering is performed on the wavelet packet coefficients (feature space)
The other cluster contains time series that correspond to background activity. If the window is in a part of the brain with no activity correlated to the stimulus, then all time series are considered to be background activity. As Wi0 is moved throughout the brain, local decision (activation/non activation) are computed for each voxel i ∈ Wi0 . This principle exploits the intrinsic spatial correlation that is present in the data. Indeed, truly activated voxels tend to be spatially clustered, while falsely activated voxels will tend to be scattered. One can then increase the sensitivity of the detection by using the fact that real activation should be more clustered than artifactual activation caused by noise. These local decisions are combined to generate a more robust global activation score, which can be presented in the form of an activation map. Unlike global clustering methods [3], our local clustering approach only partitions the time series in a small region of the brain. Furthermore, the clustering of the time series is not performed directly on the raw fMRI signal. Instead, the raw data are projected on a set of basis functions conveniently chosen for their discriminating power, and their robustness to noise. If the stimulus is periodic, the Fourier basis provide the most interesting projection of the data [4]. If the stimulus is not periodic, then we need to replace the Fourier transform with a transform that can efficiently represent the nonlinear and non-stationary structures present in event-related data. Several studies indicate that one finds dynamic changes of the fMRI signal in time, frequency, and in space [5, 6]. Wavelet packets are time-frequency “atoms” that are localized in time and in frequency. Wavelet packets have been used quite successfully to analyze transients in evoked poten-
Analysis of Event-Related fMRI Data
625
tials or electroencephalograms [7]. We favor therefore the use of wavelet packets to perform the analysis of the event-related hemodynamic response xi . Instead of using a fixed basis we will select a basis among a very large and highly redundant dictionary of wavelet packets, B = {ψ γ }. Because the dictionary is highly t redundant, one can select those wavelet packets ψ γ = [ψγ (0), · · · , ψγ (T − 1)] (where γ is a parameter indexing the functions) on which the projection of eventrelated fMRI data reveals the organization of the time-series into “activated”, and “non-activated” clusters. The representation of an event-related signal xi in such a basis is defined by xi = γ αγ (xi )ψ γ . This approach does not assume any particular model of the hemodynamic response, but rather tries to “let the data speak for themselves”. The paper is organized as follows. In Section 2 we provide a brief review of the wavelet packets and the best-basis algorithm [8]. In Section 3 we describe the new best clustering basis algorithm. Results of experiments conducted on synthetic and in-vivo fMRI data are presented in Section 4. Finally, a conclusion and a discussion can be found in Section 5. J k = 0, l = 0,...,2 0 −1 J k = 0, l = 0,...,2 0 −1 − 1 J 0−2 k = 1, l = 0,...,2 − 1
J 0−2 k = 2, l = 0,...,2 − 1
J 0−2 k = 3, l = 0,...,2 − 1
j = 1, j = 2,
...
J 0−2 k = 0, l = 0,...,2 − 1
j = 0,
J −1 k = 1, l = 0,...,2 0 − 1
Fig. 2. Wavelet packets tree shows the index γ = (j, k, l) at each node.
2
Wavelet Packets, Best-Basis
Let ψ 0 (t) be the scaling function and let ψ 1 (t) be the wavelet associated with a multiresolution analysis [9]. Let {hn } be the lowpass filter, and let {gn } be the high pass filter associated with this wavelet transform. The basic wavelet packets are given by ψ 2n (t) = hk ψ n (2t − k) , ψ 2n+1 (t) = gk ψ n (2t − k). (1) k
k
A multiscale wavelet packet is defined by ψj,k,l (t) = ψ k (2j t − l). The index γ = (j, k, l) can be interpreted as follows : • j = 0, . . . , J ≤ J0 is the scale : ψj,k,l has a support of size 2−j . • k = 0, . . . , 2j − 1 is the frequency index at a given scale j : ψj,k,l has roughly k oscillations. • l = 0, . . . , 2J0 −j − 1 is the translation index within a node (j, k) : ψj,k,l is located at l 2−j .
626
F.G. Meyer and J. Chinrungrueng
As shown in Fig. 2 the library of wavelet packets organizes itself into a binary tree, where the nodes of the tree represent subspaces with different timefrequency localization characteristics. The 2J0 × 1 discrete wavelet packet basis vector ψ j,k,l is sometimes written as ψ γ . We define the 2J0 × 2J0 −j matrix Ψ j,k = ψ j,k,0 | · · · |ψ j,k,2J0 −j −1 (2) to be the collection of wavelet packets at node (j, k) stacked together. The wavelet packet coefficients αj,k,l , l = 0, · · · , 2J0 −j of the vector x at the node (j, k) are given by the projections of x on the 2J0 −j wavelet packets stacked inside the matrix Ψ j,k , t αj,k,0 · · · αj,k,2J0 −j −1 = Ψ tj,k x. (3) Let us associate the dyadic frequency interval [2j k, 2j (k + 1) ) to the wavelet packet node Ψ j,k . If a collection of intervals [2j k, 2j (k+1)) provides a cover of the time-frequency plane, then the corresponding set of Ψ j,k forms an orthonormal J basis [8]. If we perform a dyadic subdivision at each level, we get 22 bases for J levels. Clearly, we have an extremely large amount of freedom for the construction of orthogonal bases from the wavelet packet library {ψj,k,l }. This greater flexibility can be exploited to increase the efficiency of the representation. Coifman and Wickerhauser [8] suggested to use a dynamic programming algorithm with order T log(T ) to search for that best basis which is optimal according to a given cost function M. These dictionaries have been used recently in the context of supervised classification [10]. In this work we intend to address a more general and deeper problem, where one wants to determine the best clustering basis without having access to a training set containing activated and non activated signals.
3
Best Clustering Bases
In our problem we need a cost functional M that measures how well a basis function ψγ can partition a set of signals into meaningful clusters. We illustrate our idea with a simple geometric example shown in Fig. 3. We consider two sets of points distributed respectively inside the two circles A and B (see Fig. 3). We consider two different bases: (a1 , a2 ), and (b1 , b2 ). One could imagine that the bases (a1, a2) and (b1, b2) correspond to two different wavelet packet bases. In order to discover the two clusters A and B, we can partition the points using a clustering algorithm. We can use either one of the two bases to perform the clustering. However, to simplify the clustering process and to avoid the curse of dimensionality, one should use the basis that concentrates its discriminatory power onto a small number of basis functions. Let us consider a vector v, and let us project the dataset on this vector. If we cluster the projections into two partitions, we can define the discriminatory power of v as the distance between the two centroids. according to this definition, the vectors a1 and a2 have the same discriminatory power δ (see Fig. 3). However,
Analysis of Event-Related fMRI Data
627
a2 b2
b1 A +
A + δ + B
δ ∆
a1 + B
Fig. 3. The basis (b1 , b2 ) concentrates its discriminatory power on a single vector b1 , whereas the basis (a1 , a2 ) distributes the power equally on the vectors a1 and a2 .
the vector b1 has a discriminatory power ∆, that is larger than δ. The vector b2 on the other hand does not discriminate one cluster from another. Therefore, if we use the basis (b1 , b2 ) the discriminatory power is only concentrated on the vector b1 . We should choose the basis (b1 , b2 ) and perform the clustering using only the vector b1 . Our clustering cost functional reflects this principle and selects the basis that concentrates most discriminatory power on a small number of basis functions. In order to define a cost functional adapted to our clustering problem, we need to define a measure of the separation between two clusters of wavelet packet coefficient αγ (xi ) , for a given basis vector ψγ . We partition the set of coefficients {αγ (xi ), i = 0, · · · , N − 1} into 2 clusters, and we define µi,c to be the membership value of the wavelet packet coefficient αγ (xi ) in the cluster c. µi,c is a real number in [0, 1] that measures the likelihood that αγ (xi ) belongs to the cluster c, ( µi,1 + µi,2 = 1). We first compute the cenN −1 troid αγ (c) = i=0 µi,c αγ (xi ) of the cluster c. We then calculate the in-class N −1 2 variance s2γ (c) = i=0 µi,c (αγ (xi ) − αγ (c)) /(N − 1) of each cluster c. A good separation will be achieved if the distance between the cluster centroids is large relative to the in-class variance of each cluster. For a given wavelet packet ψ γ , γ = (j, k, l), the normalized distance between the two clusters, D(γ) , is defined by |αγ (c1 ) − αγ (c2 )| D(γ) = . s2γ (c1 )s2γ (c2 )
(4)
Our definition of the normalized variance is similar to the definition of the Fisher linear discriminant. As explained in the previous section, we are interested in finding a basis where a small number of basis functions ψ γ are capable of clustering the wavelet packets coefficients αγ (xi ) into two clusters with a large distance D(γ). In other words, we favor a sparse distribution of the distances {D(γ)}γ . To characterize the sparsity of the distribution we compute its entropy. Given a wavelet packet node (j, k), we define the cost function for the subspace Ψ j,k by
628
F.G. Meyer and J. Chinrungrueng
M(Ψ j,k ) = −
−j 2J0 −1
l=0
D2 (j, k, l) D2 (j, k, l) log , D(j, k, ·)2 D(j, k, ·)2
(5)
2J0 −j −1 2 D (j, k, l). M(j, k) will be maximum if all the where D(j, k, ·)2 = l=0 wavelet packets ψ j,k,l at the node (j, k) have the same distance D(j, k, l). Such a subspace of wavelet packets will be of no interest for our purpose. 3.1
Local Clustering Basis Selection
We are now ready to define the best clustering basis algorithm. This algorithm searches for the optimal basis according to the criterion defined in (5). The wavelet packet tree (see Fig. 2) is explored from the bottom up, and the optimal combination of the Ψ j,k is kept. The search proceeds as follows. Given a set of −1 N time-series vectors {xi }N i=0 , indexed by their position i, 1. Wavelet packet expansion. For each vector xi , compute the wavelet packet coefficients αγ (xi ) at each node (j, k) of the wavelet packet tree. 2. Clustering. For each wavelet packet index (j, k, l), cluster the set {αj,k,l (xi ), i = 0, · · · , N − 1} into two clusters, and compute the distance D(j, k, l). For each wavelet packet node node (j, k), compute the cost function M(Ψ j,k ). 3. Divide and conquer. For the coarsest scale J, initialize the best basis with BJ,k = Ψ J,k , k = 0, . . . , 2J − 1. For the scales J − 1 until 0, choose the best subspace Bj,k according to
Bj,k
Ψ j,k if M(Ψ j,k ) ≤ = Bj+1,2k ⊕ Bj+1,2k+1
M(Bj+1,2k ) + M(Bj+1,2k+1 ) otherwise.
(6)
The output of the algorithm is the best basis B0,0 = {bγ } , γ ∈ Γ0 . We have noticed in our experiments that the basis vectors that provide the best clustering power will generate a set of coefficients with a large variance. Because we want to retain only the basis vectors with the largest clustering power, we rank the basis vectors according to their variance σ 2 (γ), and keep the first Tr vectors bγl contribute to a percentage r of the total variance. These Tr basis vectors constitute the “clustering” space. The wavelet packet expansion has a complexity of T log T , and the complexity of the search is also T log T . 3.2
Interpretation of the Clusters
If the region Wi0 is located in a part of the brain that is activated, then one of the two clusters contains activated voxels, and the other cluster is composed of time
Analysis of Event-Related fMRI Data
629
series that describe background activity. According to our experimental findings the cluster with the largest in-class variance s2γ (c) corresponds to the activated cluster. If the window is in a part of the brain with no activity correlated to the stimulus, then all time series correspond to background activity, and the partition of the time series into two clusters is artificial. In this latter case both the normalized distance Dγ between the two clusters and the total variance of the wavelet packet coefficients are much smaller than when the region Wi0 contains a mixture of activated and background time series. We use this observation to discriminate between the two situations : (1) the two clusters come from background activity (unrelated to the stimulus), and (2) one cluster contains the activated time series, and the other the background time series.
4 4.1
Experiments Synthetic Event-Related Data
We describe in this section experiments conducted on artificial data sets of eventrelated fMRI time series. The BOLD signal was modeled using the parametric model proposed by Glover [11]. The BOLD signal y(t) is zero before the stimulus onset ts . If t > ts , y(t) given by y(t) = a1 (t − ts )d1 e−(t−ts )/t1 − 0.4a2 (t − ts )d2 e−(t−ts )/t2
(7)
The normalization constants are given by ai = max((t − ts )di e−(t−ts )/ti )−1 . We take ts = 22.5s. We consider 32 time samples : t = [0, 1.5, 3, . . . , 46.5]. We assume that the parameters of the model are random variables normally distributed: d1 ∼ N (5, 0.1), d2 ∼ N (12, 0.5), t1 ∼ N (1, 0.2), and t2 ∼ N (0.9, 0.1). We have chosen the mean of the random variables to be equal to the values estimated by Glover [11] for a motor response. Fig. 4–A shows the mean realization of the event-related signals. We generate four different realizations of y(t) according to (7). We add white Gaussian noise to these time series. The variance of the noise is increased for each experiment. Finally, we generate 16 other time series that are realizations of white Gaussian noise. The ratio of the number of non-activated time series to the number of activated time series is 4 to 1. Six datasets with average SNR=0.1, 0.2, 0.5, 0.8, 1, and 1.5 are generated. We apply the best clustering algorithm and obtain the best clustering basis for each dataset. Fig. 4–B shows the first 2 best clustering basis vectors bγ0 , and bγ1 obtained with the data with a SNR equal to 0.5. These two vectors capture 40% of the total wavelet packet coefficient variance. These vectors resemble the hemodynamic response function at different delay times and different positive response widths. Fig. 4–C shows the scatter plot obtained by projecting a dataset onto the two best clustering basis vectors for SNR = 0.5. The two clusters are well separated. We now compare the performance of our approach with two other standard methods: the t-test [4] and the correlation analysis [12]. The comparison will be based on the number of true and false positives for each value of the SNR. The true activation rate is the ratio between the number of
630
F.G. Meyer and J. Chinrungrueng 0.5
A
1.5
0.8 0.6
−0.5 0
0.4
10
20
30
40
Time 0.5
0.2
Hemodynamic Function Cluster Noise Cluster
SNR = 0.5
0 Second Basis Vector
Hemodynamic Response Function
1
0
0
1
0.5
0
−0.2 −0.4 0
10
20
30 Time
40
B
−0.5 0
10
20
30 Time
40
C
First Basis Vector −0.5 −0.5
0
0.5
1
1.5
Fig. 4. A : Mean hemodynamic response time series defined by (7). B : the best two basis vectors obtained with an SNR=0.5. C : scatter plot obtained by projecting all the time series onto the first two best clustering basis vectors.
true positives detected by the algorithm and the total number of true positives. The false activation rate is the ratio between the number of false positives detected by the algorithm and the total number of true negatives. For each value of the SNR, we generate 10 independent datasets. We analyze each dataset, and compute the average true and false activation rates. We describe in the following the details of each method of analysis. Best clustering basis algorithm. We apply the best clustering algorithm to the data and obtain the reduced feature space by retaining the first Tr vectors bγl that capture r = 40% of the total variance of the wavelet packet coefficients. We then project the time series onto this reduced subspace. Finally, we use the fuzzy C-means clustering algorithm to cluster the projected data into two clusters. The cluster ca with the centroid with the higher energy will be assigned to be the activated cluster. The membership value µi,ca measures the likelihood that the time series xi is activated. An activation map can be obtained by thresholding µi,ca . In our experiments, we use a threshold equal to 0.8. T -test. The t-test assumes that each fMRI time series corresponds to the realization of an identically independent stochastic process and divides the data into two groups, obtained during on (post-stimulus) and off (pre-stimulus) periods. We calculate a t statistic by computing the difference of the sample means of each group normalized by the pooled standard deviation [4]. We then threshold the p-value map at p = 0.05. Correlation analysis (gold standard). We compute the correlation between the time series and the model (7) using the mean values of the random parameters : d1 = 5, d2 = 12, t1 = 1, and t2 = 0.9. The correlation threshold is 0.5. We note that the correlation analysis has the perfect knowledge of the (average) true fMRI signal. Because this method represents an ideal situation that cannot be implemented in practice, it provides a challenging gold standard for our algorithm. Fig. 5 shows the average true and false activation rates as a function of the SNR. As expected, the best performance is obtained with the correlation analysis. When the SNR becomes greater than 0.4, the performance of our approach is comparable to the correlation analysis without requiring any knowledge about the hemodynamic response. Indeed, as shown in Fig. 4–right, our
Analysis of Event-Related fMRI Data 1
0.2 Clustering t−test Correlation
False Activation Rate
True Activation Rate
0.8
0.6
0.4
0
0.1
Clustering t−test Correlation
0.2
A
631
0.2
0.4
0.6 0.8 1 Signal−to−Noise Ratio
1.2
1.4
0
B
0.2
0.4
0.6 0.8 1 Signal−to−Noise Ratio
1.2
1.4
Fig. 5. True activation rate (left) and false activation rate (right) obtained with the best clustering basis algorithm (*), the t-test (+), and the correlation method (o).
method can discover automatically the relevant structure of the hemodynamic response. Both the correlation analysis and the best clustering basis algorithm outperform the t-test by achieving a higher true activation rate and a lower false activation rate. 4.2
In-Vivo Event-Related fMRI Data
We present here the results of experiments conducted with event-related fMRI data. The data, provided by Dr. Gregory McCarthy (Brain Imaging and Analysis Center, Duke University), demonstrate prefrontal cortex activation in the presence of infrequent events [13]. Visual stimuli were presented to the subjects: most of the images were squares. Infrequent events (targets) consisted in the appearance of circles at random time. A picture was displayed every 1.5 seconds. The subject was asked to mentally count the number of occurrences of the circles and report that number at the end of each run for a total of 10 runs. The experiment was designed to study whether the processes that elicit P300, an event-related potential caused by infrequent target events whose amplitude is dependent on the preceding sequence of stimuli, could also be measured by fMRI [13]. The data was acquired with a gradient echoplanar EPI sequence (TR = 1500 ms, TE = 45 ms, NEX = 1, FOV = 40 × 20 cm, slice thickness = 7 mm, and imaging matrix 128 × 64). More details about the experiments are available in [13]. We extract 16-image segments consisting of the 8 images preceding and the 8 images following each target. We have in each run about 5 to 6 targets for a total of about 52 targets. We average the 52 segments in order to increase the SNR. This average signal constitutes the time series xi . We correct for baseline differences by subtracting from xi the pre-stimulus mean value. We place a 4 × 4 window Wi0 into two different regions: (A) a region where we expect to see activation, and (B) a region where we expect no activation (see Fig. 6–A). For each position we compute the best clustering basis based on the 16 time series inside the window. Fig. 6–B shows the first two basis vectors (γ = (1, 0, 1), and (1, 0, 2))
632
F.G. Meyer and J. Chinrungrueng 1
35 Activated Cluster Non−activated Cluster
30
0.5
A
B
Second Basis Vector
25 0 2
4
6
8
10
12
14
16
1
0.5
20 15 10 5 0 −5
0
A
B
2
1
35
0.8
30
8 10 Time (in image)
12
14
16
C
Second Basis Vector
0.2 0 −0.2 −0.4
−10 −10
−5
0 5 10 First Basis Vector
15
20
20
25
0.4
15
20
10 15
5
10 5
0
0
−0.6
−5
−5
−0.8 −1
6
Non−activated Cluster
0.6
D
4
2
4
6
8 10 Time (in image)
12
14
16
E
−10 −10
−5
0 5 10 First Basis Vector
15
20
F
−10
2
4
6
8 10 Time (in image)
12
14
16
Fig. 6. A : Two 4 × 4 windows are placed in a region (A) where we expect activation, and in (B) where we expect no response related to the stimulus. B : two most discriminating basis vectors (γ = (1, 0, 1), (1, 0, 2)) for region (A). C : Scatter plot obtained by projecting the fMRI time series from region A onto the first and the second basis vectors. The class membership is determined using the fuzzy C-mean clustering algorithm with the membership value threshold of 0.8. D : most discriminating basis vector for region (B). E :Scatter plot obtained by projecting the fMRI time series from region (B) onto the first and the second basis vectors. F : time series extracted from the two voxels detected as activated by both the best clustering algorithm and the t-test.
that capture 40% of the total variance for region A. We note that these two vectors, that were discovered automatically by the algorithm, behave as some delayed hemodynamic responses. Figure 6–D shows the first best basis vector for region B. This function has no specific features. A scatter plot obtained by projecting the 16 time series on the two best clustering vectors is shown for each region A and B in Fig. 6–C and E. As expected, the scatter plot of the region with no activation (B) is fairly compact, whereas the scatter plot of region A is elongated and shows two well defined clusters. We confirm this visual impression by measuring the distance between the cluster centroids for both regions. Obviously, the clusters in region B are not meaningful but the distance between the centroids provides a quantitative measurement of the spread of the coefficients. The distance between the centroid is 2.01 for region A and 0.9 for region B. The total variance of the wavelet packets coefficients is 55 for region A and 20.5 for region B. This clearly demonstrates that both the distance between the cluster centroids as well as the total variance within the window Wi0 can be used to discriminate between activated and non-activated regions. The activated time series xi were detected by thresholding the membership value µi,ca at a level of 0.8. For comparison purposes, we also computed an activation map using the t-test. The p-value threshold was 0.005. The two activation maps shown in Fig. 7 are identical. The two time series detected as activated by the clustering algo-
Analysis of Event-Related fMRI Data
A
633
B
Fig. 7. Left: Activation map obtained by clustering wavelet packet coefficients of the two basis vectors. The membership value threshold is 0.8. Right: activation map obtained using the t-test with p-value threshold = 0.005.
rithm and by the t-test are shown in Fig. 6–F. They have the shape expected for hemodynamic responses. We also notice that the two best clustering vectors shown in Fig. 6–B perform as matched filters to detect these time series.
5
Conclusion
We have presented an algorithm that constructs a clustering basis that best separates activated time series from background noise using a small number of basis functions. Unlike most fMRI data analysis methods our approach does not require any model of the hemodynamic response or any a priori information. We have shown with several experiments conducted on synthetic data that our approach was capable of finding the basis that best concentrated its discriminatory power on the fewest number of its vectors. The projection of the original data on the first few basis vectors always revealed the organization of the data. We also applied our method to an in-vivo event-related fMRI dataset. The best clustering basis included a small number of vectors that had the characteristic features of an hemodynamic response. We have also shown that when the spatial window is placed inside a non-activated region, the best basis could not separate the time series into meaningful clusters. In this case, the basis functions resembled noise and did not possess any relevant structures. We have assumed in all the synthetic experiments that the noise was white and Gaussian. This is clearly not the case for experimental fMRI data. It has been noticed by several authors that data collected under the null-hypothesis condition exhibit the 1/f spectrum associated with long memory processes [14, 15]. The colored noise can be uncorrelated by the wavelet transform. It has been shown [16] that the wavelet transform approximates the Karhunen-Lo`eve transform for processes with 1/f spectrum. We can therefore assume that the noise in the wavelet domain is uncorrelated and Gaussian. Acknowledgments. This work was supported by a Whitaker Foundation Biomedical Engineering Research Grant. The authors are extremely grateful to Dr. Gregory McCarthy for making the fMRI data available for this work.
634
F.G. Meyer and J. Chinrungrueng
References 1. Magnus, K., Nichols, T., Poline, J.B., Holmes, A.: Statistical limitations in functional neuroimaging ii. signal detection and statistical inference. Phil. Trans. R. Soc. Lond. B (1999) 1261–81 2. Vazquez, A., Noll, D.: Nonlinear aspects of the BOLD response in functional MRI. Human Brain Mapping 7 (1998) 108–118 3. Golay, X., Kollias, S., Stoll, G., Meier, D., Valavanis, A., Boesiger, P.: A new correlation-based fuzzy logic clustering algorithm for fMRI. Magnetic Resonance in Medicine 40 (1998) 249–260 4. Lange, N., Zeger, S.: Non-linear Fourier time series analysis for human brain mapping by functional magnetic resonance imaging. Appl. Statist. 46 (1997) 1–29 5. Mitra, P., Pesaran, B.: Analysis of dynamic brain imaging data. Biophysical Journal 76 (1999) 691–708 6. Bullmore, E., Long, C., Suckling, J., Fadili, J., Calvert, G., Zelaya, F., Carpenter, T., Brammer, M.: Colored noise and computational inference in neurophysiological (fMRI) time series analysis : resampling methods in time and wavelet domain. Human Brain Mapping 78 (2001) 61–78 7. Raz, J., Dickerson, L., Turetsky, B.: A wavelet packet model of evoked potentials. Brain Lang 66:1 (1999) 61–88 8. Coifman, R., Wickerhauser, M.: Entropy-based algorithms for best basis selection. IEEE Trans. Information Theory 38 (1992) 713–718 9. Mallat, S.: A Wavelet Tour of Signal Processing. Academic Press (1999) 10. Saito, N.: Classification of geophysical acoustic waveforms using time-frequency atoms. Proc. Am. Statist. Assoc. Statist. Computing (1996) 322–7 11. Glover, G.: Deconvolution of impulse response in event-related bold fMRI. NeuroImage (1999) 416–429 12. Friston, K., Jezzard, P., Turner, R.: Analysis of functional MRI time-series. Human Brain Mapping 1 (1994) 153–171 13. McCarthy, G., Luby, M., Gore, J., Goldman-Rakic, P.: Infrequent events transiently activate human prefrontal and parietal cortex as measured by functional MRI. Journal of Neurophysiology 77 (1997) 1630–4 14. Zarahn, E., Aguire, G., D’Esposito, M.: Empirical analysis of BOLD fMRI statistics : I. Spatially unsmoothed data collected under null hypothesis conditons. Neuroimage 5 (1997) 179–197 15. Fadili, J., Bullmore, E.: Wavelet-generalized least squares : a new BLU estimator of linear regression models with 1/f errors. NeuroImage 15 (2002) 217–232 16. Wornell, G.: A Karhunen-Lo`eve-like expansion for 1/f processes via wavelets. IEEE Trans. on Info. Theory 36 (1998) 859–61
Estimation of the Hemodynamic Response Function in Event-Related Functional MRI: Directed Acyclic Graphs for a General Bayesian Inference Framework Guillaume Marrelec1,4 , Philippe Ciuciu2,4 , M´elanie P´el´egrini-Issac3,4 , and Habib Benali1,4 1
INSERM U494 {marrelec, benali}@imed.jussieu.fr 2 CEA, SHFJ [email protected] 3 INSERM U483 [email protected] 4 IFR 49
Abstract. A convenient way to analyze BOLD fMRI data consists of modeling the whole brain as a stationary, linear system characterized by its transfer function: the Hemodynamic Response Function (HRF). HRF estimation, though of the greatest interest, is still under investigation, for the problem is ill-conditioned. In this paper, we recall the most general Bayesian model for HRF estimation and show how it can beneficially be translated in terms of graphical models, leading to (i) a clear and efficient representation of all structural and functional relationships entailed by the model, and (ii) a straightforward numerical scheme to approximate the joint posterior distribution, allowing for estimation of the HRF, as well as all other model parameters. We finally apply this novel technique on both simulations and real data.
1
Introduction
Functional MRI (fMRI) is a non-invasive technique allowing for the evolution of brain processes to be dynamically followed in various cognitive and behavioral tasks. In the most common fMRI technique, based on the so-called Blood Oxygen Level Dependent (BOLD) contrast, what is actually measured is only indirectly related to neuronal activity through a process that is still under investigation [1,2]. For this reason, a convenient way to analyze BOLD fMRI data consists of modeling the whole brain as a stationary, linear “black box” system characterized by its transfer response function, also called Hemodynamic Response Function (HRF) [3]. This model, called General Linear Model (GLM), fairly well accounts for the properties of the real system as long as the inter-stimulus interval does not decrease beyond about two seconds [4,5]. Estimation of the HRF is of the greatest interest when analyzing fMRI data, since it can give a deep insight into the underlying dynamics of brain activation C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 635–646, 2003. c Springer-Verlag Berlin Heidelberg 2003
636
G. Marrelec et al.
and the relationships between activated areas. HRFs are increasingly suspected to vary from region to region, from task to task, and from subject to subject [6,7,8]. Nevertheless, accurate estimation of the response function still belongs to ongoing research, since the problem is badly conditioned, and various nonparametric methods have been developed so far in an attempt to infer the HRF at each time sample, such as selective averaging [4], averaging over regions [9], introduction of non-diagonal models for the temporal covariance of the noise [10], or temporal regularization [11]. In [12,13], we proposed a Bayesian, non-parametric estimation of the HRF for event-related designs. Basic yet relevant physiological information was introduced to temporally constrain the problem and calculate robust estimators of the parameters of interest. In [14,15,16] the model was extended to account for asynchronous event-related designs, different trial types, and several fMRI sessions, further improving the estimation. For calculation reasons, all variants proposed so far have however the drawback of not integrating the hyperparameter uncertainty. Furthermore, probabilistic treatment of the drift parameters in the extended model is hypothesized to be possible, but at a significantly higher computational cost. In this paper, we propose to cast a new light on the GLM. We still place ourselves in a Bayesian framework, permitting integration of information originating from various sources and efficient inference on the parameters of interest. A general model is set that accounts for most event-related fMRI data. In a common Bayesian approach, we would then calculate the joint posterior distribution of all parameters, which would be the pivotal quantity for all further inference. Since direct sampling from this probability density function (pdf) would prove impossible, Markov-Chain sampling would be required, such as Gibbs sampling. In this case, conditional pdfs should be derived. In this perspective, we advocate that calculation of the posterior pdf is unncessary. We resort to a novel approach that focuses on graphical modeling and that, once the model has been properly set, makes it possible to directly lead probabilistic inference about all parameters. More precisely, we utilize graph theory to conveniently deal with the model. Graphs give a very simple and efficient representation of the model, however complex it may be. In this framework, we translate the model into a directed acyclic graph (DAG) and into functional relationships between the DAG variables. Using Markov properties of DAGs, drawing inference becomes straightforward, and Gibbs sampling provides us with a numerical approximation of the joint posterior pdf. In the first part of this paper, we develop the general Bayesian framework for HRF estimation, presenting an extended version of the GLM. In the following section, the GLM is translated in terms of graphical model, and it is shown how inference can readily be performed from there. We briefly present simulations and finally apply our resolution model to real data.
Estimation of the Hemodynamic Response Function
2
637
HRF Estimation in fMRI Data Analysis
2.1
Notations
In the following, x denotes a real number, x a vector and X a matrix. For the sake of simplicity, (xi ) —between parentheses— is a shortcut for (xi )1≤i≤I . “t ” is the regular matrix transposition. I N stands for the N -by-N identity matrix. “∝” relates two expressions that are proportional. For two variables x and y, “x|y” stands for “x given y”, and p(x) for the probability of x. N (m, V ; x) is the Gaussian density function with mean m and covariance matrix V calculated at sample x. Inv−χ2 (d, r2 ; u) is the scaled inverse-chi-square density function1 with d degrees of freedom and scale parameter r2 evaluated at sample u. 2.2
General Linear Model
Data. Let an fMRI experiment be composed of S sessions, each session involving I different stimulus types. Define y s = (ys,ts,n )1≤n≤Ns as the BOLD fMRI time course of a voxel (i.e., volume element) at (not-necessarily uniformly sampled) times (ts,n ) for session s, and xs,i = (xs,i,t )ts,0 ≤t≤ts,Ns the corresponding binary time series, composed of the ith stimulus onsets. The following discrete linear convolution model (H) is assumed to hold between the stimuli and the data: ys,ts,n =
Ki I
hi,k∆t xs,i,ts,n −k∆t +
Ms
λs,m dm,ts,n +etn
n = ns +1, . . . , Ns ,
m=1
i=1 k=0
where ns is the largest integer so, that ts,n − Ki ∆t < ts,1 for all i. The (Ki + 1)dimensional vector hi = (hi,k∆t )t represents the ith unknown HRF to be estimated, sampled every ∆t. All HRFs are assumed to be constant across sessions. Ls = Ns − ns is the actual amount of data used in the calculation for each session. X s,i = (xs,i,ts,n −k∆t ) is the regular Ls -by-(Ki + 1) design matrices, consisting of the lagged stimulus covariates. In the Ls -by-Ms matrix D s = (dm,ts,n ) are the values at times (ts,n ) of a basis of Ms functions that takes a potential drift and any other nuisance effect into account, and λs = (λs,m )t contains the corresponding coefficients. For the sake of simplicity, the bases are assumed to be orthonormal, i.e., L1s D ts D s = I Ls . Vector es = (es,ts,n )t accounts for noise and is supposed to consist of independent and identically distributed Gaussian variables of unknown variance σs2 , assumed to be independent from the HRFs. In matrix form, (H) boils down to ys =
I
X s,i hi + D s λs + es ,
s = 1, . . . , S,
i=1 1
If u is chi-square distributed with d degrees of freedom, then dr2 /u is scaled inversechi-square distributed with d degrees of freedom and scale parameter r2
638
G. Marrelec et al.
also called General Linear Model (GLM). In this model, the likelihood of the data yields S p (y s )|H, (hi ), (σs2 ), (λs ) = p y s |H, (hi ), σs2 , λs , s=1
with each term in the product reading I 2 2 X s,i hi + D s λs , σs I Ls ; y s . p y s |H, (hi ), σs , λs = N i=1
HRFs and hyperparameters. The GLM being ill-conditioned, prior information must be incorporated in order to constrain the problem. Since the underlying physiological process of BOLD fMRI is as of yet only partially understood, we set the following soft constaints [12,15]: (P1) the HRFs start and end at 0. This amounts to setting the first and last samples of each HRF to 0, so that only Ki − 1 parameters (instead of Ki + 1) are now unknown. (P2) the HRFs are smooth. Quantification is achieved by setting Gaussian priors for the norm of the second derivative of the HRFs, whose variances are adjusted by hyperparameters i ’s: i = 1, . . . , I p(hi |H, 2i ) = N 0, 2i R−1 i ; hi where Ri is the following (Ki − 1)-by-(Ki − 1) matrix: 5 −4 1 0 ··· 0 −4 6 −4 1 0 1 −4 6 −4 1 0 .. 0 1 −4 6 −4 1 0 . 1 .. .. .. .. .. .. .. Ri = . . . . . . . . 4 (∆t) . .. 0 1 −4 6 −4 1 0 0 1 −4 6 −4 1 0 1 −4 6 −4 0 ··· 0 1 −4 5 (P3) No prior dependence is assumed between HRFs, so that I p(hi |H, 2i ) · p(2i |H). p (hi ), (2i )|H = i=1
For convenience reasons, the priors for 2i ’s are set as conjugate priors, i.e., these parameters are assumed to be a priori i.i.d. with common pdf a scaled inverse-χ2 with n degrees of freedom and scale parameter r2 given in the model (n set to a small value to obtain a “hardly” informative prior). This setting is further analyzed in the discussion.
Estimation of the Hemodynamic Response Function
639
Drifts and noise variances. Unlike the HRFs, noise variances and drift parameters may vary across sessions. For the sake of simplicity, the σs2 ’s are again assumed to be i.i.d. with common pdf a scaled inverse-χ2 distribution with nσ degrees of freedom and scale parameter rσ2 . The λs are assumed to be i.i.d. with common pdf a Gaussian pdf of mean mλ and covariance matrix V λ . Joint posterior distribution. Considering the model so constructed and assuming no further prior dependence between parameters, formal application of the chain rule yields S p (y s ), (λs ), (hi ), (σs2 ), (2i )|H = p y s |H, (hi ), σs2 · p(λs |H) · p(σs2 |H) s=1
×
I
(1) p(hi |H, 2i ) · p(2i |H).
i=1
Given data (y s ), our knowledge relative to the model parameters can easily be updated using the conditioning formula p (y s ), (λs ), (hi ), (σs2 ), (2i )|H 2 2 p (λs ), (hi ), (σs ), (i )|H, (y s ) = . p (y s )|H In words, the joint posterior probability distribution is proportional to the joint probability of Eq. (1). Replacing all distributions by their functional forms, this joint posterior pdf could be calculated in closed form, as is indeed done in most research papers applying Bayesian analysis. Since direct sampling from the joint posterior pdf is impossible in this problem, we must resort to MCMC, e.g., Gibbs sampling where conditional pdfs should be derived. In this perspective, we propose to avoid calculating the joint posterior pdf to directly proceed to inference. In order to do so, we beforehand embed our model in a framework that allows for convenient representation, handling, and numerical inference: directed acyclic graphs.
3 3.1
Graphical Modeling Directed Acyclic Graphs (DAGs)
A graph G is a mathematical object that relates a set of vertices, V , to a set of edges, E, consisting of pairs of elements taken from V . There is a directed edge or arrow between vertices z n and z m in V if the set E contains the ordered pair (z n , z m ); vertex z n is a parent of vertex z m , and vertex z m is a child of vertex z n . An oriented graph is a graph whose edges are all oriented. A path is a sequence of distinct vertices z n1 , · · · , z nm for which (z nl , z nl+1 ) is in E for each l = 1, · · · , m − 1. The path is a cycle if the end points are allowed to be the same, z n1 = z nm . Finally, an oriented graph with no cycle is called a directed acyclic graph, or DAG. For more details, the reader is referred to [17].
640
G. Marrelec et al.
The major feature of DAGs is that any probability density on (z n ) must factorize according to the so-called factorization property: p(z) =
N
p (z n |pa (z n )) ,
(2)
n=1
where pa(n) is the set of parents of vertex z n . Defining a graph amounts to (i) defining relevant variables (i.e., nodes) z n , (ii) defining structural relationships (i.e., edges) z n → z m , and (iii) defining functional relationships p (z n |pa(n)). Pearl [18] showed a property that proves to be very efficient for numerical sampling, namely that nothing more is required to calculate the conditional probability of any DAG node: the probability distribution of any variable z n in the network, conditioned on the state of all other variables, is given by the product p (z m |pa (z m )) , (3) p(z n |r.v.) ∝ p (z n |pa (z n )) · z m ∈ch(z n )
where r.v. stands for “remaining variables” and ch(z n ) for the children nodes of z n . In other words, the conditional probabilities can be derived from local quantities that are part of the model spectification. 3.2
DAG Model for HRF Estimation
The GLM can easily be expressed in terms of DAG. Indeed, consider the DAG proposed in Fig. 1. Irrespective of the functional relationships between nodes, (2) states that the joint pdf for all DAG variables decomposes exactly like the posterior pdf in Eq. (1). Identifying all functional relationships of the DAG to their counterparts for model (H) then makes the DAG a perfect representation of the GLM. However complicated (H) may be, it is still much simpler to conceptualize in graph form than as it was presented before. Whereas determination of structural relationships between two given variables in model (H) remains a tough problem to tackle, the corresponding DAG clearly and unambiguously represents all possible independence relationships, that can be read off the graph using its Markov properties. 3.3
Numerical Inference
To obtain a numerical approximation of the joint posterior pdf, we apply Gibbs sampling. This consists of starting with a seed vector and sequentially modifying one vector component at a time by sampling according to the conditional pdf of that component given the remaining variables. Samples are composed of the set of all vectors whose components have been updated an equal amount of times. An issue with Gibbs sampling is to partition the vector of all parameters into components whose conditional sampling can easily be performed. Another one is derivation of the conditional pdfs corresponding to the chosen clustering. In our
Estimation of the Hemodynamic Response Function
641
Fig. 1. DAG corresponding to the General Linear Model. The gray nodes represent available information
case, both questions are answered at once, thanks to the previous step of graph modeling. As a matter of fact, it first allows us to decompose the parameter vector onto its 2I + 2S canonical components: I i ’s and hi ’s, S σs2 ’s and λs ’s (all y s ’s begin given, no sampling needs to be done on these variables). The updating steps are performed on these variables, and we therefore need access to the following conditional pdfs: p(hi |H, r.v.), p(2i |H, r.v.), p(σs2 |H, r.v.), and p(λs |H, r.v.). But, according to Pearl’s theorem [18], knowledge of the functional relationships is sufficient to infer these conditional pdfs. Application of Eq. (3) directly yields p(2i |H, r.v.) ∝ p(2i |H) · p(hi |H, 2i ) = Inv−χ2 µi , τi2 ; i p(σs2 |H, r.v.) ∝ p(σs2 |H) · p y s |H, (hi ), σs2 , λs = Inv−χ2 νs , ωs2 ; σs2 p(hi |H, r.v.) ∝ p(hi |H, 2i ) ·
S p y s |H, (hi ), σs2 , λs = N (δ i , ∆i ; hi ) s=1
p(λs |H, r.v.) ∝ p(λs |H) · p y s |H, (hi ), σs2 , λs ) = N (γ s , Γ s ; λs ) ,
with µi = (K − 1) + n n r2 + hti Rhi (K − 1) + n νs = Ls + nσ nσ rσ2 + y s − i X s,i hi − D s λs 2 ωs2 = Ls + nσ −1 1 t 1 t 1 δi = R+ X s,i X s,i X s,i y s − X s,j hj − D s λs 2i σs2 σs2 s s τi2 =
∆i =
and
1 t 1 R+ X s,i X s,i 2 i σs2 s
j=i
−1
,
642
G. Marrelec et al.
−1 1 t −1 V λ mλ + 2 D s y s − γs = V X s,i hi σs i −1
1 t −1 Γ s = V λ + 2 Ds Ds . σs
−1 λ
1 + 2 D ts D s σs
The sampling can then be performed by sequentially updating the 2i ’s, the hi ’s, then the σs2 ’s, and finally the λ’s. We are admittedly mostly interested in the HRFs, but knowledge of the values taken by the other parameters are relevant as well for our analysis and a better understanding of brain processing. Gibbs sampling gives us access to estimates for all parameters or any quantity of interest related to them. For instance, in this paper, parameter estimators are given as posterior mean ± posterior standard deviation. These quantities are, in turn, approximated by their sample counterparts on the Markov chain generated by the Gibbs sampling scheme.
4
Simulations
We simulated data with two HRFs, as depicted in Fig. 2, and two sessions of N = 100 time samples. ∆t and the sampling interval were both set to 1.5 s. Quadratic drifts (p1 (t) = 846+0.2·t+0.001·t2 and p2 (t) = 950+0.15·t+0.0011·t2 ) and Gaussian white noises (σ12 = 50, σ22 = 100) were also added. Note that the noise standard deviations are about of the same amplitude as the HRF. For the analysis, we set both orders to K = 20, and took a quadratic drift in consideration (M = 3) with mλ = (900 0 0)t , V λ diagonal, with diag(V λ ) = 100 · I 3 , n = nσ = 2, r2 = 1, rσ2 = 52 . Our unoptimized Matlab program took less than 30 s on a Ultra Sparc workstation to run 1,000 updates. We kept the 500 last samples to lead inference. We obtained the following estimates (mean ± standard error deviation): σ12 ≈ 55.8 ± 10.6, σ22 ≈ 115.1 ± 19.9, 21 ≈ 1.17 ± 0.55 and 22 ≈ 0.66 ± 0.34, λ1,0 = 857 ± 4 and λ2,0 = 953 ± 5. As shown on Fig. 2, HRF estimates are also very accurate for the noise level considered.
5
Real Data
Eleven healthy subjects (age 18-40) were scanned while performing a motor sequence learning task. Using a joystick, they were asked to reach a target projected on a screen for 3 s, following an elliptic curve as precisely and rapidly as possible. They had to complete 64 trials of sequence (SEQ) mode (the targets appeared in a predefined order, unknown to the subject, to form a 8-item-long sequence) and 16 trials of random (RAN) mode (the targets appeared randomly). The time interval beween two consecutive trials was randomly selected to uniformly lie between 3 and 4 s. Functional T∗2 -weighted acquisitions were performed on a 3 T Bruker MEDSPEC 30/80 MR system (TR: 3.486 ms, TE: 35 ms, flip angle: 90◦ , matrix 64 × 64 × 42, voxel size 3 × 3 × 3 mm).
Estimation of the Hemodynamic Response Function second HRF: Gaussian density function
16
16
14
14
12
12
10
10
intensity
intensity
first HRF: SPM2 canonical HRF
8 6
8 6
4
4
2
2
0
0
−2
−2
−4 0
7.5
15
643
22.5
−4 0
30
7.5
15
22.5
30
Fig. 2. Simulations. Estimated (dashed line) and simulated (solid line) HRF
To illustrate the method, we selected two voxels, vox1 and vox2. vox1 was located in the left cerebellum and vox2 in the right hippocampus. Our goal was to estimate both HRFs corresponding to conditions SEQ and RAN respectively. For the analysis, we first adjusted the stimulus on a grid of interval ∆t =TR/5. Both HRFs were assumed to have a common order K = 5 × 5, for a duration of 5 TRs. nσ and n were both set to 1; rσ2 was set to 402 and r2 to 106 ; mλ was set to (3, 500 0 0)t , and V λ to a diagonal matrix such, that diag(V λ ) = (10, 0002 1, 0002 1, 0002 ). As showed in Fig. 3, the method is able to extract different HRF behaviors for different conditions, despite a very low SNR. The high noise level is reflected in the large estimate error bars but does not prevent discrimination between conditions. vox2 (right hippocampus) 50
40
40
30
30
20
20
10
10
intensity
intensity
vox1 (left cerebellum) 50
0 −10
0 −10
−20
−20
−30
−30
−40
−40
−50 −60 0
−50 2
4
6
8
10
12
14
16
time (s)
−60 0
2
4
6
8
10
12
14
16
time (s)
Fig. 3. Real data. HRFs corresponding to the SEQ (solid line) and RAN (dashed line) stimuli. Each sample has been slightly shifted to the left (SEQ) or to the right (RAN) for a better graphical rendering
2
www.fil.ion.ucl.ac.uk/spm/spm99.html
644
6
G. Marrelec et al.
Discussion
Our approach made it possible to associate the well-known general linear model for HRF estimation in fMRI data analysis to a directed acyclic graph. This had the first advantage of making clear all modeling hypotheses. Moreover, in a Bayesian framework, the complex, yet central, step of calculating the joint posterior pdf was avoided. Instead, the graph provided us with a very convenient tool to first break down the set of all variables into coherent subsets, namely its nodes. Using the Markov properties, it is straightforward to derive all conditional pdfs that are required for Gibbs sampling as products of conditional pdfs that have been specified with the modeling. Fully probabilistic numerical inference is then straightforward at a reasonable time cost. For an increased effectiveness, the sampling procedure should be optimized. Convergence could be monitored through comparison of within- and between-variances of parallel chains as proposed in [19]. Because variable partitioning is implied by the graph structure, our aplication of Gibb sampling differs from classical applications. For instance, we sequentially sampled each HRF, resp. drift vector, resp. noise variance, whereas a conventional procedure would simultaneously sample all HRFs, then all drift parameters, and so on. What influence this difference makes on the convergence speed of the Markov chain is a matter that needs to be further investigated. To check estimation robustness, influence of the prior parameters is of importance. mλ and V λ were found to have very little weight on the inference, nσ and rσ some more. Selection of n and r did not much matter for drift and noise parameters, but had a dramatic influence on HRF estimation. Indeed, we did not even have a prior idea of the magnitude order of 2i , whereas scaled inversechi-square pdfs are relatively localized around their mode. To remedy this flaw, we suggest to estimate potential values of 2i for a set of credible HRFs (such as SPM99 canonical HRF abovementioned). Changing the prior pdf form would also be possible, and we believe that setting priors for log(2i ) would prove more adapted to the state of ignorance that we are in relative to these parameters than conjugate priors. A general procedure is proposed in [20] to sample from pdfs that have the structure implied by Eq. (3) using rejection sampling. In the framework of DAG modeling, the local properties of relationships renders the model very simple to structurally or functionally modify at a local level, either because it does not correctly explain the phenomenom under interest, or because a more complex model is sought As a matter of fact, the proposed model for HRF estimation can already be seen as an improvement of the graphical model associated to the basic one-HRF, one-session linear model. Now, consideration of local spatial information, as in [21], could be achieved by gathering all voxel graphical models that were here assumed to be independent from each other and adding relationships between neighboring hv,i ’s. As more and more information is incorporated into the model, the corresponding graph will become more and more complex. However, tools have been developed to deal with such graphs. Parallel processing of Gibbs sampling can be implemented. To avoid the problem of simultaneous updating of neighboring variables, one has to
Estimation of the Hemodynamic Response Function
645
apply the so-called “edge reversal” control policy, as detailed in [18]. For huge graphs, [22] proposed an efficient variant of Gibbs sampling. We finally believe that this novel approach has a much broader application range than just fMRI data analysis. Indeed, we are confident in the fact that any Bayesian model can be embedded in a graphical framework. This would allow to concentrate on the modeling, since efficient and automated inference would directly derive from the model.
7
Conclusion
In this paper, we proposed a novel Bayesian inference framework for HRF estimation in fMRI data analysis, based on translating the existing Bayesian model into a DAG to combine the features of graphical modeling and Bayesian analysis. This approach makes extensive use of directed acyclic graphs to (i) represent the model in a compact, yet efficient way, and (ii) lead probabilistic inference through Gibbs sampling. This technique takes advantage of Markov properties of DAGs. Models can easily be designed, and both structural (i.e., of independence) and functional relationships are clearly presented. Moreover, using Gibbs sampling on the DAG, fully probabilistic numerical inference is straightforward. Further research includes integration of more diffuse prior pdfs when necessary, as well as spatial constraints for the HRFs. Acknowledgements. The authors are grateful to Pr. Julien Doyon (Institut de G´eriatrie, Universit´e de Montr´eal, Canada) for providing us with the data and to Carine Pos´e for her technical support. Guillaume Marrelec is supported by the Fondation pour la Recherche M´edicale.
References 1. Li, T.Q., Haefelin, T.N., Chan, B., Kastrup, A., Jonsson, T., Gbver, GB., Moseley, M E .: Assessment of hemodynamic response during focal neural activity in human using bolus tracking, arterial spin labeling and BOLD techniques. Neuroimage 12 (2000) 442–451 2. Aubert, A., Costalat, R.: A model of the coupling between brain electrical activity, metabolism, and hemodynamics: application to the interpretation of functional neuroimaging. Neuroimage 17 (2002) 1162–1181 3. Friston, K.J ., Jezzard, P., Turner, R.: Analysis of functional MRI time-series. Hum. Brain Mapp. 1 (1994) 153–171 4. Dale, A.M., Buckner, R.L.: Selective averaging of rapidly presented individual trials usin fMRI. Hum. Brain Mapp. 5 (1997) 329–340 5. Buckner, R.L.: Event-related fMRI and the hemodynamic response. Hum. Brain Mapp. 6 (1998) 373–377 6. Buckner, R.L., Koutstaal, W., Schacter, DL., Wagner, A.D., Rosen, B.R. Functional-anatomic study of episodic retrieval using fMRI (I). Neuroimage 7 (1998) 151–162
646
G. Marrelec et al.
7. Buckner, R.L., Koutstaal, W., Schacter, D.L., Dale, A.M., Rotte, M., Rosen, B.R. Functional-anatomic study of episodic retrieval using fMRI (II). Neuroimage 7 (1998) 163–175 8. Aguirre, G.K., Zarahn, E., D’Esposito, M.: The variability of human BOLD hemodynamic responses. Neuroimage 7 (1998) S574 9. Kershaw, J., Abe, S., Kashikura, K., Zhang, X., Kanno, I.: A Bayesian approach to estimatmg the haemodynamic response function in event-related fMRI. Neuroimage 11 (2OOO) S474 10. Burock, M.A., Dale, A.M.: Estimation and detection of event-related fMRI signals with temporally correlated noise: a statisticaly efficient and unbiased approach. Hum. Brain Mapp. 11 (2000) 249–260 11. Goutte, C., Nielsen, F.˚ A., Hansen, L.K.: Modeling the haemodynamic response in fMRI using smooth FIR filters. IEEE Trans. Med. Imaging 19 (2000) 1188–1201 12. Marrelec, G., Benali, H., Ciuciu, P., Poline, J.B.: Bayesian estimation of the hemodynamic response function in functional MRI. In Fry, R., ed.: Bayesian Inference and Maximum Entropy Methods in Science and Engineering: 21st International Workshop, AIP, Melvilie (2001) 229–247 13. Marrelec, G., Benali, H., Ciuciu, P., P´el´egrini-Issac, M., Poline, J.B.: Robust Bayesian estimation of the hemodynamic response function in event-related BOLD fMRI using basic physiological information. Hum. Brain Mapp. 19 (2003) 1–17 14. Ciuciu, P., Marrelec, G., Idier, J., Benali, H., Poline, J.B.: A general tool to estimate the hemodynamic response function in fMRI data. In: Proceedings of the 8th International Conference on Functional Mapping of the Human Brain, Available on CD-ROM. (2002) 15. Ciuciu, P., Marrelec, G., Poline, J.B., Idier, J., Benali, H.: Robust estimation of the hemodynamic response function in asynchronous multitasks multisessions eventrelated fMRI paradigms. In: 2002 IEEE International Symposium on Biomedical Imaging Proceedings, IEEE (2002) 847–850 16. Ciuciu, P., Poline, J.B ., Marrelec, G., Idier, J., Pallier, C., Benali, H.: Unsupervised robust non-parametric estimation of the hemodynamic response function for any fMRI experiment. IEEE Trans. Med. Imaging (2002) Accepted. 17. Pearl: Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge (2001) 18. Pearl, J.: Evidential reasoning using stochastic simulation of causal models. Artif. Intell. 32 (1987) 245–257 19. Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis. Texts in Statistical Science. Chapman & Hall, London (1998) 20. Marrelec, G., Benali, H.: Automated rejection sampling from product of distributions. Comput. Stat. (2004) In press. 21. G¨ ossl, C., Auer, D., Fahrmeir, L.: Bayesian spatiotemporal inference in functional magnetic resonance imaging. Biometrics 57 (2001) 554–562 22. Jensen, C.S., Kong, A., Kjærulff, U.: Blocking Gibbs sampling in very large probabilistic expert systems. International Journal of Human Computer Studies. Special Issue on Real-World Applications of Uncertain Reasoning 42 (1993) 647–666
Nonlinear Estimation and Modeling of fMRI Data Using Spatio-temporal Support Vector Regression 1
2
1
Yongmei Michelle Wang , Robert T. Schultz , R. Todd Constable , and 1 Lawrence H. Staib 1
Department of Diagnostic Radiology; 2 Child Study Center Yale University School of Medicine, New Haven, CT 06520 {[email protected]} Abstract. This paper presents a new and general nonlinear framework for fMRI data analysis based on statistical learning methodology: support vector machines. Unlike most current methods which assume a linear model for simplicity, the estimation and analysis of fMRI signal within the proposed framework is nonlinear, which matches recent findings on the dynamics underlying neural activity and hemodynamic physiology. The approach utilizes spatio-temporal support vector regression (SVR), within which the intrinsic spatio-temporal autocorrelations in fMRI data are reflected. The novel formulation of the problem allows merging model-driven with data-driven methods, and therefore unifies these two currently separate modes of fMRI analysis. In addition, multiresolution signal analysis is achieved and developed. Other advantages of the approach are: avoidance of interpolation after motion estimation, embedded removal of low-frequency noise components, and easy incorporation of multi-run, multisubject, and multi-task studies into the framework.
1 Introduction Functional magnetic resonance imaging (fMRI) is a noninvasive technique for mapping brain function by the blood oxygenation level dependent (BOLD) effect [23]. In recent years, it has played an increasing role in neuroscience research, and is beginning to become useful clinically as well, for example, in surgical planning. However, the small signal change due to the BOLD effect is very noisy and susceptible to artifacts such as those caused by scanner drift, head motion, and cardio-respiratory effects. Although a task or stimulus can be repeated over and over again, there are limits due to time constraints, learning adaptation of the subjects, etc. Therefore, refined techniques from statistics, biosignal/image processing and analysis are required for accurate detection and characterization of functional activity. The BOLD signal is a complex function of neural activity, oxygen metabolism, cerebral blood volume, cerebral blood flow (CBF), and other physiological parameters. The dynamics underlying neural activity and hemodynamic physiology are believed to be nonlinear [2,21]. Most existing fMRI data analyses assume a linear model, and primarily rely on linear methods or general linear models. As fMRI experiments have grown more sophisticated, the role of nonlinearities has become more important. Statistical evidence that justifies the use of nonlinear analysis for fMRI has also been C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 647–659, 2003. © Springer-Verlag Berlin Heidelberg 2003
648
Y.M. Wang et al.
provided recently [16]. The feasibility of its application to fMRI data has rarely been shown previous to this work. What makes fMRI analysis challenging are two features peculiar to fMRI. First, fMRI data have intrinsic spatial and temporal autocorrelations [28]. Second, fMRI data tend to have clustered activations. Common approaches either assume spatial independence, or spatially smooth the data with a Gaussian kernel in a preprocessing step. Spatial smoothing enables effective detection of a certain size of clustered activation. However, smoothing may produce a biased estimate by displacing activation peaks and underestimating their height. In order to address this issue, spatial modeling has been proposed [7,13] to take the spatial activation pattern into consideration. Since more powerful tests can be obtained with temporal smoothing due to the improved signal to noise ratio [9], a spatio-temporal linear regression method for fMRI activation detection [15] has recently been developed. This method uses the time series of neighboring voxels together with its own. It has the advantage of modeling the intrinsic spatio-temporal autocorrelations of fMRI data, which is one of the novelties of this work as well. The associated benefits compared to the corresponding voxelwise approaches have also been demonstrated. In general, techniques for analyzing fMRI data can be divided into model-driven, e.g. standard general linear model (GLM) [8], and data-driven methods, e.g. principal component analysis (PCA) [1] or independent component analysis (ICA) [20]. In model-driven methods, a model of the expected response is generated and compared with the data. These methods require prior knowledge of event timing from which an anticipated hemodynamic response can be modeled. Although accurate experimental paradigms are usually available, thorough understanding of the hemodynamic changes that relate neuronal activity to the measured BOLD [23] signal is still under research. Also, for brain responses that are not directly locked to the paradigm, model-driven analysis may not be adequate. Data-driven methods, however, explore the fMRI data statistically without any assumption about the paradigm or the hemodynamic response function. This flexibility is desirable especially where it is difficult to generate a good model; however, there are drawbacks. For example, the assumption implicit in PCA is that different modes are gaussian and uncorrelated, whereas ICA assumes that different modes are nongaussian and independent. In addition, a significance estimate for each component is usually not available. Given the advantages and disadvantages, we propose an approach that merges data-driven methods with prior time course modeling by adjusting a model coefficient. Despite the progress in fMRI analysis, there is still a need for robust and unified statistical analysis methods due to the many limitations with existing techniques, as described above. In this paper, we develop a novel, general and reliable nonlinear approach for fMRI analysis based on spatio-temporal support vector regression so that existing difficulties resulting from noise, low resolution, inappropriate smoothing and modeling can be resolved.
Nonlinear Estimation and Modeling of fMRI Data
649
2 Methods 2.1 Support Vector Machines (SVM) and Support Vector Regression (SVR) The Support Vector Machine (SVM), introduced by Vapnik [27] and studied by others [26,5], is a new and powerful learning methodology that can deal with nonlinear classification (SVC) and regression (SVR). It is systematic and principled, and has become very popular recently in the machine learning community. The idea of SVR is based on the computation of a linear regression function in a high dimensional feature space where the input data are mapped via a nonlinear function. Here we sketch the ideas behind SVR; a more detailed description of SVR can be found in Smola [26]. Given M input sample points x*1 , x*2 , , x*i , , x* M , where x*i ∈ ℜ z , and M corresponding scalar output values, y1 , y 2 , , yi , , y M , the aim is to find an approximation or a regression function of the form: M * * * y = f ( x ) = ∑α i K ( xi , x ) + b
(1)
i =1
to learn this input – output mapping from the set of training examples with high generalizability. The training process of SVR is to find an optimal set of Lagrange multipliers, α i , ∀i ∈ [1, M ] , by maximizing the SVR objective function: M
M
O = −ε ∑ α i + ∑ y iα i − i =1
i =1
1 2
M
M
∑ ∑α α i =1 j =1
i
j
* * K ( xi , x j )
(2)
subject to: 1) linear constraints:
M
∑α i =1
2) box constraint:
i
= 0,
− C ≤ α i ≤ C , ∀i ∈ [1, M ]
(3) (4)
* α i is the Lagrange multiplier associated with each training example xi . ε in Eq. (2) is
the insensitivity value meaning that training error below ε is not taken into account as error. C is the tradeoff constant between the smoothness of the SVR function and the total training error. K in Eqs. (1) and (2) is the kernel function. When the approximation function can not be linearly regressed, the kernel function maps training exam* * ples from the input space to a high dimensional feature space ℑ by x → Φ (x ) ∈ ℑ , in such a way that the function f between the output and the mapped input data points can now be linearly regressed in the feature space. K describes the inner product in the feature space: * * * * K ( xi , x j ) = Φ( xi ) ⋅ Φ( x j ) (5) There are different types of kernel functions. A commonly used kernel function is the Gaussian radial basis function (RBF):
650
Y.M. Wang et al.
* * − xÿ − xÿ K ( xi , x j ) = exp 2i σ 2 j 2
(6)
Maximizing the SVR objective function in Eq. (2) by SVR training provides us with an optimum set of Lagrange multipliers, α i , ∀i ∈ [1, M ] . The coefficient, b, of the estimated SVR function in Eq. (1) can be computed by adjusting the bias to pass through one of the given training examples with non-zero α i . With the nonlinear kernel mapping, the regression function in Eq. (1) can be interpreted as a linear combination of the input data in the feature space. Only those input elements with non-zero Lagrange multipliers contribute to the determination of the function. In fact, most of the α i ’s are zero. The training data with non-zero α i are called “support vectors”. Support vectors form a sparse subset of the training data. This type of representation is especially useful for high dimensional input spaces. 2.2 fMRI Data Representation by 4-Dimensional (4D) Spatio-temporal SVR SVR has recently been applied to system identification, nonlinear system prediction and face detection with good results [11,22,18]. Comparisons of SVR with several existing regression techniques, including polynomial approximation, radial basis functions, and neural networks were carried out in [22]. Initial attempts that directly use SVM have also been achieved for modeling hemodynamic response [3] and for comparing the patterns of fMRI activations to different visual stimuli [10]. However, the application of SVR in the context of fMRI analysis has not yet been exploited and developed. This work is the first one that introduces SVR into fMRI analysis. We formulate fMRI data as spatially windowed continuous 4D functions. That is, the fMRI data is divided into many small windows, such as a 3 × 3 × 3 region within which the entire time series is included. Each input (the training data) within a window is a 4D vector equal to the row, column, slice, and time indices of a voxel. The output is the corresponding intensity. We approximate and recover all training data within the respective windows using SVR. The detailed formulation follows. Let y (u , v, w, t ) be the fMRI signal of voxel [u , v, w]T at a given time point t, where u, v, and w are the respective row, column and slice coordinates of the data. If the 4D fMRI data size is S u × S v × S w × S t , where S t is the total number of time points, the * corresponding input vector x is represented as
* T x = [u, v, w, t ] ,
u ∈ [1, S u ], v ∈ [1, S v ], w ∈ [1, S w ], t ∈ [1, S t ] .
(7)
Within each spatio-temporal window of size M u × M v × M w × S t = M , we have M input samples x*1 , x*2 , , x*i , , x* M , where x*i ∈ ℜ 4 , and the respective scalar output y1 , y 2 , , y i , , y M . SVR is used to restore the training examples within the window. Local intrinsic spatio-temporal correlations are accounted for during the regression by controlling function smoothness and training error through parameter C (Eq. (4)). In
Nonlinear Estimation and Modeling of fMRI Data
651
order to compensate for the spatial correlation between neighboring windows, we use spatially overlapped windows (in all three dimensions) so that the recovered intensities over the overlapped voxels are averaged from the corresponding windows. 2.3 Incorporation of Temporal Modeling into Spatio-temporal SVR Without loss of generality, we assume a simple on-off boxcar function as our model variable, which contains p zeros or ones during each OFF or ON period, and c repetitions or cycles of these two periods. The total number of time points, S t , should be equal to c × 2 p . The resulting boxcar function, m(t), is:
0, if t = 1, , p; 2 p + 1, ,3 p; ; S t − 2 p + 1, , S t − p m(t ) = 1, if t = p + 1, ,2 p; 3 p + 1, ,4 p; ; S t − p + 1, , S t
(8)
where t ∈ [1, S t ] . An additional model entry, based on m(t) in Eq. (8), is added to each input data and makes our SVR a 5-dimensional (5D) regression problem:
* T x = [u, v, w, t , m(t )] ∈ ℜ 5
* x (9)
whereas the output is still the corresponding fMRI signal y (u , v, w, t ) . A simple analogy of this model incorporation is not available in traditional signal analysis because fMRI analysis is quite different from conventional systems that are fully characterized by an impulse response. The human brain is a very complicated system: only certain regions of the brain may activate according to the hemodynamic response while many other regions do not. Given the input and output, we need to find out which part of the brain belongs to the hemodynamic response involved region. The intuitive justification of our modelbased formulation can be achieved by analogy with the General Linear Model (GLM) as it is typically used in fMRI [8]. GLM is given by:
α =
×
+
µ Voxel time series vector
ÿÿÿ
Design matrix
Parameters
Error vector
Fig. 1. General Linear Model regression and boxcar function fitting diagram.
Y = Xβ + e
(10)
where Y is a fMRI data matrix; X is a “design matrix” specifying the time courses of all factors hypothesized to be present in the observed data (e.g., the task reference function, or a linear trend); β is a map of voxel values for each hypothesized factor; and e is a matrix of noise or residual modeling errors. Given this linear model and a
652
Y.M. Wang et al.
design matrix X, the β maps can be found by least squares estimation. The simplest example of the design matrix consists of a boxcar reference function (as in Eq. (8)) and a column vector with all entries constant 1 representing the mean value, without any other hypothesized factors (Fig. 1). In this case, for each voxel, the time series vector is regressed through fitting the boxcar function and the mean value µ (Fig. 1). For this voxel at a give time t, the fitting vector is the corresponding row of the design matrix and can be represented as:
[m(t ), 1]T
(11)
i.e. either [0, 1]T or [1, 1] T. We extend this idea to SVR. Support Vector Machines have very good learning and generalization abilities. As long as we construct the input vectors with the essential features we would like the machine to learn, SVR can capture the complicated relationships (nonlinear or linear) hidden in the training examples. Therefore, for fMRI data representation, in addition to using the indices of the coordinates and time point as input vectors, we also add extra model fitting entries to the nd input vectors. For the model fitting vector in Eq. (11), the 2 entry is a constant 1, which is the same for all the input vectors and can be neglected in SVR learning. With the input vector in Eq. (9), we incorporate temporal modeling into the regression. Although the m(t) here is a simple boxcar function, it can be any reference function from prior knowledge about the event timing and hemodynamic response. 2.4 Multiresolution Effects with W-Scale With the above formulation, in order to capture the underlying relationship using SVR for the windowed data and accommodate the differences in scale and training set size, the corresponding entries in the input vector are normalized over training examples within each window. After normalization, we multiply all t i by a coefficient W-scale, and all m(t i ) by a coefficient W-model, whose effects are explained below. We can adjust the effect of temporal scale by varying W-scale, the coefficient for the time indices. Varying W-scale is equivalent to examining the temporal data at different scales. Larger W-scale corresponds to finer temporal resolution. We can restore the time course at multiple resolutions and extract different frequency components by changing W-scale. Many voxel time series in fMRI exhibit low frequency trend components that may be due to aliased high frequency physiological components or drifts in scanner sensitivity. These trends can be removed in a variety of ways. In addition to using a simple high-pass filtering in the temporal domain, a running-lines smoother has been proposed [19] fitting with linear regression (by least-squares) to the k nearest neighbors of a given point. The approach used by Skudlarski et al. [25] accounts for drift during calculation of the SPM. However, both methods only aim to handle linear trends. In our spatio-temporal nonlinear SVR, with appropriate W-scale (usually relatively small), the low frequency noise can be extracted and removed, and thus achieve nonlinear de-trending. The optimal W-scale for a specific frequency component is expected to be related to the total number of time points, the period of the stimulation, and the data noise
Nonlinear Estimation and Modeling of fMRI Data
653
level, whose value is currently determined empirically. More rigorous formulation of W-scale determination is one of our future directions, which might also be achieved in the frequency domain through spectrum analysis, etc. 2.5 Merging Model-Driven with Data-Driven through W-Model The coefficient associated with the model index, W-model, determines the degree of influence of the temporal model term and the degree to which the approach is modeldriven. Higher W-model (W-model = 1) is used when reliable temporal models are available. Otherwise, lower or zero W-model is used, and the approach becomes more data-driven. W-model can be interpreted as a model confidence or fitness measure, whose value could be empirically pre-determined as a constant or estimated from regression residual analysis shown below, though extra computation is needed. To arrive at a measure of adequacy of the temporal model, we examine how much of the variation in the response variable is explained by the fitted regression data. We view an observed y i as consisting of two components: observed y = explained by the regressed relation + residual. The differences (
y i − yˆ i ) = (observed response – pre-
dicted), or residuals [14], would all be zero in the ideal situation where all the observed points lie exactly on the regressing line, and the y values would be completely * accounted for by the regression on x . We consider the mean of the absolute values of the residuals to be an overall measure of the discrepancy or departure from regression, denoted as D: D =
1 M
M
∑| y i =1
i
− yˆ i | . This discrepancy measure can be used to establish
an appropriate value for W-model. Let the D values calculated from our model-driven (W-model = 1) and data-driven (W-model = 0) methods be D_model and D_data, respectively. We define D_diff = D_data – D_model. For each window, an improved Wmodel can be estimated according to:
0, W-model = D _ diff , D _ high− D _ low 1,
if D _ diff < D _ low
(12)
if D _ low ≤ D _ diff < D _ high if D _ diff ≥ D _ high
where D_high and D_low are high and low threshold values respectively.
3 Experiments and Results1 We validate the approach by using the conventional t-test [17] on our SVR restored fMRI data for activation detection, without additional pre-smoothing or postprocessing. The Gaussian RBF kernel function (Eq. (6)) is used in our experiments with σ set empirically to 0.1. Other SVR parameters are also set empirically: C=1200, ε = 20 (Eqs. (2), (4)). 1
For the original color figures in this Section (Figs. 2 - 6), please check web site: http://noodle.med.yale.edu/~wang/Research/ipmi03_fig.html
654
Y.M. Wang et al.
3.1 Simulated Data 3.1.1 Data Generation In order to test with access to ground truth, we generate a 2D (spatial size 52 × 63) time series of synthetic data that imitates a single fMRI brain slice in which four regions are activated. Three different amplitudes of activations are added to the gray matter (intensity 180) to simulate weak, medium and strong activations as in real fMRI data. For simplicity, the activations are temporally in the form of a boxcar function, with 6 scans during each off or on period. Note that a more realistic and complicated reference function formed by convolving this boxcar with a Gamma function [4] can also be used, but we expect that the performance would be similar. The total number of time points is 72 (6 cycles). The generated data in Fig. 2(a) is then used as ground truth for comparisons. Simulated noisy data are obtained by adding Gaussian noise, N (0,30 2 ), N (0,40 2 ), N (0,50 2 ), to the ground truth data (see Fig. 2(b) for N (0,30 2 ) ). 3.1.2 Effects of W-Scale and W-Model For this dataset, the SVR window size used is: 3 × 3 × 3 × 72. Fig. 3 left demonstrates the effects of W-scale by showing the recovered time courses for an activated pixel of the simulated noisy data (Fig. 2(b)) without model fitting (W-model = 0, data-driven). As W-scale increases, higher frequency temporal components are extracted. When Wscale = 5 (Fig. 3(a)) the restored signal captures the low frequency component which can be interpreted as a nonlinear trend. Fig. 3 right demonstrates the effects of varying W-model by showing the recovered time series for the same activated pixel when Wscale = 0, which corresponds to zero frequency (d.c. component). As W-model increases, the temporal models have stronger and stronger effects during the regression and data fitting. For non-activated pixels, the model term barely affects the data regression as long as the gray-level variation at these locations does not happen to match the stimulus cycles. Note that since these are simulated data and no real physiological or neuronal activities are involved, the recovered time courses do not show any lag or undershoots. In fact, the recovered time course accurately restores the ground truth time course (Fig. 3(f)), i.e., the boxcar function. Fig. 2. Simulated 2D+T data. Top row: time T vs. spatial axis X; Bottom row: spatial axis Y vs. X.
T
X (a) Y
X
(b)
(c)
(d)
(a): Ground truth data; (b): Simulated noisy data, with noise level N (0,30 2 ) ; (c): Restored data by our SVR (W-model = 1); (d): Gaussian smoothed data with s.t.d = 0.5.
Nonlinear Estimation and Modeling of fMRI Data
655
3.1.3 Recovered Image and ROC Analysis for Activation Detection The recovered image by our method (W-model = 1) (Fig. 2(c)) accurately restores the ground truth (Fig. 2(a)). The image obtained using Gaussian smoothing of the original noisy data is shown in Fig. 2(d) for comparison. Obviously, our SVR significantly improves the quality of the noisy data. We also applied the t-test for activation detection on i) non-smoothed original noisy data; ii) pre-smoothed data; iii) our SVR recovered data with W-model = 1; iv) our SVR recovered data with W-model determined by Eq. (12), for all three noise levels. Note that for case iv), in Eq. (12), we empirically set D_low = 0 and D_high = 0.015 × minimum(D_model, D_data). The locations and intensities of detected activations can then be compared to the known pattern of added activations to measure the accuracy of detection. We use receiver operating characteristic (ROC) analysis for evaluation. The application of ROC analysis to the analysis of fMRI processing techniques was introduced by Constable et al. [6] and has been used extensively as a tool for objective comparisons of various strategies [25]. The essence of ROC analysis is the comparison of true activation rates (proportion of voxels correctly detected as significant to all voxels with added activations) obtained with different analysis techniques for a given false activation rate (proportion of voxels incorrectly detected as significant to all voxels without added activations). A plot of true activation rate versus false activation rate for different threshold values of a rating scale is called an ROC curve. Under the assumption that the underlying data for true positives and true negative trials form a binormal distribution, the area under the ROC curve can be shown to be the probability that the corresponding analysis technique will correctly identify the true positives. (Average effect of three noise levels)
Fig. 3. Effects on time course with varying W-scale and W-model in our SVR. (Simulated noise level: N (0,30 2 ) .)
Fig. 4. ROC curves for simulated noisy 2D + T data.
The visual comparison of the ROC curves in Fig. 4 (the average effects of the noisy data at three different noise levels: N (0,30 2 ), N (0,40 2 ), N (0,50 2 ) ) indicates that our SVR approaches outperform the simple t-test yielding larger areas under the ROC curves. Compared with using fixed W-model = 1, determining W-model based on the
656
Y.M. Wang et al.
model fitness / confidence measure (Eq. (12)) leads to further enhancement in activation detection. From the results of the t-test on the original noisy data and presmoothed data, we can see that pre-smoothing improves the detection accuracy. 3.2 Real fMRI Data We also applied the proposed approach to a block-design cognitive fMRI experiment [24], examining social attribution to geometric animations. T2*- weighted images are acquired using a single shot echo planar sequence. The pulse sequence is TR = 2 1500ms, TE = 60ms, flip angle= 60 o , NEX=1, in plane voxel size = 3.125 × 3.125mm . 14 coronal slices are collected and are 10mm thick (skip 1mm). Corresponding T1weighted structural images of the same thickness are collected in the same session (TR=500, TE=14, FOV=200mm, 256 × 192mm matrix, 2NEX). The first four volumes of fMRI time series are discarded to discount T1 saturation effects. We have examined this dataset for a visuospatial task from one subject and one run. The window size we used is 3 × 3 × 1 × 160, where 160 is the total number of time points. We did not use an isotropic window since the voxel shape is not cubic. The 3 × 3 × 1 window covers a brain region whose physical size is almost isotropic 3 (9.4 × 9.4 × 10mm ). Visual comparisons in Fig. 5 with t-test results (on pre-smoothed data with empirically optimal FWHM=6.25mm) reveal that our SVR approach (Wmodel = 1) leads to: greater spatial extent and statistical significance in the intraparietal sulcus (IPS) with potentially better delineation and localization of the underlying spatial activation. Note: at the top of each slice in Fig. 5 are the respective p-value and t-value for threshold. When the same t-threshold used for SVR (t > 7.8) is used for the t-test, no activations are detected. For t-test in Fig. 5(c), we intentionally further decrease the t-threshold to t > 2.3 and try to detect more IPS activation regions, which, however, leads to a more blurred spatial extent rather than precisely localized spatial activation as in Fig. 5(a), as well as some false activations. The associated time course for an activation voxel from our SVR method for this data is shown in Fig. 6. (a): by SVR
ÿÿÿ
(b): by t-test
(c): by t-test
ÿþýüûúûûùøþ÷þöþõúôþóþþÿþýüûúûûùøþ÷þöþòúñþóþþÿþýüûúûûùøþ÷þöþñúðþóþ
Fig. 5. Results comparison for real fMRI data from a visuospatial task. (Color activation maps.)
Fig. 6. Time courses of an activation voxel for the real fMRI data in Fig. 5.
4 Conclusions and Discussions From a signal processing viewpoint, fMRI activation detection is a problem of nonlinear spatio-temporal system identification. We have presented a novel regression model involving spatio-temporal correlations using support vector regression. Many preprocessing procedures, such as smoothing, de-trending, and interpolation after motion
Nonlinear Estimation and Modeling of fMRI Data
657
estimation, required by other methods, are embedded within this unified framework. Experimental results on both simulated and real fMRI data revealed its effectiveness. The approach meets the need for reliable and sensitive fMRI signal analysis. A few comments on the particulars of our method are discussed below. In this paper, primarily, we discussed the method using block-design paradigms. However, the approach can be applied to event-related experiments as well. The exploration of the framework on event-related fMRI data is one of our future directions. The size of the spatial window within which we perform SVR is empirically set as 3 × 3 × 3 or 3 × 3 × 1. This not only ensures that the included area covers a brain region whose physical size is almost isotropic, but also allows for some spatial continuity while limiting the likelihood of heterogeneous activation within the same window. Correction for head motion involves rigid-body transformation estimation and resampling. Due to the thick image slices typical of fMRI, intensity interpolation, required during the resampling process, can introduce significant artifacts [12]. With our SVR approach, interpolation after motion estimation is avoided due to the use of continuous variables for both the input vectors and output scalar. The spatial coordinates and time indices used in the SVR learning can be any continuous (floating point) value. This advantage is not available for other methods. The ability of SVR to handle high dimensional input data makes it ideally suited for extensions to multi-run, multi-subject and multi-task studies. Our SVR formulation allows easy incorporation of data from multiple sessions by expanding the input vectors and analyzing the data over multiple runs and subjects together. Similar to the spatial and temporal indices, now we would have two additional dimensions for run and subject indices. This technique would account for between-run and betweensubject variability with likely increased statistical significance in activated regions. The multi-task problem can also be solved in a similar way i) using only one model function as that in Eq. (8), with 0 representing rest, and 1 representing all the designated tasks; or ii) expanding the input vectors with each function representing a specific task (i.e. 1), versus true rest and all other tasks (i.e. 0). In addition to exploring the above described issues, our future work also includes temporal model estimation from our SVR restored data without assuming a specific shape of the hemodynamic response function, combining this hemodynamic modeling to improve the specificity and sensitivity of fMRI signal detection, comparison with the General Linear Model for activation detection, incorporating spatial models such as information about the configuration of the activation regions and anatomic prior knowledge into the framework, as well as further validation. Acknowledgments. This work was partially supported by NIH grant EB000311. The authors would like to thank the anonymous reviewers for comments that considerably improved the quality of this paper.
References 1.
Backfriender, W., Baumgartner, R. Stamal, M., Moser, E., Bergmann, H.: Quantification of intensity variations in functional MR images using rotated principal components. Physics in Medicine and Biology 41 (1996) 1425–1438
658 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22.
Y.M. Wang et al. Birn, R.M., Saad, Z.S., Bandettini, P.A.: Spatial heterogeneity of the nonlinear dynamics in the fMRI BOLD response. Neuroimage 14 (2001) 817–826 Boulanouar, K., Roux, F., Celsis, P.: Modeling brain hemodynamic response in functional MRI using vector support method. In: Cognitive Neuroscience Society Annual Meeting (Abstract) (2001) Boynton, G.M., Engel, S.A. Glover, G.H., Heeger, D.J.: Linear systems analysis of functional magnetic resonance imaging in human V1. J. Neuroscience 16 (1996) 4207–4221 Collobert, R., Bengio, S.: SVMTorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research 1 (2001) 143–160 Constable, R.T., Skudlarski, P., Gore, J.C.: An ROC approach for evaluating functional brain MR imaging and postprocessing protocols. Magnetic Resonance in Medicine 34 (1995) 57–64 Descombes, X., Kruggel, F., von Cramon, D.Y.: fMRI signal restoration using a spatiotemporal Markov random field preserving transitions. Neuroimage 8 (1998) 340–349 Friston, K.J., et. al.: Statistical parametric maps in functional imaging: A general linear approach. Human Brain Mapping 2 (1995) 189–210 Friston, K.J., Holmes, A.P., Poline, J-B, Grasby, P.J., Williams, S.C.R., Frackowiak, R.S.J.: Analysis of fMRI time-series revisited. Neuroimage 2 (1995) 45–53 Golland, P., et. al.: Discriminative analysis for image-based studies. In: Intl. Conf. on Medical Image Computing and Computer-Assisted Intervention (2002) 508–515 Gretton, A., Doucer, A., Herbrich, R., Rayner, P.J.W., Scholkopf, B.: Support vector regression for black-box system identification. In: IEEE Workshop on Statistical Signal Processing (2001) 341–344 Grootoonk, S., Hutton, C., Ashburner, J., Howseman, A.M., Josephs, O., Rees, G., Friston, K.J., Turner, R.: Characterization and correction of interpolation effects in the realignment of fMRI time series. Neuroimage 11 (2000) 49–57 Hartvig, N.V., Jensen, J.L.: Spatial mixture modeling of fMRI data. Human Brain Mapping 11 (2000) 233–248 Johnson, R.A., Bhattacharyya, G.K.: Statistics: Principles and Methods. John Wiley & Sons, Inc. (2001) Katanoda, K., Matsuda, Y., Sugishita, M.: A spatio-temporal regression model for the analysis of functional MRI data. Neuroimage 17 (2002) 1415–1428 Laird, A.R., Rogers, B.P., Meyerand, M.E.: Investigating the nonlinearity of fMRI activation data. In: Proc. Second Joint EMBS / BMES Conference (2002) 23–26 Lang, N.: Statistical procedures for functional MRI. In: Moonen, C, Bandettini, P. (eds.): Functional MRI. Springer-Verlag (1999) 301–355 Li, Y., Gong, S., Liddell, H.: Support vector regression and classification based multiview face detection and recognition. In: Proc. Fourth IEEE Intl. Conf. on Automatic Face and Gesture Recognition (2000) 300–305 Marchini, J.L., Ripley, B.D.: A new statistical approach to detecting significant activation in functional MRI. Neuroimage 12 (2000) 366–380 McKeown, M., Makeig, S., Brown, G., Jung, T., Kindermann, S., Bell, A., Sejnowski, T.: Analysis of fMRI data by blind separation into independent spatial components. Human Brain Mapping 6 (1998) 160–188 Miller, K.L., Luh, W.M, Lie, T.L., Martinez, A., Obata, T., Wong, E.C., Frank, L.R., Buxton, R.B.: Nonlinear temporal dynamics of the cerebral blood flow response. Human Brain Mapping 13 (2001) 1–12 Mukherjee, S., Osuna, E., Girosi, F.: Nonlinear prediction of chaotic time series using support vector machines. In: Proc. IEEE Workshop on Neural Networks and Signal Processing Vol. VII (1997) 511–520
Nonlinear Estimation and Modeling of fMRI Data 23. 24. 25. 26. 27. 28.
659
Ogawa, S., Lee, T.M., Nayak, A.S., Glynn, P.: Oxygenation-sensitive contrast in magnetic resonance image of rodent brain of high magnetic fields. Magnetic Resonance in Medicine 14 (1990) 68–78 Schultz, R.T., et. al.: The role of the fusiform face area in social cognition: Implications for the pathobiology of autism. Phil. Trans. of the Royal Society, Series B, 358 (2003) 415–427 Skudlarski, P., Constable, R.T., Gore, J.C.: ROC analysis of statistical methods used in functional MRI: Individual subjects. Neuroimage 9 (1999) 311–329 Smola, A.J., Scholkopf, B.: A Tutorial on Support Vector Regression. NeuroCOLT Technical Report NC-TR-98-030, Royal Holloway College, University of London, UK (1998) Vapnik, V.N., Statistical Leaning Theory. John Wiley & Sons, New York (1998) Zarahn, E., Aguirre, G.K., D’Esposito, M.: Empirical analyses of BOLD fMRI statistics. I. Spatially unsmoohed data collected under null-hypothesis conditions. Neuroimage 5 (1997) 179–197
A Constrained Variational Principle for Direct Estimation and Smoothing of the Diffusion Tensor Field from DWI Z. Wang1 , B.C. Vemuri1 , Y. Chen2 , and T. Mareci3 1
Department of Computer & Information Sciences & Engr. 2 Department of Mathematics 3 Department of Biochemistry University of Florida, Gainesville, FL 32611
Abstract. In this paper, we present a novel constrained variational principle for simultaneous smoothing and estimation of the diffusion tensor field from diffusion weighted imaging (DWI). The constrained variational principle involves the minimization of a regularization term in an Lp norm, subject to a nonlinear inequality constraint on the data. The data term we employ is the original Stejskal-Tanner equation instead of the linearized version usually employed in literature. The original nonlinear form leads to a more accurate (when compared to the linearized form) estimated tensor field. The inequality constraint requires that the nonlinear least squares data term be bounded from above by a possibly known tolerance factor. Finally, in order to accommodate the positive definite constraint on the diffusion tensor, it is expressed in terms of cholesky factors and estimated. variational principle is solved using the augmented Lagrangian technique in conjunction with the limited memory quasi-Newton method. Both synthetic and real data experiments are shown to depict the performance of the tensor field estimation algorithm. Fiber tracts in a rat brain are then mapped using a particle system based visualization technique.
1
Introduction
Diffusion is a process of movement of molecules as a result of random thermal agitation and in our context, refers specifically to the random translational motion of water molecules in the part of the anatomy being imaged with MR. In three dimension, water diffusivity can be described by a 3 × 3 matrix D called diffusion tensor which is intimately related to the geometry and organization of the microscopic environment. General principle is that water diffuses preferably along ordered tissues e.g., the brain white matter. Diffusion tensor MRI is a relatively new MR image modality from which anisotropy of water diffusion can be inferred quantitatively [2], thus providing a method to study the tissue microstructure e.g., white matter connectivity in the brain in vivo. Diffusion weighted echo intensity image Sl and the diffusion tensor D are related through the Stejskal-Tanner equation [2] as given by: Sl = S0 e−bl :D = S0 e−
3
i=1
3
j=1
bl,ij Dij
This research was supported in part by the NIH grant NS42075.
C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 660–671, 2003. c Springer-Verlag Berlin Heidelberg 2003
(1)
A Constrained Variational Principle for Direct Estimation
661
where bl is the diffusion weighting of the l-th magnetic gradient, ":" denotes the generalized inner product for matrices. Taking log on both sides of equation (1) yields the following transformed linear Stejskal-Tanner equation: log(Sl ) = log(S0 ) − bl : D = log(S0 ) −
3 3
bl,ij Dij
(2)
i=1 j=1
Given several (at least seven) non-collinear diffusion weighted intensity measurements, D can be estimated via multivariate regression models from either of the above two equations. Diffusion anisotropy can then be computed to show microstructural and physiological features of tissues [3]. Especially in highly organized nerve tissue, like white matter, diffusion tensor provides a complete characterization of the restricted motion of water through the tissue that can be used to infer fiber tracts. The development of diffusion tensor acquisition, processing, and analysis methods provides the framework for creating fiber tract maps based on this complete diffusion tensor analysis [8,11]. For automatic fiber tract mapping, the diffusion tensor field must be smoothed without losing relevant features. Currently there are two popular approaches, one involves smoothing the raw data Sl while preserving relevant detail and then estimating the diffusion tensor D from the smoothed raw data (Parker et.al., [16,23]). The raw data in this context consists of several diffusion weighted images acquired for varying magnetic field strengths and directions. Note that at least seven values at each 3D grid point in the data domain are required to estimate the six unknowns in the symmetric 2-tensor D and one scale parameter S0 . The raw data smoothing or de-noising can be formulated using variational principles which in turn requires solution to PDEs or at times directly using PDEs which are not necessarily arrived at from variational principles (see [17,1,24,20, 7] and others in [6]). Another approach to restore the diffusion tensor field is to smooth the principal diffusion direction after the diffusion tensor has been estimated from the raw noisy measurements Sl . In Poupon et al. [19], an energy function based on a Markovian model was used to regularize the noisy dominant eigenvector field computed directly from the noisy estimates of D obtained from the measurements Sl using the linearized Stejskal-Tanner equation (2). Coulon et.al., [9] proposed an iterative restoration scheme for principal diffusion direction based on direction map restoration work reported in [21]. Other sophisticated vector field restoration methods [22,12,18] can potentially be applied to the problem of restoring the dominant eigen-vector fields computed from the noisy estimates of D. Recently, Chefd’Hotel et.al., [5] presented an elegant geometric solution to the problem of smoothing a noisy D that was computed from Sl using the log-linearized model (2) described above. They assume that the given (computed) tensor field D from Sl is positive definite and develop a clever approach based on differential geometry of manifolds to achieve constrained smoothing where the smoothed tensor field is constrained to be positive semi-definite. Interesting results of mapped fibers are shown for human brain MRI. We propose a novel formulation of the diffusion tensor field estimation and smoothing as a constrained optimization problem. The specific approach we use is called the augmented Lagrangian technique which allows one to deal with inequality constraints.
662
Z. Wang et al.
The novelty of our formulation lies in the ability to directly, in a single step process, estimate a smooth D from the noisy measurements Sl with the preservation of the positiveness constraint on D. The formulation does not require any adhoc methods of setting parameter values to achieve the solution. These are the key features distinguishing our solution method from methods reported in literature to date. In contrast to our solution (to be described subsequently in detail), most of the earlier approaches used a two step method involving, (i) computation of a D from Sl using a linear least-squares approach and then (ii) computing a smoothed D via either smoothing of the eigen-values and eigen-vectors of D or using the matrix flows approach in [5]. The problem with the two step approach to computing D is that the estimated D in the first step using the log-linearized model need not be positive definite or even semidefinite. Moreover, it is hard to trust the fidelity of the eigen values and vectors computed from such matrices even if they are to be smoothed subsequently prior to mapping out the nerve fiber tracts. Also, the noise model used in the log-linearized scheme is not consistent with the physics. Briefly, our model seeks to minimize a cost function involving, the sum of an Lp norm based gradient of the diffusion tensor D – whose positiveness is ensured via a Cholesky factorization to LLT – and an Lp norm based gradient of S0 , subject to a nonlinear data constraint based on the original (not linearized) Stejskal-Tanner equation (1). The model is posed as a constrained variational principle which can be minimized by either discretizing the variational principle itself or the associated Euler-Lagrange equation. We choose the former and use the augmented Lagrangian method together with the limited memory quasi-Newton method to achieve the solution. Rest of the paper is organized as follows: in section 2, the detailed variational formulation is described along with the nonlinear data constraints, the positive definite constraint and the augmented Lagrangian solution. Section 3 contains the detailed description of the discretization as well as the algorithmic description of the augmented Lagrangian framework. In section 4, we present experiments on application of our model to synthetic as well as real data. Synthetic data experiments are conducted to present comparison of tensor field restoration results with a recently presented work of Coulon et.al., [9]. Moreover, results of comparison between the use of the linearized StejskalTanner model and the nonlinear form of the same are presented as well.
2
Constrained Variational Principle Formulation
Our solution to the recovery of a piecewise smooth diffusion tensor field from the measurements Sl is posed as a constrained variational principle. We seek to minimize a measure of lack of smoothness in the diffusion tensor D being estimated using the Lp norm of its gradient. This measure is then constrained by a nonlinear data fidelity term related to the original Stejskal-Tanner equation (1). This nonlinear data term is constrained by an inequality which requires that it be bounded from above by a possibly known tolerance factor. The positiveness constraint on the diffusion tensor being estimated is achieved via the use of the Cholesky factorization theorem from computational linear algebra. The constrained variational principle is discretized and posed using the augmented Lagrangian technique [14]. The augmented Lagrangian is then solved using
A Constrained Variational Principle for Direct Estimation
663
the limited memory quasi-Newton scheme. The novelty of our formulation lies in the unified framework for recovering and smoothing of the tensor field from the data Sl . In addition, to our knowledge, this is the first formulation which allows for simultaneous estimation and smoothing of D as well as one in which the regularization parameter is not set in an adhoc way. The approach presented here describes a principled way to determine the regularization parameter. Let S0 (X) be the response intensity when no diffusion-encoding gradient is present, D(X) the unknown symmetric positive definite tensor, LLT (X) be the Cholesky factorization of the diffusion tensor with L being a lower triangular matrix, Sl , l = 1, .., N is the response intensity image measured after application of a magnetic gradient of known strength and direction and N is the total number of intensity images each corresponding to a direction and strength of the applied magnetic gradient. min E(S0 , L) = (|∇S0 (X)|p + |∇L(X)|p )dX Ω
subject to
C(S0 , L) = ασ 2 −
N
T
(Sl − S0 e−bl :LL )2 dX ≥ 0
(3)
Ω l=1
where Ω is the image domain, bl is the diffusion weighting of the l-th magnetic gradient, ":" is the generalized inner product of matrices. The first and the second terms in the variational principle are Lp smoothness constraint on S0 and L respectively, where p > 12/7 for S0 and p ≥ 1 for L. |∇L|p = d |∇Ld |p , where d ∈ D = {xx, yy, zz, xy, yz, xz} are indices to the six nonzero components of L. The lower bounds on the value of p are chosen so as to make the proof of existence of a subproblem solution for this minimization (see section 2.4) mathematically tractable. α is a constant scale factor and σ 2 is the noise variance in the measurements Sl . 2.1 The Nonlinear Data Constraint The Stejskal-Tanner equation (1) shows the relation between diffusion weighted echo intensity image Sl and the diffusion tensor D. However, multivariate linear regression based on equation (2) has been used to estimate the diffusion tensor D [2]. It was pointed out in [2] that these results agree with nonlinear regression based on the original StejskalTanner equation (1). However, if the signal to noise ratio (SNR) is low and the number of intensity images Sl is not very large (unlike in [2] where N = 315 or N = 294), the result from multivariate linear regression will differ from the nonlinear regression significantly. A robust estimator belonging to the M-estimator family was used by Poupon et.al., [19], however, its performance is not discussed in detail. In Westin et. al., [25]), an analytical solution is derived from equation (2) by using a dual tensor basis, however, it should be noted that this can only be used for computing the tensor D when there is no noise in the measurements Sl or the SNR is extremely high. Our aim is to provide an accurate estimation of the diffusion tensor D for practical clinical use, where the SNR may not be high and the total number of intensity images N is restricted to a moderate number. The nonlinear data fidelity term based on original Stejskal-Tanner equation (1) is fully justified for use in such situations. This nonlinear
664
Z. Wang et al.
data term is part of an inequality constraint that imposes an upper bound on the closeness T of the measurements Sl to the mathematical model S0 e−bl :LL . The bound ασ 2 may be estimated automatically from the measurements using any variance estimation methods from literature [13]. 2.2 The Lp Smoothness Constraint In Blomgren et.al., [4], it was shown that Lp smoothness constraint doesn’t admit discontinuous solutions as the TV-norm does when p > 1. However, when p is chosen close to 1, its behavior is close to the TV-norm for restoring edges. In our constrained model, we need p > 12/7 for regularizing S0 and p >= 1 for L to ensure existence of the solution. Note that what is of importance here is the estimation the diffusion tensor D and therefore, the edge-preserving property in the estimation process is more relevant for the case of D than for S0 . Hence, we choose an appropriate p for S0 and D permitted by the theorem below. In our experiment, we choose p = 1.705 for S0 and p = 1.00 for L. 2.3 The Positive Definite Constraint In general, a matrix A ∈ n×n is said to be positive definite if xT Ax > 0, for all x = 0 in n . The diffusion tensor D happens to be a positive definite matrix but due to the noise in the data Sl , it is hard to recover a D that retains this property unless one includes it explicitly as a constraint. One way to impose this constraint is using the Cholesky factorization theorem, which states that: If A is a symmetric positive definite matrix then, there exists a unique factorization A = LLT where, Lis a lower triangular matrix with positive diagonal elements. After doing the Cholesky factorization, we have transfered the inequality constraint on the matrix D to an inequality constraint on the diagonal elements of L. This is however still hard to satisfy theoretically because, the set on which the minimization takes place is an open set. However, in practise, with finite precision arithmetic, testing for a positive definiteness constraint is equivalent to testing for positive semi-definiteness. To answer the question of positive semi-definiteness, a stable method would yield a positive response even for nearby symmetric matrices. ˜ = D + E with E ≤ D , where is a small multiple of This is because, D the machine precision. Because, with an arbitrarily small perturbation, a semi-definite matrix can become definite, it follows that in finite precision arithmetic, testing for definiteness is equivalent to testing for semi-definiteness. Thus, we repose the positive definiteness constraint on the diffusion tensor matrix as, xT Dx ≥ 0 which is satisfied when D = LLT . 2.4
Comments on Existence of the Solution
Consider the augmented Lagrangian formulation which serves as a subproblem of (3): min
(S0 ,L)∈A
L(S0 , L; λ, µ) = E(S0 , L) − λC(S0 , L) +
1 2 C (S0 , L) 2µ
(4)
A Constrained Variational Principle for Direct Estimation
665
Where A = {(S0 , L) | L ∈ BV (Ω), Ld ∈ L2 (Ω), d ∈ D and S0 ∈ W 1,p (Ω), p > 12/7}. Here Ω ⊂ 3 , BV (Ω) denotes the space of bounded variation functions on the domain Ω, L2 (Ω) is the space of square integrable functions on Ω and W 1,p (Ω) denotes the Sobolev space of order p on Ω [10]. Theorem 1 Suppose Sl ∈ L4 (Ω), then the augmented Lagrangian formulation (4) has a solution (S0 , L) ∈ A. Proof Outline: We can prove the following: – Lower semi-continuity of the first term E(S0 , L) in (4). – Continuity of the second term C(S0 , L) in (4) for S0 ∈ W 1,p (Ω) when p > 6/5. – Continuity of the third term C 2 (S0 , L) in (4) for S0 ∈ W 1,p (Ω) when p > 12/7. (n)
(nk)
Thus, if (S0 , L(n) ) is a minimizing sequence, then it has a subsequence (S0 , L(nk) ) (nk) where L(nk) ) converges weakly in BV (Ω) and S0 converges weakly in W 1,p (Ω) when p > 12/7. From the compact embedding theorem [10], L(nk) converges strongly (nk) in L2 a.e. (almost everywhere) on Ω. Similarly, S0 converges strongly in L4 a.e. on Ω. Thus the minimizing sequence has a convergent subsequence, and the convergence is the solution of the minimization problem (4)([10]). Finding a solution of the constrained variation principle (3) involves solving a sequence of (4) with fixed λ and µ at each stage. It is much more difficult than when dealing with the problems of recovering and smoothing separately. However, there are benefits of posing the problem in this constrained unified framework, namely, one does not accumulate the errors from a two stage process. Moreover, this framework incorporates the nonlinear data term which is more appropriate for low SNR values prevalent when b is high. Also, the noise model is correct for the nonlinear data model unlike the log-linearized case. Lastly, in the constrained formulation, it is now possible to pose mathematical questions of existence and uniqueness of the solution – which was not possible in earlier formulations reported in literature.
3
Numerical Methods
We discretize the constrained variational principle (3) and then transform it into a sequence of unconstrained problems by using the augmented Lagrangian method and then employ the limited quasi-Newton technique [14] to solve them. Let T
Rl,ijk = Sl,ijk − S0,ijk e−bl :Lijk Lijk + + + 2 2 2 |∇S0 |ijk = (∆x S0 ) + (∆y S0 ) + (∆z S0 ) + |∇Ld |ijk = |∇L|pijk =
ijk
+ + 2 2 2 (∆+ x Ld ) + (∆y Ld ) + (∆z Ld ) +
d∈D
ijk
|∇Ld |pijk
,
d∈D
(5)
666
Z. Wang et al.
+ + Where ∆+ x , ∆y and ∆z are forward difference operators, is a small positive number used to avoid singularities of the Lp norm when p < 2. Now the discretized constrained variational principle can be written as: min E(S0 , L) = (|∇S0 |pijk + |∇L|pijk ) S0 ,L
i,j,k
subject to
C(S0 , L) = ασ 2 −
N
2 Rl,ijk ≥0
(6)
i,j,k l=1
The above problem is now posed using the augmented Lagrangian method, where a sequence of related unconstrained subproblems are solved, and the limit of these solutions is the solution to (6). Following the description in [14], the k-th subproblem of (6) is given by: min L(S0 , L, s; λk , µk ) = E(S0 , L) − λk (C(S0 , L) − s) +
1 (C(S0 , L) − s)2 (7) 2µk
where s ≥ 0 is a slack variable, µk , λk are the barrier parameter and the Lagrange multiplier for the k-th subproblem respectively. One can explicitly compute the slack variable s at the minimum from s = max(C(S0 , L) − µk λk , 0) and substitute it in (7) to get an equivalent subproblem in (S0 , L) given by:
=
min LA (S0 , L; λk , µk ) E(S0 , L) − λk C(S0 , L) +
E(S0 , L) −
µk 2 2 λk
1 2 2µk C (S0 , L)
if C(S0 , L) − µk λk ≤ 0 (8) otherwise
The following algorithm summarizes the procedure to find the solution for (6): Initialize S0 (0), L(0) using the nonlinear regression, choose initial µ0 and λ0 . for k = 1, 2, ... Find approximate minimizer S0 (k), L(k) of LA (·, ·; λk , µk ) starting with S0 (k − 1), L(k − 1); If final convergence test is satisfied STOP with approximate solution S0 (k), L(k); Update Lagrange multiplier using λk+1 = max(λk − C(S0 , L)/µk , 0); Choose new penalty parameter µk+1 = µk /2; Set new starting point for the next iteration to S0 (k), L(k); endfor Due to the large number of unknown variables in the minimization, we solve the subproblem using limited memory Quasi-Newton technique. Quasi-Newton like methods compute the approximate Hessian matrix at each iteration of the optimization by using only the first derivative information. In Limited-Memory Broyden-Fletcher-GoldfarbShano (BFGS), search direction is computed without storing the approximated Hessian matrix. Details can be found in Nocedal et.al., [14].
A Constrained Variational Principle for Direct Estimation
4
667
Experimental Results
In this section, we present two sets of experiments on the application of our smoothing tensor estimation model. One is on synthetic data sets and the other is on a real data set consisting of a DWI acquired from a normal rat brain. We synthesized an anisotropic tensor field on a 3D lattice of size 32x32x8. The volume consists of two homogeneous regions with the following values for S0 and D: Region1 : S0 = 10.00 D = 0.001 × [0.9697 1.7513 0.8423 0.0 0.0 0.0] Region2 : S0 = 8.33 D = 0.001 × [1.5559 1.1651 0.8423 0.3384 0.0 0.0] Where the tensor D is depicted as [dxx , dyy , dzz , dxy , dxz , dyz ], the dominant eigen vector of the first region is along the y axis, while that of the second region is in the xy plane and inclined at 60 degrees to the y axis. The diffusion weighted images Sl are generated using the Stejskal-Tanner equation at each voxel X given by: Sl (X) = S0 (X)e−bl :D(X) + n(X),
n(X) ∼ N (0, σN )
(9)
where N (0, σN ) is a zero mean Gaussian noise with standard deviation σN . We choose the 7 commonly used gradient configurations ([25]) and 3 different field strengths in each direction for bl values. Figure 1 shows the results for a synthetic data set with σN = 1.5. We display the dominant eigen vectors computed from the original and the restored diffusion tensor field using the following methods: (i) Linear - linear regression from (2) as in [2], (ii) Nonlinear - nonlinear regression from (1), (iii) Linear + EVS (Eigen vector smoothing) - linear regression followed by the dominant eigen vector smoothing method described in Coulon et.al., [9], (iv) Nonlinear + EVS - nonlinear regression plus the smoothing as in (iii), and (v) Our method. It is evident from this figure that our new model yields very good estimates of the dominant eigen vector field. The method in Coulon et.al., however, will not work well at voxels where the estimated dominant eigen vectors are almost orthogonal to the ones in their neighborhoods. In such cases, Coulon et.al.’s method will treat them as discontinuities and does not smooth them. Though it is possible to treat these locations as outliers in Coulon et.al.’s method, it is difficult to set a reasonable criteria. Additional quantitative measures (described below) also show the superior performance of our model. To quantitatively assess the proposed model, we compare the accuracy of the dominant eigen vector computed from previously mentioned methods. Let θ be the angle(in degrees) between the dominant eigen vector of the estimated diffusion tensor field and the dominant eigen vector of the original tensor field, table 1 shows the mean (µθ ) and standard deviation(σθ ) of θ using different methods for the synthetic data with different levels of additive Gaussian noises. A better method is one that yields smaller values. From this table, we can see the our model yields lower values than all other methods under various noise levels. It is also clear from this table that methods using the original nonlinear Stejskal-Tanner equation (1) are more accurate than those using the linearized one (2). The advantage of our method and the nonlinear approaches are more apparent when the noise level is higher, which supports our discussion in section 2.1.
668
Z. Wang et al.
Fig. 1. A slice of the results for the synthetic data with σN = 1.5: Top left image is the dominant eigen vector computed from original tensor field, and the other images arranged from left to right, top to bottom are the dominant eigen vectors computed from estimated tensor field using the following methods: linear regression, nonlinear regression, linear + EVS, nonlinear+EVS and our model. Table 1. Comparison of the accuracy of the estimated dominant eigen vectors using different methods for different noise levels. σn = 0.5 Linear Nonlinear Linear+EVS Nonlinear+EVS Our method µθ 10.00 8.22 1.76 1.46 0.76 σθ 7.29 5.90 2.38 1.44 1.17 σn = 1.0 Linear Nonlinear Linear+EVS Nonlinear+EVS Our method µθ 22.69 18.85 6.87 4.75 2.19 σθ 17.77 15.15 14.59 10.64 2.52 σn = 1.5 Linear Nonlinear Linear+EVS Nonlinear+EVS Our method µθ 34.10 30.26 16.29 12.50 6.47 σθ 23.11 22.22 24.37 22.19 9.58
The normal rat brain data we used here has 21 diffusion weighted images measured using the same configuration of bl as in the previous example, each image is a 128x128x78 volume data. We extract 10 slices in the region of interest, namely the
A Constrained Variational Principle for Direct Estimation
(a)
669
(b)
Fig. 2. (a)Results of normal rat brain estimated using multivariate linear regression without smoothing. (b) Results of normal rat brain estimated using our proposed method. Both (a) and (b) are arranged as following: First row, left to right: Dxx , Dxy and Dxz . Second row, left to right: S0 , Dyy and Dyz . Third row, left to right: F A, < D > and Dzz .
corpus callosum, for our experiment. Figure 2(b) depicts images of the six independent components of the estimated diffusion tensor, the computed FA, the trace(D) and S0 (echo intensity without applied gradient) obtained using our proposed model. As a comparison, figure 2(a) shows the same images computed using linear least square fitting based on the linearized Stejskal-Tanner equation from the raw data. For display purposes, we use the same brightness and contrast enhancement for displaying the corresponding images in the two figures. The effectiveness of edge preservation in our method is clearly evident in the off-diagonal components of D. In addition, fiber tracts were estimated as integral curves of the dominant eigen vector field of the estimated D and is visualized using the particle systems technique (Pang et.al [15]). The mapped fiber tracts are found to follow the expected tracts quite well from a neuroanatomical perspective as shown in figure 3. In the above presented results, what is to be noted is that we have demonstrated a proof of concept for the proposed simultaneous recovery and smoothing of D in the case of the synthetic data and the normal rat brain respectively. The quality of results obtained for the normal rat brain is reasonably satisfactory for visual inspection purposes, however intensive quantitative validation of the mapped fibers needs to be performed and will be the focus of our future efforts.
5
Conclusions
We presented a novel constrained variational principle formulation for simultaneous smoothing and estimation of the positive definite diffusion tensor field from diffusion
670
Z. Wang et al.
Fig. 3. Rat brain fiber tracts in and around the corpus callosum visualized using particle systems superimposed on the S0 image. The particles are shown in bright orange. Left: Intermediate frame of an animation sequence depicting fiber growth; Right: the last frame of the same sequence from a different viewpoint.
weighted images (DWI). To our knowledge, this is the first attempt at simultaneous smoothing and estimation of the positive definite diffusion tensor field from the raw data. We used the Cholesky decomposition to incorporate the positive definiteness constraint on the diffusion tensor to be estimated. The constrained variational principle formulation is transformed into a sequence of unconstrained problems using the augmented Lagrangian technique and solved numerically. Proof of the existence of a solution for each problem in the sequence of unconstrained problems is outlined. Results of comparison between our method and a competing scheme [9] are shown for synthetic data under different situations involving the use of linearized and nonlinear data acquisition models depicting the influence of the choice of the data acquisition model on the estimation. It was concluded that using the nonlinear data model yields better accuracy in comparison to the log-linearized model. Also, the superiority of our method in estimating tensor field over the chosen competing method was demonstrated for the synthetic data experiment. Finally, fiber tract mapping of a normal rat brain were depicted using a particle system based visualization scheme. The estimated diffusion tensors are quite smooth without losing essential features when inspected visually. However, quantitative validation of the estimated fibers is essential and will be the focus of our future efforts. Acknowledgments. Authors would like to thank Timothy McGraw for generating the fiber maps and Evren Ozarslan for discussions on the physics of imaging.
References 1. L.Alvarez, P. L. Lions, and J. M. Morel, “Image selective smoothing and edge detection by nonlinear diffusion. ii,” SIAM J. Numer. Anal., vol. 29, no. 3, pp. 845–866, June 1992. 2. P. J. Basser, J. Mattiello and D. Lebihan, “Estimation of the Effective Self-Diffusion Tensor from the NMR Spin Echo,” J. Magn. Reson., series B 103, pp. 247–254, 1994. 3. P. J. Basser and C. Pierpaoli, “Microstructural and Physiological Features of Tissue Elucidated by Quantitative-Diffusion-Tensor MRI,” J. Magn. Reson., series B 111, pp. 209–219, 1996. 4. P. Blomgren, T.F. Chan and P. Mulet, “Extensions to Total Variation Denoising,”, Tech. Rep.97–42, UCLA, September 1997.
A Constrained Variational Principle for Direct Estimation
671
5. C. Chefd’hotel, D. Tschumperle’, Rachid Deriche, Olivier D. Faugeras, “Constrained Flows of Matrix-Valued Functions: Application to Diffusion Tensor Regularization,” ECCV, Vol. 1, pp. 251–265, 2002. 6. V. Caselles, J. M. Morel, G. Sapiro, and A. Tannenbaum, IEEE TIP, special issue on PDEs and geometry-driven diffusion in image processing and analysis, Vol 7, No. 3, 1998. 7. T.F. Chan, G. Golub, and P. Mulet, “A nonlinear primal-dual method for TV-based image restoration,” in Proc. 12th Int. Conf. Analysis and Optimization of Systems: Images, Wavelets, and PDE’s, Paris, France, June 26–28, 1996, M. Berger et.al., Eds., no. 219, pp. 241–252. 8. T.E. Conturo, N.F. Lori, T.S. Cull, E. Akbudak, A.Z. Snyder, J.S. Shimony, R.C. McKinstry, H. Burton, and M.E. Raichle, “Tracking neuronal fiber pathways in the living human brain,” in Proc. Natl. Acad. Sci. USA 96, 10422–10427 (1999) 9. O. Coulon, D.C. Alexander and S.R. Arridge, “A Regularization Scheme for Diffusion Tensor Magnetic Resonance Images,” IPMI, 2001, pp. 92–105, Springer-Verlag . 10. L.C. Evans, “Partial Differential Equations,” Graduate Studies in Mathematics, American Mathematical Society, 1997. 11. D.K. Jones, A. Simmons, S.C.R. Williams, and M.A. Horsfield, “Non-invasive assessment of axonal fiber connectivity in the human brain via diffusion tensor MRI,” Magn. Reson. Med., 42, 37–41 (1999). 12. R. Kimmel, R. Malladi and N.A. Sochen, “Images as Embedded Maps and Minimal Surfaces: Movies, Color, Texture, and Volumetric Medical Images,” IJCV, 39(2), pp. 111–129, 2000. 13. Bernard W.Lindgren Statistical Theory, Chapman & Hall/CRC, 1993. 14. J. Nocedal and S.J. Wright, Num. Optimization, Springer, 2000. 15. A. Pang and K. Smith, “Spray Rendering: Visualization Using Smart Particles,” IEEE Visualization 1993 Conference Proceedings, 1993, pp. 283–290. 16. G.J.M. Parker, J.A. Schnabel, M.R. Symms, D.J. Werring, and G.J. Baker, “Nonlinear smoothing for reduction of systematic and random errors in diffusion tensor imaging,” Magn. Reson. Imag. 11, 702–710, 2000. 17. P.Perona and J.Malik, “Scale-space and edge detection using anisotropic diffusion,” IEEE TPAMI, vol. 12, no. 7, pp. 629–639, 1990. 18. Perona, P., “Orientation diffusions,” IEEE TIP, vol.7, no.3, pp. 457–467, 1998. 19. C. Poupon, J.F. Mangin, C.A. Clark, V. Frouin, J. Regis, D.Le Bihan and I. Block, “Towards inference of human brain connectivity from MR diffusion tensor data,”, Med. Image Anal., vol. 5, pp. 1–15, 2001. 20. L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear variation based noise removal algorithms,” Physica D, vol. 60, pp. 259–268, 1992. 21. B. Tang, G. Sapiro and V. Caselles, ”Diffusion of General Data on Non-Flat Manifolds via Harmonic Maps Theory: The Direction Diffusion Case,” IJCV, 36(2), pp. 149–161, 2000. 22. D. Tschumperle and R. Deriche. “Regularization of orthonormal vector sets using coupled PDE’s,” Proceedings of IEEE Workshop on Variational and Level Set Methods in Computer Vision, pp. 3–10, July 2001 23. B. C. Vemuri, Y. Chen, M.Rao, T.McGraw, Z. Wang and T.Mareci, “Fiber Tract Mapping from Diffusion Tensor MRI,” Proceedings of IEEE Workshop on Variational and Level Set Methods in Computer Vision, pp. 81–88, July 2001 24. J.Weickert, “A review of nonlinear diffusion filtering,” Scale-space theory in computer vision, 1997, vol. 1252, of Lecture Notes in Computer Science, pp. 3–28, Springer-Verlag. 25. C.F. Westin, S.E. Maier, H. Mamata A. Nabavi, F.A. Jolesz and R. Kikinis, “Processing and visualization for diffusion tensor MRI,” Med. Image Anal., vol. 6, pp. 93–108, 2002.
672
Persistent Angular Structure: New Insights from Diffusion MRI Data. Dummy Version Kalvis M. Jansons1 and Daniel C. Alexander2 1
Department of Mathematics. Department of Computer Science. University College London, Gower Street, London, WC1E 6BT, UK. [email protected], [email protected] 2
Abstract. We determine a statistic called the radially Persistent Angular Structure (PAS) from samples of the Fourier transform of a threedimensional function. The method has applications in diffusion magnetic resonance imaging (MRI), which samples the Fourier transform of the probability density function of particle displacements. The persistent angular structure is then a representation of the relative mobility of particles in each direction. In combination, PAS-MRI computes the persistent angular structure at each voxel of an image. This technique has biomedical applications, where it reveals the orientations of microstructural fibres, such as white-matter fibres in the brain. We test PAS-MRI on synthetic and human brain data. The data come from a standard acquisition scheme for diffusion-tensor MRI in which the samples in each voxel lie on a sphere in Fourier space.
1
Introduction
Diffusion magnetic resonance imaging (MRI) measures the displacements of particles that are subject to Brownian motion within a sample of material. The microstructure of the material determines the mobility of these particles, which is normally directionally dependent. The anisotropy of particle displacements gives information about the anisotropy, on a microscopic scale, of the material in which the particles move. The probability density function of particle displacements p reflects the anisotropy of particle displacements. Diffusion MRI and, in particular, diffusion tensor magnetic resonance imaging (DT-MRI) [1] is popular in biomedical imaging because of the insight it provides into the microstructure of biological tissue [2,3]. In biomedical applications, the particles are usually water molecules. Water is a major constituent of many types of living tissue and water molecules in tissue are subject to Brownian motion, i.e. the random motion driven by thermal fluctuations. The microstructure of the tissue determines the mobility of water molecules within the tissue. Our primary interest is in using diffusion MRI to probe the microstructure of brain tissue. However, the new approach we introduce here could be used much more widely. White matter in the brain contains bundles of parallel axon fibres. At the diffusion lengthscale applicable in MRI, the average displacement of water molecules along the axis of the fibres is larger than in other directions. The C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 672–683, 2003. c Springer-Verlag Berlin Heidelberg 2003
Persistent Angular Structure
673
function p has ridges in the fibre directions in a material consisting solely of parallel fibres on a microscopic scale. Diffusion-tensor MRI, which we discuss further in section 2, is a standard technique in diffusion MRI. In DT-MRI, we assume a Gaussian profile for p. However, this simple model is often a poor approximation in material with complex microstructure, for example, in areas of the brain where white-matter fibres cross. Here we introduce a technique called PAS-MRI. In PAS-MRI, we determine a feature of p, which we call the persistent angular structure (PAS). The persistent angular structure represents the relative mobility of particles in each direction, which can have many peaks. In brain imaging, these peaks can be used to determine the orientation of white-matter fibres. The new technique can resolve the directions of crossing fibres, which DT-MRI cannot. We demonstrate this in simulation and show results from white-matter fibre-crossings in the human brain. The human brain data was acquired originally for DT-MRI using a clinical acquisition sequence. In section 2, we provide some background on diffusion MRI and review current techniques. We use an algorithm based on the maximum entropy method to define the persistent angular structure of p in section 3. In section 4, we test PAS-MRI on synthetic data and then show results from the human brain and we conclude in section 5. We cover this work in greater detail in [4].
2
Background
In diffusion MRI, measurements are commonly made using the Stejskal–Tanner pulsed gradient spin echo (PGSE) method [5], which samples the Fourier transform of p. Two magnetic field gradient pulses of duration δ and separation ∆ are introduced to the simple spin-echo sequence. Assuming rectangular pulse profiles, the associated wavenumber is q = γδg, where g is the component of the gradient in the direction of the fixed field B0 and γ is the gyromagnetic ratio. The MRI measurement A (q; X, ∆) is defined at each location X of a finite, regular image grid in three-dimensional space and depends on the wavenumber, q, and the pulse separation, ∆. In PAS-MRI, we treat each voxel separately and use a fixed ∆. We therefore drop the dependence of A on ∆ and X from the notation. The normalized MRI measurement A(q) is given by A(q) = (A (0))−1 A (q). When ∆−1 δ is negligible [6], A(q) =
(1)
R3
p(x) exp(iq · x)dx,
where x is the particle displacement and the domain of p is R3 .
(2)
674
K.M. Jansons and D.C. Alexander
When ∆−1 δ is not negligible, which is common in practice, the main effect is to convolve the Fourier transformation with a simple compact function [7]. However, this effect and other similar effects due to non-zero ∆−1 δ do not change the principal directions extracted in PAS-MRI, but merely slightly broaden the peaks seen in the reconstructed angular structure. One way to proceed is to measure A at each location on a regular grid of wavenumbers, q, and then to use a fast Fourier transform (FFT) algorithm to obtain values of p on a discrete set of displacements, x. This approach is called “q-space imaging” and is used mostly to probe p in one dimension, as in [6]. Wedeen et al [8] succeed in acquiring data from healthy volunteers for a direct FFT inversion in three dimensions. They use 500 measurements of A(q) in each scan. We shall refer to the motion of particles by diffusion alone in a medium for which the diffusion tensor is constant and homogeneous as simple diffusion. In simple diffusion, the probability density function of particle displacements has a Gaussian profile [9] so that p(x) = G(x; D, t), where 1 xT D−1 x , G(x; D, t) = ((4πt)3 det(D))− 2 exp − 4t
(3)
D is the diffusion tensor and t is the diffusion time. In DT-MRI, the apparent diffusion tensor is determined on the assumption of simple diffusion. With the simple diffusion model, (2) gives A(q) = exp(−tqT Dq).
(4)
Taking the logarithm of (4) we see that each A(q) provides a linear constraint on the elements of D. To fit the six free parameters of D, we need a minimum of six independent measurements of A(q). Because of the effects of noise, acquiring more than six A(q) provides a better approximation of D. A common approach [10] is to acquire M measurements of A (0) together with ˆj measurements A (qj ), 1 ≤ j ≤ N , for which δ, ∆, and |q| are fixed, but each q is unique. Then A(qj ) = (A¯ (0))−1 A (qj ), where A¯ (0) is the geometric mean of the A (0) measurements. In the literature, 1 ≤ M ≤ 20 and N < 100. Thus DT-MRI requires fewer measurements of A (q) than the three-dimensional qˆj space technique of Wedeen et al [8] and smaller voxel sizes can be used. The q are chosen so that they are well separated on the sphere and are not directionally ˆ j are the biased. Polyhedral configurations are a popular choice in which the q directions of the vertices of regular polyhedra from the centre of the polyhedron ˆ j that minimize the electrostatic energy with [11]. Another option is to use q equal charges and often with the additional constraint that the points are in equal and opposite pairs. Hasan et al [11] compare several different approaches ˆj . to choosing the q A drawback of DT-MRI is that the simple diffusion model is poor in many regions, in particular those in which fibres cross within a voxel. Observations of departures from simple diffusion have been noted in the literature [12,13,14].
Persistent Angular Structure
675
One alternative to the simple diffusion model has p as a mixture of Gaussian densities [12,15]: p(x) = Mn (x; D1 , · · · , Dn ; a1 , · · · , an ), where Mn (x; D1 , · · · , Dn ; a1 , · · · , an ) = each ai ∈ [0, 1] and
3
i
n
ai G(x; Di , t),
(5)
(6)
i=1
ai = 1.
Method
We use a method based on the principle of maximum entropy [16] to obtain an expression for the function that contains the least information subject to the constraints from the data. We use −S[p], where S is the entropy [16], for the information content of a probability density function p defined over a set Ω. Thus the mathematical expression for the information content of p is p(x) ln p(x) dx. (7) I[p] = Ω
Since we have only Fourier components of p for a small set of points, which may lie on a sphere in Fourier space, we are unable to extract any useful information about the radial structure of p, and, in any case, the greatest interest is in the angular structure. To extract useful information about the angular structure in a computationally efficient way, we shall restrict attention to determining a probability density function of the form: p(x) = pˆ(ˆ x)r−2 δ(|x| − r),
(8)
ˆ where δ is the standard one-dimensional δ distribution, r is a constant and x is a unit vector in the direction of x. In a sense, we are projecting the angular structure from all radii onto the sphere of radius r, and ignoring any information about the radial structure in the data, which is often very limited. The final result is weakly dependent on the choice of r but the important features of the angular structure are not, provided we are inside the range of r for which the method is reliable. We shall determine a function of the above form that has minimum relative information with respect to p0 = (4πr2 )−1 δ(|x| − r) and subject to the constraints of the data. We call pˆ the (radially) Persistent Angular Structure (PAS). The domain of pˆ is the unit sphere, as it represents only orientational information. The relative information of the probability density function p with respect to the probability density function p0 is given by p(x) dx. (9) I[p; p0 ] = p(x) ln p0 (x) Ω
676
K.M. Jansons and D.C. Alexander
The constraints on p from the data can be incorporated into the expression above using the method of Lagrange multipliers to yield N (λj exp(iqj · rˆ x)) − pˆ(ˆ x)µ dˆ x, (10) I[ˆ p] = pˆ(ˆ x) ln pˆ(ˆ x) − pˆ(ˆ x) j=1
where qj , 1 ≤ j ≤ N , are the non-zero wavenumbers for the MRI measurements, the λj are Lagrange multipliers for the constraints from the data and the Lagrange multiplier µ controls the normalization of pˆ. The integral in (10) is taken over the unit sphere. The information content, I[ˆ p], has a unique minimum at N pˆ(ˆ x) = exp λ0 + λj exp(iqj · rˆ x) . (11) j=1
We need to solve:
x)dˆ x = A(qj ), pˆ(ˆ x) exp(iqj · rˆ
(12)
for the λi , where the integral is taken over the unit sphere. In PAS-MRI, we use a Levenberg–Marquardt algorithm [17] to find the persistent angular structure, pˆ, by fitting the λj in (11) to the data using (12). Full details of our implementation of PAS-MRI are in section 5 of [4].
4
Experiments and Results
In this section, we demonstrate PAS-MRI using the common DT-MRI scheme where the qj for the data are on a sphere. For convenience, we non-dimensionalize all lengths with |q|−1 , which is a natural lengthscale. We also assume x → −x symmetry of p and equation (11) thus becomes N λj cos(ˆ qj · rˆ x) . pˆ(ˆ x) = exp λ0 + j=1
In a more general setting where the qj do not lie on a sphere, a typical or average wavenumber would be used for the non-dimensionalization. Data sets from healthy volunteers were provided by the University College London Institute of Neurology together with details of their acquisition sequence [18]. All subjects were scanned with the approval of the joint National Hospital and Institute of Neurology ethics committee and gave informed written consent. For these data sets, M =3, N =60. Also ∆ = 0.04 s, δ = 0.032 s and |g| = ˆ j come from Jones’ software 0.022 Tm−1 , which gives |q| = 1.9 × 105 m−1 . The q that implements the method in [10]. Unfortunately, this point set is slightly anisotropic, but our method appears robust to this.
Persistent Angular Structure
677
The data sets contain 42 axial slices through the brain evenly spaced at intervals of 2.5 × 10−3 m. Each row of each slice of the acquisition undergoes a linear phase correction step. In each slice, the 62 × 96 array of measurements is padded with zeros to a 128 × 128 grid from which an image is created via a FFT. After the Fourier transform, residual phase errors are discarded by taking the modulus of the complex value in each voxel. In each slice, the voxels are 1.7 × 10−3 m squares. Finally, an image registration scheme based on Automated Image Registration [19] improves the spatial alignment of the acquisition volumes.
Fig. 1. Test functions pi , 0 ≤ i ≤ 4, plotted over spheres of radii r|q|−1 , with r = 1.2, together with plots of each pˆi (·; r) for r = 0.4, 1.2, 2.4.
We test PAS-MRI on a variety of synthetic data and then show results from human brain data. We present more extensive results in [4].
678
4.1
K.M. Jansons and D.C. Alexander
Synthetic Data Experiments
Given a function p with Fourier transform F , we generate a synthetic MRI measurement A (q) by setting A (q) = |F (q) + c|,
(13)
where the real and imaginary parts of c ∈ C are independent identically distributed random variables with distribution N (0, σ 2 ). For a particular test function, we generate M synthetic A (0) measurements and a single A (qj ) for each j = 1, · · · , N . We then compute A¯ (0) and A(qj ). Unless otherwise stated, we emulate the sequence used to acquire the human brain data so that M = 3, ˆ j , we use one point from N = 60, |q| = 1.9 × 105 m−1 , t = ∆ = 0.04 s. For the q every pair in an electrostatic point set with sixty equal and opposite pairs. These ˆ j used in the human brain data and provide a ˆ j are more isotropic than the q q fairer test of the new method, but we obtain similar results from both sets. We use the following five test functions to test PAS-MRI: p0 (x) = M1 (x; A0 ; 1), p1 (x) = M1 (x; A1 ; 1), p3 (x) = M2 (x; A1 , A2 ; 12 , 12 ), p2 (x) = M1 (x; A4 ; 1), 1 1 1 p4 (x) = M3 (x; A1 , A2 , A3 ; 3 , 3 , 3 ), where Mn is defined in (6), A1 = 10−10 diag(17, 2, 2) m2 s−1 , A0 = 7 × 10−10 I m2 s−1 , −10 2 −1 A2 = 10 diag(2, 17, 2) m s , A3 = 10−10 diag(2, 2, 17) m2 s−1 , −10 diag(8.5, 8.5, 2) m2 s−1 . A4 = 10 We use pˆ(·; r) for the persistent angular structure found with parameter r from data synthesized from a test function p. Figure 1 shows plots of each pi , 0 ≤ i ≤ 4, over the sphere of radius r|q|−1 for r = 1.2 and pˆi (·; r) found from noise-free data for r = 0.4, 1.2, and 2.4. We do not expect pˆ(·; r) to have exactly the same shape as p on the sphere of radius r|q|−1 , but the peaks and troughs should coincide. The persistent angular structure tends to have sharper peaks, as it extracts the angular structure of p that persists radially. The wavelength of the basis functions of pˆ(·; r) decreases with increasing r. At low r, we have insufficient resolution to resolve the interesting angular structure. At high r the resolution is too fine to be supported by the data. As we can see in figure 1, when we set r too low, pˆ(·; r) has too few principal directions and extra principal directions appear in pˆ(·; r) when r is set too high. At r = 1.2, the principal directions are consistent with those of the test functions and results in [4] show that we observe a range of values of r for which pˆ(·; r) is consistent with p. Figure 2 shows how noise in the simulated measurements affects the persistent angular structure. We define (14) s = σ −1 F (0)
Persistent Angular Structure
679
as the signal to noise ratio of the measurement at q = 0. We generate synthetic data for each test function with s ∈ {4, 16, 32, ∞}.
Fig. 2. Shows pˆi (·, 1.2), 0 ≤ i ≤ 4, found from synthetic data with increasing noise level. In the first row, we show the test functions plotted over the sphere of radius r|q|−1 , with r = 1.2. In the second row, we show the corresponding pˆi (·; 1.2) found from noise-free data. The remaining rows show pˆi (·; 1.2) from data with s ∈ {4, 16, 32}.
The persistent angular structure extracted from the data is determined by both the signal and the noise, though the choice of r can be used to control the smoothness of the reconstruction to some extent. We can consider the signal to noise ratio of the angular structure, which is distinct from the notion of signal to noise ratio of the MRI measurements. When the angular structure is
680
K.M. Jansons and D.C. Alexander
weak, as for p0 and p2 , its signal to noise ratio may be low even if the signal to noise ratio of the MRI measurements is high. This often occurs for example in tissue that is microscopically isotropic. When the signal to noise ratio of the angular structure is low the angular structure of the noise dominates the persistent angular structure, as we see in figure 2. When the angular structure is strong, as it is for p1 , p3 and p4 , the noise usually has little effect. 4.2
Human Brain Data Experiments
In this section, we show results from PAS-MRI applied to a human brain data set. The persistent angular structure is most useful in regions where the simple diffusion model is a poor approximation. Here we use PAS-MRI with an algorithm for deciding when the simple diffusion model is adequate. We use the voxel classification algorithm of Alexander, et al [14] with published parameter settings. Figure 3 shows the fractional anisotropy, [2], for two axial slices through the human brain data set. Slice (a) is at the level of the pons and slice (b) contains parts of the corpus callosum.
(a)
(b)
Fig. 3. Maps of the fractional anisotropy in two axial slices from a human brain data set. Figures 4 and 5 show the regions of interest outlined on the maps in (a) and (b), respectively.
Figures 4 and 5 show PAS-MRI results in regions of interest in the slices in figure 3. The regions of interest are outlined on the maps in figure 3. The region of interest in slice (a) of figure 3 contains a cluster of voxels in which the simple diffusion model is not an adequate approximation. Figure 4 shows the persistent angular structure in each of these voxels. The region of interest in slice (b) of figure 3 contains mostly voxels for which the simple diffusion model is adequate. Figure 5 shows the persistent angular structure in all voxels of this region.
Persistent Angular Structure
681
Fig. 4. Persistent angular structure in each voxel within a region of interest in the pons for which the simple diffusion model is a poor approximation. The region of interest is outlined on figure 3(a).
Fig. 5. Persistent angular structure in each voxel within a region of interest in the corpus callosum. The region of interest is outlined on figure 3(b).
In the area of the pons shown in figure 4, two white-matter fibres cross. The transpontine tract (left–right) crosses the pyramidal tract (inferior–superior). The orientations of these crossing fibres is clearly elucidated by the persistent angular structure in this region.
682
K.M. Jansons and D.C. Alexander
The fibres of the corpus callosum are approximately parallel and we can see from figure 5 that within the corpus callosum the persistent angular structure mostly has a single principal direction along the axis of the fibres. The region of interest used for figure 5 also contains some voxels from areas of grey matter (e.g. the top row of the region of interest) and cerebro-spinal fluid (e.g. the bottom left corner of the region of interest). In these regions, p is more isotropic and the principal directions that appear in the persistent angular structure come mostly from the noise in the data, as highlighted by the simulation results in figure 2.
5
Discussion
We have introduced a new diffusion MRI technique called PAS-MRI. In PASMRI, we find the persistent angular structure of the probability density function of particle displacements, p, from MRI measurements of the Fourier transform of p. We have demonstrated the technique using data acquired for DT-MRI using a standard imaging sequence in which the measurements lie on a sphere in Fourier space. In simulation, we find that the principal directions of the persistent angular structure reflect the directional structure of several test functions. In clinical applications, we aim to use the persistent angular structure to determine white-matter fibre directions. Early results for human brain data are promising. This new technique has the advantage over existing techniques, such as DTMRI and three-dimensional q-space imaging, that it can resolve fibre directions at crossings, but still requires only a modest number of MRI measurements. We plan to consider, elsewhere, extensions of PAS-MRI. We believe one way in which the current approach could be improved is to consider the optimal placement of the observational points, i.e. the set of qj used in the MRI measurements. In particular, the results are likely to improve for samples not restricted to a sphere in Fourier space. We hope others will investigate the extent to which PAS-MRI can highlight the early signs of diseases, such as multiple sclerosis. Acknowledgements. The authors would like to thank Gareth Barker and Claudia Wheeler Kingshott at the Institute of Neurology, UCL, for providing the human brain data used in this work.
References 1. Basser P J, Matiello J and Le Bihan D 1994 MR diffusion tensor spectroscopy and imaging Biophysical Journal 66 259–67 2. Basser P J and Pierpaoli C 1996 Microstructural and physiological features of tissues elucidated by quantitative diffusion tensor MRI Journal of Magnetic Resonance Series B 111 209–19 3. Pierpaoli C, Jezzard P, Basser P J, Barnett A and Di Chiro G 1996 Diffusion tensor imaging of the human brain Radiology 201 637–48 4. 2003 Persistent angular structure: new insights from diffusion MRI data Inverse Problems (submitted)
Persistent Angular Structure
683
5. Stejskal E O and Tanner T E 1965 Spin diffusion measurements: spin echoes in the presence of a time-dependent field gradient The Journal of Chemical Physics 42 288–92 6. Callaghan P T 1991 Principles of Magnetic Resonance Microscopy (Oxford, UK: Oxford Science Publications) 7. Mitra P P and Halperin B I 1995 Effects of finite gradient-pulse widths in pulsedfield-gradient diffusion measurement Journal of Magnetic Resonance 113 94–101 8. Wedeen V J, Reese T G, Tuch D S, Dou J-G, Weiskoff R M and Chessler D 2000 Mapping fiber orientation spectra in cerebral white matter with Fourier-transform diffusion MRI Proc. 7th Annual Meeting of the ISMRM (Philadelphia) (Berkeley, USA: ISMRM) 321 9. Crank J 1975 Mathematics of diffusion (Oxford, UK: Oxford University Press) 10. Jones D K, Horsfield M A and Simmons A 1999 Optimal strategies for measuring diffusion in anisotropic systems by magnetic resonance imaging Magnetic Resonance in Medicine 42 515–25 11. Hasan K M, Parker D L and Alexander A L 2001 Comparison of gradient encoding schemes for diffusion-tensor MRI Journal of Magnetic Resonance Imaging 13 769– 80 12. Frank L R 2002 Characterization of anisotropy in high angular resolution diffusionweighted MRI Magnetic Resonance in Medicine 47 1083–99 13. Alexander A L, Hasan K M, Lazar M, Tsuruda J S and Parker D L 2001 Analysis of partial volume effects in diffusion-tensor MRI Magnetic Resonance in Medicine 45 770–80 14. Alexander D C, Barker G J and Arridge S R 2002 Detection and modeling of non-Gaussian apparent diffusion coefficient profiles in human brain data Magnetic Resonance in Medicine 48 331–40 15. Le Bihan D 1995 Molecular diffusion, tissue microdynamics and microstructure NMR in Biomedicine 8 375–86 16. Skilling J and Gull S F 1985 Algorithms and applications Maximum Entropy and Bayesian methods in Inverse Problems ed Smith C R and Grandy W T (Dordrecht: Reidel publishing company) 83–132 17. Press W H, Teukolsky S A, Vettering W T and Flannery B P 1988 Numerical Recipes in C (New York, USA: Press Syndicate of the University of Cambridge) 18. Barker G J and Wheeler-Kingshott C Personal Communication 19. Woods R P, Grafton S T, Holmes C J, Cherry S R and Mazziotta J C 1998 Automated image registration: I general methods and intra-subject intra-modality validation Journal of Computer Assisted Tomography 22 141–154
Probabilistic Monte Carlo Based Mapping of Cerebral Connections Utilising Whole-Brain Crossing Fibre Information Geoff J.M. Parker1 and Daniel C. Alexander2 1
Imaging Science & Biomedical Engineering, University of Manchester, Manchester M13 9PT, UK [email protected], http://www.man.ac.uk/˜gjp 2 Department of Computer Science, University College London, Gower Street, London WC, UK [email protected]
Abstract. A methodology is presented for estimation of a probability density function of cerebral fibre orientations when one or two fibres are present in a voxel. All data are acquired on a clinical MR scanner, using widely available acquisition techniques. The method models measurements of water diffusion in a single fibre by a Gaussian density function and in multiple fibres by a mixture of Gaussian densities. The effects of noise on complex MR diffusion weighted data are explicitly simluated and parameterised. This information is used for standard and Monte Carlo streamline methods. Deterministic and probabilistic maps of anatomical voxel scale connectivity between brain regions are generated.
1
Introduction
Probabilistic methods for determining the connectivity between brain regions using information obtained from diffusion MRI have recently been introduced [3,8,9,10,11,15]. These approaches utilise probability density functions (PDFs) defined at each point within the brain to describe the local uncertainty in fibre orientation. Each PDF is intended to interpret the information available from a diffusion imaging acquisition in terms of the likely underlying fibre structure. Given an accurate voxelwise PDF it should be possible to define the probability of anatomical connection, defined at the voxel scale, between any two points within the brain. This may be achieved using Monte Carlo approaches based on, for example, streamlines [9,10,11] or Bayesian methods [3,15]. To date, PDFs used in probabilistic connectivity methods have either been defined in terms of the diffusion tensor model [3,8,9,10,11] or using q-space approximations acquired from spatially undersampled brain data [15]. The single tensor model of diffusion assumes that diffusive water molecule displacements are Gaussian distributed, which is a poor approximation where fibres cross, diverge, or experience high curvature. This leads to either inaccurate PDFs, which may C.J. Taylor and J.A. Noble (Eds.): IPMI 2003, LNCS 2732, pp. 684–695, 2003. c Springer-Verlag Berlin Heidelberg 2003
Probabilistic Monte Carlo Based Mapping of Cerebral Connections
685
assign unwarranted confidence in fibre orientation, or overly conservative PDFs that reflect the ambiguous fibre orientation information provided by the tensor in these regions. The signal to noise ratio (SNR) and sampling requirements of q-space based approaches such as diffusion spectrum imaging [15] make it impractical to perform whole brain imaging at the resolution required for diffusionbased connectivity mapping. At best, only low resolution imaging or restricted volume acquisitions are possible, both of which are obstacles to inter-regional connectivity mapping. An additional drawback of many suggested PDFs is their somewhat obscure meaning in terms of the phenomena that they are attempting to describe. Examples include the PDFs proposed by Parker et al, Lazar & Alexander, and Koch et al, each of which is a heuristic attempt to interpret tissue microstructural arrangement from the parameters of the estimated diffusion tensor [8,9,10,11], without providing much justification. The diffusion spectrum PDFs proposed by Tuch et al describe the probability of a structural obstruction to diffusion being oriented along one or more directions [15]. However, even a PDF such as this, which has a theoretically close relationship with the underlying structure, does not take into account the influence of data noise on the confidence that may be assigned to an estimated fibre direction. A promising approach for identifying voxels containing crossing fibres using spherical harmonic fits to diffusion orientation profiles has recently been described [5,2]. These techniques use whole brain multi-directional diffusion encoded data originally designed to allow enhanced SNR diffusion tensor imaging on standard clinical scanners [6]. These data are not suitable for invoking standard q-space-based methods but can reveal locations in the brain where the single tensor model is poor, indicating fibre crossings [5,2]. These datasets have the advantages of being acquired on clinical scanners, thus allowing access to patient cohorts, and higher spatial resolution and coverage than acquisitions designed for q-space imaging. For this work, we assume that voxels in which the single tensor fits the data poorly are fibre crossings and we fit a multi-Gaussian model instead. We simulate the effect of data noise on these fits, allowing us to generate noise-based PDFs in regions containing crossings. Uncertainty in single fibre voxels is parameterised in the same way with the aid of a single Gaussian model. This set of methods represents a promising approach for performing whole-brain, high resolution, probabilistic connectivity mapping to defined seed points that accounts for crossing fibre tracts.
2 2.1
Methods Data Acquisition
Single-shot echo planar diffusion weighted brain data were acquired using a GE Signa 1.5 tesla scanner with a standard quadrature head coil. Sequence parameters: cardiac gating (TR = 20 RR ≈ 20 s); 60 axial slices; TE = 95 ms; 54 non-collinear diffusion-weighting directions at b = 1156 smm−2 (calculated according to [13]); 6 acquisitions with a b ∼ 0 smm−2 ; diffusion sensitisation
686
G.J.M. Parker and D.C. Alexander
gradient duration δ = 34 ms; interval between gradients ∆ = 40 ms; gradient strength G = 22 mTm−1 ; 96 × 96 acquisition matrix, interpolated during reconstruction to 128 × 128; 220 mm field of view, generating 2.30 × 2.30 × 2.30 mm3 voxels as acquired, which are reconstructed to 1.72 × 1.72 × 2.30 mm3 [6, 16]. The total acquisition time was approximately 20 minutes. Eddy current induced image distortions in the diffusion sensitised images were removed using affine multiscale two-dimensional registration [14]. The brain was extracted on the b = 0 images to provide a brain mask using the brain extraction tool (BET) available in the FSL software package (http://www.fmrib.ox.ac.uk). All subjects were scanned with ethical committee approval and gave informed, written consent. The SNR of the data was defined as the mean b = 0 signal divided by the mean standard deviation of signal in a range of uniform tissue regions. For the data used in this study a SNR value of 16 was obtained. 2.2
Voxel Classification
We use the algorithm of Alexander et al [2] to identify voxels in which the single tensor model is poor. In these voxels we fit a mixture of two Gaussian densities; otherwise we use the single tensor model. The principal diffusion directions (PDDs) of the two diffusion tensors in the mixture model provide estimates of the orientations of the crossing fibres. We define ˆ = b−1 log S(k, ˆ b) − log S(k, ˆ 0) , d(k) (1) ˆ b) is the MRI measurement with diffusion weighting gradient direcwhere S(k, ˆ tion k and diffusion weighting factor b. For the single tensor model ˆ. ˆ =k ˆ T Dk d(k)
(2)
This model provides a suitable description of diffusion in tissues where a single fibre bundle direction is present in an image voxel. In voxels containing n different fibre directions a multi-Gaussian model [1,5] can be more appropriate: ˆ b) = S(0) S(k,
n
ˆ . ˆ T Di k ai exp −bk
(3)
i=1
ˆ to identify As described in [2], we use a spherical harmonic model of d(k) voxels in which the single tensor model is poor. Thus ˆ = d(k)
l ∞
ˆ , clm Ylm (k)
(4)
l=0 m=−l
ˆ is real where Ylm is the spherical harmonic of order l and index m. Since d(k) m ∗ ˆ ˆ and d(k) = d(−k), clm = 0 when l is odd and clm = (−1) cl−m . We truncate the series at order 0, 2 or 4 using the analysis of variance test for deletion of variables
Probabilistic Monte Carlo Based Mapping of Cerebral Connections
687
to select the best model. When the series is truncated at order 0, diffusion is isotropic. When it is truncated at order 2, the single tensor model is a good approximation and Eqs. 2 and 4 are equivalent. When fourth-order terms are included in the series, the single tensor fit is poor and we fit the multi-Gaussian model with n = 2. We assume that we cannot resolve the directions of more than two fibres with the number (54) of diffusion-weighted measurements acquired. To fit the multi-Gaussian model, we use a Levenberg–Marquardt algorithm ˆ Figure 1 shows a on data resampled from the spherical harmonic model of d(k). region of the brain demonstrating crossing fibre content. This region, where the motor tract (a superior-inferior tract) crosses the superior longitudinal fasciculus (an anterior-posterior tract at this position) and fibres from the corpus callosum are passing left-right, demonstrates the widespread presence of fibre crossings and their potential impact on the directional information present within the brain. Figure 1 also demonstrates the ease with which regions containing crossing fibres may be overlooked in conventional diffusion tensor imaging.
(a)
(b)
(c)
Fig. 1. Comparison of single tensor (top) and biGaussian modelling (bottom). Top: Greyscale linearly proportional to tensor fractional anisotropy (F A) [12]. Bottom: Voxels containing single, isotropic tensors (dark grey), single anisotropic tensors (mid-grey), and two tensors (white). Region shown is in the vicinity of the right corona radiata. Black needles represent PDDs. Note that low F A values outside grey matter in the single tensor parameterisation (top) often correspond to regions of crossing fibres (bottom). However, even relatively high anisotropy regions may demonstrate crossing fibres. (a) axial; (b) coronal; (c) sagittal views
688
2.3
G.J.M. Parker and D.C. Alexander
Noise-Based Uncertainty in Principal Diffusion Direction
The effect of noise on the single or two tensor fitting process is modelled using a simulated complex MR measurement. We add zero-mean random Gaussiandistributed noise with SNR = 16 repeatedly to estimate the PDF describing the effects of noise on apparent fibre direction. Noise model. We define a PDF in the PDD by simulating the addition of noise to complex MR data. The mean brain b = 0 signal (S0 ), δ, ∆, the data noise level, and the b-matrix are used as simulation inputs. We fit the spherical harmonic series to the simulated noisy data followed by the multi-Gaussian model. For a given δ, ∆, and S0 , the degree of deviation of the PDD about its expected direction is dependent upon the noise level, and the relative and absolute magnitudes of the tensor eigenvalues, λ1 , λ2 , λ3 (table 1). Within the simulation we constrain the range of λ1 , λ2 , λ3 by using the fact that the trace of the diffusion tensor varies little in brain tissue. We reduce the simulation set by assuming that λ2 = λ3 (i.e. that we only encounter axially symmetric tensors). In voxels exhibiting partial volume effects (for example in the presence of crossing fibres) non-axially symmetric tensors may be expected; however as we are explicitly detecting and modelling these cases it is likely to be a good assumption that all remaining single tensors are indeed axially symmetric. By extension we also assume that cases with two non-axially symmetric tensors do not occur. Table 1. Eigenvalues and fractional anisotropy, F A [12], used in the noise simulation. A constant value of trace = 2100 × 10−6 mm2 s−1 was used for all experiments FA 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 λ1 781 864 950 1042 1143 1256 1390 1554 1773 (×10−6 mm2 s−1 ) λ2 , λ3 660 618 575 529 479 422 355 273 164 (×10−6 mm2 s−1 )
Single fibre case. For the single axially-symmetric tensor model, the addition of Gaussian noise generates PDD distributions as shown in Fig. 2. If we align the z axis along the original PDD, the PDF is independent of longitudinal angle, φ, and has approximately normally distributed dependence on angle of deflection, θ, with a mean coincident with the original PDD (θ ∈ [−π/2, π/2), φ ∈ [−π/2, π/2)). Figure 3 shows the standard deviation in θ as a function of F A. This is in line with the results of [7], but clarifies the dependence of the PDD on noise in the axially symmetric limit. The deviations of the fitted lines in Fig. 3 from the data points may be due to the choice of a normal model of θ. It is possible that the use of a dedicated spherical distribution such as a von Mises - Fisher distribution may improve the parameterisation at low F A.
Probabilistic Monte Carlo Based Mapping of Cerebral Connections
(a)
(b)
(c)
689
(d)
Fig. 2. Simulated distribution of PDD due to random Gaussian distributed noise as a function of F A. Noiseless orientation of PDD along vertical (z) axis. Axial symmetry of tensor assumed. (a) F A = 0.1; (b) F A = 0.3; (c) F A = 0.5; (d) F A = 0.7 1.0
SD(theta)
0.8
0.6
0.4
0.2
0.0 0.0
0.2
0.4
0.6
0.8
1.0
FA
Fig. 3. Standard deviation (σθ ) in θ, due to the addition of Gaussian noise at SNR = 16 as a function of F A. The relationship between σθ and tensor F A is well described by a biexponential function, allowing uncertainty in θ to be estimated from the anisotropy of the fitted tensor. Closed circles: σθ of PDD of single tensor (no crossing fibres present). Open circles: σθ of PDD of one of two tensors when crossing fibres are present
Two fibre case. For the mixture model two PDDs exist, representing the axes of two crossing fibres. Examples of the distribution of these directions under the addition of Gaussian noise to the simulated complex MR data are shown in Fig. 4. The mean orientation of each fibre is not greatly affected by the presence of the neighbouring fibre. Furthermore, the spread in orientation of each is affected little by the relative orientation or spread of the other. With this observation we assume that the distributions of fitted PDDs for each fibre may be treated independently. As in the single fibre case, we use a Gaussian model for the distribution of the angle of deflection, the parameters of which are a function of F A for each tensor. As can be seen in Fig. 3, the uncertainty associated with the PDD of a tensor when crossing fibres are present is larger than that in the single fibre case, due to the larger set of fitted parameters. 2.4
Streamline Propagation in the Multi-tensor Field
A step in the streamline propagation process is defined: X(l + 1) = X(l) + w(l)δt ,
(5)
690
G.J.M. Parker and D.C. Alexander
(a)
(e)
(b)
(c)
(f)
(g)
(d)
(h)
Fig. 4. Simulated distribution of fitted PDDs due to random Gaussian distributed noise for a range of crossing fibre cases. Axial symmetry of both tensors is assumed and fibres are present in equal proportions. Noiseless fibre orientation: (a,b,e,f) y, z; (c,d,g,h) xy, y. F A values for each tensor: (a) 0.9, 0.9; (b) 0.3, 0.9; (c) 0.9, 0.9; (d) 0.3, 0.9; (e) 0.7, 0.7; (f) 0.5, 0.7; (g) 0.7 0.7; (h) 0.5, 0.7
where X(l) is the position in 3 of the streamline at point l on its length, w(l) is the propagation direction at this point (l → X), and δt is the step size. With a single tensor model of diffusion, w(l) is defined as the interpolated PDD at that point. We achieve this by trilinear interpolation of surrounding tensor elements to provide an interpolated tensor, from which the local PDD is determined. In the case where one or more of the image voxels involved in the interpolation is described by more than one tensor we employ selection rules to ensure the most appropriate tensor is included in the interpolation. If Di (p) is one of n tensors present at image location p, a location to be included in the interpolation, then the selected tensor is that which satisfies max (|Γ (Di (p)) · w(l − 1)|) , i
(6)
where Γ (Di (p)) is the PDD of the ith tensor at p. This formulation ensures that when fibre crossing is detected, the tensor representing the fibre with an orientation closest to that of the current streamline propagation direction is chosen to influence further propagation. Figure 5(a) demonstrates the propagation of streamlines through a region of crossing fibres. 2.5
Monte Carlo Methods and Maps of Connection Probability
The PICo framework is used to enable probabilistic fibre tracking [10,11], allowing maps of connection probability to be generated. The method utilises a Monte Carlo streamline approach, sampling the orientation PDFs within each voxel on each iteration. Fig. 5(b) illustrates the deviation of streamlines due to the cumulative effects of PDFs within each imaging voxel.
Probabilistic Monte Carlo Based Mapping of Cerebral Connections
(a)
691
(b)
Fig. 5. (a) Streamline propagation through a region of crossing fibres in the vicinity of the example in Fig. 1. Streamlines(black continuous lines) initiated in the corpus callosum propagate left-right through the regions, whilst initiation in the corticospinal tract leads to inferior-superior propagation. (b)Illustration of streamline randomisation process in the corpus callosum
The number of occasions, µ (p, N ), over N repetitions at which each voxel, p, is crossed by a streamline is used to define a map of the probability, ψ, of connection to the start point, in a similar fashion to [8]: µ (p, N ) µ(p, N ) ≈ . N →∞ N N
ψ(p) = lim
3
(7)
Probabilistic Fibre Tracking Examples
A number of examples from a single subject are presented. Figure 6 shows maps of connection probability to a single voxel in the middle of the splenium of the corpus callosum. Branching of the extracted pathway may be observed at various points, with a branch of relatively high probability in the left hemisphere. However, the main route of connection is tightly localised between the seed point and the posterior cortical regions. This is due to the generally high anisotropy observed in this region, leading to low uncertainty in in fibre direction. Figure 7 shows patterns of connection to a region placed in the left superior longitudinal fasciculus. The end points of the measured tract terminate in Broca’s area and Wernicke’s area. However, it is also evident that probability of connection is assigned to pathways that approach motor regions. Non-zero probability is also assigned to a region in the parietal lobe. These connections are unlikely to be present in the underlying anatomy. Figure 8 demonstrates probabilities of connection to a single point in the left pyramidal body. The coritcospinal tract connected to this point is well-defined throughout, indicating low uncertainty in fibre direction along its length.
692
G.J.M. Parker and D.C. Alexander (a)
(b)
(c)
(d)
(e)
Fig. 6. Maps of connection probability to a seed point placed in the centre of the splenium of the corpus callosum. (a)–(d) adjacent slices of increasingly superior anatomical location. Radiological viewing convention used. Greyscale windowed to show full range of connection probability. (e) surface rendering showing thresholded probability values in corpus callosum. Note branching of identified connected regions in left hemisphere but an otherwise tightly localised pattern of connection
4
Discussion
We have identified voxels within the brain that include more than one fibre orientation and, by modelling the effects of noise, obtained PDFs on the PDDs for both the single and dual fibre cases. These PDFs are characterised as independent normal distributions in the angle of deflection away from the initial estimate of the PDD. The standard deviation of these distributions is shown to be related simply to the anisotropy of the tensors used in the model. Using the crossing fibre information and the orientation PDFs we have shown that it is possible to generate probabilistic representations of diffusion-based voxel-scale connectivity from user-defined start points. This mapping benefits from the increased information content provided by the multi-tensor decomposition, which allows more accurate definition of the routes and termini of connections. It is also possible to use these data for deterministic fibre tracking (for example streamline methods (Fig. 5(a))), which again benefits from the improved information content. We have presented results in a single brain; results from a group of 10 further brains showed compatible results (not shown). Validation of the method is to date limited to cross-referencing the results with know anatomy. This is undesirable, as knowledge of anatomy may be incomplete, and quantification of errors is difficult. Improved validation may be achieved with the use of data simulations and via the use of animal models. Also, the repeatability of the method could be assessed by repeated data ac-
Probabilistic Monte Carlo Based Mapping of Cerebral Connections
693
C C D A B
(a)
E
D
A
B
(b)
Fig. 7. Views of thresholded probability map in the left superior longitudinal fasciculus with rendered brain surface. Probability of connection to seed region in superior longitudinal fasciculus shows evidence of connection to Broca’s area (A), Wernicke’s area (B), motor areas (C, D), and an unidentifies parietal/occipital region (E). (a) frontal view; (b) view from left
quisitions in a single subject. Notwithstanding the limitations on validation, the connections identified may be categorised as likely true positive and false positive in terms of the likely underlying anatomy. For example, the linkage between Broca’s area and Wernicke’s area demonstrated in Fig. 7 is as expected from known functional neuroanatomy. However, the connections, from the same start point, to motor regions in Fig. 7 are not expected. The fact that it is not possible to distinguish these two sets of connections as more or less likely from the data alone identifies a fundamental limitation of diffusion imaging-based fibre tracking. The uncertainties that we are able to define in one or more underlying fibre bundles reflect the fact that diffusion imaging gives us an imprecise indication of these orientations, which does not allow us to define connections unambiguously. Given the data, these regions are roughly equally likely to be connected to our start point; given known anatomy some of these are unlikely. In light of such observations it seems that relating diffusion imaging based findings to anatomy will always require significant expert knowledge, interpretation, and guidance (for example via the use of constraining volumes of interest [4]). Possible relaxation on these restrictions could be achieved by a reduction in data noise or a reduction in the sensitivity of the parameterisation process to noise, leading to a decrease in the variance seen in Figs 2 and 4. This would reduce the amount of dispersion in the connection probability maps and possibly reduce the occurrence of erroneous connections. Another possibility for reducing ’false positive’ connections is the introduction of constraints on the tracking process, such as curvature penalties, or regularisation of the diffusion data. However, curvature restrictions penalise genuine pathways demonstrating high curvature (for example the optic radiation), and
694
G.J.M. Parker and D.C. Alexander
(a)
(b)
Fig. 8. Views of thresholded probability map of connections to the left pyramidal body with rendered brain surface. (a) frontal view; (b) view from left
regularisation methodologies may destroy potentially useful information and often also require curvature penalties. It may also be possible to extend the design of the PDFs to allow fibre orientation information from the local neighbourhood to be included, thus improving the focus of the PDF. However, without such advances in data interpretation and/or improvements in data quality, care must be employed when interpreting the results of diffusion connectivity studies.
Acknowledgements. We are grateful to Dr Olga Ciccarelli and Dr Claudia Wheeler-Kingshott, Institute of Neurology, London, for acquiring and making the data available. We also thank Dr Simona Luzzi, University of Manchester, for assistance with identification of neuroanatomical regions. This work has been supported in part by the Medical Research Council and the Engineering and Physical Sciences Research Council of Great Britain Interdisciplinary Research Collaboration “From Medical Images and Signals to Clinical Information,” under Grant GR/N14248/01.
References 1. Alexander, A.L., Hasan, K.M., Mariana, L., Tsuruda, J.S., Parker, D.L.: Analysis of partial volume effects in diffusion-tensor MRI. Magn. Reson. Med. 45 (2001) 770–780 2. Alexander, D.C., Barker, G.J., Arridge, S.R.: Detection and modelling of nonGaussian apparent diffusion coefficient profiles in human brain data. Magn. Reson. Med. 48 2002 331–340 3. Behrens, T.E.J., Jenkinson, M., Brady, J.M., Smith, S.M.: A probabilistic framework for estimating neural connectivity from diffusion weighted MRI. Proc. Int. Soc. Magn. Reson. Med. (2002) 1142
Probabilistic Monte Carlo Based Mapping of Cerebral Connections
695
4. Conturo, T.E., Lori, N.F., Cull, T.S., Akbudak, E., Snyder, A.Z., Shimony, J.S., McKinstry, R.C., Burton, H., Raichle, M.E.: Tracking neuronal fiber pathways in the living human brain. Proc. Nat. Acad. Sci. USA 96 (1999) 10422–10427 5. Frank, L.R.: Characterization of anisotropy in high angular resolution diffusionweighted MRI. Magn. Reson. Med. 47 (2002) 1083–1099 6. Jones, D.K., Horsfield, M.A., Simmons, A.: Optimal strategies for measuring diffusion in anisotropic systems by magnetic resonance imaging. Magn. Reson. Med. 42 (1999) 515–525 7. Jones, D.K.: Determining and visualizing uncertainty in estimates of fiber orientation from diffusion tensor MRI. Magn. Reson Med. 49 (2003) 7–12 8. Koch, M.A., Norris, D.G., Hund-Georgiadis, M.: An investigation of functional and anatomical connectivity using magnetic resonance imaging. NeuroImage 16 (2002) 241–250 9. Lazar, M., Alexander, A.L.: White matter tractography using random vector (RAVE) perturbation. Proc. Int. Soc. Magn. Reson. Med. (2002) 539 10. Parker, G.J.M., Barker, G.J., Buckley, D.L.: A probabilistic index of connectivity (PICo) determined using a Monte Carlo approach to streamlines. ISMRM Workshop on Diffusion MRI (Biophysical Issues), Saint-Malo, France (2002) 245–255 11. Parker, G.J.M., Barker, G.J., Thacker, N.A., Jackson, A.: A framework for a streamline-based probabilistic index of connectivity (PICo) using a structural interpretation of anisotropic diffusion. Proc. Int. Soc. Magn. Reson. Med. (2002) 1165 12. Pierpaoli, C., Basser, P.J.: Toward a quantitative assessment of diffusion anisotropy. Magn. Reson. Med. 36 (1996) 893–906 13. Stejskal, E.O., Tanner, J.E.: Spin diffusion measurements: spin echoes in the presence of a time-dependent field gradient. J Chem Phys 42 (1965) 288–292 14. Symms, M.R., Barker, G.J., Franconi, F., Clark, C.A.: Correction of eddy-current distortions in diffusion-weighted echo-planar images with a two-dimensional registration technique. Proc. Int. Soc. Magn. Reson. Med. (1997) 1723 15. Tuch, D. S., Wiegell, M. R., Reese, T. G., Belliveau, J. W., and Weeden, V. J. Measuring cortico-cortical connectivity matrices with diffusion spectrum imaging. Proc. Int. Soc. Magn. Reson. Med. (2001) 502 16. Wheeler-Kingshott, C.A.M., Boulby, P.A., Symms, M.R., Barker, G.J.: Optimised cardiac gating for high angular-resolution whole-brain DTI on a standard scanner. Proc. Int. Soc. Magn. Reson. Med. (2002) 1118
Author Index
Abbey, Craig K. 318 Acha, Bego˜ na 294 Acha, Jos´e I. 294 Adde, Geoffray 524 Alexander, Daniel C. 672, 684 Allen, P. Daniel 38 Avants, Brian 101 Aziz, Aamer 270 Baillet, Sylvain 512 Bardinet, Eric 487 Barrett, H.H. 342 Bellio, Anne 536 Benali, Habib 536, 548, 635 Bhalerao, Abhir 282 Boukerroui, Djamal 586 Brady, Michael 586 Bruijne, Marleen de 136 Cachia, A. 160 C ¸ etin, M¨ ujdat 148 Chan, Tony F. 172 Chandrashekara, Raghavendra 599 Chen, Y. 388, 660 Chinrungrueng, Jatuporn 623 Christensen, Gary E. 438 Ciuciu, Philippe 635 Claridge, Ela 306 Clarkson, E. 342 Clerc, Maureen 524 Collins, D.L. 160 Constable, R. Todd 647 Cootes, Tim F. 38, 258 Crum, William R. 378 Csernansky, John G. 114 Daunizeau, Jean 536 Davatzikos, Christos 426 Davies, Rhodri H. 38, 63 Delingette, Herv´e 25 Duncan, James S. 198 Evans, A.C.
160
Fan, Ayres 148 Faugeras, Olivier
524
Fischl, Bruce 330 Fisher, John W. 148, 366 Fletcher, P. Thomas 450 Gao, ZaiXiang 126 Gee, James 101 Ginneken, Bram van 136 Golland, Polina 330 Grimson, E. 185 Gu, Xianfeng 172 Haker, Steven 148 Hawkes, David J. 378 He, Jianchun 438 Hilger, Klaus B. 1 Hill, Derek L.G. 378 Horkaew, P. 13 Hu, QingMao 270 Hund-Georgiadis, Margret Insana, Michael F.
318
Jaffe, C. Carl 246 Jansons, Kalvis M. 672 Joshi, Sarang 114, 450 Kara¸calı, Bilge 426 Karssemeijer, Nico 401 Keriven, Renaud 524 Kupinski, M.A. 342 Kybic, Jan 524 Lamb, H. 234 Leahy, Richard M. 512 Lelieveldt, B.P.F. 234 Lima, Joao A.C. 611 Liu, Huafeng 560 Loeckx, Dirk 463 Lohmann, Gabriele 89 Lu, Conglin 450 Maes, Frederik 463 Malandain, Gr´egoire 487 Mangin, J.-F. 160 Mareci, T. 660 Marrelec, Guillaume 635
89
698
Author Index Scott, I.M. 258 Serrano, Carmen 294 Shi, Pengcheng 560 Snoeren, Peter R. 401 Sonka, Milan 222, 234 Staib, Lawrence H. 198, 647 Stewart, Charles V. 475 Styner, Martin A. 63 Suetens, Paul 463 Sz´ekely, G´ abor 63
Marsland, Stephen 413 Mattout, J´er´emie 536, 548 Maurer, Calvin R. 210 Meyer, Fran¸cois G. 623 Mohiaddin, Raad H. 599 Mulkern, Robert 148 NessAiver, Moriel 573 Nichols, Thomas E. 512 Niessen, Wiro J. 136 Noble, J. Alison 586 Nolte, Lutz-Peter 63 Nowinski, Wieslaw L. 270 Oost, C.R. 234 Osman, Nael F. 611 Pal´ agyi, K´ alm´ an 222 Pan, Li 611 Pantazis, Dimitrios 512 Papadopoulo, Th´eodore 524 Papadopoulos-Orfanos, D. 160 Park, S. 342 Parker, Geoff J.M. 684 Paulsen, Rasmus R. 1 P´el´egrini-Issac, M´elanie 536, 548, 635 Perera, Amitha 475 Pitiot, Alain 25 Pizer, Stephen M. 114, 450 Preece, Steve J. 306 Preul, Christoph 89 Prince, Jerry L. 573 Qi, Jinyi
354
Rajamani, Kumar T. 63 Rangarajan, Anand 499 Rao, Anil 599 Rao, M. 388 R´egis, J. 160 Reiber, J.H.C. 234 Reyes-Aldasoro, Constantino Carlos Rivi`ere, D. 160 Roa, Laura M. 294 Rohlfing, Torsten 210 Rohr, Karl 76 Rueckert, Daniel 599 Russakoff, Daniel B. 210 Sanchez-Ortiz, Gerardo Ivar Schultz, Robert T. 647
599
Tagare, Hemant D. 246 Tao, Zhong 246 Taylor, Christopher J. 38, 63, 126, 258 Tempany, Clare 148, 185 Thodberg, Hans Henrik 51 Thompson, Paul M. 25, 172 Toga, Arthur W. 25 Tsai, A. 185 Tsai, Chia-Ling 475 Tschirren, Juerg 222 Twining, Carole J. 38, 413 ¨ umc¨ Uz¨ u, M.
234
Vandermeulen, Dirk 463 Vemuri, B.C. 388, 660 Viergever, Max A. 136 Wang, F. 388 Wang, Lei E. 114 Wang, Yalin 172 Wang, Yongmei Michelle 647 Wang, Z. 660 Waterton, John C. 126 Wells, William M. 148, 185, 366 Williams, Tomos G. 126 Willsky, Alan S. 148, 185 W¨ orz, Stefan 76 Xia, Yan
270
282 Yang, G.Z. 13 Yang, Jing 198 Yau, Shing-Tung 172 Yushkevich, Paul 114 Zemp, Roger J. 318 Zhang, Jie 499 Z¨ ollei, Lilla 366 Zsemlye, Gabriel 63