Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y. Vardi Rice University, Houston, TX, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
4191
Rasmus Larsen Mads Nielsen Jon Sporring (Eds.)
Medical Image Computing and Computer-Assisted Intervention – MICCAI 2006 9th International Conference Copenhagen, Denmark, October 1-6, 2006 Proceedings, Part II
13
Volume Editors Rasmus Larsen Technical University of Denmark Informatics and Mathematical Modelling 2800 Kgs. Lyngby, Denmark E-mail:
[email protected] Mads Nielsen IT University of Copenhagen Rued Langgaards Vej 7, 2300 København S, Denmark E-mail:
[email protected] Jon Sporring University of Copenhagen Department of Computer Science Universitetsparken 1, 2100 København Ø, Denmark E-mail:
[email protected]
Library of Congress Control Number: 2006932793 CR Subject Classification (1998): I.5, I.4, I.3.5-8, I.2.9-10, J.3, J.6 LNCS Sublibrary: SL 6 – Image Processing, Computer Vision, Pattern Recognition, and Graphics ISSN ISBN-10 ISBN-13
0302-9743 3-540-44727-X Springer Berlin Heidelberg New York 978-3-540-44727-6 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2006 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 11866763 06/3142 543210
Preface
The 9th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2006, was held in Copenhagen, Denmark at the Tivoli Concert Hall with satellite workshops and tutorials at the IT University of Copenhagen, October 1-6, 2006. The conference has become the premier international conference with indepth full length papers in the multidisciplinary fields of medical image computing, computer-assisted intervention, and medical robotics. The conference brings together clinicians, computer scientists, engineers, physicists, and other researchers and offers a forum for the exchange of ideas in a multidisciplinary setting. MICCAI papers are of high standard and have a long lifetime. In this volume as well as in the latest journal issues of Medical Image Analysis and IEEE Transactions on Medical Imaging papers cite previous MICCAIs including the first MICCAI conference in Cambridge, Massachusetts, 1998. It is obvious that the community requires the MICCAI papers as archive material. Therefore the proceedings of MICCAI are from 2005 and henceforth being indexed by Medline. A careful review and selection process was executed in order to secure the best possible program for the MICCAI 2006 conference. We received 578 scientific papers from which 39 papers were selected for the oral program and 193 papers for the poster program. The papers were evaluated by 3 independent scientific reviewers. Reviewer affiliations were carefully checked against author affiliations to avoid conflicts of interest, and the review process was run as a double blind process. A special procedure was devised for papers from the universities of the organizers upholding a double blind review process also for these papers. A total of 98 % of the reviews we asked for were received. The MICCAI program committee consisted of the local organizers, 2 internationally selected co-chairs, and 15 internationally selected area chairs, each a leading expert in his/her field. Each area chair was assigned 40 papers from which he/she formed a recommendation for the program committee based on the scientific reviews as well as their own assessment. The entire program committee met in Copenhagen for 2 full days in May 2006. At this meeting all 578 papers and their corresponding reviews were printed and discussed. In a first round of discussions the area chairs were divided into 5 groups of 3. From their joint pool of 120 papers each group identified 12 potential oral papers, 28 poster papers, and 16 potential poster papers. In the second round the oral program was made from the resulting 60 potential oral papers. Oral papers were selected based on their quality, total coverage of MICCAI topics, and suitability for oral presentation. In parallel the remaining 80 potential poster papers were considered and 33 papers were accepted for poster presentations.
VI
Preface
The entire procedure was designed such that the papers were selected by paper to paper comparison forcing the committee members to argue for the decision in each individual case. We believe a careful and fair selection process has been carried out for MICCAI 2006. Each paper was examined by 3 reviewers, and further scrutinized by 3-8 program committee members. Our thanks go to the reviewers and area chairs for their hard work and enthusiasm, and to the two program co-chairs David Hawkes and Wiro Niessen for their dedication to putting together the program. This year’s MICCAI was augmented by more workshops than previously. Twelve independent workshops were held prior and subsequent to the conference. These workshops served as a forum for MICCAI subfields and made room for many more presentations due to their parallel programs. The workshops were organized for all scientific and most practical matters by the workshop chairs. We thank the workshop organizers for suggesting, arranging, and managing these excellent workshops. It is our hope that we will see multiple workshops also at future MICCAI conferences. Three tutorials were also provided by leading experts in their fields of research. We thank the two keynote speakers Terry Jernigan, UCSD and Copenhagen University Hospital, Hvidovre, Denmark and Thomas Sinkjær, Director, Center for Sensory-Motor Interaction, Aalborg University, Denmark. A series of sponsors helped make the conference possible. for this they are thanked. Finally, we thank our 3 general co-chairs Anthony Maeder, Nobuhiko Hata, and Olaf Paulson, who provided insightful comments and invaluable support during the entire process of planning MICCAI 2006. The greater Copenhagen region in Denmark and the Sk˚ ane region in Southern Sweden are connected by the Øresund Bridge. The region hosts 14 universities and a large concentration of pharmaceutical and biotech industry as well as 26 hospitals. This makes Copenhagen the capital of one of the most important life science centers in Europe. It was our great pleasure to welcome delegates from all over the world to Denmark and the city of Copenhagen. It is our hope that delegates in addition to attending the conference took the opportunity to sample the many excellent cultural offerings of Copenhagen. We look forward to welcoming you to MICCAI 2007 to be held October 29 November 2 in Brisbane, Australia and chaired by Anthony Maeder.
October 2006
Rasmus Larsen, Mads Nielsen, and Jon Sporring
Organization
The university sponsors for MICCAI 2006 were the IT-University of Copenhagen, the Technical University of Denmark, and the University of Copenhagen.
Executive Committee General Chairmanship Mads Nielsen (chair) Nubohiko Hata Anthony Maeder Olaf Paulson
IT-University of Copenhagen, Denmark Brigham and Women’s Hospital, Boston, USA University of Queensland, Brisbane, Australia Copenhagen University Hospital, Denmark
Program Chairmanship Rasmus Larsen (chair) David Hawkes Wiro Niessen
Technical University of Denmark University College London, UK Erasmus Medical School, Netherlands
Workshop and Tutorials Chair Jon Sporring
University of Copenhagen, Denmark
Program Committee Leon Axel Marleen de Bruijne Kevin Cleary Herv´e Delingette Polina Golland Nico Karssemeijer Sven Kreiborg Jyrki L¨ otj¨ onen Kensaku Mori S´ebastien Ourselin Egill Rostrup Julia Schnabel Pengcheng Shi Martin Styner Carl-Fredrik Westin
New York University Medical Center, USA IT University of Copenhagen, Denmark Georgetown University Medical Center, USA INRIA, Sophia Antipolis, France Massachusetts Institute of Technology, USA Radboud University Nijmegen, Netherlands University of Copenhagen, Denmark VTT, Finland Nagoya University, Japan CSIRO, Australia University of Copenhagen, Denmark University College London, UK Hong Kong University of Science and Technology University of North Carolina, USA Harvard University, USA
VIII
Organization
Conference Secretariat and Management Camilla Jørgensen Eva Branner Henrik J. Nielsen
IT University of Copenhagen, Denmark Congress Consultants, Denmark Congress Consultants, Denmark
Student Awards Coordinator Karl Heinz H¨ohne
Germany
Local Organizing Committee Erik Dam Sune Darkner Søren Erbou Mads Fogtmann Hansen Michael Sass Hansen Peter Stanley Jørgensen Marco Loog ´ Hildur Olafsd´ ottir Ole Fogh Olsen Mikkel B. Stegmann Martin Vester-Christensen
IT University of Copenhagen Technical University of Denmark Technical University of Denmark Technical University of Denmark Technical University of Denmark Technical University of Denmark IT University of Copenhagen Technical University of Denmark IT University of Copenhagen Technical University of Denmark Technical University of Denmark
Sponsors AstraZeneca Center for Clinical and Basic Research Claron Elsevier GE Medtronic NDI - Northern Digital Inc. Siemens Corporate Research Springer Visiopharm
Organization
Reviewers Hossam El Din Hassan Abd El Munim Purang Abolmaesumi Elsa Angelini Anastassia Angelopoulou Neculai Archip John Ashburner Stephen Aylward Fred S. Azar Jose M. Azorin Eric Bardinet Christian Barillot Philip Batchelor Pierre-Louis Bazin Fernando Bello Marie-Odile Berger Abhir Bhalerao Rahul Bhotika Isabelle Bloch Emad Boctor Thomas Boettger Hrvoje Bogunovic Sylvain Bouix Pierrick Bourgeat Roger Boyle Elizabeth Bullitt Catherina R. Burghart Darius Burschka Nathan Cahill Hongmin Cai Darwin G. Caldwell Oscar Camara-Rey Carlos Alberto Castao Moraga Pascal Cathier M. Mallar Chakravarty Hsun-Hsien Chang Jian Chen Lishui Cheng Aichi Chien Kiyoyuki Chinzei Gary Christensen Albert C.S. Chung Moo Chung Chris A. Cocosco
D. Louis Collins Olivier Colliot Lars Conrad-Hansen Jason Corso Olivier Coulon Patrick Courtney Jessica Crouch Erik Dam Mikhail Danilouchkine Sune Darkner Julien Dauguet Laura Dempere-Marco Maxime Descoteaux Michel Desvignes Maneesh Dewan Jean-Louis Dillenseger Simon DiMaio Christophe Doignon Etienne Dombre Andrew Dowsey Ye Duan Simon Duchesne Ayman El-Baz Randy Ellis Søren Erbou Simon Eskildsen Yong Fan Aly Farag Aaron Fenster Gabor Fichtinger Oliver Fleig P. Thomas Fletcher Charles Florin Mads Fogtmann Hansen Jenny Folkesson Rui Gan Andrew Gee Guido Gerig David Gering Frans Gerritsen Bernard Gibaud Maryellen Giger Gaolang Gong
IX
X
Organization
Ren Hui Gong ´ Miguel A.G. Ballester Mark Gooding Girish Gopalakrishnan Vicente Grau Eric Grimson Christophe Grova Christoph Guetter Efstathios Hadjidemetriou Horst Hahn Haissam Haidar Ghassan Hamarneh Lars G. Hanson Matthias Harders Makoto Hashizume M. Sabry Hassouna Mark Hastenteufel Peter Hastreiter Yong He Pierre Hellier David Holmes Byung-Woo Hong Robert Howe Qingmao Hu Zhenghui Hu Heng Huang Karl Heinz H¨ohne Ameet Jain Pierre Jannin Branislav Jaramaz Tianzi Jiang Yuchong Jiang Ge Jin Leigh Johnston Julien Jomier Sarang Joshi Leo Joskowicz Ioannis Kakadiaris D.B. Karron Michael Kaus Peter Kazanzides Kamran Kazemi Erwan Kerrien Irina Boshko Kezele Ali Khamene
Ron Kikinis Adelaide Kissi Takayuki Kitasaka Jan Klein Ender Konukoglu Tim Kroeger Thomas Lange Thomas Lango Rudy Lapeer Sang-Chul Lee Koen van Leemput Chunming Li Shuo Li Jianming Liang Hongen Liao Rui Liao Yuan-Lin Liao Jean Lienard Marius George Linguraru Alan Liu Huafeng Liu Tianming Liu Marco Loog William Lorensen Peter Lorenzen Anant Madabhushi Mahnaz Maddah Frederik Maes Sherif Makram-Ebeid Gregoire Malandain Robert Marti Marcos Martin-Fernandez Ken Masamune Julian Mattes Tim McInerney Gloria Menegaz Chuck Meyer Michael I. Miga James Miller Abhilash Miranda Lopamudra Mukherjee William Mullally Yoshihiro Muragaki Delphine Nain Kyojiro Nambu
Organization
Sumit Nath Nassir Navab Stephane Nicolau Marc Niethammer Alison Noble Herke Jan Noordmans Wieslaw L. Nowinski Thomas O’Donnell Arnaud Ogier Allison M. Okamura Silvia Olabarriaga ´ Hildur Olafsd´ ottir Salvador Olmos Ole Fogh Olsen Mark Olszewski Tobias Ortmaier Xenophon Papademetris Nikos Paragios Hyunjin Park Javier Pascau Rajni Patel Alexandru Patriciu Perrine Paul Rasmus Paulsen Ioannis Pavlidis Kim Steenstrup Pedersen Heinz-Otto Peitgen M´elanie Pelegrini-Issac Xavier Pennec Dimitrios Perperidis Eric Pichon Josien Pluim Kilian Pohl Richard Prager Tobias Preusser Sylvain Prima Jerry L. Prince Yingge Qu Srinivasan Rajagopalan Nasir Rajpoot Richard A. Robb Miguel Angel Rodriguez-Florido Torsten Rohlfing Karl Rohr Michael Rosenthal
Daniel Rueckert Daniel Russakoff Ichiro Sakuma Tim Salcudean Thomas Sebastian Zuyao Shan Cartik Sharma Dinggang Shen Hongjian Shi Lin Shi Rudolf Sidler Alberto Signoroni Nabil Simaan Vikas Singh Karl Sj¨ ostrand ¨ Orjan Smedby Xubo Song Jon Sporring James Stewart Rik Stokking Danail Stoyanov Yi Su Navneeth Subramanian Paul Suetens G´ abor Sz´ekely Songyuan Tang Xiaodong Tao Huseyin Tek Demetri Terzopoulos Jean-Philippe Thiran Marc Thiriet Carlos Thomaz Jussi Tohka Oliver Tonet Shan Tong Jocelyne Troccaz ¨ G¨ ozde Unal Regis Vaillant Ragini Verma Martin Vester-Christensen Pierre Vieyres Kirby Vosburgh Albert Vossepoel Lionel C. C. Wai Defeng Wang
XI
XII
Organization
Linwei Wang Qiang Wang Yiying Wang Yongmei Michelle Wang Yuanquan Wang Simon Warfield Zhouping Wei Ross Whitaker James Williams Cees van Wijk Ivo Wolf Wilbur C.K. Wong Chia-Hsiang Wu John Jue Wu Ting Wu Chris Wyatt Stefan W¨orz
Zhong Xue Yasushi Yamauchi Pingkun Yan G.Z. Yang Ziv Yaniv Terry Yoo Paul Yushkevich Stefan Zachow Jianchao Zeng Yiqiang Zhan Zheen Zhao Guoyan Zheng S. Kevin Zhou Wanlin Zhu Tatjana Zrimec Reyer Zwiggelaar
MICCAI Society Executive Officers Alan Colchester Richard A. Robb Nicholas Ayache Terry M. Peters Karl Heinz H¨ohne
President and Board Chair Executive Director Executive Secretary Treasurer Elections Officer (Honorary Board member)
Staff G´ abor Sz´ekely Nobuhiko Hata
Membership Coordinator Publication Coordinator
Board of Directors Nicholas Ayache Alan Colchester James Duncan Guido Gerig Anthony Maeder Dimitris Metaxas Mads Nielsen Alison Noble Terry M. Peters Richard A. Robb
INRIA, Sophia Antipolis, France University of Kent, Canterbury, UK Yale University, New Haven, Connecticut, USA University of North Carolina, Chapel Hill, USA University of Queensland, Brisbane, Australia Rutgers University, New Jersey, USA IT University of Copenhagen, Copenhagen, Denmark University of Oxford, Oxford, UK Robarts Research Institute, London, Ontario, Canada Mayo Clinic College of Medicine, Rochester, Minnesota, USA
Student Awards
Every year MICCAI awards outstanding work written and presented by students. Both oral and poster presentations are eligible for the awards, and the awards are presented to the winners in a public ceremony at the conference.
MICCAI 2005 – Palm Springs At MICCAI 2005 five prizes each valued at 500 USD sponsored by Northern Digital Incorporation (NDI) were awarded in the following categories Image segmentation and analysis: Pingkun Yan, MRA Image Segmentation with Capillary Active Contour Image registration: Ashraf Mohamed, Deformable Registration of Brain Tumor Images via a Statistical Model of Tumor Induced Deformation Computer assisted interventions and robotics: Henry C. Lin, Automatic Detection and Segmentation of Robot Assisted Surgical Motions Simulation and visualization: Peter Savadjiev, 3D Curve Inference for Diffusion MRI Regularization Clinical applications: Srinivasan Rajagopalan, Schwarz Meets Schwann: Design and Fabrication of Biomorphic Tissue Engineering Scafolds
MICCAI 2004 – St. Malo At MICCAI 2004 four prizes each valued at 600 Euros sponsored by Northern Digital Incorporation (NDI) were awarded in the following categories Image segmentation and processing: Engin Dikici, Quantification of Delayed Enhancement MR Images Image registration and analysis: Dimitrios Perperidis, Spatio-Temporal FreeForm Registration of Cardiac MR Image Sequences Image guided therapy and robotics: Danail Stoyanov, Dense 3D Depth Recovery for Soft Tissue Deformation During Robotically Assisted Laparoscopic Surgery Image Simulation and Display: Davide Valtorta, Dynamic Measurements of Soft Tissue Viscoelastic Properties with a Torsional Resonator Device
Table of Contents – Part II
Segmentation I Robust Active Shape Models: A Robust, Generic and Simple Automatic Segmentation Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julien Abi-Nahed, Marie-Pierre Jolly, Guang-Zhong Yang
1
Automatic IVUS Segmentation of Atherosclerotic Plaque with Stop & Go Snake . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ellen Brunenberg, Oriol Pujol, Bart ter Haar Romeny, Petia Radeva
9
Prostate Segmentation in 2D Ultrasound Images Using Image Warping and Ellipse Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sara Badiei, Septimiu E. Salcudean, Jim Varah, W. James Morris
17
Detection of Electrophysiology Catheters in Noisy Fluoroscopy Images . . . Erik Franken, Peter Rongen, Markus van Almsick, Bart ter Haar Romeny
25
Fast Non Local Means Denoising for 3D MR Images . . . . . . . . . . . . . . . . . . . Pierrick Coup´e, Pierre Yger, Christian Barillot
33
Active Shape Models for a Fully Automated 3D Segmentation of the Liver – An Evaluation on Clinical Data . . . . . . . . . . . . . . . . . . . . . . . . Tobias Heimann, Ivo Wolf, Hans-Peter Meinzer Patient Position Detection for SAR Optimization in Magnetic Resonance Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andreas Keil, Christian Wachinger, Gerhard Brinker, Stefan Thesen, Nassir Navab Symmetric Atlasing and Model Based Segmentation: An Application to the Hippocampus in Older Adults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G¨ unther Grabner, Andrew L. Janke, Marc M. Budge, David Smith, Jens Pruessner, D. Louis Collins Image Diffusion Using Saliency Bilateral Filter . . . . . . . . . . . . . . . . . . . . . . . . Jun Xie, Pheng-Ann Heng, Simon S.M. Ho, Mubarak Shah Data Weighting for Principal Component Noise Reduction in Contrast Enhanced Ultrasound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gord Lueck, Peter N. Burns, Anne L. Martel
41
49
58
67
76
XVI
Table of Contents – Part II
Shape Filtering for False Positive Reduction at Computed Tomography Colonography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abhilash A. Miranda, Tarik A. Chowdhury, Ovidiu Ghita, Paul F. Whelan
84
Validation and Quantitative Image Analysis Evaluation of Texture Features for Analysis of Ovarian Follicular Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Na Bian, Mark.G. Eramian, Roger A. Pierson
93
A Fast Method of Generating Pharmacokinetic Maps from Dynamic Contrast-Enhanced Images of the Breast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Anne L. Martel Investigating Cortical Variability Using a Generic Gyral Model . . . . . . . . . 109 Gabriele Lohmann, D. Yves von Cramon, Alan C.F. Colchester Blood Flow and Velocity Estimation Based on Vessel Transit Time by Combining 2D and 3D X-Ray Angiography . . . . . . . . . . . . . . . . . . . . . . . . 117 Hrvoje Bogunovi´c, Sven Lonˇcari´c Accurate Airway Wall Estimation Using Phase Congruency . . . . . . . . . . . . 125 Ra´ ul San Jos´e Est´epar, George G. Washko, Edwin K. Silverman, John J. Reilly, Ron Kikinis, Carl-Fredrik Westin Generation of Curved Planar Reformations from Magnetic Resonance Images of the Spine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Tomaˇz Vrtovec, S´ebastien Ourselin, Lavier Gomes, Boˇstjan Likar, Franjo Pernuˇs Automated Analysis of Multi Site MRI Phantom Data for the NIHPD Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Luke Fu, Vladimir Fonov, Bruce Pike, Alan C. Evans, D. Louis Collins Performance Evaluation of Grid-Enabled Registration Algorithms Using Bronze-Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Tristan Glatard, Xavier Pennec, Johan Montagnat Anisotropic Feature Extraction from Endoluminal Images for Detection of Intestinal Contractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Panagiota Spyridonos, Fernando Vilari˜ no, Jordi Vitri` a, Fernando Azpiroz, Petia Radeva
Table of Contents – Part II
XVII
Symmetric Curvature Patterns for Colonic Polyp Detection . . . . . . . . . . . . 169 Anna Jerebko, Sarang Lakare, Pascal Cathier, Senthil Periaswamy, Luca Bogoni 3D Reconstruction of Coronary Stents in Vivo Based on Motion Compensated X-Ray Angiograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Babak Movassaghi, Dirk Schaefer, Michael Grass, Volker Rasche, Onno Wink, Joel A. Garcia, James Y. Chen, John C. Messenger, John D. Carroll Retina Mosaicing Using Local Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Philippe C. Cattin, Herbert Bay, Luc Van Gool, G´ abor Sz´ekely
Brain Image Processing A New Cortical Surface Parcellation Model and Its Automatic Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 C´edric Clouchoux, Olivier Coulon, Jean-Luc Anton, Jean-Fran¸cois Mangin, Jean R´egis A System for Measuring Regional Surface Folding of the Neonatal Brain from MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Claudia Rodriguez-Carranza, Pratik Mukherjee, Daniel Vigneron, James Barkovich, Colin Studholme Atlas Guided Identification of Brain Structures by Combining 3D Segmentation and SVM Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Ayelet Akselrod-Ballin, Meirav Galun, Moshe John Gomori, Ronen Basri, Achi Brandt A Nonparametric Bayesian Approach to Detecting Spatial Activation Patterns in fMRI Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Seyoung Kim, Padhraic Smyth, Hal Stern Fast and Accurate Connectivity Analysis Between Functional Regions Based on DT-MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Dorit Merhof, Mirco Richter, Frank Enders, Peter Hastreiter, Oliver Ganslandt, Michael Buchfelder, Christopher Nimsky, G¨ unther Greiner Riemannian Graph Diffusion for DT-MRI Regularization . . . . . . . . . . . . . . . 234 Fan Zhang, Edwin R. Hancock
XVIII
Table of Contents – Part II
High-Dimensional White Matter Atlas Generation and Group Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Lauren O’Donnell, Carl-Fredrik Westin Fiber Bundle Estimation and Parameterization . . . . . . . . . . . . . . . . . . . . . . . 252 Marc Niethammer, Sylvain Bouix, Carl-Fredrik Westin, Martha E. Shenton Improved Correspondence for DTI Population Studies Via Unbiased Atlas Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 Casey Goodlett, Brad Davis, Remi Jean, John Gilmore, Guido Gerig Diffusion k-tensor Estimation from Q-ball Imaging Using Discretized Principal Axes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 Ørjan Bergmann, Gordon Kindlmann, Arvid Lundervold, Carl-Fredrik Westin Improved Map-Slice-to-Volume Motion Correction with B0 Inhomogeneity Correction: Validation of Activation Detection Algorithms Using ROC Curve Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 Desmond T.B. Yeo, Roshni R. Bhagalia, Boklye Kim Hippocampus-Specific fMRI Group Activation Analysis with Continuous M-Reps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 Paul A. Yushkevich, John A. Detre, Kathy Z. Tang, Angela Hoang, Dawn Mechanic-Hamilton, Mar´ıa A. Fern´ andez-Seara, Marc Korczykowski, Hui Zhang, James C. Gee Particle Filtering for Nonlinear BOLD Signal Analysis . . . . . . . . . . . . . . . . . 292 Leigh A. Johnston, Eugene Duff, Gary F. Egan Anatomically Informed Convolution Kernels for the Projection of fMRI Data on the Cortical Surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 Gr´egory Operto, R´emy Bulot, Jean-Luc Anton, Olivier Coulon A Landmark-Based Brain Conformal Parametrization with Automatic Landmark Tracking Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 Lok Ming Lui, Yalin Wang, Tony F. Chan, Paul M. Thompson Automated Topology Correction for Human Brain Segmentation . . . . . . . . 316 Lin Chen, Gudrun Wagenknecht A Fast and Automatic Method to Correct Intensity Inhomogeneity in MR Brain Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 Zujun Hou, Su Huang, Qingmao Hu, Wieslaw L. Nowinski
Table of Contents – Part II
XIX
A Digital Pediatric Brain Structure Atlas from T1-Weighted MR Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 Zuyao Y. Shan, Carlos Parra, Qing Ji, Robert J. Ogg, Yong Zhang, Fred H. Laningham, Wilburn E. Reddick Discriminative Analysis of Early Alzheimer’s Disease Based on Two Intrinsically Anti-correlated Networks with Resting-State fMRI . . . . . . . . . 340 Kun Wang, Tianzi Jiang, Meng Liang, Liang Wang, Lixia Tian, Xinqing Zhang, Kuncheng Li, Zhening Liu
Motion in Image Formation Rawdata-Based Detection of the Optimal Reconstruction Phase in ECG-Gated Cardiac Image Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . 348 Dirk Ertel, Marc Kachelrieß, Tobias Pflederer, Stephan Achenbach, Robert M. Lapp, Markus Nagel, Willi A. Kalender Sensorless Reconstruction of Freehand 3D Ultrasound Data . . . . . . . . . . . . 356 R. James Housden, Andrew H. Gee, Graham M. Treece, Richard W. Prager Motion-Compensated MR Valve Imaging with COMB Tag Tracking and Super-Resolution Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 Andrew W. Dowsey, Jennifer Keegan, Mirna Lerotic, Simon Thom, David Firmin, Guang-Zhong Yang Recovery of Liver Motion and Deformation Due to Respiration Using Laparoscopic Freehand 3D Ultrasound System . . . . . . . . . . . . . . . . . . . . . . . . 372 Masahiko Nakamoto, Hiroaki Hirayama, Yoshinobu Sato, Kozo Konishi, Yoshihiro Kakeji, Makoto Hashizume, Shinichi Tamura
Image Guided Intervention Numerical Simulation of Radio Frequency Ablation with State Dependent Material Parameters in Three Space Dimensions . . . . . . . . . . . . 380 Tim Kr¨ oger, Inga Altrogge, Tobias Preusser, Philippe L. Pereira, Diethard Schmidt, Andreas Weihusen, Heinz-Otto Peitgen Towards a Multi-modal Atlas for Neurosurgical Planning . . . . . . . . . . . . . . . 389 M. Mallar Chakravarty, Abbas F. Sadikot, Sanjay Mongia, Gilles Bertrand, D. Louis Collins
XX
Table of Contents – Part II
Using Registration Uncertainty Visualization in a User Study of a Simple Surgical Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 Amber L. Simpson, Burton Ma, Elvis C.S. Chen, Randy E. Ellis, A. James Stewart Ultrasound Monitoring of Tissue Ablation Via Deformation Model and Shape Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Emad Boctor, Michelle deOliveira, Michael Choti, Roger Ghanem, Russell Taylor, Gregory Hager, Gabor Fichtinger
Clinical Applications II Assessment of Airway Remodeling in Asthma: Volumetric Versus Surface Quantification Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Amaury Saragaglia, Catalin Fetita, Fran¸coise Prˆeteux Asymmetry of SPECT Perfusion Image Patterns as a Diagnostic Feature for Alzheimer’s Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 Vassili A. Kovalev, Lennart Thurfjell, Roger Lundqvist, Marco Pagani Predicting the Effects of Deep Brain Stimulation with Diffusion Tensor Based Electric Field Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Christopher R. Butson, Scott E. Cooper, Jaimie M. Henderson, Cameron C. McIntyre CFD Analysis Incorporating the Influence of Wall Motion: Application to Intracranial Aneurysms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438 Laura Dempere-Marco, Estanislao Oubel, Marcelo Castro, Christopher Putman, Alejandro Frangi, Juan Cebral A New CAD System for the Evaluation of Kidney Diseases Using DCE-MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 Ayman El-Baz, Rachid Fahmi, Seniha Yuksel, Aly A. Farag, William Miller, Mohamed A. El-Ghar, Tarek Eldiasty Generation and Application of a Probabilistic Breast Cancer Atlas . . . . . . 454 Daniel B. Russakoff, Akira Hasegawa Hierarchical Part-Based Detection of 3D Flexible Tubes: Application to CT Colonoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462 Adrian Barbu, Luca Bogoni, Dorin Comaniciu
Table of Contents – Part II
XXI
Detection of Protrusions in Curved Folded Surfaces Applied to Automated Polyp Detection in CT Colonography . . . . . . . . . . . . . . . . . . . 471 Cees van Wijk, Vincent F. van Ravesteijn, Frank M. Vos, Roel Truyen, Ayso H. de Vries, Jaap Stoker, Lucas J. van Vliet Part-Based Local Shape Models for Colon Polyp Detection . . . . . . . . . . . . . 479 Rahul Bhotika, Paulo R.S. Mendon¸ca, Saad A. Sirohey, Wesley D. Turner, Ying-lin Lee, Julie M. McCoy, Rebecca E.B. Brown, James V. Miller An Analysis of Early Studies Released by the Lung Imaging Database Consortium (LIDC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 Wesley D. Turner, Timothy P. Kelliher, James C. Ross, James V. Miller Detecting Acromegaly: Screening for Disease with a Morphable Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 Erik Learned-Miller, Qifeng Lu, Angela Paisley, Peter Trainer, Volker Blanz, Katrin Dedden, Ralph Miller A Boosting Cascade for Automated Detection of Prostate Cancer from Digitized Histology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504 Scott Doyle, Anant Madabhushi, Michael Feldman, John Tomaszeweski Optimal Sensor Placement for Predictive Cardiac Motion Modeling . . . . . 512 Qian Wu, Adrian J. Chung, Guang-Zhong Yang 4D Shape Registration for Dynamic Electrophysiological Cardiac Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520 Kevin Wilson, Gerard Guiraudon, Doug Jones, Terry M. Peters Estimation of Cardiac Electrical Propagation from Medical Image Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 Heye Zhang, Chun Lok Wong, Pengcheng Shi Ultrasound-Guided Percutaneous Scaphoid Pinning: Operator Variability and Comparison with Traditional Fluoroscopic Procedure . . . . 536 Maarten Beek, Purang Abolmaesumi, Suriya Luenam, Richard W. Sellens, David R. Pichora Cosmology Inspired Design of Biomimetic Tissue Engineering Templates with Gaussian Random Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544 Srinivasan Rajagopalan, Richard A. Robb
XXII
Table of Contents – Part II
Registration of Microscopic Iris Image Sequences Using Probabilistic Mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 Xubo B. Song, Andriy Myronenko, Stephen R. Plank, James T. Rosenbaum Tumor Therapeutic Response and Vessel Tortuosity: Preliminary Report in Metastatic Breast Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 Elizabeth Bullitt, Nancy U. Lin, Matthew G. Ewend, Donglin Zeng, Eric P. Winer, Lisa A. Carey, J. Keith Smith Harvesting the Thermal Cardiac Pulse Signal . . . . . . . . . . . . . . . . . . . . . . . . . 569 Nanfei Sun, Ioannis Pavlidis, Marc Garbey, Jin Fei On Mobility Analysis of Functional Sites from Time Lapse Microscopic Image Sequences of Living Cell Nucleus . . . . . . . . . . . . . . . . . . . 577 Lopamudra Mukherjee, Vikas Singh, Jinhui Xu, Kishore S. Malyavantham, Ronald Berezney Tissue Characterization Using Dimensionality Reduction and Fluorescence Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586 Karim Lekadir, Daniel S. Elson, Jose Requejo-Isidro, Christopher Dunsby, James McGinty, Neil Galletly, Gordon Stamp, Paul M.W. French, Guang-Zhong Yang
Registration II A Method for Registering Diffusion Weighted Magnetic Resonance Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594 Xiaodong Tao, James V. Miller A High-Order Solution for the Distribution of Target Registration Error in Rigid-Body Point-Based Registration . . . . . . . . . . . . . . . . . . . . . . . . 603 Mehdi Hedjazi Moghari, Purang Abolmaesumi Fast Elastic Registration for Adaptive Radiotherapy . . . . . . . . . . . . . . . . . . . 612 Urban Malsch, Christian Thieke, Rolf Bendl Registering Histological and MR Images of Prostate for Image-Based Cancer Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620 Yiqiang Zhan, Michael Feldman, John Tomaszeweski, Christos Davatzikos, Dinggang Shen Affine Registration of Diffusion Tensor MR Images . . . . . . . . . . . . . . . . . . . . 629 Mika Pollari, Tuomas Neuvonen, Jyrki L¨ otj¨ onen
Table of Contents – Part II
XXIII
Analytic Expressions for Fiducial and Surface Target Registration Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 Burton Ma, Randy E. Ellis Bronchoscope Tracking Based on Image Registration Using Multiple Initial Starting Points Estimated by Motion Prediction . . . . . . . . . . . . . . . . 645 Kensaku Mori, Daisuke Deguchi, Takayuki Kitasaka, Yasuhito Suenaga, Hirotsugu Takabatake, Masaki Mori, Hiroshi Natori, Calvin R. Maurer Jr. 2D/3D Registration for Measurement of Implant Alignment After Total Hip Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653 Branislav Jaramaz, Kort Eckman 3D/2D Model-to-Image Registration Applied to TIPS Surgery . . . . . . . . . . 662 Julien Jomier, Elizabeth Bullitt, Mark Van Horn, Chetna Pathak, Stephen R. Aylward Ray-Tracing Based Registration for HRCT Images of the Lungs . . . . . . . . . 670 Sata Busayarat, Tatjana Zrimec Physics-Based Elastic Image Registration Using Splines and Including Landmark Localization Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678 Stefan W¨ orz, Karl Rohr Piecewise-Quadrilateral Registration by Optical Flow – Applications in Contrast-Enhanced MR Imaging of the Breast . . . . . . . . . . . . . . . . . . . . . . 686 Michael S. Froh, David C. Barber, Kristy K. Brock, Donald B. Plewes, Anne L. Martel Iconic Feature Registration with Sparse Wavelet Coefficients . . . . . . . . . . . . 694 Pascal Cathier Diffeomorphic Registration Using B-Splines . . . . . . . . . . . . . . . . . . . . . . . . . . 702 Daniel Rueckert, Paul Aljabar, Rolf A. Heckemann, Joseph V. Hajnal, Alexander Hammers Automatic Point Landmark Matching for Regularizing Nonlinear Intensity Registration: Application to Thoracic CT Images . . . . . . . . . . . . . 710 Martin Urschler, Christopher Zach, Hendrik Ditt, Horst Bischof Biomechanically Based Elastic Breast Registration Using Mass Tensor Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718 Liesbet Roose, Wouter Mollemans, Dirk Loeckx, Frederik Maes, Paul Suetens
XXIV
Table of Contents – Part II
Intensity Gradient Based Registration and Fusion of Multi-modal Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726 Eldad Haber, Jan Modersitzki A Novel Approach for Image Alignment Using a Markov–Gibbs Appearance Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734 Ayman El-Baz, Asem Ali, Aly A. Farag, Georgy Gimel’farb Evaluation on Similarity Measures of a Surface-to-Image Registration Technique for Ultrasound Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742 Wei Shao, Ruoyun Wu, Keck Voon Ling, Choon Hua Thng, Henry Sun Sien Ho, Christopher Wai Sam Cheng, Wan Sing Ng Backward-Warping Ultrasound Reconstruction for Improving Diagnostic Value and Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750 Wolfgang Wein, Fabian Pache, Barbara R¨ oper, Nassir Navab Integrated Four Dimensional Registration and Segmentation of Dynamic Renal MR Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758 Ting Song, Vivian S. Lee, Henry Rusinek, Samson Wong, Andrew F. Laine
Segmentation II Fast and Robust Clinical Triple-Region Image Segmentation Using One Level Set Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766 Shuo Li, Thomas Fevens, Adam Krzy˙zak, Chao Jin, Song Li Fast and Robust Semi-automatic Liver Segmentation with Haptic Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774 Erik Vidholm, Sven Nilsson, Ingela Nystr¨ om Objective PET Lesion Segmentation Using a Spherical Mean Shift Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782 Thomas B. Sebastian, Ravindra M. Manjeshwar, Timothy J. Akhurst, James V. Miller Multilevel Segmentation and Integrated Bayesian Model Classification with an Application to Brain Tumor Segmentation . . . . . . . . . . . . . . . . . . . . 790 Jason J. Corso, Eitan Sharon, Alan Yuille A New Adaptive Probabilistic Model of Blood Vessels for Segmenting MRA Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799 Ayman El-Baz, Aly A. Farag, Georgy Gimel’farb, Mohamed A. El-Ghar, Tarek Eldiasty
Table of Contents – Part II
XXV
Segmentation of Thalamic Nuclei from DTI Using Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807 Ulas Ziyan, David Tuch, Carl-Fredrik Westin Multiclassifier Fusion in Human Brain MR Segmentation: Modelling Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815 Rolf A. Heckemann, Joseph V. Hajnal, Paul Aljabar, Daniel Rueckert, Alexander Hammers Active Surface Approach for Extraction of the Human Cerebral Cortex from MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823 Simon F. Eskildsen, Lasse R. Østergaard Integrated Graph Cuts for Brain MRI Segmentation . . . . . . . . . . . . . . . . . . . 831 Zhuang Song, Nicholas Tustison, Brian Avants, James C. Gee Validation of Image Segmentation by Estimating Rater Bias and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839 Simon K. Warfield, Kelly H. Zou, William M. Wells A General Framework for Image Segmentation Using Ordered Spatial Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848 Mika¨el Rousson, Chenyang Xu Constructing a Probabilistic Model for Automated Liver Region Segmentation Using Non-contrast X-Ray Torso CT images . . . . . . . . . . . . . 856 Xiangrong Zhou, Teruhiko Kitagawa, Takeshi Hara, Hiroshi Fujita, Xuejun Zhang, Ryujiro Yokoyama, Hiroshi Kondo, Masayuki Kanematsu, Hiroaki Hoshi Modeling of Intensity Priors for Knowledge-Based Level Set Algorithm in Calvarial Tumors Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 864 Aleksandra Popovic, Ting Wu, Martin Engelhardt, Klaus Radermacher A Comparison of Breast Tissue Classification Techniques . . . . . . . . . . . . . . 872 Arnau Oliver, Jordi Freixenet, Robert Mart´ı, Reyer Zwiggelaar Analysis of Skeletal Microstructure with Clinical Multislice CT . . . . . . . . . 880 ¨ Joel Petersson, Torkel Brismar, Orjan Smedby An Energy Minimization Approach to the Data Driven Editing of Presegmented Images/Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888 Leo Grady, Gareth Funka-Lea
XXVI
Table of Contents – Part II
Accurate Banded Graph Cut Segmentation of Thin Structures Using Laplacian Pyramids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896 Ali Kemal Sinop, Leo Grady Segmentation of Neck Lymph Nodes in CT Datasets with Stable 3D Mass-Spring Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904 Jana Dornheim, Heiko Seim, Bernhard Preim, Ilka Hertel, Gero Strauss Supervised Probabilistic Segmentation of Pulmonary Nodules in CT Scans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 912 Bram van Ginneken MR Image Segmentation Using Phase Information and a Novel Multiscale Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 920 Pierrick Bourgeat, Jurgen Fripp, Peter Stanwell, Saadallah Ramadan, S´ebastien Ourselin Multi-resolution Vessel Segmentation Using Normalized Cuts in Retinal Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 928 Wenchao Cai, Albert C.S. Chung
Brain Analysis and Registration Simulation of Local and Global Atrophy in Alzheimer’s Disease Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937 Oscar Camara-Rey, Martin Schweiger, Rachael I. Scahill, William R. Crum, Julia A. Schnabel, Derek L.G. Hill, Nick C. Fox Brain Surface Conformal Parameterization with Algebraic Functions . . . . 946 Yalin Wang, Xianfeng Gu, Tony F. Chan, Paul M. Thompson, Shing-Tung Yau Logarithm Odds Maps for Shape Representation . . . . . . . . . . . . . . . . . . . . . . 955 Kilian M. Pohl, John Fisher, Martha Shenton, Robert W. McCarley, W. Eric L. Grimson, Ron Kikinis, William M. Wells Multi-modal Image Registration Using the Generalized Survival Exponential Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964 Shu Liao, Albert C.S. Chung Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 973
Table of Contents – Part I
Bone Shape Analysis Quantitative Vertebral Morphometry Using Neighbor-Conditional Shape Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marleen de Bruijne, Michael T. Lund, L´ aszl´ o B. Tank´ o, Paola P. Pettersen, Mads Nielsen Anatomically Constrained Deformation for Design of Cranial Implant: Methodology and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ting Wu, Martin Engelhardt, Lorenz Fieten, Aleksandra Popovic, Klaus Radermacher Open-Curve Shape Correspondence Without Endpoint Correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Theodor Richardson, Song Wang Reconstruction of Patient-Specific 3D Bone Surface from 2D Calibrated Fluoroscopic Images and Point Distribution Model . . . . . . . . . . . . . . . . . . . . ´ Guoyan Zheng, Miguel A.G. Ballester, Martin Styner, Lutz-Peter Nolte
1
9
17
25
Robotics and Tracking A Pilot Study of Robot-Assisted Cochlear Implant Surgery Using Steerable Electrode Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jian Zhang, Kai Xu, Nabil Simaan, Spiros Manolidis Robot-Assisted Prostate Brachytherapy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yan Yu, Tarun Podder, Yongde Zhang, Wan-Sing Ng, Vladimir Misic, Jason Sherman, Luke Fu, Dave Fuller, Edward Messing, Deborah Rubens, John Strang, Ralph Brasacchio Design and Validation of an Image-Guided Robot for Small Animal Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peter Kazanzides, Jenghwa Chang, Iulian Iordachita, Jack Li, C. Clifton Ling, Gabor Fichtinger GPU Based Real-Time Instrument Tracking with Three Dimensional Ultrasound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Paul M. Novotny, Jeffrey A. Stoll, Nikolay V. Vasilyev, Pedro J. del Nido, Pierre E. Dupont, Robert D. Howe
33
41
50
58
XXVIII
Table of Contents – Part I
Segmentation Shape-Driven 3D Segmentation Using Spherical Wavelets . . . . . . . . . . . . . . Delphine Nain, Steven Haker, Aaron Bobick, Allen Tannenbaum
66
Artificially Enlarged Training Set in Image Segmentation . . . . . . . . . . . . . . Tuomas T¨ olli, Juha Koikkalainen, Kirsi Lauerma, Jyrki L¨ otj¨ onen
75
Segmenting Lung Fields in Serial Chest Radiographs Using Both Population and Patient-Specific Shape Statistics . . . . . . . . . . . . . . . . . . . . . Yonghong Shi, Feihu Qi, Zhong Xue, Kyoko Ito, Hidenori Matsuo, Dinggang Shen 4D Shape Priors for a Level Set Segmentation of the Left Myocardium in SPECT Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Timo Kohlberger, Daniel Cremers, Mika¨el Rousson, Ramamani Ramaraj, Gareth Funka-Lea
83
92
Cell Segmentation Using Coupled Level Sets and Graph-Vertex Coloring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Sumit K. Nath, Kannappan Palaniappan, Filiz Bunyak
Analysis of Diffusion Tensor MRI 3D Histological Reconstruction of Fiber Tracts and Direct Comparison with Diffusion Tensor MRI Tractography . . . . . . . . . . . . . . . . . 109 Julien Dauguet, Sharon Peled, Vladimir Berezovskii, Thierry Delzescaux, Simon K. Warfield, Richard Born, Carl-Fredrik Westin Rician Noise Removal in Diffusion Tensor MRI . . . . . . . . . . . . . . . . . . . . . . . 117 Saurav Basu, P. Thomas Fletcher, Ross T. Whitaker Anisotropy Creases Delineate White Matter Structure in Diffusion Tensor MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Gordon Kindlmann, Xavier Tricoche, Carl-Fredrik Westin
Shape Analysis and Morphometry Evaluation of 3-D Shape Reconstruction of Retinal Fundus . . . . . . . . . . . . 134 Tae Eun Choe, Isaac Cohen, Gerard Medioni, Alexander C. Walsh, SriniVas R. Sadda
Table of Contents – Part I
XXIX
Comparing the Similarity of Statistical Shape Models Using the Bhattacharya Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Kolawole O. Babalola, Tim F. Cootes, Brian Patenaude, Anil Rao, Mark Jenkinson Improving Segmentation of the Left Ventricle Using a Two-Component Statistical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Sebastian Zambal, Jiˇr´ı Hlad˚ uvka, Katja B¨ uhler An Approach for the Automatic Cephalometric Landmark Detection Using Mathematical Morphology and Active Appearance Models . . . . . . . . 159 Sylvia Rueda, Mariano Alca˜ niz Automatic Segmentation of Jaw Tissues in CT Using Active Appearance Models and Semi-automatic Landmarking . . . . . . . . . . . . . . . . . 167 Sylvia Rueda, Jos´e Antonio Gil, Rapha¨el Pichery, Mariano Alca˜ niz Morphometric Analysis for Pathological Abnormality Detection in the Skull Vaults of Adolescent Idiopathic Scoliosis Girls . . . . . . . . . . . . . 175 Lin Shi, Pheng Ann Heng, Tien-Tsin Wong, Winnie C.W. Chu, Benson H.Y. Yeung, Jack C.Y. Cheng A Novel Quantitative Validation of the Cortical Surface Reconstruction Algorithm Using MRI Phantom: Issues on Local Geometric Accuracy and Cortical Thickness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Junki Lee, Jong-Min Lee, Jae-Hun Kim, In Young Kim, Alan C. Evans, Sun I. Kim Multivariate Statistics of the Jacobian Matrices in Tensor Based Morphometry and Their Application to HIV/AIDS . . . . . . . . . . . . . . . . . . . . 191 Natasha Lepore, Caroline A. Brun, Ming-Chang Chiang, Yi-Yu Chou, Rebecca A. Dutton, Kiralee M. Hayashi, Oscar L. Lopez, Howard J. Aizenstein, Arthur W. Toga, James T. Becker, Paul M. Thompson Highly Accurate Segmentation of Brain Tissue and Subcortical Gray Matter from Newborn MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Neil I. Weisenfeld, Andrea U.J. Mewes, Simon K. Warfield Transformation Model and Constraints Cause Bias in Statistics on Deformation Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Torsten Rohlfing Limits on Estimating the Width of Thin Tubular Structures in 3D Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Stefan W¨ orz, Karl Rohr
XXX
Table of Contents – Part I
Toward Interactive User Guiding Vessel Axis Extraction from Gray-scale Angiograms: An Optimization Framework . . . . . . . . . . . . 223 Wilbur C.K. Wong, Albert C.S. Chung A Statistical Parts-Based Appearance Model of Inter-subject Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Matthew Toews, D. Louis Collins, Tal Arbel The Entire Regularization Path for the Support Vector Domain Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Karl Sj¨ ostrand, Rasmus Larsen A New Closed-Form Information Metric for Shape Analysis . . . . . . . . . . . . 249 Adrian Peter, Anand Rangarajan
Simulation and Interaction Feasibility of Patient Specific Aortic Blood Flow CFD Simulation . . . . . . . 257 Johan Svensson, Roland G˚ ardhagen, Einar Heiberg, Tino Ebbers, Dan Loyd, Toste L¨ anne, Matts Karlsson A Model Based Approach for Multi-lead ECG Array Layout Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 Christoph Hinterm¨ uller, Michael Seger, Bernhard Pfeifer, Gerald Fischer, Bernhard Tilg Simulation of Acquisition Artefacts in MR Scans: Effects on Automatic Measures of Brain Atrophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 Oscar Camara-Rey, Beatrix I. Sneller, Gerard R. Ridgway, Ellen Garde, Nick C. Fox, Derek L.G. Hill Non-rigid 2D-3D Registration with Catheter Tip EM Tracking for Patient Specific Bronchoscope Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 281 Fani Deligianni, Adrian J. Chung, Guang-Zhong Yang Anatomical Modelling of the Musculoskeletal System from MRI . . . . . . . . . 289 Benjamin Gilles, Laurent Moccozet, Nadia Magnenat-Thalmann Towards a Statistical Atlas of Cardiac Fiber Structure . . . . . . . . . . . . . . . . . 297 Jean-Marc Peyrat, Maxime Sermesant, Xavier Pennec, Herv´e Delingette, Chenyang Xu, Elliot McVeigh, Nicholas Ayache A Comparison of Needle Bending Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Ehsan Dehghan, Orcun Goksel, Septimiu E. Salcudean
Table of Contents – Part I
XXXI
An Inverse Kinematics Model for Post-operative Knee . . . . . . . . . . . . . . . . . 313 Elvis C.S. Chen, Randy E. Ellis Online Parameter Estimation for Surgical Needle Steering Model . . . . . . . . 321 Kai Guo Yan, Tarun Podder, Di Xiao, Yan Yu, Tien-I Liu, Keck Voon Ling, Wan Sing Ng Realistic Simulated MRI and SPECT Databases . . . . . . . . . . . . . . . . . . . . . . 330 Berengere Aubert-Broche, Christophe Grova, Anthonin Reilhac, Alan C. Evans, D. Louis Collins Extrapolating Tumor Invasion Margins for Physiologically Determined Radiotherapy Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 Ender Konuko˜glu, Olivier Clatz, Pierre-Yves Bondiau, Herv´e Delingette, Nicholas Ayache Simultaneous Stereoscope Localization and Soft-Tissue Mapping for Minimal Invasive Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Peter Mountney, Danail Stoyanov, Andrew Davison, Guang-Zhong Yang Real-Time Endoscopic Mosaicking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 Sharmishtaa Seshamani, William Lau, Gregory Hager Depth Perception - A Major Issue in Medical AR: Evaluation Study by Twenty Surgeons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 Tobias Sielhorst, Christoph Bichlmeier, Sandro Michael Heining, Nassir Navab Hybrid Navigation Interface for Orthopedic and Trauma Surgery . . . . . . . . 373 Joerg Traub, Philipp Stefan, Sandro Michael Heining, Tobias Sielhorst, Christian Riquarts, Ekkehard Euler, Nassir Navab Virtual Fly-Over: A New Visualization Technique for Virtual Colonoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 M. Sabry Hassouna, Aly A. Farag, Robert Falk Viscoelasticity Modeling of the Prostate Region Using Vibro-elastography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Septimiu E. Salcudean, Daniel French, Simon Bachmann, Reza Zahiri-Azar, Xu Wen, W. James Morris Simultaneous Reconstruction of Tissue Attenuation and Radioactivity Maps in SPECT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 Yi Tian, Huafeng Liu, Pengcheng Shi
XXXII
Table of Contents – Part I
Statistical Finite Element Model for Bone Shape and Biomechanical Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Laura Belenguer-Querol, Philippe B¨ uchler, Daniel Rueckert, ´ Gonzales Ballester Lutz-Peter Nolte, Miguel A.
Robotics and Intervention Fetus Support Manipulator with Flexible Balloon-Based Stabilizer for Endoscopic Intrauterine Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 Hongen Liao, Hirokazu Suzuki, Kiyoshi Matsumiya, Ken Masamune, Takeyoshi Dohi, Toshio Chiba Recovery of Surgical Workflow Without Explicit Models . . . . . . . . . . . . . . . 420 Seyed-Ahmad Ahmadi, Tobias Sielhorst, Ralf Stauder, Martin Horn, Hubertus Feussner, Nassir Navab Comparison of Control Modes of a Hand-Held Robot for Laparoscopic Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Oliver Tonet, Francesco Focacci, Marco Piccigallo, Filippo Cavallo, Miyuki Uematsu, Giuseppe Megali, Paolo Dario “Virtual Touch”: An Efficient Registration Method for Catheter Navigation in Left Atrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Hua Zhong, Takeo Kanade, David Schwartzman Towards Scarless Surgery: An Endoscopic-Ultrasound Navigation System for Transgastric Access Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Ra´ ul San Jos´e Est´epar, Nicholas Stylopoulos, Randy E. Ellis, Eigil Samset, Carl-Fredrik Westin, Christopher Thompson, Kirby Vosburgh New 4-D Imaging for Real-Time Intraoperative MRI: Adaptive 4-D Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454 Junichi Tokuda, Shigehiro Morikawa, Hasnine A. Haque, Tetsuji Tsukamoto, Kiyoshi Matsumiya, Hongen Liao, Ken Masamune, Takeyoshi Dohi The Use of Super Resolution in Robotic Assisted Minimally Invasive Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462 Mirna Lerotic, Guang-Zhong Yang Modeling the Human Aorta for MR-Driven Real-Time Virtual Endoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470 Klaus J. Kirchberg, Andreas Wimmer, Christine H. Lorenz
Table of Contents – Part I
XXXIII
Adaptive Script Based Animations for Intervention Planning . . . . . . . . . . . 478 Konrad Muehler, Ragnar Bade, Bernhard Preim Towards Optimization of Probe Placement for Radio-Frequency Ablation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486 Inga Altrogge, Tim Kr¨ oger, Tobias Preusser, Christof B¨ uskens, Philippe L. Pereira, Diethard Schmidt, Andreas Weihusen, Heinz-Otto Peitgen C-arm Tracking and Reconstruction Without an External Tracker . . . . . . . 494 Ameet Jain, Gabor Fichtinger Rigid-Flexible Outer Sheath Model Using Slider Linkage Locking Mechanism and Air Pressure for Endoscopic Surgery . . . . . . . . . . . . . . . . . . 503 Akihiko Yagi, Kiyoshi Matsumiya, Ken Masamune, Hongen Liao, Takeyoshi Dohi Combined Endo- and Exoscopic Semi-robotic Manipulator System for Image Guided Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 Stefanos Serefoglou, Wolfgang Lauer, Axel Perneczky, Theodor Lutze, Klaus Radermacher The Feasibility of MR-Image Guided Prostate Biopsy Using Piezoceramic Motors Inside or Near to the Magnet Isocentre . . . . . . . . . . . 519 Haytham Elhawary, Aleksander Zivanovic, Marc Rea, Brian Davies, Collin Besant, Donald McRobbie, Nandita de Souza, Ian Young, Michael Lamp´erth The Role of Insertion Points in the Detection and Positioning of Instruments in Laparoscopy for Robotic Tasks . . . . . . . . . . . . . . . . . . . . . . 527 Christophe Doignon, Florent Nageotte, Michel de Mathelin Automatic Localization of Laparoscopic Instruments for the Visual Servoing of an Endoscopic Camera Holder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 Sandrine Voros, Jean-Alexandre Long, Philippe Cinquin A Novel Robotic Laser Ablation System for Precision Neurosurgery with Intraoperative 5-ALA-Induced PpIX Fluorescence Detection . . . . . . . 543 Masafumi Noguchi, Eisuke Aoki, Daiki Yoshida, Etsuko Kobayashi, Shigeru Omori, Yoshihiro Muragaki, Hiroshi Iseki, Katsushige Nakamura, Ichiro Sakuma Visual Servoing for Intraoperative Positioning and Repositioning of Mobile C-arms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551 Nassir Navab, Stefan Wiesner, Selim Benhimane, Ekkehard Euler, Sandro Michael Heining
XXXIV
Table of Contents – Part I
Navigated Three Dimensional Beta Probe for Optimal Cancer Resection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 Thomas Wendler, Joerg Traub, Sibylle Ilse Ziegler, Nassir Navab Development of Safe Mechanism for Surgical Robots Using Equilibrium Point Control Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 Shinsuk Park, Hokjin Lim, Byeong-sang Kim, Jae-bok Song Real Time Adaptive Filtering for Digital X-Ray Applications . . . . . . . . . . . 578 Olivier Bockenbach, Michel Mangin, Sebastian Schuberth
Cardio-vascular Applications Semiautomatic Volume Conductor Modeling Pipeline for Imaging the Cardiac Electrophysiology Noninvasively . . . . . . . . . . . . . . . . . . . . . . . . . 588 Bernhard Pfeifer, Michael Seger, Christoph Hinterm¨ uller, Gerald Fischer, Friedrich Hanser, Robert Modre, Hannes M¨ uhlthaler, Bernhard Tilg Atrial Septal Defect Tracking in 3D Cardiac Ultrasound . . . . . . . . . . . . . . . 596 Marius George Linguraru, Nikolay V. Vasilyev, Pedro J. del Nido, Robert D. Howe Intra-operative Volume Imaging of the Left Atrium and Pulmonary Veins with Rotational X-Ray Angiography . . . . . . . . . . . . . . . . . . . . . . . . . . . 604 Robert Manzke, Vivek Y. Reddy, Sandeep Dalal, Annemarie Hanekamp, Volker Rasche, Raymond C. Chan Phase-Based Registration of Multi-view Real-Time Three-Dimensional Echocardiographic Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612 Vicente Grau, Harald Becher, J. Alison Noble Carotid Artery Segmentation Using an Outlier Immune 3D Active Shape Models Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620 Karim Lekadir, Guang-Zhong Yang Estimation of Cardiac Hyperelastic Material Properties from MRI Tissue Tagging and Diffusion Tensor Imaging . . . . . . . . . . . . . . . . . . . . . . . . . 628 Kevin F. Augenstein, Brett R. Cowan, Ian J. LeGrice, Alistair A. Young Boosting and Nonparametric Based Tracking of Tagged MRI Cardiac Boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636 Zhen Qian, Dimitris N. Metaxas, Leon Axel
Table of Contents – Part I
XXXV
A Region Based Algorithm for Vessel Detection in Retinal Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645 Ke Huang, Michelle Yan Carotid Artery and Jugular Vein Tracking and Differentiation Using Spatiotemporal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654 David Wang, Roberta Klatzky, Nikhil Amesur, George Stetten
Image Analysis in Oncology Appearance Models for Robust Segmentation of Pulmonary Nodules in 3D LDCT Chest Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662 Aly A. Farag, Ayman El-Baz, Georgy Gimel’farb, Robert Falk, Mohamed A. El-Ghar, Tarek Eldiasty, Salwa Elshazly Intensity-Based Volumetric Registration of Contrast-Enhanced MR Breast Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671 Yin Sun, Chye Hwang Yan, Sim-Heng Ong, Ek Tsoon Tan, Shih-Chang Wang Semi-parametric Analysis of Dynamic Contrast-Enhanced MRI Using Bayesian P-Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679 Volker J. Schmid, Brandon Whitcher, Guang-Zhong Yang
Brain Atlases and Segmentation Segmentation of Brain MRI in Young Children . . . . . . . . . . . . . . . . . . . . . . . 687 Maria Murgasova, Leigh Dyet, David Edwards, Mary Rutherford, Joseph V. Hajnal, Daniel Rueckert A Learning Based Algorithm for Automatic Extraction of the Cortical Sulci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695 Songfeng Zheng, Zhuowen Tu, Alan L. Yuille, Allan L. Reiss, Rebecca A. Dutton, Agatha D. Lee, Albert M. Galaburda, Paul M. Thompson, Ivo Dinov, Arthur W. Toga Probabilistic Brain Atlas Encoding Using Bayesian Inference . . . . . . . . . . . 704 Koen Van Leemput Atlas Stratification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712 Daniel J. Blezek, James V. Miller
XXXVI
Table of Contents – Part I
Cardiac Motion Analysis Physiome Model Based State-Space Framework for Cardiac Kinematics Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720 Ken C.L. Wong, Heye Zhang, Huafeng Liu, Pengcheng Shi Automated Detection of Left Ventricle in 4D MR Images: Experience from a Large Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728 Xiang Lin, Brett R. Cowan, Alistair A. Young Pairwise Active Appearance Model and Its Application to Echocardiography Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736 S. Kevin Zhou, Jie Shao, Bogdan Georgescu, Dorin Comaniciu Cardiac Motion Recovery: Continuous Dynamics, Discrete Measurements, and Optimal Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744 Shan Tong, Pengcheng Shi
Clinical Applications I HMM Assessment of Quality of Movement Trajectory in Laparoscopic Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 752 Julian J.H. Leong, Marios Nicolaou, Louis Atallah, George P. Mylonas, Ara W. Darzi, Guang-Zhong Yang A Novel MRI Texture Analysis of Demyelination and Inflammation in Relapsing-Remitting Experimental Allergic Encephalomyelitis . . . . . . . . 760 Yunyan Zhang, Jennifer Wells, Richard Buist, James Peeling, V. Wee Yong, J. Ross Mitchell Comparison of Different Targeting Methods for Subthalamic Nucleus Deep Brain Stimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768 Ting Guo, Kirk W. Finnis, Sean C.L. Deoni, Andrew G. Parrent, Terry M. Peters Objective Outcome Evaluation of Breast Surgery . . . . . . . . . . . . . . . . . . . . . 776 Giovanni Maria Farinella, Gaetano Impoco, Giovanni Gallo, Salvatore Spoto, Giuseppe Catanuto, Maurizio B. Nava Automatic Detection and Segmentation of Ground Glass Opacity Nodules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784 Jinghao Zhou, Sukmoon Chang, Dimitris N. Metaxas, Binsheng Zhao, Lawrence H. Schwartz, Michelle S. Ginsberg
Table of Contents – Part I
XXXVII
Imaging of 3D Cardiac Electrical Activity: A Model-Based Recovery Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792 Linwei Wang, Heye Zhang, Pengcheng Shi, Huafeng Liu Segmentation of the Surfaces of the Retinal Layer from OCT Images . . . . 800 Mona Haeker, Michael Abr` amoff, Randy Kardon, Milan Sonka Spinal Crawlers: Deformable Organisms for Spinal Cord Segmentation and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808 Chris McIntosh, Ghassan Hamarneh Markerless Endoscopic Registration and Referencing . . . . . . . . . . . . . . . . . . . 816 Christian Wengert, Philippe C. Cattin, John M. Duff, Charles Baur, G´ abor Sz´ekely Real-Time Tracking of Contrast Bolus Propagation in Continuously Moving Table MR Angiography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824 Joshua Trzasko, Stephen Riederer, Armando Manduca Preventing Signal Degradation During Elastic Matching of Noisy DCE-MR Eye Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832 Kishore Mosaliganti, Guang Jia, Johannes Heverhagen, Raghu Machiraju, Joel Saltz, Michael Knopp Automated Analysis of the Mitotic Phases of Human Cells in 3D Fluorescence Microscopy Image Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 840 Nathalie Harder, Felipe Mora-Berm´ udez, William J. Godinez, Jan Ellenberg, Roland Eils, Karl Rohr
Registration I Spline-Based Probabilistic Model for Anatomical Landmark Detection . . . 849 Camille Izard, Bruno Jedynak, Craig E.L. Stark Affine and Deformable Registration Based on Polynomial Expansion . . . . 857 Gunnar Farneb¨ ack, Carl-Fredrik Westin Simultaneous Multiple Image Registration Method for T1 Estimation in Breast MRI Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865 Jonathan Lok-Chuen Lo, Michael Brady, Niall Moore New CTA Protocol and 2D-3D Registration Method for Liver Catheterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873 Martin Groher, Nicolas Padoy, Tobias F. Jakobs, Nassir Navab
XXXVIII Table of Contents – Part I
A New Registration/Visualization Paradigm for CT-Fluoroscopy Guided RF Liver Ablation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 882 Ruxandra Micu, Tobias F. Jakobs, Martin Urschler, Nassir Navab A New Method for CT to Fluoroscope Registration Based on Unscented Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 891 Ren Hui Gong, A. James Stewart, Purang Abolmaesumi Automated 3D Freehand Ultrasound Calibration with Real-Time Accuracy Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 899 Thomas Kuiran Chen, Purang Abolmaesumi, Adrian D. Thurston, Randy E. Ellis Non-rigid Registration of 3D Multi-channel Microscopy Images of Cell Nuclei . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907 Siwei Yang, Daniela K¨ ohler, Kathrin Teller, Thomas Cremer, Patricia Le Baccon, Edith Heard, Roland Eils, Karl Rohr Fast Deformable Registration of 3D-Ultrasound Data Using a Variational Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915 Darko Zikic, Wolfgang Wein, Ali Khamene, Dirk-Andr´e Clevert, Nassir Navab A Log-Euclidean Framework for Statistics on Diffeomorphisms . . . . . . . . . . 924 Vincent Arsigny, Olivier Commowick, Xavier Pennec, Nicholas Ayache Nonrigid 3D Brain Registration Using Intensity/Feature Information . . . . 932 Christine DeLorenzo, Xenophon Papademetris, Kun Wu, Kenneth P. Vives, Dennis Spencer, James S. Duncan Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 941
Robust Active Shape Models: A Robust, Generic and Simple Automatic Segmentation Tool Julien Abi-Nahed1,2 , Marie-Pierre Jolly1 , and Guang-Zhong Yang2 1
2
Imaging and Visualization Department, Siemens Corporate Research Princeton, New Jersey, USA Royal Society/Wolfson Foundation Medical Image Computing Laboratory Imperial College London, London, UK {julien.abi.nahed, marie-pierre.jolly}@siemens.com, {julien.nahed, g.z.yang}@imperial.ac.uk
Abstract. This paper presents a new segmentation algorithm which combines active shape model and robust point matching techniques. It can use any simple feature detector to extract a large number of feature points in the image. Robust point matching is then used to search for the correspondences between feature and model points while the model is being deformed along the modes of variation of the active shape model. Although the algorithm is generic, it is particularly suited for medical imaging applications where prior knowledge is available. The value of the proposed method is examined with two different medical imaging modalities (Ultrasound, MRI) and in both 2D and 3D. The experiments have shown that the proposed algorithm is immune to missing feature points and noise. It has demonstrated significant improvements when compared to RPM-TPS and ASM alone.
1
Introduction
The Active Shape Model (ASM) [1] paradigm is a popular method for image segmentation when a priori information about the shape of the object of interest is available. The effectiveness of ASM depends on the ability to extract the correct feature points from the data. In fact, the model is represented by a collection of points, each trying to find a corresponding feature point in its vicinity. The ambiguity occurs when multiple feature points are possible candidates for a single model point. In this case, the model points do not have enough information to choose their correct partners especially when the initialization is far away from the correct solution. In other words poor segmentation results are often introduced when features are chosen locally for each model point. In practice, it is easy to extract a range of candidate feature points for each model point, but the process of model/feature matching while rejecting outliers is not trivial. There is a rich literature dealing with the robustness of ASM. Early methods tried to include intensity information leading to different versions of Active Appearance Models (AAM) [2]. From our experience, however, training and fitting AAM is more tedious, and over-fitting due to the higher dimensionality of the model R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 1–8, 2006. c Springer-Verlag Berlin Heidelberg 2006
2
J. Abi-Nahed, M.-P. Jolly, and G.-Z. Yang
can be a significant issue for practical applications. Realizing the dramatic effect of the outliers on ASM’s local search, Rogers et al. [3] tried to estimate ASM parameters using robust estimators instead of the traditional least squares. The idea is very interesting since least squares is not the optimal estimator when the distribution of the residuals between model and feature points is not Gaussian. Despite of increasing robustness achieved, the technique is still based on a local search for detecting feature points, thus leading to local minima. On the other hand, the original Robust Point Matching (RPM) algorithm [4] is a non-rigid point matching algorithm which allows a global to local search for the correspondences between two point sets. An important characteristic of RPM framework is that it allows the incorporation of deformations using any method. In [4] thin-plate splines were used to introduce deformations leading to the RPM-TPS algorithm. Also, RPM has the ability to match point sets of arbitrary size by rejecting outliers from both sets. Motivated by these observations, we combined the two complementary techniques, namely RPM and ASM into Robust Active Shape Models (RASM) where the deformations are controlled by ASM. We demonstrate its practical value by applying the proposed method in 2D and 3D, using in vivo data acquired with different medical scanners: Ultrasound (US) and Magnetic Resonance Imaging (MRI). The results demonstrate the practical value of the algorithm in automatic image segmentation. The rest of the paper is organized as follows. In Section 2 we describe the RASM algorithm in details. Section 3 presents experiments and results. Finally, in Section 4 we conclude and discuss future work.
2
Methodology
Following from the active shape model algorithm [1], we encode the template using a Point Distribution Model consisting of a mean shape S which deforms along the main modes of variation recovered after applying Principle Component Analysis (PCA) on the shapes in the training set. An instance S of the model is represented by S = S + Pb where the columns of matrix P are the t modes of variation we choose to retain, b is a vector representing the contribution of each mode of variation in a given instance (the b’s are also called the shape parameters). Our goal is to match the template S with the features extracted from the scene. The problem can therefore be formulated as: given two sets of points S = {Si = Six , Siy , Siz , i = 1, ..., M } and Y = {Yj = Yj x , Yj y , Yj z , j = 1, ..., F }, the goal is to align the two sets of points by recovering the correspondences and the deformations that would superimpose the two point sets in the same reference frame and reject outliers in both data sets (outliers in the features correspond to noise, and outliers in the template correspond to missing features). 2.1
Energy Formulation
For RASM, we choose to formulate the matching task in an optimization approach that is similar to [4], it is also possible to think of the problem in probabilistic terms, in fact the two approaches are equivalent, more details can be
Robust ASM: A Robust, Generic and Simple Automatic Segmentation Tool
3
found in [5]. RASM jointly estimates correspondences, an affine transformation and deformations. The system maintains a matrix M of size (M + 1) by (F + 1) to store both the correspondences and the outliers (as in RPM). The elements mij of M are defined as follow(i = 1, ..., M + 1 and j = 1, ..., F + 1): ]0, 1] if Si matches Yj mij = 0 otherwise mM+1,j =
]0, 1] if Yj is an outlier 0 otherwise
mi,F +1 =
]0, 1] if Si is an outlier 0 otherwise
The segmentation/matching problem is solved by minimizing the following objective energy function:
E(M, f, b) =
M+1 +1 F i=1 j=1
+T0
M
mij ||Yj − f(Si +
S
i t
Pi,k bk )||2 + T
i=1
+T trace[(f − I)T (f − I)]
mij log mij
i=1 j=1
k=1
mi,F +1 log mi,F +1 + T0
M F
F
mM+1,j log mM+1,j
j=1
(1)
The first term corresponds to the geometrical alignment of the two sets where mij are the fuzzy correspondence variables. They take a value between 0 and 1 following the soft-assign technique [4]. Yj are the feature points and Si the template points. ASM is being incorporated into RPM by expressing Si as a PDM model. The function f represents an affine transformation. The second term is an entropy barrier function, T is a parameter used for deterministic annealing to control the degree of fuzziness of the correspondences mij (T0 is the initial value of T ). As the temperature T is decreased, the correspondences harden to get closer to the binary values of 0 and 1. The last term is a regularization term to prevent the affine transformation from unphysical behavior by penalizing the residual part of f different from the identity matrix I. This penalization is important at the begining of the annealing but vanishes as T approaches zero. Unlike the RPM formulation, we no longer need a regularization parameter for the deformations because ASM guarantee a valid final shape. Starting with an affine transformation we activate ASM local deformations when the correspondences are meaningful. Notice how the rejection of outliers is incorporated into the energy function without the need to estimate any additional parameters. In fact, the outliers are modeled by clusters whose centers (SM+1 and YF +1 ) coincide with the center of mass of the point sets. The clusters temperature which can be viewed as a search range [5] is kept high and does not follow any annealing schedule. This is done to allow the rejection of any point that does not show any strong evidence of being a valid match throughout the entire annealing process. The last two terms in the geometric alignment part of the energy, along with the third and fourth terms in the energy, represent the outliers incorporated in the
4
J. Abi-Nahed, M.-P. Jolly, and G.-Z. Yang
objective function. We found this formulation of the energy to give better results, it implies updating the outliers in M at each temperature. In order to guarantee that the derived constraints need to correspondences are one to one, thefollowing F +1 be satisfied M+1 and i=1 mij = 1 ∀j = 1, ...F j=1 mij = 1 ∀i = 1, ...M . This can be enforced explicitly via the iterative row and column Sinkhorn balancing normalization [4] of matrix M. 2.2
RASM Algorithm
Given the two sets of points S = {Si , i = 1, ..., M } and Y = {Yj , j = 1, ..., F } where Y is the feature set and S the template. We start by setting the ASM shape parameters to zeros so that S = S. The transformation is set to identity, and the initial/final temperatures with the annealing rate are also set beforehand as in [4]. The initial temperature T0 is estimated as the largest square distance between all the points Si and Yj . The annealing rate (AR) is fixed at 0.95 (but could be anywhere between 0.9 and 0.99). Tf is set to the average square distance between the points of S. Like RPM, RASM is a two step algorithm. In the first step, we evaluate the correspondences by differentiating the energy function w.r.t mij and setting the result to zero, we get the following update equations: t −||Yj − f (Si + k=1 Pi,k bk )||2 mij = e( ) ∀i = 1, ..., M and ∀j = 1, ..., F T −||Yj − f (SM+1 )||2 mM+1,j = e( ) ∀j = 1, ..., F T0 t −||YF +1 − f (Si + k=1 Pi,k bk )||2 mi,F +1 = e( ) ∀i = 1, ..., M (2) T0 In the second step the affine transformation as well as deformations are estimated given the correspondences mij . First we create an intermediate point set V = {Vi =
F j=1 mij Yj F j=1 mij
, i = 1, ..., M } and minimize the equation:
E(f, b) =
M i=1
Wi ||Vi − f(Si +
S
i t
Pi,k bk )||2
(3)
k=1
using weighted least squares. F We use binary weights to account for outliers in the template. If Wi = j=1 mij is less than a threshold (e.g. 0.1), the weight is set to zero and the corresponding point is not taken into account at this particular iteration. At the beginning, we only solve for f as an affine transformation and assume b = 0. Once the temperature is cool enough, more precisely when T = Tf + 0.2(T0 − Tf ) (i.e. for the last 20% of the process), we introduce ASM deformations. At this point, we jointly solve for f as an affine transformation using weighted least squares and solve for b using the classical ASM iterations [1].
Robust ASM: A Robust, Generic and Simple Automatic Segmentation Tool
(a)
(b)
(c)
5
(d)
Fig. 1. A typical evolution of RASM algorithm when applied to the left ventricle feature points extracted from a B-mode 2-chamber ultrasound image. Crosses are model points, dots are feature points, gray points are outliers. (a) initialization; (b) alignment after an affine transformation. (c) final shape recovered by RASM; and (d) final shape recovered by RPM-TPS.
3 3.1
Experimental Results Left Ventricle in 2D Cardiac Ultrasound
B-mode 2-chamber view echocardiography images were acquired with an Acuson Sequoia 256 Scanner (Mountain View, CA). Using 33 patients we trained two ASM models of the left ventricle (at ED and ES) using [6]. The derived models were subsequently applied to 20 patients with known cardiac dysfunction. A typical RASM example is shown in Figure 1 where we follow the evolution of the ED model to align to the feature points. The feature detection was performed using the dedicated technique proposed in [7]. Figures 1(a)-(c) illustrate the initialization of the algorithm, intermediate result with affine transformation, and final result with affine transformation and ASM deformations. Figure 1(d) shows the results obtained when applying RPM-TPS. This example illustrates a major strength of the algorithm. On the left side of the endocardial border (the anterior wall), two major subsets of feature points can be observed. The ASM search algorithm yields feature points from a mixture of these two subsets and the resultant shape is a compromise between the true and outlier feature points. With RASM, the outliers can be correctly identified, resulting in a more faithful depiction of the endocardial border. Moreover the global shape is preserved and the missing feature points (near the apex for example) are accounted for, through the outliers in the template. The RPM-TPS algorithm (Matlab implementation available on the web) results in a bad shape that does not look like a left ventricle. In general, we observed that RPM-TPS did not allow control of the deformations and resulted in contours that did not resemble the desired shape. On the other hand, even though ASM segmented contours that resembled left ventricles, it was not able to recover the correct feature points and the contours did not align with image features. RASM however was able to recover left ventricles shapes aligned with image features. Figure 2 shows more examples of RASM performance for the segmentation of the left ventricle in cardiac ultrasound images. To assess the robustness of RASM, and the effect of the density of the model points on the performance, we performed several tests on synthetic and in vivo
6
J. Abi-Nahed, M.-P. Jolly, and G.-Z. Yang
Fig. 2. Point sets and RASM segmentation overlayed on the images
2D data in [8], where we showed that the algorithm could tolerate at most 50% of feature outliers. The robustness was shown to depend on the nature of the noise and on the number of missing model points missing in the data. We also compared the recovered contours with the ground truths drawn by a medical expert. We computed the distance between every point on a segmented contour and the closest point on a ground truth contour (and vice versa). We found an average distance of 0.3mm and a maximum distance of 1.9mm which is totally acceptable to compute volumes and ejection fraction. 3.2
Right Ventricle in 3D Magnetic Resonance Imaging
A right ventricle (RV) model at ED was built from 13 hand segmented CT and MR volumes. The CT volumes were acquired using a Siemens SENSATION 16 scanner and segmented using the multi-label segmentation method proposed in [9]. The MR slices were acquired using a Siemens SONATA 1.5T scanner and contours were manually drawn by experts on all slices. After constructing meshes from the segmentation using marching cubes, we applied the harmonic embedding technique proposed by [10] to parameterize these shapes. This powerful technique provides a statistically and geometrically optimal 3D ASM model. To test the RASM algorithm, we acquired four breath-hold retrospectively ECGgated trueFISP cine MRI sequences on a Siemens SONATA 1.5T scanner. The short axis images were situated between the valve plane and the apex of the RV. The region of interest was automatically determined by first locating the left ventricle using the technique proposed in [11] and then identifying the right ventricle blood pool next to it. Then, feature points were extracted with a simple Canny edge detector. Figure 3 shows the evolution of the RASM algorithm recovering the RV. Figure 4 shows an example of RASM segmentation. We also show the ground truth manually drawn by experts for comparison. In order to validate the segmentation results we compared the recovered contours (for the four subjects) from RASM with the ground truth contour as in 2D. We found an average error distance of 1.1mm and a maximum of 3.2mm. Furthermore, we
Robust ASM: A Robust, Generic and Simple Automatic Segmentation Tool
(a)
(b)
7
(c)
Fig. 3. A typical evolution of RASM algorithm when applied to the right ventricle feature points extracted from MR images. Crosses are model points, dots are feature points, gray points represent outliers. (a) initialization; (b) alignment after an affine transformation. (c) final shape recovered by RASM.
(a)
(b)
(c)
(d)
Fig. 4. Ground truths (a)/(b) and segmented contours (c)/(d) recovered after 3D RASM fitting of the RV model on MR data
studied the evolution of the energy while changing the time when ASM deformations are introduced in the algorithm. On average, we found that [10%-30%] of the total time is a valid range for this parameter. Notice here a powerful property of RASM in 3D: we do not need to extract the feature points in 3D. Instead we perform several 2D feature detection, and RASM will fit the model points to the extracted feature points. This is particularly useful for any modality that acquires multiple slices to analyze a 3D structure.
4
Discussion and Conclusion
Non rigid point matching is very important in computer vision. But how much to deform and how to deform the model remain critical issues. RPM-TPS matches point sets using thin-plate splines for deformations without an explicit prior on the final shape [4]. As it was first designed for registration, RPM-TPS is rather ill-suited for the task of segmentation. We preferred to incorporate a priori knowledge using the simple and fast ASM which despite being very useful for cluttered scenes, is seriously limited by the ability of the feature extractor to detect the correct points. The strength of RASM is that given two datasets, it is able to establish as many correspondences as possible between them while rejecting outliers. The size of the two datasets is irrelevant to the performance of the technique. These are the strengths inherited from the RPM framework. RASM
8
J. Abi-Nahed, M.-P. Jolly, and G.-Z. Yang
also inherited the major strengths in ASM namely the model specificity, generalization ability and compactness that allow only legal shapes to be detected. In this paper, we have shown that by the incorporation of outliers rejection, RASM is able to match point sets of arbitrary size more robustly than ASM or RPMTPS alone. The results obtained have demonstrated the strength and potential clinical value of the technique. In several cases, RASM succeeded when both RPM-TPS and ASM failed. The quantification of the segmentation results are very encouraging and could be improved by making the feature extraction more application-dependent. More specifically, we plan to introduce weights on the features similar to the work in [12]. The new information added would increase the robustness and accuracy of RASM. In the long term, we are also looking into extending the RASM framework to 4D.
References 1. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models — their training and application. Computer Vision and Image Understanding 61 (1995) 38–59 2. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23 (2001) 681–685 3. Rogers, M., Graham, J.: Robust active shape model search. In: ECCV (4). (2002) 517–530 4. Chui, H., Rangarajan, A.: A new algorithm for non-rigid point matching. In: CVPR. (2000) 2044–2051 5. Chui, H., Rangarajan, A.: A feature registration framework using mixture models. In: MMBIA. (2000) 190 6. Huang, X., Paragios, N., Metaxas, D.N.: Establishing local correspondences towards compact representations of anatomical structures. In: MICCAI(2). (2003) 926–934 7. Jolly, M.P.: Assisted ejection fraction in b-mode and contrast echocardiography. In: ISBI. (2006) 97–100 8. Abi-Nahed, J., Jolly, M.P., Yang, G.Z.: Robust active shape models. In: MIUA. (2005) 115–118 9. Grady, L., Funka-Lea, G.: Multi-label image segmentation for medical applications based on graph-theoretic electrical potentials. In: ECCV Workshops CVAMIA and MMBIA. (2004) 230–245 10. Horkaew, P., Yang, G.Z.: Construction of 3d dynamic statistical deformable models for complex topological shapes. In: MICCAI (1). (2004) 217–224 11. Jolly, M.P.: Combining edge, region, and shape information to segment the left ventricle in cardiac mr images. In: MICCAI. (2001) 482–490 12. Lin, N., Papademetris, X., Sinusas, A.J., Duncan, J.S.: Analysis of left ventricular motion using a general robust point matching algorithm. In: MICCAI(1). (2003) 556–563
Automatic IVUS Segmentation of Atherosclerotic Plaque with Stop & Go Snake Ellen Brunenberg1 , Oriol Pujol2 , Bart ter Haar Romeny1 , and Petia Radeva2 1
Department of Biomedical Engineering, Eindhoven University of Technology, P.O.Box 513, 5600 MB Eindhoven, the Netherlands
[email protected],
[email protected] 2 Computer Vision Center, Universitat Aut` onoma de Barcelona, Edifici O, Campus UAB, 08193 Bellaterra (Cerdanyola), Barcelona, Spain
[email protected],
[email protected] Abstract. Since the upturn of intravascular ultrasound (IVUS) as an imaging technique for the coronary artery system, much research has been done to simplify the complicated analysis of the resulting images. In this study, an attempt to develop an automatic tissue characterization algorithm for IVUS images was done. The first step was the extraction of texture features. The resulting feature space was used for classification, constructing a likelihood map to represent different coronary plaques. The information in this map was organized using a recently developed [1] geodesic snake formulation, the so-called Stop & Go snake. The novelty of our study lies in this last step, as it was the first time to apply the Stop & Go snake to segment IVUS images.
1
Introduction
During the last decade, intravascular ultrasound has overtaken angiography as state-of-the-art visualization technique of atherosclerotic disease in coronary arteries. The most important property that determines the occurrence and outcome of acute coronary syndrome is the vulnerability, as opposed to the occlusion. Therefore coronary angiography was not the best method for risk determination, as it only displays a shadow of the lumen. Conversely, IVUS images show the morphological and histological properties of a cross-section of the artery. Different tissue types, such as soft plaque, fibrous plaque, and calcium, can be distinguished. Most significant in a vulnerable plaque is the presence of a large soft core with a thin fibrous cap. Although heavily calcified plaques seem to be more stable than non-calcified plaques, the amount of calcium is an indicator of the overall plaque burden, and as such, the degree of calcification will correlate with the overall risk of plaque rupture in the coronary arterial tree. Despite the good vulnerability determination, IVUS has the disadvantage that manual analysis of the huge amount of images is difficult, subjective, and timeconsuming. Therefore, there is an increasing interest in the development of automatic tissue characterization algorithms for IVUS images. However, this is a challenging problem, as the image quality is poor due to noise, imaging artifacts and shadowing. A lot of research on this question has been done using R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 9–16, 2006. c Springer-Verlag Berlin Heidelberg 2006
10
E. Brunenberg et al.
extraction of texture features to characterize plaque tissues or determine plaque borders [2,3,4], recently in combination with classification techniques like AdaBoost [5]. Deformable models are another extensively used technique to retrieve the intima and adventitia boundaries [6,7,8]. Nevertheless, they are not commonly applied to segment different tissue types. In this study, a new geodesic snake formulation, the so-called Stop & Go snake [1], is employed to find the soft plaque and calcification regions in atherosclerotic plaques. This snake uses likelihood maps to decouple regularity and convergence, thus controlling the role of the curvature in a better way. To ensure convergence, the external force definition is split into an attractive and a repulsive vector field. We used this new snake in a traditional pattern recognition pipeline: first, the extraction of texture features, namely local binary patterns, co-occurrence matrices, and Gabor filters, was performed on the images. The second stage consists of classification using AdaBoost, addressing soft and fibrous plaque, and calcium. The confidence rate map obtained was used as a likelihood map for the Stop & Go snake. The next section describes the feature extraction and classification. Section 3 discusses the fundamentals of the Stop and Go snake formulation and the use of likelihood maps. Section 4 gives the experimental results and section 5 concludes this paper.
2
Feature Extraction and Classification
The complexity of IVUS images can be reduced using feature extraction. Because pixel-based gray level-only methods are not sufficient to differentiate between the complicated structures, texture features are used. Examples of the latter discussed in previous studies are co-occurrence matrices and fractal analysis [2], run-length measures and radial profile [3], local binary patterns [4], derivatives of Gaussian, wavelets, Gabor filters, and cumulative moments [7]. Because Pujol et al. achieved best results on IVUS tissue characterization using local binary patterns, co-occurrence matrices, and Gabor filters [4], our study was performed using these three sets of features. After feature extraction, the feature space is used as input for an AdaBoost classifier. The idea behind boosting methods is that many simple, weak classifiers together can form a strong classification algorithm with many practical advantages [9]. AdaBoost is easy to implement and fast. However, before both feature extraction and classification can be performed, the images have to be preprocessed. The objective of this step is twofold: first, some of the images’ marks and artifacts are removed to avoid their influence on feature extraction and classification. Furthermore, the image in Cartesian coordinates is converted into polar coordinates, to prevent biases from rotation-variant feature extractors. 2.1
Feature Extractors
With the preprocessed image as input, feature extractors usually give a high-dimensional feature vector as response. The Local Binary Pattern (LBP)
Automatic IVUS Segmentation of Atherosclerotic Plaque
11
operator, introduced by Ojala et al. [10], detects uniform local binary patterns within circularly symmetric neighborhoods of P members with radius R. The second feature operator, the gray-level co-occurrence matrix (COOC), provides a statistical tool for extraction of second-order texture information from images [11]. The actual feature space is generated by the extraction of six measures from the co-occurrence matrices, namely energy, entropy, inverse difference moment, shade, inertia and promenance [7]. The last feature operators used for this study are the Gabor filters, originating from the multi-channel filtering theory. A Gabor filter consists of a sinusoidal, modulated in amplitude by a Gaussian envelope. By rotation of the coordinate system, filters at different orientations can be obtained. For practical purposes, the angles θ = {0◦ , 45◦ , 90◦ , 135◦} suffice again. 2.2
AdaBoost Classification
The normalized feature data were classified using our AdaBoost algorithm, with decision stumps or one-level decision trees as weak classifiers. Let the training set contain N samples, consisting of a d-dimensional feature vector xi (i = 1, . . . , N ) and a class label yi . Decision stumps threshold these training data using only one out of the d features, assigning a class label based on majority voting [12]. During boosting, such a weak classifier is called repeatedly in a series of rounds t = 1, . . . , T . The principle of this algorithm is to maintain a distribution of weights assigned to the training samples, denoted by Dt (i). Initially, the samples are classified using equally set weights. In the next round, the weights for incorrectly classified samples are increased, while those for correct observations are reduced. This has the effect that the algorithm focuses on samples misclassified in the previous round, i.e. the difficult samples in the data set [9].
3
Stop & Go Snake
In a traditional pattern recognition pipeline, classifiers like AdaBoost are used to find regions of interest after feature extraction. Subsequently, a deformable model can organize the obtained image information. 3.1
Traditional Geodesic Deformable Models
Geodesic active contours have a formulation based on curve evolution and level sets [13]. A big advantage of this is that it can deal with topological changes during snake evolution. This evolution should find the curve of minimum length in a Riemannian surface with a measure depending on the image gradient. It follows that the snake Γ evolves according to: δΓ = (g · κ + V0 − ∇g, n) · n , δt 2
(1)
with g = 1/(1 + |∇I| ) and κ the curvature of Γ , n its inward unit normal and , the scalar product of two vectors. It can be seen that the curvature
12
E. Brunenberg et al.
has a double role, defining the motion of the curve at zero gradient regions, but also serving as a curve regularizing term, ensuring continuity. This has the disadvantage that it slows down the numeric scheme because it is a second-order term. Furthermore, it complicates snake convergence into concave areas. 3.2
Stop & Go Formulation
To overcome the problems regarding convergence and regularity, a new definition where these terms are decoupled was introduced by Pujol et al. [1]. In this socalled Stop & Go snake, the curvature term does not interfere in the convergence process. The desired decoupling can be achieved using characteristic functions of the region of interest R: 1 if (x, y) ∈ R, I(x, y) = (2) 0 otherwise . By removing the influence of the curvature, any global vector field properly defining the target contour as its equilibrium ensures convergence. Such a vector field can be split into an exterior attractive field (the Go term) and an inner repulsive one (the Stop term), of which the sum cancels on the curve of interest. The separately defined Go and Stop terms are glued together by means of the characteristic function mentioned above. If the evolving curve is outside the region of interest R, the Go term corresponds to an area minimization process restricted to the outside of R. The Go field ensures an inward motion to R, comparable to a balloon force. Because there is only need for a Stop field in the neighborhood of the desired contour, this term can be defined by the outward gradient g locally defining the contours of R. The Stop and Go snake evolution is then given by: δΓ ˘ = I · ∇g, n n + V0 · (1 − I) · n + ακ In δt Stop
Go
.
(3)
Reg. term
The last term ensures smoothness and continuity. Because these conditions are only required in the final steps of the snake evolution, a restrictive mask I˘ is used to define their scope, here I˘ = Gσ ∗ I, with Gσ a Gaussian filter with standard deviation σ. In spite of the resemblance of (3) to that of ’traditional’ geodesic snakes, the role of the curvature is different here. It is now only a regularizing term weighted by α. Furthermore, it does not trouble the convergence anymore because it only influences the last steps of the evolution. 3.3
Likelihood Maps
Unfortunately, for practical applications, there are no characteristic functions that define the regions of interest available, so an alternative function to generate the decoupling is needed. A likelihood map represents the probability of each pixel to belong to the object of interest. In general, a likelihood map only needs
Automatic IVUS Segmentation of Atherosclerotic Plaque
13
to fulfill the requirement that the object of interest is given as a local extremum. Examples of results that can be used as likelihood maps are the image’s direct response to feature extractors or the outcome of the classifier. The latter is not very suitable, because it has very strong edges that cause the snake to follow simply the contours of this classification map. This is often not the optimal result and besides, there are easier and faster ways to find those edges. In this study, we propose to use the classifier’s confidence rates as likelihood maps. These confidence rates can easily be extracted from the AdaBoost classification. ˘ will be used The normalized (between 0 and 1) version of the likelihood map L in the Stop & Go snake. Then, the only question remaining is the definition of ˘ the Stop term, L·∇g. Pujol et al. [1] proposed to base this term on the likelihood ˘ All this leads to: map, defining it as ∇(1 − L).
δΓ ˘ ·n+β ∇ 1−L ˘ , n · n + V0 1 − L ˘ ·n , = ακL δt
(4)
with α weighting the curvature’s role and β influencing the curve’s smoothness.
4
Results
The IVUS images used in this study images were acquired by a last generation IVUS scanner (Galaxy, Boston Scientific). Compared to the previous ones, this scanner provides much better contrast and higher resolution, leading to a more expressed texture appearance of all intravascular structures. In this context, the analysis of these images is even more difficult compared to the previous generation images, justifying the texture analysis. To be able to evaluate the results of the classifier and the snake correctly, IVUS images segmented manually by cardiologists from the Hospital Universitari ”Germans Trias i Pujol” in Badalona, Spain, were used as a ’ground truth’. Furthermore, classifiers like AdaBoost, that perform supervised learning, need a set of training samples. In this study, a set of about 13,000 points was selected manually from the cardiologists’ segmentation. 4.1
Results of AdaBoost Classification
For 30 IVUS images, two classifications were performed using 10 rounds of boosting, one separating fibrous plaque and calcium from soft plaque, and one distinguishing fibrous plaque from calcium. In the classified images shown in column C of Figs. 1 and 2, it can be seen that calcium and fibrous plaque are classified reasonably well, but that soft plaque apparently is difficult to segment. These observations are confirmed when looking at the confusion matrix in Table 1. The majority of soft plaque, fibrous plaque and calcium points are classified correctly, although soft plaque is still repeatedly classified as fibrous. The percentage of points that are assigned correctly is 75.82%.
14
E. Brunenberg et al.
Fig. 1. Classification and snake results for two IVUS images with presence of soft plaque. Column A: Original images. B: Images segmented by cardiologist. C: AdaBoost classification map. D: Stop & Go snake result. Legend for columns B and C: Dark gray represents soft plaque, light gray fibrous plaque and white calcification.
Fig. 2. Classification and snake results for two IVUS images with presence of soft plaque and calcium. Column A: Original images. B: Images segmented by cardiologist. C: AdaBoost classification map. D: Stop & Go snake result. Legend for column D: White lines represent soft plaque, black ones calcification. Table 1. Confusion matrix (in percents) for boosted decision stumps classification Labeled plaque 5.03
Classified as soft plaque Classified as 2.43 fibrous plaque Classified as 0.00 calcium Totals 7.46
as
soft Labeled as fibrous Labeled as cal- Totals plaque cium 19.11 0.09 24.23 67.67
0.44
70.54
2.11
3.12
5.23
88.89
3.65
100.00
Automatic IVUS Segmentation of Atherosclerotic Plaque
15
Table 2. Area comparison for different segmentations and classifications ’True’ mask Cardiologist Cardiologist Cardiologist Cardiologist AdaBoost
4.2
1 1 1 1
Mask to compare Mean error soft plaque Mean error (pixels) (pixels) Cardiologist 2 4.28 ± 3.45 1.61 ± 0.68 Cardiologist 3 1.14 ± 0.65 AdaBoost 5.30 ± 5.08 0.71 ± 0.21 Snake 5.71 ± 5.40 1.86 ± 1.49 Snake 0.78 ± 0.50 1.06 ± 0.94
calcium
Results of Stop & Go Snake
In addition to the classification map, the AdaBoost classifier also returns confidence rates and thresholds. Every confidence rate above the threshold represents the probability that the point belongs to the searched class, while a lower rate indicates that the pixel does not belong to this class. In our case, the likelihood map for soft plaque was derived from the confidence rates of the classification between fibrous plaque and calcium against soft plaque. Only the rates below the threshold were taken into account and the map was inverted before using it as a likelihood map. The likelihood map for calcium resulted from the confidence rates above the threshold for the separation of calcium and fibrous plaque. For the actual evolution of the snake over the likelihood map, the Level Sets formulation by Osher and Sethian [14] was implemented using an explicit Euler scheme. Pujol et al. [1] found that the fastest Stop & Go snake configuration uses {V0 = 1.3, t = 1.3, and α = 0.23}, while a snake fully concerned with accuracy and smoothness uses {V0 = 0.2, t = 0.5, and α = 0.6}. For this study, the tradeoff configuration was used, being {V0 = 1, t = 0.5, and α = 0.6}. Furthermore, 300 iteration steps and a β of 150 were applied. Using those parameters, the Stop & Go snake was applied on all images with soft plaque and calcium. After snake deformation, the plaque boundaries found by the cardiologist were superimposed on the mask found for soft plaque or calcium, in order to exclude the catheter and other artifacts. Furthermore, very small regions were omitted. The final output is mapped on the original images in Cartesian coordinates and is shown in columns D of figures 2 and 3. The white lines indicate the soft plaque found by the snake, while the black represent the calcium. It can be seen that the selection of soft plaque is weak. However, the calcium detection is reasonable. Unfortunately, it is not trivial to analyze the snake results statistically. In this study, we propose a comparison of the areas of interest found with the snake and the same areas in the cardiologist’s segmentation. We considered 20 images with soft plaque and 8 images with calcium. The error was determined as the area of the regions in the different masks that did not overlap, normalized by the total area of the ’true’ mask, for example the cardiologist’s map. To get more insight in intraobserver variability and AdaBoost performance as well, five different comparisons were made (see Table 2). As can be seen from the error rates in this table, it is quite difficult to detect soft plaque well, starting with the segmentation of the different cardiologists. The results for calcium are
16
E. Brunenberg et al.
much better. The boosted decision stumps algorithm can be seen to perform better (more similar to the cardiologists ’ground truth’) than the snake on finding the regions of interest for this tissue type.
5
Conclusion
In this study, various tools for pattern recognition were considered in an attempt to obtain automatic segmentation of plaque tissues. Different texture features were extracted and the AdaBoost classifier and the Stop & Go snake both gave encouraging results segmenting calcium. These explicit and implicit classification methods both have their advantages and drawbacks. AdaBoost is a little more precise, but considers each point just within its neighborhood. Conversely, the Stop & Go snake organizes points in groups. However, for soft plaque, this step needs some more attention in future research. Furthermore, classification results should be corrected for shadowing caused by guide wires and calcifications.
References 1. Pujol, O., Gil, D., Radeva, P.: Fundamentals of stop and go active models. Image and Vision Computing 23 (2005) 681–691 2. Nailon, W., McLaughlin, S.: Intravascular ultrasound image interpretation. Proc. of ICPR, Austria. IEEE Computer Society Press: USA (1996) 503–506 3. Zhang, X., Sonka, M.: Tissue characterization in intravascular ultrasound images. IEEE Trans. on Medical Imaging 17 (1998) 889–899 4. Pujol, O., Radeva, P.: Near real time plaque segmentation of ivus. Proc. of Computers in Cardiology (2003) 69–72 5. Pujol, O., et al.: Adaboost to classify plaque appearance in ivus images. Progress in Pattern Recognition, Image Analysis, and Appl.: LNCS 3287 (2004) 629–636 6. Roy Cardinal, M.H., et al.: Intravascular ultrasound image segmentation: A fastmarching method. MICCAI: LNCS 2879 (2003) 432–439 7. Pujol, O., Radeva, P.: Texture segmentation by statistic deformable models. Int. J. of Image and Graphics 4 (2004) 433–452 8. Roy Cardinal, M.H., et al.: Automatic 3d segmentation of intravascular ultrasound images using region and contour information. MICCAI: LNCS 3749 (2005) 319– 326 9. Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. J. of Comp. and Syst. Sciences 55 (1997) 119–139 10. Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. on Pattern Analysis and Machine Intelligence 24 (2002) 971–987 11. Haralick, R., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Trans. on System, Man, Cybernetics 3 (1973) 610–621 12. Qu, Y., et al.: Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from non-cancer patients. Clinical Chemistry 48 (2002) 1835–1843 13. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. Int. J. of Computer Vision 22 (1997) 61–79 14. Osher, S., Sethian, J.: Fronts propagating with curvature-dependent speed: algorithms based on hamilton-jacobi formulations. J. of Comput. Physics 79 (1988)
Prostate Segmentation in 2D Ultrasound Images Using Image Warping and Ellipse Fitting Sara Badiei1 , Septimiu E. Salcudean1 , Jim Varah2 , and W. James Morris3 1
Department of Electrical and Computer Engineering, University of British Columbia, 2356 Main Mall, Vancouver, BC, V6T 1Z4, Canada {sarab, tims}@ece.ubc.ca 2 Department of Computer Science, University of British Columbia, Vancouver 3 Vancouver Cancer Center, British Columbia Cancer Agency, Vancouver
Abstract. This paper presents a new algorithm for the semi-automatic segmentation of the prostate from B-mode trans-rectal ultrasound (TRUS) images. The segmentation algorithm first uses image warping to make the prostate shape elliptical. Measurement points along the prostate boundary, obtained from an edge-detector, are then used to find the best elliptical fit to the warped prostate. The final segmentation result is obtained by applying a reverse warping algorithm to the elliptical fit. This algorithm was validated using manual segmentation by an expert observer on 17 midgland, pre-operative, TRUS images. Distancebased metrics between the manual and semi-automatic contours showed a mean absolute difference of 0.67 ± 0.18mm, which is significantly lower than inter-observer variability. Area-based metrics showed an average sensitivity greater than 97% and average accuracy greater than 93%. The proposed algorithm was almost two times faster than manual segmentation and has potential for real-time applications.
1
Introduction
Prostate cancer strikes one in every six men during their lifetime [1]. In recent years, increased prostate cancer awareness has led to an increase in the number of reported cases. As a result, prostate cancer research has increased dramatically, resulting in a 3.5% decrease in annual deaths due to this illness [1]. Interstitial brachytherapy is the most common curative treatment option for localized prostate cancer in North America. Improvements in ultrasound guidance technology, radioisotopes and planning software suggest an even greater increase in the number of brachytherapy procedures in the years to come [2]. In brachytherapy, small radioactive seeds are inserted into the prostate under TRUS guidance to irradiate and kill the prostate cells along with the tumor. Brachytherapy requires segmentation of the prostate boundary from TRUS images pre-operatively to generate the treatment plan, and post-operatively, to evaluate dose distribution and procedure success. Currently, oncologists segment the prostate boundary by hand. In this paper we emphasize the following requirements, used at the BC Cancer Agency, for prostate segmentation in 2D TRUS images: smooth, continuous R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 17–24, 2006. c Springer-Verlag Berlin Heidelberg 2006
18
S. Badiei et al.
contours with no sharp edges, no hourglass shapes and approximate symmetry about the median axis of the prostate. In pre and post-operative procedures, achieving these requirements manually is frustrating and time consuming. In intra-operative procedures, where prostate segmentation would enable real time updates to the treatment plan, manual prostate segmentation is unfeasible. It would be desirable to have an automatic or semi-automatic segmentation algorithm that satisfies the prostate segmentation criteria listed above but also has real time capabilities so that it can be incorporated into intra-operative procedures. A significant amount of effort has been dedicated to this area. Strictly edge-based algorithms such as [3, 4, 5, 6] result in poor segmentation due to speckle noise and poor contrast in TRUS images. One such edge-detection algorithm, introduced by Abolmaesumi et al. [6], uses a spatial Kalman filter along with Interacting Multiple Modes Probabilistic Data Association (IMMPDA) filters to detect the closed contour of the prostate. Although the IMMPDA edge-detector is capable of creating continuous contours, it cannot satisfy the smoothness and symmetry requirements on its own. In order to overcome these issues, deformable models such as those outlined in [7, 8, 9] were proposed. These techniques worked moderately well; however, the segmented shapes were not constrained and hourglass contours, for example, were possible. In order to constrain the prostate contours, groups such as Shen et al. [10] and Gong et al. [11] incorporated a priori knowledge of the possible prostate shapes into their algorithm. The automatic segmentation procedure of Shen et al. [10] combined a statistical shape model of the prostate, obtained from ten training samples, with rotation invariant Gabor features to initialize the prostate shape. This shape was then deformed in an adaptive, hierarchical manner until the technique converged to the final model. Unfortunately this technique is very expensive computationally, requiring over a minute to segment a single image. Gong et al. [11] modeled the prostate shape by fitting superellipses to 594 manually segmented prostate contours. These a priori prostate models, along with an edge map of the TRUS image, were then used in a Bayesian segmentation algorithm. The edge map was created by first applying a sticks filter, then an anisotropic diffusion filter and finally a global Canny edge-detector. In this technique, the pre-processing steps are computationally expensive and the global Canny edge-detector finds fragmented, discontinuous edges in the entire image. In this paper we have attempted to create an algorithm with low computational complexity that satisfies the smoothness and symmetry constraints and is driven by the clinical understanding of prostate brachytherapy. The proposed algorithm requires no computationally expensive pre-processing and is insensitive to the initialization points. The algorithm makes use of a novel image warping technique, ellipse fitting and locally applied IMMPDA edge-detection. The ellipse fitting procedure outlined in [12,13] solves a convex problem, is computationally cheap, robust and always returns an elliptical solution. The IMMPDA edge-detector is applied locally meaning that it finds continuous edges only along the boundary of the prostate instead of along the entire image. The method is described in more detail in the following section.
Prostate Segmentation in 2D Ultrasound Images
(a) Initialization
(b) Forward Warp
(c) First Ellipse Fit
(d) IMMPDA Edge Map
(e) Second Ellipse Fit
(f) Reverse Warp
19
Fig. 1. The proposed semi-automatic segmentation algorithm consists of six steps. Steps 1 and 2 show the unwarped (x-mark ) and warped (circles) initialization points respectively. Steps 3 and 5 show the best ellipse fit to the warped initialization points and to the IMMPDA edge map respectively. Step 6 shows the final segmentation result.
2
Segmentation Method
In the first step of our segmentation procedure the user selects six initialization points. Next, the image along with the initialization points are warped as outlined in Section 2.1. The warped prostate should look like an ellipse so that we can make use of the extensive literature on ellipse fitting to fit an initial ellipse to the six warped points as outlined in Section 2.2. This initial ellipse fit is used to confine and locally guide the IMMPDA edge-detector [6] along the boundary of the warped prostate to prevent the trajectory from wandering. A second ellipse is fit to the IMMPDA edge points, which provides a more accurate estimate of the warped prostate boundary. Finally, inverse warping is applied to the second elliptical fit to obtain the final segmented prostate contour. The only pre-processing required is a median filter (5x5 kernel) applied locally along the path of the edge-detector. The six step segmentation procedure outlined above is presented in Fig. 1. 2.1
Image Warping
The prostate is a walnut shaped organ with a roughly elliptical cross section. During image acquisition the TRUS probe is pressed anteriorly against the rectum wall which causes the 2D transverse image of the prostate to be deformed. A
S. Badiei et al.
Warping Function
20
1 0.8 0.6 0.4 4
0.2 0
3
100
2
80
theta (θ) 0−π
60
radius (r) 10−100
1
40 20 0
Fig. 2. The warping function: r /r = 1 − sin(θ) exp
−r 2 2σ 2
. The warped radius (r ) is
approximately equal to the unwarped radius (r) for angles (θ) close to 0 or π and for large r values. The warped radius (r ) is less than the unwarped radius (r) for angles (θ) close to π/2 and small r values; therefore, causing a stretch. The transition point between small and large r values is set by σ.
simple warping algorithm can be used to undo the effect caused by compression so that the prostate boundary in a 2D image looks like an ellipse. We assign a polar coordinate system to the TRUS image with origin on the TRUS probe axis, θ = 0 along the horizontal, and r ranging from Rmin to Rmax . We can undo the effect of the anterior compression caused by the TRUS probe by: i) stretching the image maximally around θ = π/2 and minimally around θ = 0 and θ = π, and ii) stretching the image maximally for small r values and minimally for large r values. These requirements suggest a sin(θ) dependence in the angular direction and a gaussian dependence in the radial direction. The resulting warping function is 2 −r r = r − r sin(θ) exp . (1) 2σ 2 where r represents the new warped radius, r is the unwarped radius, θ ranges from 0 to π and σ is a user selected variable that represents the amount of stretch in the radial direction (Fig. 2). Small σ values indicate less radial stretch and can be used for prostate shapes that are already elliptical and/or have experienced little deformation. Larger σ values indicate greater radial stretch and can be used for prostate shapes that have experienced more deformation by the TRUS probe. The advantages of this warping algorithm are that it is simple and intuitive with only one parameter to adjust. Due to time limits the σ was chosen separately for each image by trial and error. In the future we will automate the selection of σ. This and other more complex warping functions are being investigated as outlined in Section 5. 2.2
Ellipse Fitting
A general conic in 2D has the form F (X, P ) = X · P = P1 x2 + P2 xy + P3 y 2 + P4 x + P5 y + P6 = 0.
(2)
Prostate Segmentation in 2D Ultrasound Images
21
T where X = x2 xy y 2 x y 1 and P = P1 P2 P3 P4 P5 P6 . Given a set of points xi ,yi (i = 1 : n) around the boundary of a prostate in a 2D image, our goal is to find the best elliptical fit to these points. The problem reduces to finding the P vector that minimizes the following cost function subject to the constraint P T CP = 1 Cost =
n n (F (Xi , P ))2 = (Xi · P )2 = SP 2 . i=1
(3)
i=1
T T XnT , C is a 6x6 matrix of zeros with the upper Here S = X1T X2T ... Xn−1 left 3x3 matrix being [0 0 2; 0 -1 0; 2 0 0] and the P T CP = 1 constraint ensures elliptical fits only. The solution is the eigenvector P corresponding to the unique positive eigenvalue of the following generalized eigenvalue problem [12, 13]: (C − μS T S) = 0.
(4)
This ellipse fitting technique always returns a P vector that represents an ellipse regardless of the initial condition. Also, the fit has rotational and translational invariance, is robust, easy to implement and computationally cheap.
3
Experimental Results
Using the standard equipment (Siemens Sonoline Prima with 5MHz biplane TRUS probe) and the standard image acquisition procedure used at the BC Cancer Agency, we acquired 17 pre-operative TRUS images of the prostate from the midgland region. The images had low resolution and contained a superimposed needle grid. Although it is possible to obtain higher resolution images our goal was to work with the images currently used at the Cancer Agency. A single expert radiation oncologist with vast experience in prostate volume studies manually segmented the images using the Vari-Seed planning software from Varian. We then applied our semi-automatic segmentation algorithm and the same expert chose the initialization points. The algorithm was written in Matlab 7 (Release 14) and executed on a Pentium 4 PC running at 1.7 GHz. The image sizes were 480x640 with pixel size of 0.18 x 0.18mm. The times for manual and semi-automatic segmentation were recorded and are presented in Table 1. On average the time for manual segmentation, which includes the time for selecting 16-25 points and then re-adjusting them to obtain approximate symmetry, was 45.35 ± 4.85 seconds. The average time for semi-automatic segmentation, which includes the time to choose initialization points and then run the algorithm, was 25.35 ± 1.81 seconds. The average time to run the algorithm alone was 14.10 ± 0.20 seconds; however, we expect this time to decrease by at least five fold when the algorithm is transferred to C++. The manual segmentation was used as the ’gold standard’ while distance [14] and area [7] based metrics were used to validate the semi-automatic segmentation. For distance-based metrics we found the radial distance between the manual and semi-automatic contours, with respect to a reference point at the center of
22
S. Badiei et al. Table 1. Average times for manual and semi-automatic segmentation Manual Total time (sec) Mean STD 45.35 4.85
Semi-automatic (Matlab) Initialization (sec) Algorithm (sec) Total time (sec) Mean STD Mean STD Mean STD 11.26 1.81 14.10 0.20 25.35 1.81
Table 2. Average area and distance-based metrics for validation of segmentation algorithm Area-Based Metrics Sensitivity (%) Accuracy (%) Mean STD Mean STD 97.4 1.0 93.5 1.9
Distance-Based Metrics Sensitivity (mm) Accuracy (mm) Mean STD Mean STD 0.67 0.18 2.25 0.56
the prostate, as a function of theta around the prostate: rdif f (θ). The Mean Absolute Difference (MAD) and Maximum Difference (MAXD) were then found from rdif f (θ) as outlined in [14]. For area-based metrics we superimposed the manually segmented polygon, Ωm , on top of the semi-automatically segmented polygon, Ωa . The region common to both Ωm and Ωa is the True Positive (TP) area. The region common to Ωm but not Ωa is the False Negative (FN) area and the region common to Ωa but not Ωm is the False Positive (FP) area. We then found the following area-based metrics [7]: Sensitivity = T P/(T P + F N ).
(5)
Accuracy = 1 − (F P + F N )/(T P + F N ).
(6)
The MAD, MAXD, sensitivity and accuracy were found for each image and the overall average values along with standard deviations (std) are presented in Table 2. On average, the mean absolute difference between the manual and semi-automatic contours was 0.68 ± 0.18mm which is less than half the average variability between human experts (1.82 ± 1.44mm) [11]. Therefore, the mean absolute distance is on the same order of error as the repeatability between two different expert observers performing manual segmentation. On average, the maximum distance between the manual and semi-automatic contours was 2.25 ± 0.56mm. The average sensitivity and accuracy were over 97 and 93 percent respectively with std values less than 2%. Refer to Fig. 3 for sample outputs.
4
Discussion
The algorithm described above does not require training models or time consuming pre-filtering of TRUS images. The result is a simple, computationally inexpensive algorithm with potential for real-time applications and extension to 3D segmentation. Furthermore, the algorithm works well on the low resolution
Prostate Segmentation in 2D Ultrasound Images
23
Fig. 3. Comparison between manual segmentation (dotted line) and the proposed semiautomatic segmentation (solid line)
TRUS images currently used at the BC Cancer Agency and can be integrated easily into their setup without any changes to the image acquisition procedure. In preliminary studies, an untrained observer chose the initialization points. The segmented contour resulted in sensitivity and accuracy values over 90% with MAD and MAXD values less than 1 and 3mm respectively. This result shows that unlike most other semi-automatic segmentation techniques the success of the final segmentation does not rely on precise initialization of the prostate boundary. Consequently, during intra-operative procedures, a less experienced observer can choose the initialization points to update the treatment plan allowing the oncologist to continue their work uninterrupted. A more in-depth study of initialization sensitivity will be carried out in the future. Due to the nature of this technique, the algorithm works best for images that have an elliptical shape after warping and worst for images that have a diamond shape in the anterior region of the prostate.
5
Conclusion and Future Work
We have presented an algorithm for the semi-automatic segmentation of the prostate boundary from TRUS images. The algorithm is insensitive to the accuracy of the initialization points and makes use of image warping, ellipse fitting and the IMMPDA edge-detector to perform prostate segmentation. The merits of the algorithm lie in its straightforward, intuitive process but most importantly in its computational simplicity and ability to create good results despite non-ideal TRUS images.
24
S. Badiei et al.
There is significant potential for improving the warping function and we are currently working on two areas. First, we are attempting to implement an algorithm that will automatically choose the optimal σ and second, we would like to create a more complex warp function based on tissue strain information from elastography images [15]. It is our goal to extend this 2D semi-automatic segmentation algorithm to 3D in order to create real-time visualization tools for oncologists during pre, intra and post-operative procedures.
References 1. American Cancer Society: Key statistics about prostate cancer (2006) http://www. cancer.org. 2. Vicini, F., Kini, V., Edmundson, G., Gustafson, G., Stromberg, J., A., M.: A comprehensive review of prostate cancer brachytherapy: Defining an optimal technique. Int. J. Radiation Oncology Biol. Phys. 44 (1999) 483–491 3. Aarnink, R., Giesen, R., Huynen, A., de la Rosette, J., Debruyne, F., Wijkstra, H.: A practical clinical method for contour determination in ultrasonographic prostate images. Ultrasound Med. Biol. 20 (1994) 705–717 4. Pathak, S., Chalana, V., Haynor, D., Kim, Y.: Edge guided delineation of the prostate in transrectal ultrasound images. In: Proc. First Joint BMES/EMBS IEEE Conf. Volume 2. (1999) 1056 5. Kwoh, C., Teo, M., Ng, W., Tan, S., Jones, L.: Outlining the prostate boundary using the harmonics method. Med. Biol. Eng. Computing 36 (1998) 768–771 6. Abolmaesumi, P., Sirouspour, M.: An interacting multiple model probabilistic data association filter for cavity boundary extraction from ultrasound images. IEEE Trans. Med. Imaging 23 (2004) 772– 784 7. Ladak, H., Mao, F., Wang, Y., Downey, D., Steinman, D., Fenster, A.: Prostate boundary segmentation from 2d ultrasound images. Medical Physics 27 (2000) 1777–1788 8. Knoll, C., Alcaniz, M., Grau, V., Monserrat, C., Juan, M.: Outlining of the prostate using snakes with shape restrictions based on the wavelet transform. Pattern Recogn. 32 (1999) 1767–1781 9. Ghanei, A., Soltanian-Zadeh, H., Ratkewicz, A., Yin, F.: A three-dimensional deformable model for segmentation of human prostate from ultrasound images. Medical Physics 28 (2001) 2147–2153 10. Shen, D., Zhan, Y., Davatzikos, C.: Segmentation of prostate boundaries from ultrasound images using statistical shape model. IEEE Trans. Med. Imaging 22 (2003) 539– 551 11. Gong, L., Pathak, S., Haynor, D., Cho, P., Kim, Y.: Parametric shape modeling using deformable superellipses for prostate segmentation. IEEE Trans. Med. Imaging 23 (2004) 340– 349 12. Fitzgibbon, A., Pilu, M., Fisher, R.: Direct least square fitting of ellipses. IEEE Trans. Pattern Anal. Mach. Intellig. 21 (1999) 476–480 13. Varah, J.: Least squares data fitting with implicit function. BIT 36 (1996) 842–854 14. Chalana, V., Kim, Y.: A methodology for evaluation of boundary detection algorithms on medical images. IEEE Trans. Med. Imaging 16 (1997) 642–652 15. Turgay, E., Salcudean, S., Rohling, R.: Identifying mechanical properties of tissue by ultrasound. Ultrasound Med. Biol 32 (2006) 221–235
Detection of Electrophysiology Catheters in Noisy Fluoroscopy Images Erik Franken1 , Peter Rongen2 , Markus van Almsick1 , and Bart ter Haar Romeny1 1
Technische Universiteit Eindhoven, Department of Biomedical Engineering, P.O. Box 513, 5600 MB Eindhoven, The Netherlands {E.M.Franken, M.v.Almsick, B.M.terHaarRomeny}@tue.nl 2 Philips Medical Systems, Best, The Netherlands
[email protected]
Abstract. Cardiac catheter ablation is a minimally invasive medical procedure to treat patients with heart rhythm disorders. It is useful to know the positions of the catheters and electrodes during the intervention, e.g. for the automatization of cardiac mapping. Our goal is therefore to develop a robust image analysis method that can detect the catheters in X-ray fluoroscopy images. Our method uses steerable tensor voting in combination with a catheter-specific multi-step extraction algorithm. The evaluation on clinical fluoroscopy images shows that especially the extraction of the catheter tip is successful and that the use of tensor voting accounts for a large increase in performance.
1
Introduction
Cardiac catheter ablation is a procedure to treat heart rhythm disorders (arrhythmias). It involves the insertion of one or more flexible thin tubes, called electrophysiology (EP) catheters, through small skin incisions, usually in the groin. These catheters are threaded through blood vessels into the heart. The EP catheters contain a number of electrodes used to make intracardiac electrograms. Using these electrograms, the firing spot or conduction path causing the arrhythmias can be identified. A special EP catheter (the ablation catheter) emits radiofrequency energy to destroy the spot or to block the undesired conduction path. The movement of the catheter through the body is guided using a real-time X-ray fluoroscopy imaging system (Figure 1a). Catheter ablation is a time-consuming medical procedure, therefore tools to speed up the process are of great interest. An important tool is the automatization of cardiac mapping, i.e. creating a 3D map of cardiac activation patterns over the entire heart. By using bi-plane fluoroscopy, the 3D position of the catheters can be estimated and can be used to superimpose the cardiac activation sequences onto fluoroscopic images. Different research groups and companies are working on this problem, see e.g. [1,2,3]. In these papers, segmentation of the catheters, and especially the electrodes, is considered an important but difficult task to automate. Kynot et al. [1] have proposed an algorithm R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 25–32, 2006. c Springer-Verlag Berlin Heidelberg 2006
26
E. Franken et al. (a)
(b) Image Calculate local feature images
ECG sticker EP catheter tip Border of the heart EP catheter tip ECG sticker
r
β
q Local features
Enhance using steerable tensor voting
r˜
β˜
q˜ Enhanced features
High−level EP catheter extraction Catheter positions
Fig. 1. (a) An example of a typical EP X-ray fluoroscopy image acquired during a clinical intervention. We only want to detect the EP catheters, not the other visible elongated structures. (b) Framework of our method. See text for details.
to detect the electrodes of the catheters, but problems remain in associating the electrodes with the catheters in the image. De Buck et al. [3] constructed an advanced cardiac mapping system, but still require the user to perform a manual segmentation of the catheter and the electrodes. Fallavollita et al. [2] developed a catheter tip detection algorithm based on thresholding of the X-ray image, but due to the noisy nature of fluoroscopy images, the performance is not satisfactory. In our present work we propose a method for automatic detection of EP catheters in noisy X-ray fluoroscopy images, without the need for any user intervention. Our method detects the catheter bodies as well as the corresponding electrodes. We restrict ourselves to the detection of catheters in a still image, i.e. only spatial information is used. Figure 1b shows the general framework of our EP catheter extraction process. The method is divided into three main stages. In the first stage, calculate local feature images, we perform preprocessing and use local filtering operations to calculate a number of local feature images. Because fluoroscopy images are noisy, these local feature images are unreliable. Therefore, the idea behind the next step is to use information from a larger spatial neighborhood, compared to the neighborhood of the local filters, to make the feature images more consistent and specifically enhance the elongated structures. For that purpose we use steerable tensor voting [4], which is based on tensor voting [5]. In the last stage, high-level EP catheter extraction, the enhanced feature images generated by the previous step are used to finally decide where the EP catheters are located. EP catheter-specific properties are used to discriminate the catheters from other line structures. These three stages will be explained in the next sections. The paper will conclude with an evaluation on clinical images and a discussion.
Detection of Electrophysiology Catheters in Noisy Fluoroscopy Images
2
27
Local Feature Detection
Prior to any filtering we first apply background equalization to remove disturbing structures in the background. We apply a morphological closing operation with a disc-shaped structure element on a slightly blurred version of the input image, to get an image that only contains background structures. Each pixel of the original image is divided by the corresponding pixel in this image to cancel out the background structures. The second order differential structure of an image gives important information about line-like and blob-like structures. Therefore, we construct a Hessian matrix for each pixel position by calculating second order Gaussian derivatives, see e.g.[6]. We use the 2 eigenvalues λ1 and λ2 with λ1 > λ2 and corresponding eigenvectors e1 and e2 to obtain the following feature images – A local ridgeness image s(x, y) = max(λ1 (x, y), 0), indicating the likelihood that the image contains a line segment at position (x, y). We use λ1 because its value exactly corresponds to the value one would find when seeking the maximum response of the second order derivative applied in all different orientations. We only keep positive values because we know that catheters are always dark relative to the background. – A local orientation image β(x, y) = ∠e1 (x, y) (where “∠ denotes the angle of the eigenvector relative to the horizontal direction), indicating the most likely orientation of a line segment at position (x, y). The orientation of the first eigenvector corresponds to the orientation at which the response of the second order derivative is maximum. – A local blobness image q(x, y) = max(λ2 (x, y), 0), indicating the likelihood that the image contains a blob-like structure at position (x, y). The motivation is that λ2 > 0 implies that λ1 > λ2 > 0, which corresponds to a locally concave shape.
3
Contextual Enhancement by Steerable Tensor Voting
To enhance the noisy local ridgeness and orientation measures, we need a model for the continuation of curves in images. For tensor voting [5] we need a function w(x) that indicates the probability that a certain curve passes through position x, given that the same curve passes the origin (0, 0) horizontally. In addition we want to know the most probable angle γ(x) of this curve at position x. Different choices are possible for w and γ [4]. In this work, we base the choice on the Gestalt principles [7] of proximity (closer curve segments are more likely to belong together) and good-continuation (low curvature is favored over large curvature), and get the following functions w(x) = e
−
r2 2 2σctx
cos2ν φ and γ(x) = 2φ,
with x = (r cos φ, r sin φ)
(1)
where σctx > 0 is the scale (size) of the function (i.e. this parameter controls the proximity), and ν ∈ N determines the angular specificity of the function
28
E. Franken et al. (a)
(b)
(c) e1
e2
s b
λ2
λ1
β
Fig. 2. (a) Example of a stick voting field. Gray scale indicates the value of w(x) (darker means higher value) and the line segments indicate the orientation γ(x). (b) Graphical representation of a second rank symmetric semi-positive definite tensor. (c) Illustration of tensor voting illustrated for a single voter at position x , which broadcasts a vote to its neighbors.
(i.e. this parameter controls the good-continuation). The function for γ expresses the cocircularity constraint, meaning that the most likely connections between two points (with one orientation imposed) is assumed to be a circular arc. The model is visualized in Figure 2a. In tensor voting, w and γ are combined to generate a tensorial filter kernel V : R2 → R2 × R2 (i.e. a function that assigns a 2 × 2 matrix to all spatial positions), called the stick voting field, as follows γ(x) V(x) = w(x) c(x) c(x)T with c(x) = cos (2) sin γ(x) . Notice that due to this construction all matrices V(x) have a largest eigenvalue λ1 (x) = w(x), and the other eigenvalue λ2 (x) = 0. The input for tensor voting is the local ridgeness image s and orientation image β and the output of the method is a tensor field. The operation is intuitively displayed in Figure 2c. The operational definition is U(x) = s(x ) Vβ(x ) (x − x ) dx , (3) Ω
where Vβ (x) is the tensorial voting field rotated over β where rotation is achieved as follows β − sin β −1 Vβ (x) = Rβ V(R−1 Rβ = cos . (4) β x)Rβ , sin β cos β Since s(x) > 0 ∀x ∈ Ω, cf. the definition in the previous section, and due to the way V is constructed, cf. (2), all tensors in the resulting tensor field ˜1 , λ ˜ 2 and U are positive semi-definite and symmetric. From the eigenvalues λ ˜ ˜1 , e ˜2 of these tensors we calculate enhanced feature images s˜, β, eigenvectors e and q˜, as follows (omitting spatial coordinates for simplicity) ˜1 − λ ˜2 , s˜ = λ
β˜ = ∠˜ e1 ,
˜1 − λ ˜ 2 ). q˜ = q · (λ
(5)
In tensor voting terminology, s˜ is referred to as stickness and is a measure for orientation certainty, see Figure 2b.
Detection of Electrophysiology Catheters in Noisy Fluoroscopy Images
29
In the EP catheter detection algorithm, we perform two subsequent tensor voting steps. The first one is performed on the local feature images s and β. The enhancement is not always sufficient. To get more consistent curves, directional non-maximum suppression (thinning) is applied on the resulting stickness image to keep the centerlines of the curves, followed by a second tensor voting step with the thinned image as input. Equation (3) is not straightforward to implement in an efficient way, due to the necessity to constantly rotate the voting field V. Therefore we developed a generic method called steerable tensor voting [4]. The idea is to write the tensorial voting field as a steerable filter [8] which allows us to implement eq. (3) simply as a sum of a number of complex-valued convolutions, allowing a reduction of complexity from O(n4 ) to O(n2 log n) where n is the number of pixels in one dimension. This algorithm is might also be very suitable to implement on graphical processing unit (GPU).
4
High-Level Extraction of the EP Catheters
In the last part of the algorithm, we use specific knowledge about EP-catheters to extract them. We will explain it briefly here, and refer to Figure 3a for a schematic overview. The algorithm consists of three modules. The first module is path extraction. The ridgeness and orientation images s˜ and β˜ are used to perform directional non-maximum suppression, resulting in an image with curves of 1 pixel thickness. From this image we extract a number of most salient connected pixel strings (the paths). If a path exhibits very high curvature or is likely to be part of branching lines, the path is split to allow proper reconnection in the subsequent steps. From the resulting paths, a path graph is created, which has connections between paths whose spatial positions and orientations make it probable that they belong to the same global elongated object in the image. The second module is electrode extraction and grouping. From the blobness image q˜ the most salient local maxima are extracted as electrode candidates. Using the extracted paths and knowledge of typical distances between electrodes, a graph is created with connections between all candidates that might be neighboring electrodes on a catheter. Then, we scan for groups of connected electrodes in this graph that have the best match with the known properties of electrode spacing on EP catheters. These electrode groups and the extracted paths are used to create catheter tip paths, describing the curves in the image that connect all electrodes of a catheter with each other. The catheter tip is now fully extracted (which might already be enough for most applications), but in the third module, path grouping, our algorithm also attempts to extract the entire catheter. Using the catheter tips and the path graph, different reasonable extensions of the catheter are considered. The best extension is selected based on a global criterion involving minimization of curvature and change of curvature.
30
E. Franken et al. (b) (a) Orientation
Apply thinning
Blobness
Extract most salient local maxima
Thinned ridgeness image
Extract most salient pixel strings
Electrode candidates
Create connections between electrodes Electrode graph
Paths
Split paths at discontinuities
Electrode extraction and grouping
Path extraction
Ridgeness
(c)
Group electrodes
Paths
Electrode groups
Create path graph
Create catheter tip paths Catheter tips
Path graph
Catheter candidates
Path grouping
Find possible extensions for catheter tips
Select best extensions using global criteria
Catheters
Fig. 3. (a) Schematic overview of the high-level extraction method. (b) three example images from the clinical the test set. Upper row: without additional noise. Lower row: with artificially added multiplicative Poisson noise, which is added to investigate the noise robustness. (c) Extraction results on the test set for low noise and high noise images, and with and without tensor voting. The colours indicate extracted catheter tips (%tip), extracted additional catheter segments (%tip+ext), and extracted entire catheters (% entire). The grey vertical lines with horizontal serifs indicate confidence intervals of 95%.
5
Results
We implemented the EP catheter extraction algorithm partly in Mathematica and partly in C++. The parameters for local feature detection and tensor voting were optimized on 6 different images using the signal-to-background ratio as criterion. The parameter values are (512 × 512 pixel images): scale of Gaussian derivatives σlocal = 3.4 pixels, angular specificity of the voting field ν = 4, scale of the voting field σctx = 15 pixels, and scale of the voting field for the second tensor voting step σctx2 = 7.5 pixels. The parameter of the high-level extraction part were optimized using a test set of 10 images. Since these test sets are small, we think that the parameters can be further optimized.
Detection of Electrophysiology Catheters in Noisy Fluoroscopy Images 1
2
3
4
5
6
7
8
9
31
Fig. 4. EP catheter extraction example. 1 - Original image. 2 - Original image with additional noise, used as input for this example. 3 - Background equalized image. 4 Local ridgeness image. 5 - Blobness image. 6 - Result of first tensor voting step. 7 Result of a second tensor voting step. 8 - Extracted paths and electrode candidates. 9 - Final extraction result.
We used an evaluation set of 50 randomly selected X-ray images acquired during 4 clinical interventions (see Figure 3b), without any image that was used for parameter tuning. These images contain 103 EP catheters that all contain from 4 up to 10 electrodes. The catheters were extracted both with tensor voting and without tensor voting (by simply skipping this step) and both with and without added multiplicative Poisson noise. Each catheter extraction result was assigned to one of the following categories: (1) catheter not detected at all, (2) successful extraction of the tip only, i.e. the part containing the electrodes, (3) successful extraction of the tip plus an additional catheter segment with the same length, and (4) successful extraction of the entire catheter. Figure 3c displays the results of the catheter extraction, showing that especially the tip detection is successful. The increase in performance due to tensor
32
E. Franken et al.
voting is high. For example, the success rate of catheter tip extraction increased from 57% to 80% and from 43% to 72% for low and high noise images respectively. The success rates on extraction of tip+extension and extraction of the entire catheter are still low, especially in noisy images. For clinical practice, however, tip extraction is most relevant. Figure 4 shows an example of the entire EP catheter extraction algorithm.
6
Discussion
We introduced an algorithm for the extraction of EP catheters in fluoroscopy images. One novelty of this work is that we are able to extract the EP catheter fully automatically, without an initial seed, by using an advanced EP catheterspecific high-level extraction algorithm. A further novelty is the use of steerable tensor voting to contextually enhance our image information, thus creating more noise robustness, as shown in the evaluation. The implementation is currently still too slow for clinical use. The algorithm can, however, be implemented more efficiently. This is an important area of future research. Finally, it should also be noted that both the extraction results and the computational performance could be greatly improved by including the information from previous frames in the image sequence.
References 1. Kynor, D.B., Dietz, A.J., Friets, E.M., Peterson, J.N., Bergstrom, U.C., Triedman, J.K., Hammer, P.E.: Visualization of cardiac wavefronts using data fusion. In Sonka, M., ed.: Medical imaging 2002: Image Processing. Proceedings of the SPIE. Volume 4684., SPIE-Int. Soc. Opt. Eng (2002) 1186–1194 2. Fallavollita, P., Savard, P., Sierra, G.: Fluoroscopic navigation to guide RF catheter ablation of cardiac arrhythmias. In: 26th Annual International Conference of the Engineering in Medicine and Biology Society. (2004) 3. De Buck, S., Maes, F., Ector, J., Bogaert, J., Dymarkowski, S., Heidbchel, H., Suetens, P.: An augmented reality system for patient-specific guidance of cardiac catheter ablation procedures. IEEE Transactions on Medical Imaging, Vol. 24, No. 11, November 2005 24 (2005) 1512–1524 4. Franken, E., van Almsick, M., Rongen, P., Florack, L., ter Haar Romeny, B.: An efficient method for tensor voting using steerable filters. In: ECCV 2006. (2006) 228–240 5. Medioni, G., Lee, M.S., Tang, C.K.: A Computational Framework for Segmentation and Grouping. Elsevier (2000) 6. ter Haar Romeny, B.M.: Front-end vision vision and multi-scale image analysis. Kluwer Academic Publishers (2003) 7. Wertheimer, M.: Untersuchungen zur Lehre von der Gestalt II. Psychologische Forschung (1923) 301–350 8. Freeman, W.T., Adelson, E.H.: The design and use of steerable filters. IEEE Trans. Pattern Analysis and Machine Intelligence 13 (1991) 891–906
Fast Non Local Means Denoising for 3D MR Images Pierrick Coup´e1 , Pierre Yger1,2 , and Christian Barillot1 1
2
Unit/Project VisAGeS U746, INSERM - INRIA - CNRS - Univ-Rennes 1, IRISA campus Beaulieu 35042 Rennes Cedex, France ENS Cachan, Brittany Extension - CS/IT Department 35170 Bruz, France {pcoupe, pyger, cbarillo}@irisa.fr http://www.irisa.fr/visages
Abstract. One critical issue in the context of image restoration is the problem of noise removal while keeping the integrity of relevant image information. Denoising is a crucial step to increase image conspicuity and to improve the performances of all the processings needed for quantitative imaging analysis. The method proposed in this paper is based on an optimized version of the Non Local (NL) Means algorithm. This approach uses the natural redundancy of information in image to remove the noise. Tests were carried out on synthetic datasets and on real 3T MR images. The results show that the NL-means approach outperforms other classical denoising methods, such as Anisotropic Diffusion Filter and Total Variation.
1
Introduction
Image processing procedures needed for fully automated and quantitative analysis (registration, segmentation, visualization) require to remove noise and artifacts in order to improve their performances. One critical issue concerns therefore the problem of noise removal while keeping the integrity of relevant image information. This is particularly true for various MRI sequences especially when they are acquired on new high field 3T systems. With such devices, along with the improvement of tissue contrast, 3T MR scans may introduce additive artifacts (noise, bias field, geometrical deformation). This increase of noise impacts negatively on quantitative studies involving segmentation and/or registration procedures. This paper focuses on one critical aspect, image denoising, by introducing a new restoration scheme in the 3D medical imaging context. The proposed approach is based on the method originally introduced by Buades et al. [2] but with specific adaptations to medical images. To limit a highly expensive computational cost due to the size of the 3D medical data, we propose an optimized and parallelized implementation. The paper is structured as follows: Section 2 presents a short overview of the Non Local (NL) means algorithm, Section 3 describes the proposed method with details about the original contribution, and Section 4 shows a comparative validation with respects to other well established denoising methods, and results obtained on a 3T MR scanner. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 33–40, 2006. c Springer-Verlag Berlin Heidelberg 2006
34
2
P. Coup´e, P. Yger, and C. Barillot
The Non Local Means Algorithm
First introduced by Buades et al. in [2], the Non Local (NL) means algorithm is based on the natural redundancy of information in images to remove noise. This filter allows to avoid the well-known artifacts of the commonly used neighborhood filters [4]. In the theoretical formulation of the NL-means algorithm, the restored intensity of the voxel i, N L(v)(i), is a weighted average of all voxel intensities in the image I. Let us denote: N L(v)(i) = w(i, j)v(j) (1) j∈I
where v is the intensity function and thus v(j) is the intensity at voxel j and w(i, j) the weight assigned to v(j) in the restoration of voxel i. More precisely, the weight quantifiesthe similarity of voxels i and j under the assumptions that w(i, j) ∈ [0, 1] and j∈I w(i, j) = 1. The original definition of the NL means algorithm considers that each voxel can be linked to all the others, but practically the number of voxels taken into account in the weighted average can be restricted in a neighborhood that is called in the following “search volume” Vi of size (2M +1)3 , centered at the current voxel i. In this search volume Vi , the similarity between i and j depends on the similarity of their local neighborhoods Ni and Nj of size (2d+1)3 (cf Fig. 1). For each voxel j in Vi , the averaged Euclidean distance − 22,a defined in [2], is computed between v(Nj ) and v(Ni ). This distance is a classical − 2 norm, convolved with a Gaussian kernel of standard deviation a, and is a measure of the distortion between voxel neighborhood intensities. Then,
Fig. 1. 2D illustration of the NL-means principle. The restored value of voxel i (in red) is a weighted average of all intensities of voxels j in the search volume Vi , according to the similarity of their intensities neighborhoods v(Ni ) and v(Nj ).
Fast Non Local Means Denoising for 3D MR Images
35
these distances are weighted by the function defined as follows: 2
j )2,a 1 − v(Ni )−v(N h2 e (2) Z(i) where Z(i) is the normalization constant with Z(i) = j w(i, j), and h acts as a filtering parameter. In [2], Buades et al. show that for 2D natural images the NL-means algorithm outperforms the denoising state of art methods such as the Rudin-Osher-Fatemi Total Variation minimization procedure [8] or the Perona-Malik Anisotropic diffusion [7]. Nevertheless, the main disadvantage of the NL-means algorithm is the computational burden due to its complexity, especially on 3D data. Indeed, for each voxel of the volume, the algorithm has to compute distances between the intensities neighborhoods v(Ni ) and v(Nj ) for all the voxels j contained in V (i). Let us denote by N 3 the size of the 3D image, then the complexity of the algorithm is in the order of O((N (2M + 1)(2d + 1))3 ). For a classical MR image data of 181 × 217 × 181 voxels, with the smallest possible value of d = 1, and M = 5, the computational time reaches up to 6 hours. This time is far beyond a reasonable duration expected for a denoising algorithm in a medical practice, and thus the reduction of complexity is crucial in the medical context.
w(i, j) =
3
Fast Implementation of the Non Local Means Algorithm
There are two main ways to address computational time for the NL-means: the decrease of computations performed and the improvement of the implementation. Voxel Selection in the Search Volume. One recent study [5] investigated the problem of the computational burden with a neighborhoods classification. The aim is to reduce the number of voxels taken into account in the weighted average. In other words, the main idea is to select only the voxels j in V (i) that will have the highest weights w(i, j) in (1) without having to compute the Euclidean distance between v(Ni ) and v(Nj ). Neglecting a priori the voxels which are expected to have small weights, the algorithm can be speeded up, and the results are even improved (see Table 4.1). In [5], Mahmoudi et al. propose a method to preselect a set of the most pertinent voxels j in V (i). This selection is based on the similarity of the mean and the gradient of v(Ni ) and v(Nj ): intuitively, similar neighborhoods tend to have close means and close gradients. In our implementation, the preselection of the voxels in Vi that are expected to have the nearest neighborhoods to i is based on the first and second order moments of v(Ni ) and v(Nj ). The gradient being sensitive to noise level, the standard deviation is preferable in case of high level of noise. In this way, the maps of local means and local standard deviations are precomputed in order to avoid repetitive calculations of moments for one same neighborhood. The
36
P. Coup´e, P. Yger, and C. Barillot
selection tests can be expressed as follows:
w(i, j) = 0
− 1 e Z(i)
v(Ni )−v(Nj )2 2,a h2
if μ1 <
v(Ni ) v(Nj )
< μ2 and σ12 <
var(v(Ni )) var(v(Nj ))
< σ22
(3)
otherwise.
Parallelized Computation. Another way to deal with the problem of the computational time required is to share the operations on several processors via a cluster or a grid. In fact, the intrinsic nature of the NL-means algorithm allows to use multithreading, and thus to parallelize the operations. We divide the volume into sub-volumes, each of them being treated separately by one processor. A server with eight Xeon processors at 3 GHz was used in our experiments.
4
Results
4.1
Validation on Phantom Data Set
In order to evaluate the performances of the NL-means algorithm on 3D T1 MR images, tests are performed on the Brainweb database1 [3] composed of 181 × 217 × 181 images. The evaluation framework is based on comparisons with other denoising methods: Anisotropic Diffusion Filter (implemented in VTK2 ) and the Rudin-Osher-Fatemi Total Variation (TV) approach [8]. Several criteria are used to quantify the performances of each method: the Peak Signal to Noise Ratio (PSNR) obtained for different noise levels, histogram comparisons between the denoised images and the “ground truth”, and finally the visual assessment. In the following, the noise is a white Gaussian noise, and the percent level is based on a reference tissue intensity, that is in this case the white matter. For the sake of clarity, the PSNR and the histograms are estimated by removing the background. Peak Signal Noise Ratio. A common factor used to quantify the differences between images is the Peak Signal to Noise Ratio (PSNR). For images encoded on 8bits the PSNR is defined as follows: P SN R = 20 log10
255 RM SE
(4)
where the RMSE is the root mean square error estimated between the ground truth and the denoised image. As we can see on Fig. 2, our optimized NLmeans algorithm produces the best values of PSNR whatever the noise level. In average, a gain of 2.6dB is observed compared to the best method among TV and Anisotropic Diffusion, and a gain of 1.2dB compared to the classical NL-means. The PSNR between the noisy images and the ground truth is called “No processing” and is used as the reference for PSNR before denoising. 1 2
http://www.bic.mni.mcgill.ca/brainweb/ www.vtk.org
Fast Non Local Means Denoising for 3D MR Images
37
40
PSNR (in dB)
35
30
25
No Processing Anisotropic Diffusion Total Variation Classical NL−means Optimized NL−means
20
15
3
9
15
21
Noise level σ (in %)
Fig. 2. PSNR values for the three compared methods for different levels of noise. The PSNR between the noisy images and the ground truth is called “No processing” and is used as the reference for PSNR before denoising. For each level of noise, the optimized NL-means algorithm outperforms the Anisotropic Diffusion method, the Total Variation method, and the classical NL-means.
Histogram Comparison. To better understand how these differences in the PSNR between the three compared methods can be explained, we compared the histograms of the denoised images with the ground truth. On Fig. 3 it is shown that the NL-means is the only method able to retrieve a similar histogram as the ground truth. The NL-means restoration distinguishes clearly the three main peaks representing the white matter, the gray matter and the cerebrospinal fluid. The sharpness of the peaks shows how the NL-means increases the contrasts between denoised biological structures (see also Fig. 4). 0.07
0.06
Mean number of voxels
Mean number of voxels
0.06
0.07
Ground Truth Total Variation Anisotropic Diffusion Optimized NL−means
0.05
0.04
0.03
0.02
0.01
0 0
Ground Truth Total Variation Anisotropic Diffusion Optimized NL−means
0.05
0.04
0.03
0.02
0.01
20
40
60
80
100
120
140
Intensity gray level
Image with 9% of noise
160
0 0
20
40
60
80
100
120
140
160
Intensity gray level
Image with 15% of noise
Fig. 3. Histograms of the restored images and of the ground truth. The histogram of the NL-means restored image clearly better fits to the ground truth one. Left: image with 9% of noise, Right: image with 15% of noise.
Visual Assessment. Fig. 4 shows the restored images and the removed noise obtained with the three compared methods. As shown in the previous analysis, we can observe that the homogeneity in white matter is higher in the image
38
P. Coup´e, P. Yger, and C. Barillot
denoised by the NL-means algorithm. Moreover, if we focus on the nature of the removed noises, it clearly appears that the NL-means restoration preserves better the high frequencies components of the image (i.e. edges).
Anisotropic Diffusion
Total Variation
NL-means
Fig. 4. Top: details of the Brainweb denoised images obtained via the three compared methods for a noise level of 9%. Bottom: images of the removed noise, i.e. the difference between noisy images and denoised images, centered on 128. From left to right: Anisotropic Diffusion, Total Variation and NL-means.
Optimization Contribution. In all experiments, the typical values used for the NL-means parameters are d = 1 (i.e Card(Ni ) = 33 ), M = 5 (i.e Card(Vi ) = 113 ), μ1 = 0.95, μ2 = 1.05, σ12 = 0.5, σ22 = 1.5, and h is close to the standard deviation of the added noise, influencing the smoothness of the global solution. To obtain a significant improvement in the results, d can be increased, but it implies to increase M yielding to a prohibitive computational time. Table 4.1 summarizes the influence of the restriction of the average number of voxels taken into account in the search volume (see (3)) and the parallelization of the implementation. Those results demonstrate how the neighborhoods selection is useful for two reasons: the computational time is drastically reduced and the PSNR is even improved by the preselection of the nearest voxels while computing the weighted averages. Combined with multithreading, these two optimizations lead to an overall reduction
Fast Non Local Means Denoising for 3D MR Images
39
Table 1. Results obtained for standard and optimized NL-means implementations on a Brainweb T1 image of size 181 × 217 × 181 with 9% of noise (d = 1 and M = 5). The time shown for the standard NL-means is calculated on only one processor of the server described in 3. The time given for our optimized version corresponds to the time with the server, and the cumulative CPU time is shown between brackets.
PSNR in dB Mean number of voxels selected in Vi Computational time in second
Original image
Standard NL-Means Optimized NL-Means 32.70 34.19 113 = 1331 227 21790 434 (3162)
Restored image
Removed noise
Fig. 5. NL-means restoration of 3T MRI data of 2563 voxels with d = 1, M = 5 in less than 10 minutes. From left to right: Original image, denoised image, and difference image centered on 128. The whole image is shown on top, and a detail is exposed on bottom.
of the computational time by a factor 21790 434 50. This reduction factor is even more important when Card(Vi ) and Card(Ni ) (i.e M and d) increase. 4.2
Experiments on Clinical Data
To show the efficiency of the NL-means algorithm on real data, tests have been performed on a high field MR system (3T). In these images, the gain in resolution
40
P. Coup´e, P. Yger, and C. Barillot
being obtained at the expense of an increase of the level of noise, the denoising step is particularly important. The restoration results, presented in Fig. 5, show good preservation of the basal ganglia. Fully automatic segmentation and quantitative analysis of such structures are still a challenge, and improved restoration-schemes could greatly improve these processings.
5
Conclusion and Further Works
This paper presents an optimized version of the Non Local (NL) means algorithm, applied to 3D medical data. The validations performed on Brainweb dataset [3] bring to the fore how the NL-means denoising outperforms well established other methods, such as Anisotropic Diffusion [7] and Total Variation [8]. If the performances of this approach clearly appears, the reduction of its intrinsic complexity is still a challenging problem. Our proposed optimized implementation, with voxel preselection and multithreading, considerably decreases the required computational time (up to a factor of 50). Further works should be pursued for comparing NL-means with recent promising denoising methods, such as Total Variation on Wavelet domains [6] or adaptative estimation method [1]. The impact of this NL-means denoising on the performances of post-processing algorithms, like segmentation and registration schemes need also to be further investigated.
References 1. J. Boulanger, Ch. Kervrann, and P. Bouthemy. Adaptive spatio-temporal restoration for 4d fluoresence microscopic imaging. In Int. Conf. on Medical Image Computing and Computer Assisted Intervention (MICCAI’05), Palm Springs, USA, October 2005. 2. A. Buades, B. Coll, and J. M. Morel. A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation, 4(2):490–530, 2005. 3. D. L. Collins, A. P. Zijdenbos, V. Kollokian, J. G. Sled, N. J. Kabani, C. J. Holmes, and A. C. Evans. Design and construction of a realistic digital brain phantom. IEEE Trans. Med. Imaging, 17(3):463–468, 1998. 4. J.S. Lee. Digital image smoothing and the sigma filter. Computer Vision, Graphics and Image Processing, 24:255–269, 1983. 5. M. Mahmoudi and G. Sapiro. Fast image and video denoising via non-local means of similar neighborhoods. IMA Preprint Series, 2052, 2005. 6. A. Ogier, P. Hellier, and C. Barillot. Restoration of 3D medical images with total variation scheme on wavelet domains (TVW). In Proceedings of SPIE Medical Imaging 2006: Image Processing, San Diego, USA, February 2006. 7. P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell., 12(7):629–639, 1990. 8. L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D, 60:259–268, 1992.
Active Shape Models for a Fully Automated 3D Segmentation of the Liver – An Evaluation on Clinical Data Tobias Heimann, Ivo Wolf, and Hans-Peter Meinzer Medical and Biological Informatics, German Cancer Research Center, Heidelberg
[email protected] Abstract. This paper presents an evaluation of the performance of a three-dimensional Active Shape Model (ASM) to segment the liver in 48 clinical CT scans. The employed shape model is built from 32 samples using an optimization approach based on the minimum description length (MDL). Three different gray-value appearance models (plain intensity, gradient and normalized gradient profiles) are created to guide the search. The employed segmentation techniques are ASM search with 10 and 30 modes of variation and a deformable model coupled to a shape model with 10 modes of variation. To assess the segmentation performance, the obtained results are compared to manual segmentations with four different measures (overlap, average distance, RMS distance and ratio of deviations larger 5mm). The only appearance model delivering usable results is the normalized gradient profile. The deformable model search achieves the best results, followed by the ASM search with 30 modes. Overall, statistical shape modeling delivers very promising results for a fully automated segmentation of the liver.
1
Introduction
The computerized planning of liver surgery has an enormous impact on the selection of the therapeutic strategy [1]. Based on pre-operative analysis of image data, it provides an individual impression of tumor location, the exact structure of the vascular system and an identifiction of liver segments. The additional information can potentially be life-saving for the patient, since anatomical particularities are far easier to spot in a 3D visualization. The limiting factor to utilize operation planning in clinical routine is the time required for the segmentation of the liver, an essential step in the planning workflow which takes approximately one hour with conventional semi-automatic tools. For this reason, there have been numerous attempts to automate the segmentation process as much as possible. Soler et al. have presented a framework for a complete anatomical, pathological and functional segmentation of the liver [2], i.e. including detection of the vascular system and lesions. The method is based on a shape constrained deformable model [3], which is initialized with a liver template shape and deformed by a combination of global and local forces. Park et al. use an abdominal probabilistic atlas to support voxel classification in CT images [4]. The underlying R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 41–48, 2006. c Springer-Verlag Berlin Heidelberg 2006
42
T. Heimann, I. Wolf, and H.-P. Meinzer
classifier is based on a Gaussian tissue model with Markov Random Field regularization. Lamecker et al. have built a 3D statistical shape model of 43 segmented livers and utilize a modified Active Shape search for segmentation [5]. The point correspondences for their model are determined by a semi-automatic geometric method. It is interesting to note that all these approaches use some kind of shape information to guide the segmentation process: Due to its close proximity to organs with similar gray-value and texture properties, segmentation methods without prior information are prone to fail. At the same time, modeling shape information of the liver poses a huge challenge, since the enormous anatomical variance is hard to capture. Statistical Shape Models as introduced by Cootes et al. [6] seem to be best-suited for this task: In constrast to an anatomical atlas or a deformable model they do not only store information about the expected mean shape, but also about the principal modes of variation. Since Lamecker et al. evaluated a statistical shape model for liver segmentation, there have been several advances in 3D model building and search algorithms. The time seems ripe for a new evaluation of the approach, which we are going to present in this paper.
2 2.1
Preliminaries Statistical Shape Models
Statistical Shape Models capture shape information from a set of labeled training data. A popular method to describe these shapes are point distribution models [6], where each input shape is specified by a set of n landmarks on the surface. Applying principal component analysis to the covariance matrix of all landmarks delivers the principal modes of variation pm in the training data and the corresponding eigenvalues λm . Restricting the model to the first c modes, all valid shapes can be approximated by the mean shape x ¯ and a linear combination of displacement vectors. In general, c is chosen so that the model explains a certain amount of the total variance, usually between 90% and 99%. In order to describe the modeled shape and its variations correctly, landmarks on all training samples have to be located at corresponding positions. 2.2
Gray-Level Appearance Models
To fit the generated shape model to new image data, a model of the local appearance around each landmark in the training data is necessary. For this purpose, Cootes et al. suggest to sample profiles perpendicular to the model surface at each landmark [7]. Typically, these profiles contain the immediate gray-level values or their normalized derivatives. By collecting profiles from all training images, a statistical appearance model featuring mean values and principal variations can be constructed for each landmark. The probability that an arbitrary sample is part of the modeled distribution can then be estimated by the Mahalanobis distance between the sample and the mean profile.
ASM for a Fully Automated 3D Segmentation of the Liver
2.3
43
Active Shape Model Search
Starting from the mean shape and an initial global transform to the image, the shape model is refined in an iterative manner: First, the fit of the gray-level appearance model is evaluated at several positions along the surface normal for each landmark. Subsequently, the global transform and the shape parameters ym are updated to best match the positions with the lowest Mahalanobis distance. To keep the√variation in reasonable limits, the shape parameters are restricted either to ±3 λm individually or to a hyperellipsoid for the entire shape vector y. By repeating these steps, the model converges toward the best local fit. To make the method more robust regarding the initial position, usually a multi-resolution framework is employed [7]: The model is first matched to coarser versions of the image with a larger search radius for the new landmark positions.
3 3.1
Material and Methods Image Data
The data used in the experiments has been collected over a period of five years of computerized operation planning and clinical studies at our research center. All images are abdominal CT scans enhanced with contrast agent and recorded in the venous phase, though the exact protocol used for acquisition changed over time. The resolution of all volumes is 512x512 voxels in-plane with the number of slices varying between 60 and 130. The voxel spacing varies between 0.55mm and 0.8mm in-plane, the inter-slice distance is mostly 3mm with a few exceptions recorded with 5mm slice distance. From 99 scans that have been labeled by radiologic experts, eight had to be taken out of the experiments because of abnormal anatomy, e.g. in cases where parts of the liver have been removed by surgery or where tumors exceeding the volume of one liter deform the surroundings. The quality of the individual segmentations, created with a selection of different manual and semiautomatic tools, varies depending on the application they were created for. For some datasets the V.Cava is segmented as part of the liver, for others it is left out. We wanted to build a model without the V.Cava (as it is used for operation planning) and elected 32 datasets with high quality segmentations as training samples for the shape model. The chosen segmentations were smoothed with a 3D Gaussian kernel and converted to a polygonal representation by the Marching-Cubes algorithm. The resulting meshes were then decimated to contain around 1500–2000 vertices and forwarded to the model-building process. The remaining 59 CT volumes were treated as candidates for the evaluation process. 3.2
Model Building
While Lamecker et al. use a semi-automatic procedure to determine correspondences, we employ a fully automated approach that minimizes a cost function
44
T. Heimann, I. Wolf, and H.-P. Meinzer
based on the description length of the model [8], which should deliver better generalization ability and specificity. The shape model was built with 2562 landmarks which were distributed equally over the model surface employing the landmark reconfiguration scheme we recently presented in [9]. For the gray-level appearance models, a multi-resolution image pyramid was created for each CT volume that was used during the creation of the shape model. We opted for six different levels of resolution R0 to R5 , where Rn corresponds to the n-th downsampling step (with R0 as the original resolution). For each down-sampling step, x- and y-resolution were halved, leading to a doubled voxel spacing. When the xy-spacing reached the same values as the (originally much higher) z-spacing, the z-resolution was halved as well. Finally, we calculated three different graylevel appearance models for each resolution: A plain intensity profile, a gradient profile and a normalized gradient profile. 3.3
Evaluation of Gray-Level Appearance Models
To evaluate the performance of the different appearance models, we employ the following procedure: For all training images, the fit of the gray-level appearance models is evaluated at the true landmark positions and at three positions on each side of the surface. To simulate the conditions during model search (where landmarks are most probably not located at the correct positions), we randomize the landmark position with a standard deviation of 1mm in R0 along the surface (doubled at each following resolution). At the same time, the direction of the normal vector is randomized with a standard deviation of approximately ten degrees. This way, 20 samples are extracted for each landmark in each image. The index of the position with the best fit (ranging from -3 to 3) is stored and used to generate a histogram of the displacements for each resolution. Ideally, the appearance model should always achieve the best fit at the true position (displacement 0), in practice we expect to see a Gaussian distribution with a certain variance. 3.4
Alternative Model Search Algorithm
In the classical ASM approach, the model is strictly constrained to the variations learned from the training data. To allow additional degrees of freedom, Weese et al. presented a search method with shape constrained deformable models [10]. They calculate the external energy from the fit of gray-value profiles (similar to the original ASM search) and the internal energy based on the length difference between corresponding edges in the deformable mesh and the closest shape model. A conjugate gradient algorithm is used to minimize the weighted sum of both energies, varying the position of all vertices. We adopt the idea of allowing free deformations guided by the difference in edge length, but simplify the method for an easier integration into the ASM search algorithm as described in Sec. 2.3: When the new landmark candidates are found, a spring-model deformable mesh is initialized on these points with the neutral positions for all springs set to the corresponding edge length of the
ASM for a Fully Automated 3D Segmentation of the Liver
45
closest shape model. This mesh is then iteratively relaxed according to: xi = xi +
j∈N (i)
di,j δ
|di,j | − |mi,j | |di,j |
(1)
where xi are the coordinates of the ith point, N (i) denotes the neighbours of vertex i, and di,j and mi,j are directed edges of the deformable mesh and the model, respectively. δ is set to 0.05 in our experiments, and a number of 100 iterations is run for relaxation. 3.5
Evaluation of Model Search
Initially, the shape model is scaled to a fixed size (around the average liver size) and translated to the upper left part of the image volume (corresponding to the right side of the patient). Without any further interaction, this procedure leads to the model being attracted by the liver in the vast majority of cases. In 11 cases, however, the image volume was recorded at a different patient position or slightly rotated, so that the search would not converge towards the liver. Instead of devising special initial transforms for these images, we decided to drop them from the evaluation set and ended up with 48 volumes for the final validation. Initial experiments suggested that the best starting resolution for the search is R4 , since many profiles leave the image volume in R5 , reducing the information of the appearance model. We run a fixed number of 10 iterations for R4 and R3 each, which usually brings the model pretty close to the liver. To deal with the remaining cases, the search in R2 is run until convergence, which is defined as a maximum landmark movement of 0.5mm. Subsequently, 10 iterations in R1 and R0 each are sufficient to fine-tune the model to the image data. This method was performed with three different search strategies: The ASM search with 10 modes of variation for the model (1), the ASM search with an increased 30 modes of variation in R2 to R0 (2) and the deformable model search with 10 modes of variation (3). In the latter case, the deformable model was only used in R1 and R0 , the previous resolutions were handled as in method 1. For all methods, the shape parameter vector y was restricted to lie inside a hyperellipsoid (size determined by the χ2 distribution). Originally, we planned to evaluate a combination of these methods with all created appearance models, but it quickly became evident that only the normalized gradient appearance model delivered usable results (see Sec. 4.2). After the last iteration in R0 terminates, the resulting mesh is rasterized into a volume of the same resolution and compared to the existing segmentation. A number of different comparison measures is used for this purpose: The Tanimoto coefficient which quantifies the overlap of two volumes as CT = |A ∩ B|/|A ∪ B|, average and RMS surface distance and the ratio of the surface area with a deviation larger than 5mm. All surface metrics were calculated in both direction to guarantee symmetry.
46
T. Heimann, I. Wolf, and H.-P. Meinzer
Fig. 1. Visualization of the variance of the created shape √ model: The left column shows the variation of the largest eigenmode between ±3 λ1 , the medium and right column of the second and third largest eigenmode, respectively
4 4.1
Results Model Building
For a detailed evaluation of the model building process, we refer the reader to [9]. The three largest modes of variation are displayed in Fig. 1 and seem to capture the encountered shape variability adequately. 90 percent of the total variance is explained by the first 10 modes of variation (used for search methods 1 and 3), while the 30 modes used for method 2 account for 99.9 percent. 4.2
Evaluation of Gray-Level Appearance Models
The results of the displacement from true position analysis are displayed in Fig. 2. While we expected a Gaussian distribution for all appearance models, only the normalized gradient profile produces symmetric displacements. In contrast, the plain intensity and gradient profiles feature a clear shift to the inside of the shape model (negative displacement values). 4.3
Evaluation of Model Search
Figure 3 displays boxplots of the results of the automatic segmentation using the three different search techniques. The results of the volumetric error are specified as 100(1 − CT ) (CT being the Tanomoto coefficient). For all four measures of segmentation quality, the ASM search with 30 modes of variance yields better results than the search with 10 modes. For one test dataset, the search with 30 modes did not converge in R2 , this image was omitted from the statistics of method 2. The best overall results are accomplished with the deformable model search.
ASM for a Fully Automated 3D Segmentation of the Liver
47
Fig. 2. Histograms showing the displacements from the true landmark positions at different resolutions R0 to R5 . From left to right: Intensity, gradient and normalized gradient profile appearance models.
Fig. 3. Results of the segmentation using the normalized gradient appearance model (1=ASM with 10 modes, 2=ASM with 30 modes, 3=deformable model with 10 modes). The box connects the 1st and 3rd quartiles of all values with the dot representing the median, the whiskers span the interval between the 0.05 and 0.95 quantiles.
5
Discussion
Comparing our results to the ones obtained in [5] (2.3–3.0mm average surface distance, 3.1–3.9mm RMS distance and 9.0–17.1% deviations larger 5mm for a varying number of parameters during model search) does not reveal significant differences. However, it is hard to draw any conclusions from this, mainly because different training and evaluation data was used in the experiments. Consequently, the here presented numbers should only be interpreted relative to each other. Having evaluated our shape model on nearly 50 clinical datasets, we are confident that a statistical shape modeling approach is able to solve the segmentation problem for liver operation planning for the vast majority of cases in the near future. However, we also noticed several problems: While 32 training shapes do not build the most extensive shape model, it is a sufficiently high number to draw the conclusion that the necessary shape variability for an exact segmentation of the liver will probably not be reached by the strictly constrained ASM
48
T. Heimann, I. Wolf, and H.-P. Meinzer
model. Approaches using deformable meshes based on the shape model seem to have a much higher potential of solving this task. Considering the simplicity of the deformable model enhancement, the obtained improvements for already acceptable results of the ASM are excellent. Since the deformable model was only used in the two highest resolutions, it could not save the performance in the worst case results, as is noticeable by the upper whiskers in the boxplots. A better initialization of the shape model and restriction of the allowed geometric transformations (rotation and scale) seem to be necessary in these cases. We were surprised by the disappointing results from the intensity and unnormalized gradient appearance models, which are probably due to the skew distribution visible in the displacement histograms. However, there are many more possibilites to model the local appearance (e.g. [11]) which will most likely improve the obtained results. Our future work will focus on evaluating these alternative appearance models and on improving the deformable model search with more sophisticated search and relaxation schemes.
References 1. Meinzer, H.P., Thorn, M., Cardenas, C.E.: Computerized planning of liver surgery – an overview. Computers & Graphics 26 (2002) 569–576 2. Soler, L., Delingette, H., Malandain, G., Montagnat, J., et al.: Fully automatic anatomical, pathological, and functional segmentation from ct scans for hepatic surgery. In: Proc. SPIE Medical Imaging. (2000) 246–255 3. Montagnat, J., Delingette, H.: Volumetric medical images segmentation using shape constrained deformable models. In: CVRMed. (1997) 13–22 4. Park, H., Bland, P.H., Meyer, C.R.: Construction of an abdominal probabilistic atlas and its application in segmentation. IEEE TMI 22 (2003) 483–492 5. Lamecker, H., Lange, T., Seebass, M.: Segmentation of the liver using a 3D statistical shape model. Technical report, Zuse Institute, Berlin (2004) 6. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models – their training and application. CVIU 61 (1995) 38–59 7. Cootes, T.F., Taylor, C.J.: Statistical models of appearance for computer vision. Technical report, Wolfson Image Analysis Unit, University of Manchester (2001) 8. Heimann, T., Wolf, I., Williams, T.G., Meinzer, H.P.: 3d active shape models using gradient descent optimization of description length. In: Proc. IPMI, Springer (2005) 566–577 9. Heimann, T., Wolf, I., Meinzer, H.P.: Optimal landmark distributions for statistical shape model construction. In: Proc. SPIE Medical Imaging. (2006) 518–528 10. Weese, J., Kaus, M., Lorenz, C., Lobregt, S., et al.: Shape constrained deformable models for 3D medical image segmentation. In: Proc. IPMI, Springer (2001) 380– 387 11. van Ginneken, B., Frangi, A.F., Staal, J.J., ter Haar Romeny, B.M., Viergever, M.A.: Active shape model segmentation with optimal features. IEEE TMI 21 (2002) 924–933
Patient Position Detection for SAR Optimization in Magnetic Resonance Imaging Andreas Keil1,3 , Christian Wachinger1 , Gerhard Brinker2 , Stefan Thesen2 , and Nassir Navab1 1
Chair for Computer Aided Medical Procedures (CAMP), TU Munich, Germany {keila, wachinge, navab}@cs.tum.edu 2 Siemens Medical Solutions, Erlangen, Germany 3 Chirurgische Klinik und Poliklinik, Klinikum Innenstadt, Munich, Germany
Abstract. Although magnetic resonance imaging is considered to be non-invasive, there is at least one effect on the patient which has to be monitored: The heating which is generated by absorbed radio frequency (RF) power. It is described using the specific absorption rate (SAR). In order to obey legal limits for these SAR values, the scanner’s duty cycle has to be adjusted. The limiting factor depends on the patient’s position with respect to the scanner. Detection of this position allows a better adjustment of the RF power resulting in an improved scan performance and image quality. In this paper, we propose real-time methods for accurately detecting the patient’s position with respect to the scanner. MR data of thirteen test persons acquired using a new “move during scan” protocol which provides low resolution MR data during the initial movement of the patient bed into the scanner, is used to validate the detection algorithm. When being integrated, our results would enable automatic SAR optimization within the usual acquisition workflow at no extra cost.
1
Introduction
Recent developments in magnetic resonance imaging (MRI) lead to improvements in the signal-to-noise ratio (SNR) which are especially needed for highresolution and high-speed imaging (e.g., functional imaging). In order to achieve this, the field strength of the static B0 field is increased (to 3 T in current products) which in turn requires higher frequencies for the B1 field emitted by the radio frequency (RF) coils. The energy deposition is proportional to the squared B1 frequency. Together with dielectric effects occurring at wavelengths close to the dimensions of the human body, this generates more heating of the patient. This heating is modeled using the specific absorption rate (SAR), given in W/kg. The international standard [1] requires multiple SAR limits to be complied with (see Section 2.2). They are derived from the requirement of limiting the temperature rise due to RF energy to 1 ◦ , 2 ◦ , and 3 ◦ for head, torso, and extremities, respectively. Staying within these limits is achieved by adjusting the duty cycle (through adjusting repetition time or slice thickness), the flip angle, or the pulse form. If the SAR cannot be estimated accurately, rather large safety margins are required which in turn reduce scan efficiency and/or image quality. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 49–57, 2006. c Springer-Verlag Berlin Heidelberg 2006
50
A. Keil et al.
Addressing this issue requires the development of more accurate monitoring and control algorithms for the RF energy applied to the patient. These algorithms in general have to solve two problems: First, a patient model is needed for simulating SAR values inside humans. Second, for transfering this simulation data to an actual patient for SAR estimation, the positioning of the patient has to be detected. The latter is the topic of the work we are presenting here. Since patients can only be positioned inside a tubular MR scanner in a few different ways (a patient may not lie diagonal or crossways inside the tube), the detection of the positioning effectively reduces to a detection of the patient’s axial position with respect to the scanner. In order to correctly align an SAR model with the actual patient, current systems rely on the manual input of a few patient parameters (usually weight, height, sex, and age) by the doctor. Furthermore, the patient’s head has to be positioned onto a given spot on the bed. There are numerous drawbacks associated with this procedure: Not only is it tedious for the radiology staff but it also takes valuable time from the MRI system. In addition, entering parameters and positioning a patient is error-prone (due to estimation, erroneous weight/height declarations by patients, inexact positioning, and reluctance to enter reasonable values). Therefore, we investigate the possibility of detecting the position relative to the scanner using image processing. Usually, there are at least two sensors already available for performing such a detection: An optical camera which is used to monitor the patient from the operating room, and the MR imaging device itself. Since the latter is already integrated with the workstation software and new protocols for fast scans (called “move during scan”) will be available in products soon, a fast and low-dose prescan will provide the desired input.
2 2.1
Related Work SAR Estimation
To our knowledge, so far only [2] explicitly suggested the design of a “smart scan software” for better adjusting scan parameters to individual patients. But of course, the purpose of all the work done on simulating/estimating SAR values is to improve the scan parameter adjustment. These simulations (e.g., [2,3]) are often based on the Finite Difference Time Domain (FDTD) method and use patient models generated from whole body MR scans like The Visible Human R Project [4]. The following section is based on the results of [5]. 2.2
Critical Limits for SAR
Table 1 is reproduced from the International Electrotechnical Comission’s standard [1, section 51.103.2] and shows the global SAR limits for two different operating modes. This standard also defines rules on which subset of limits has to be obeyed: For exposure with volume RF transmit coils like an integrated
Patient Position Detection for SAR Optimization in MRI
51
whole body coil, only the global SAR limits from Table 1 apply. In all other cases (using local RF transmit coils), additional local limits apply. Since position detection is not an issue when using local coils (they are usually positioned accurately), we will concentrate on body coils. Table 1. Global SAR limits from [1, section 51.103] for an averaging time of 6 min
Normal Mode First Level Mode a
Whole Body
Exposed Body Parta
Head
2 W/kg 4 W/kg
2 W/kg – 10 W/kg 4 W/kg – 10 W/kg
3.2 W/kg 3.2 W/kg
Scaling is coupled to the ratio “exposed body mass / total patient mass”
A simulation of the whole body, exposed body part, and head SAR is shown in Fig. 1(a). Although restrictions exist for each of these three values, it is clear that only one of them is a limiting factor to the system. Introducing the SAR-to-limit ratio (STLR) and replotting this graph (see Fig. 1(b)) clearly shows that the neck area is crucial, because it defines the transition of the limiting STLR value from head to whole body. See also the symmetry of the energy (proportional to squared field strength) of the limit-adjusted B1 field to the head and whole body STLR, respectively. Therefore, we are especially concerned about neck detection. The plot of the adjusted B12 clearly shows that scan performance / image quality is lost in the head and feet sections when setting a flat B1 limit (given by the minimal adjusted B12 at the patient’s torso). NORMAL
SAR
Whole Body Exposed Body Part Head
70 60 50 40 30 20 10 0 250 ~ Eye
500 750 1000 1250 1500 1750 2000 ZPosition [ mm ]
(a) SAR simulation for normal mode at 100 % duty cycle
22.5
Head
Feet
Whole Body Exposed Body Part Head Adjusted B1^2
20.0 17.5 15.0
40.0 36.0 32.0 28.0 24.0
12.5
20.0
10.0
16.0
7.5
12.0
5.0
8.0
2.5
4.0
0.0 -250
0 250 ~ Eye
[ µT^2 ]
[ W / kg ]
80
0 -250
25.0
Feet
< B1^2 >
Head
90
SAR value / SAR Limit
100
0.0 500 750 1000 1250 1500 1750 2000 ZPosition [ mm ]
(b) STLR values for normal mode at 100 % duty cycle and adjusted B12
Fig. 1. Simulations for a 3 T scanner with a reference B1 field strength of 11.73 µT
3 3.1
Methods Low-Dose Prescan / Move During Scan
Protocols where the patient bed is moving while a scan is performed are currently developed and are already available for testing our position detection algorithm. The resulting image data comes at no extra cost and is not intended for diagnosis
52
A. Keil et al.
as its resolution is quite low. Nevertheless, it is perfectly suited for our application which only requires the data to be good enough for distinguishing different body parts. The image data delivered by MR scans depicts the patient’s anatomy, neglecting environment details such as the patient bed. This is probably the biggest advantage compared to other types of sensors. Therefore, an MR scan is a good choice for determining the patient’s parameters, if it is fast and does not require an SAR estimation itself. However, these two issues are easily overcome by using a low-dose prescan which is performed during a constant and relatively fast movement of the patient bed with a very low dose. Although the resulting image resolution is quite low (64 by 64 pixels per slice, and a slice spacing of 7.5 mm to 15 mm), the image quality is more than sufficient for estimating the desired parameters. It should be reiterated that the data obtained in this manner is not intended for diagnosis but just for parameter estimation. 3.2
Position Detection Methods
We only evaluated slice-based methods, since working on 3D data would restrict the workflow to obtaining a full prescan before starting the actual diagnostic scan. This is time-consuming and therefore needs to be avoided as often as possible. Our goal is to develop an algorithm which only requires prescan data from the body sections that should also be scanned for diagnosis, keeping the workflow as-is. In the following subsections, we will shortly summarize the methods we investigated for detecting the patient’s z position relative to the body coil. All these methods are able to make decisions in real time which is required for a smooth workflow. Area Computations for Thorax and Neck Detection. An obvious measure for classifying slices into body sections is based on slice area computations. The threshold for separating patient from background is easily determined once for all data sets. The derivative of the slice area with respect to the z position (basically the area difference of two successive slices) is a useful measure enabling the localization of the neck section by applying a gradient descent after proper initialization. This initialization is based on the detection of head and thorax that are going to be explained next. A very robust method for detecting the thorax region is based on the fact that the lungs contain air. This results in the lungs having similar intensities as the background. After thresholding, the lungs build a cavity whose area can be calculated easily: Performing a region growing on the background (using seed points on the slice’s top or bottom border), inverting this segmentation, and subtracting the thresholded slice’s area yields the desired measure (see Fig. 2). This approach is very robust and is almost impossible to fail in a clinical setting. Principal Component Analysis. Principal component analysis (PCA), also referred to as Karhunen-Lo`eve expansion, is a well-known method for reducing data’s dimension while preserving the most significant information. This is achieved by computing a new data-specific coordinate system so that the first
Patient Position Detection for SAR Optimization in MRI
(a) Original
(b) Thresholded
(c) Inverted region growing
53
(d) Difference
Fig. 2. Area computations on a 64 × 64 thorax slice
few axes / coordinates cover most of the data’s variance. PCA originated from the early work of Pearson [6] and was first introduced into statistics by Hotelling [7]. It is used in applications as diverse as model reduction for systems of differential equations and face recognition [8,9], the latter one being quite related to our problem. Although being a traditional image recognition and representation technique, PCA is still under investigation. E.g., [10] tries to better adopt PCA to 2D images. In our context, we apply PCA for reducing the image data’s di2 mension before classifying slices s ∈ Rn into classes such as “head slice” or “feet slice”. This has the two advantages that classification can be done much faster and that irrelevant information is neglected, thereby improving classification accuracy. For creating a new, variance-specific image basis, the covariance matrix of a large set of representative slices has to be set up. Computing an eigenvalue decomposition ΣB = DB, for the covariance matrix Σ (where D is a diagonal matrix containing Σ’s eigen2 2 values in descending order), yields the desired new basis B ∈ Rn ×n , whose columns are Σ’s eigenvectors. A representation of a (de-meaned) slice image in this new basis is obtained by projecting it onto the new basis B. For reducing the data’s dimensionality, one may omit eigenvectors corresponding to small eigenvalues and only keep the image’s coefficients that correspond to the first few coordinates in the new system. (This corresponds to deleting columns of B.) The number of these “principal components” to be kept is to be chosen so that all necessary information is retained, while reducing the set of basis vectors as much as possible. A common method for finding this cut-off is to examine the ratios of successive eigenvalues. In our case, PCA is employed to detect basic shapes for distinguishing head and feet slices. We downsampled all slices to 16 by 16 pixels in order to accelerate the learning phase. This also enables us to anticipate even faster (and therefore coarser) prescans and to proof the feasibility of PCA for our classification task. There exist various criteria on how many principal components should be kept. This decision has to exhibit a good balance between covered variances, performance of classification, and storage requirements. Extensive testing showed that this is best achieved by reducing the full 256 dimensional image space to approximately 50 principal components.
54
A. Keil et al.
We did not apply any normalization of the slices to their centers of gravity for two reasons: First of all, patient movement is very limited in the up/down or left/right direction due to gravity and narrow patient beds, respectively. Furthermore, a translation of slices to their centers of gravity would even degrade the results, since slices depicting a single foot for example would become similar to head slices. A normalization would only be reasonable when being performed on the full 3D volume, effectively averting the aforementioned problem. But as mentioned at the beginning of this section, we restrict ourselves to slice processing in order to be able to work on partial scans as well. The actual classification task is then solved by building training sets for the head and feet classes and computing their means s¯H and s¯F as well as their covariance matrices ΣH and ΣF . A new slice can then be classified by comparing the Mahalanobis distances −1 dH/F (s) = (s − s¯H/F )T ΣH/F (s − s¯H/F ) (1) to the means of each of these classes. This is a simple distance classificator with negligible computational cost. Classification of a patient as “head-first” or “feet-first” is not achieved by evaluating just a single slice, but by accumulating Mahalanobis distances along the z direction until a predefined threshold for this trust value is reached. The accumulation uses the difference of Mahalanobis distances and makes decisions much more robust and reliable. We define the head trust of a partial scan consisting of slices s1 , . . . , sk as k d−1 H (s), if dF (s) > dH (s) + Δ TH (k) = tH (si ), with tH (s) = . (2) 0, otherwise i=1 The feet trust is defined analoguously. In empirical studies, a difference threshold of Δ = 10 showed good results.
4 4.1
Experiments and Results Image Data
We acquired whole body scans of 13 test persons. The subjects were positioned in different ways. Some were instructed to put a pillow below their feet and some put their head left of the patient bed’s center line. The resolution used for acquiring the images was 7.5 mm in x and y direction and 7.5 mm to 15 mm in z direction. Each slice originally had 64 by 64 pixels. It has to be reiterated that, before applying PCA, all images were downsampled in x and y direction to 16 by 16 pixels, yielding a resolution of 3 cm. 4.2
Detection Results for Crucial Sections of the Body
In this section, we will summarize the results of our experiments to detect several sections of the human body. For all detections, we had to define thresholds for
Patient Position Detection for SAR Optimization in MRI
55
the corresponding measures. These thresholds were chosen by experience once for all datasets. Head and Feet. For testing the PCA classification, the 13 data sets were divided into 4 learning sets and 9 test sets. After downsampling in x and y direction, applying PCA, and transforming the images into the new space, slices were collected for every class to be trained. After this supervised learning phase, the classification of a data set into “head-first” or “feet-first” was done using the trust values defined in (2). We were able to robustly detect the head and feet of all patients in the test set by making a decision only if one of the two trust values reached a predefined threshold while the other one remained zero. Even persons with a pillow below their feet or with their head not center-positioned were accurately classified, since one of the four training data sets shared these properties. See Figs. 3(a)–3(c) for exemplary distances and trust values of one data set.
300
Norm. Head Distance Norm. Feet Distance
1000
1000 Head Trust Feet Trust
250
800
200
600
600
400
400
200
200
150
800
Head Trust Feet Trust
100 50 0
50
100
(a)
150
200
250
50
100 150 200 Scan direction −>
(b)
250
0
50
100 150 200 <− Scan direction
(c)
250
(d)
Fig. 3. Figs. (a)–(c) show the normalized Mahalanobis distances and the trust values for head / feet first classification. Fig. (d) shows the area measures used for detecting the same patient’s thorax and neck. The x axis corresponds to the slice number.
Thorax. The area comparison described above works very robust and fast in a clinical setting. Since every patient has lungs, the only requirement is that they form a cavity. This is guaranteed unless the patient’s chest is opened during an intervention. However, for testing, we did not instruct the subjects to wear scrubs (as this is usually done in clinical routine) instead of their own clothing. This resulted in artifacts from metal buttons or zips. These artifacts destroyed the topology by canceling the signal in two data sets. In all other data sets the thorax was detected accurately. See Fig. 3(d) for the area measures for one data set. Neck. As stated in section 2.2, the detection of the patient’s neck is crucial since in this body section the confining SAR limit changes from the head limit to one of the other two limits (whole body or exposed body part, depending on the operation mode). Our approach for detecting the neck is based on a combination of three methods: The slice classification for detecting the head position, the region growing on background for detecting the thorax, and a combined
56
A. Keil et al.
area / area gradient analysis for determining the neck slices between head and thorax (see Fig. 3(d)). As long as head and thorax are detected, neck detection works perfectly.
5
Discussion
Currently, we are working on using a broader range of criteria to base our decisions on. This leads us to the addition of a classification system (e.g., a naive Bayes classifier would probably do the job) at the end of slice processing in order to robustly combine more features than just PCA, area difference, and area gradient. This modification may also empower us to detect more body parts than just head, neck, and feet and additionally would be more robust (regarding image artifacts). After that, the detection algorithm could be integrated within the duty cycle management software of MR scanners.
6
Conclusion
In this paper, we described the importance of SAR optimization for a smooth and safe MRI acquisition. Currently, MR imaging requires manual input from the radiology staff and is prone to different errors and omissions. We propose an automatic solution based on the fast and low resolution “move during scan” protocol. The detection and classification methods used provide a simple and robust solution for the detection of patient position and the location of the patient’s neck which is crucial for SAR planning. Results on thirteen test persons validate the approach. This paper presents, to the best knowledge of the authors, the first attempt to automatically detect the patient’s position with respect to the MR scanner and, therefore, enables a new solution towards designing a safe, fast, and smooth workflow for magnetic resonance image acquisition.
References 1. International Electrotechnical Comission: (IEC 60601 medical electrical equipment - part 2-33) 2. Zhai, Z., DeMeester, G.D., Morich, M.A., Harvey, P.R., Kleihorst, R.P.: Optimization of transmit efficiency for a T/R whole body coil at 3T. In: Proc. Int’l Soc. Mag. Reson. Med. Volume 13. (2005) 3. Zhai, Z., DeMeester, G.D., Shvartsman, S., Morich, M.A.: FDTD calculations of B1 -field and SAR for 3T whole body coil. In: Proc. Int’l Soc. Mag. Reson. Med. Volume 10. (2002) 4. Ackerman, M.J.: The visible human project. Proc. of the IEEE 86(3) (1998) 504–511 National Library of Medicine, Bethesda, MD. 5. Diehl, D., Weinert, U., R¨ ockelein, R., Brinker, G.: SAR-calculations of a realistic, scalable human model in a MRI-resonator. In: Proc. Int’l IGTE Symposium on Numerical Field Calculation in Electrical Engineering. (2002)
Patient Position Detection for SAR Optimization in MRI
57
6. Pearson, K.: On lines and planes of closest fit to systems of points in space. Philosophical Mag. (1901) 7. Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. of Educational Psychology 24 (1933) 417–441, 498–520 8. Kirby, M., Sirovich, L.: Application of the Karhunen-Lo`eve procedure for the characterization of human faces. IEEE Trans. PAMI 12(1) (1990) 103–108 9. Turk, M.A., Pentland, A.P.: Face recognition using eigenfaces. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR). (1991) 586–591 10. Yang, J., Zhang, D., Frangi, A.F., Yang, J.y.: Two-dimensional PCA: A new approach to appearance-based face representation and face recognition. IEEE Trans. PAMI 26(1) (2004) 131–137
Symmetric Atlasing and Model Based Segmentation: An Application to the Hippocampus in Older Adults G¨ unther Grabner1,2 , Andrew L. Janke1 , Marc M. Budge2 , David Smith3 , Jens Pruessner1, and D. Louis Collins1 1
McConnell Brain Imaging Centre, Montreal Neurological Institute, Quebec, Canada 2 The Australian National University, Canberra, Australia 3 Oxford University, UK
Abstract. In model-based segmentation, automated region identification is achieved via registration of novel data to a pre-determined model. The desired structure is typically generated via manual tracing within this model. When model-based segmentation is applied to human cortical data, problems arise if left-right comparisons are desired. The asymmetry of the human cortex requires that both left and right models of a structure be composed in order to effectively segment the desired structures. Paradoxically, defining a model in both hemi-spheres carries a likelihood of introducing bias to one of the structures. This paper describes a novel technique for creating a symmetric average model in which both hemispheres are equally represented and thus left-right comparison is possible. This work is an extension of that proposed by Guimond et al [1]. Hippocampal segmentation is used as a test-case in a cohort of 118 normal eld-erly subjects and results are compared with expert manual tracing.
1
Introduction
Non-subjective segmentation of sub-cortical structures from MRI has plagued the automated analysis of large clinical data sets. This is especially true when dealing with cohorts of patients exhibiting widely varying anatomical structure. Methods involving manual tracing suffer from inter- and intra-operator variability [2] and the increase in tracing effort required for high resolution data sets. Automated segmentation allows consistent analysis of structure volumes to be performed thus enabling non subjective estimates of atrophy in cohorts. Model based segmentation relies upon a pre-existing notion or manual delineation of the chosen structure on an average model of anatomy that represents the patient population being studied. The notion of an anatomical model is by no means new, digital atlases of both the human and animal cortices have been available for some time. Early instances of such models include a rat brain [3], a human model from CT data [4], and others [5,6]. Many of these early models were derived from a single subject’s data, and therefore could not represent the anatomical variability present in a population. Toga et al [7] provides a good summary of these recent developments. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 58–66, 2006. c Springer-Verlag Berlin Heidelberg 2006
Symmetric Atlasing and Model Based Segmentation
59
In order to build these models various linear and non-linear registration techniques have been developed to align individual subject data that constitutes the average. These registration techniques also allow labels that have been defined on models to be transformed onto individual patients anatomy by using the inverse of the subject-to-model transformation. Gee et al [8] proposed a system whereby 3D atlases of anatomy could be matched to anatomical images. Both Thompson [9] and Davatzikos [10] proposed a more sophisticated method whereby the cortex of individuals could be modeled via a deformable model. Despite the differences in methodology, the eventual goal of all these methods is to produce an average model that can accurately both show the similarities in the group and exclude the effect of anatomic variance. Wang and Joshi et al [11] expanded upon these methods in order to determine left-right differences of the hippocampus (HC). An inherent problem with the current approach to average modeling used in these methodologies to determine initial struc-ture size is the asymmetry in the cortex. As such, if a structure is outlined in both the left and right sides or left/right comparisons are required in a model, a bias will be introduced if the initial delineations on the model differ in shape or volume. This bias will be made apparent during the fitting of novel data to a model as the individual hemispheres will require differing parameters to achieve a robust fit. In a normal aver-age model of cortical structure this will be the case [12]. Here an approach that annuls this effect without compromising the automatic seg-mentation performance is presented. In this methodology, images are iteratively matched linearly and non-linearly to an evolving model of average structure. This is similar to the procedure described by Guimond [1]. At each stage a new model is built by averaging the current registration results. The technique presented here is different in that at each stage images are matched to the model in both their original and (Left-Right) mirrored orientation. The aims of this methodology were to: 1. 2. 3. 4.
2
Build a high resolution symmetric atlas from a population. Automatically extract sub-cortical structures in a population. Allow left-right volumetric comparisons to be made without bias. Eliminate subjective error.
Methods
For this work, 153 T1-volumetric brain MRI scans from the Oxford Project to Investi-gate Memory and Ageing (OPTIMA) were used. These data-sets were acquired on a 1.5T System with a spatial resolution of 0.86x 0.86 x 2mm. 2.1
Data Pre-processing
The data was converted to the MINC format and intensity corrected using histogram spline sharpening [13]. Image intensity was then clamped between the 0.1st and 99.9th percentile and this range of values is rescaled between 0 and
60
G. Grabner et al.
100. In this case the chosen range is arbitrary. Clamping is performed to reduce image artifacts (typically high-signal background noise) and bring images into approximate intensity alignment. After the first generation model was created, the individual volume intensities were normalized against the model based on the median intensity values [17]. 2.2
Registration Methods
The linear and non-linear registration methods used in this paper are slight modifica-tions of the non-linear matching strategy (mritotal + ANIMAL) proposed by Collins et al [14], the only difference being in the number of iterations used during each stage of non-linear fitting (as shown in Table 1). As proposed by [14], the ANIMAL algo-rithm comprises a hierarchical approach to registration. An initial linear fit is per-formed followed by progressively higher-resolution nonlinear fits. This procedure results in an affine transformation with an associated vector-volume to describe the non-linear component of the fit. The objective function used in the fitting is primarily cross-correlation and the regularization model is linear-elastic. 2.3
Symmetric Model Generation
In iteration 0 each aligned subject data-set is linearly registered to the initial model (ICBM152 - symmetric [packages.bic.mni.mcgill.ca]) in order to bootstrap the model using the hierarchical fitting procedure mritotal. The resulting linear transformation, TA, is flipped about the X-axis and used as the initial transformation for registering the mirror image to the ICBM152 model yielding TF. TF is then flipped about the X-axis in order to be averaged with TA. Afterwards, the averaged transformation is applied to the aligned data-set and the flipped averaged transformation to the mirror image. Then all the resulting data-sets were averaged as well as all transformations. The inverted average scaling factor is then applied to the averaged data-set which generates the next generation model. This is done in order to scale the model into the average size of all datasets used. Iteration 1 performs the same fitting procedure as iteration 0 to the model generated in iteration 0. A general overview of the slightly different non-linear fitting process which starts at iteration 2 is shown in Fig. 1. Here all subject data-sets (aligned and flipped) are non-linearly matched to the model generated in the previous iteration where the linear transformations created in iteration 1 are used as initial transformations. For each subject, the resulting transformations TA and TF are combined by flipping TF and averaging them ignoring the linear component. This averaged transformation is then concatenated with the linear transform (created in iteration 1) and the inverse of the scaling factor described with iteration 0 (Fig. 1. index I). The composed transformation TX is then applied to aligned subject data and a flipped version of the transformation to the flipped data (see Fig. 1. index J). The resulting volumes are then averaged separately (Fig. 1. indexes K); aligned
Symmetric Atlasing and Model Based Segmentation
61
Fig. 1. Schmatic of non-linear fitting and averaging procedure
and flipped. In order to generate the next generation model the average of the aligned transformations (linear component ignored, Fig. 1. index L) must be inverse applied to the average aligned volume (Fig. 1. index M) and respectively to the flipped data. The results of these transformations are averaged (Fig. 1. index N) which represents the next gen-eration model. In this work, the non-linear fitting process was repeated nine times by constantly increasing the registration resolution. Table 1 shows detailed information regarding the registration process. Table 1. Number of iterations used during each stage of non-linear fitting. Iterations 0 and 1 are linear, iterations 2 to 10 are non-linear with a deformation grid resolution from 16mm to 1mm. Iteration 0 1 2 3,4 5,6,7 8,9 10
2.4
Target Model symmetric ICBM152 model generated in iteratoin model generated in iteratoin model generated in iteratoin model generated in iteratoin model generated in iteratoin model generated in iteratoin
16 - 16 8 / [8,4] 4 / [4,2,2] 2 / [2,2] 1 0 1 35 2,3 35 35 4,5,6 35 35 25 7,8 35 35 25 10 9 35 35 25 10 5
Manual Hippocampus Segmentation
The HC was manually traced in 118 of the patient data sets in both hemispheres using the method described by Pruessner et al [15]. In brief the method consists of: aligning all images to ICBM space via a linear transformation to assist with orientation during tracing, isotropic resampling of all data to 1mm and tracing the chosen structure via the use of simultaneous coronal, transverse and sagittal views.
62
2.5
G. Grabner et al.
Comparison; Manually Traced - Automatically Segmented Hippocampi
As all patient data sets have now been non-linearly matched to the final average model-based automated segmentations of regions that exhibit sufficient image contrast such as the hippocampus are possible as per Collins et al [14]. To compare manually traced hippocampi against automatically segmented ones, the first half of 118 manually traced hippocampi is transformed on the non-linear average model (using the subject data to model transformations that were obtained in the last iteration of model generation) to create a HC on the target volume. This HC model is then back-transformed onto the individual patients anatomy of the second half. Kappa correlation is used as a measure of agreement between the manually traced hippocampus and the automatic segmented one for each subject. This process is repeated by transforming the second half of the manually traced hippocampi on the non-linear average and back-transforming the average to onto the individual subjects of the first half.
3
Results
Fig. 2 demonstrates the registration results of two aligned subjects during the progres-sion towards a symmetric average model. In this series of images the shift towards symmetry is very apparent in the later iterations. This effect is first exhibited in larger structures (e.g. ventricles) followed by more local structures (e.g. smaller sulci). The last iterations demonstrate the ability of the non-linear matching methodology to find a mean between different patients anatomy. The effect of anatomical normalization can easily be seen by comparing the anatomy of patient 1 and 2 (Fig. 2) in iterations 1 and 10. After non-linear registration the different anatomies (iteration 1) have become very similar (iteration 10). The effect of normalization of anatomy can also be seen in the associated average images, here the most obvious effect during the iteration towards a symmetric model is the reduction in variance associated with increase in contrast as the model evolves. This effect can be seen in Fig. 3 as a reduction in standard deviation. It should also be noted that in later iterations the mean image contains more anatomical information as more and more structures are matched. This is also reflected in the standard deviation images. Automatic segmentation as described in section 2.5 was compared against expert manual delineation [15] of the hippocampus in 118 non-demented community volunteers. The Kappa mean value (for the process where the average hippocampus created from the second half of the data-set cohort was back-transformed to the individual patient data-sets of the first half) was 0.770 accompanied by a standard deviation of 0.027. For the process which was done in the opposite way, the Kappa mean was 0.789 with a standard deviation of 0.032. However, it must be noted that left/right comparisons in the proposed methodology will be without bias. For this subject cohort, left/right comparison of the hippocampus volume shows no significant difference. Fig. 4 (a) shows the non-liner model in which
Symmetric Atlasing and Model Based Segmentation
63
Fig. 2. Registration results for two patients during a stepwise matching to the evolving model of average structure
the averaged hippocampus is outlined, a randomly selected example for manually tracing is shown in (b) and (c) outlines the correspond-ing automatically segmented hippocampus.
4
Discussion
This paper has presented a method whereby automated model based segmentation of symmetric structures is made possible. This is done by registering complementary original and flipped images to a target simultaneously, this feature is important to ensure bias is not introduced at any stage of the model generation. It is also important to note that during iterations registration begins with each of the images in their initial space. This constraint ensures that no bias is introduced towards a particular interme-diate model during the construction of the final model. It should be noted that this work embodies a small extension to the work done by Guimond on iterative atlasing, there are however two critical differences. The first is that after each interation of model building, the inverse of the average
64
G. Grabner et al.
Fig. 3. Successive, reduction in variance and increase in contrast as the model evolves. Top row: Mean images of 153 averages (aligned data-sets), bottom row: Standard deviation images (aligned data-sets)
Fig. 4. Comparison of manually tracing and automatic segmentation; (a) non-liner model, (b) manually tracing - and (c) automatic segmentation of the hippocampus
transformation is applied to the model, this controls for possible bias in the chosen method of registration. The second is that in this case we are building a symmetric model. Whilst a comparison of the two methods for segmenting substructures would seem a logical test for the new method, the results will be of little utility if only because only the results obtained using the new method will be unbiased towards the left or right sub-structure thas has been chosen. That said, givan that both approaches use the same underlying registration technique, the reults will be very similar. During the generation of the model it was found that a hierarchical approach to fitting as detailed in Table 1 is critical. As such only structures that exhibit high anatomic consistency (eg motor strip, ventricles) across a population are
Symmetric Atlasing and Model Based Segmentation
65
evident in the eventual model. Due to the nature and extent of processing required to build such a model the computing time taken to generate this model is large. Approximately 2 weeks of dedicated processing time on a cluster of 16 Linux machines. Whilst this may seem an inordinate amount of processing time, it must be noted that once a model is built, the time taken to then match a novel subjects anatomy to the model is on the order of minutes. For the chosen registration techinque (ANIMAL) it was found that the numbers of iterations given in Table 1 were optimal for our dataset and thus likely for most elderly T1 data. Clearly this number of iterations will need to be tuned for data that is not of this type. In our case, an iterations fitting was continued untill the Standard Deviation image stabilised for that particular stage. Once this was achieved only then was fitting allowed to progress a finer scale. The utility of a method whereby features can be automatically identified and ex-tracted from cortical data is far reaching but can be problematic [16]. One of these problems is the issue of symmetric bias when comparing structures leftright. The methodologies presented in this paper are intended to enhance the existing methods whereby models are built using non-linear registration with no corresponding impact on accuracy of automated segmentation. The most obvious benefit is the ability to perform comparisons of structures that are by nature asymmetric but posses enough symmetry to allow left-right comparisons to be made.
References 1. Guimond A, Meunier J, Thirion J-P.: Average brain models: A Convergence Study Computer Vision and Image Understanding. 1999 77(2):192-210 2. Filippi M, Horsfield MA, Bressi S, Martinelli V, Baratti C, Reganati P, Campi A, Miller DH, Comi G.: Intra- and inter-observer agreement of brain MRI lesion volume measure-ments in multiple sclerosis. A comparison of techniques. Brain. 1995;118(6):1593-600 3. Toga AW, Samaie M, Payne BA.: Digital rat brain: a computerized atlas. Brain Res Bull 1989 Feb;22(2):323-33 4. Dann R, Hoford J, Kovacic S, Reivich M, Bajcsy R.: Evaluation of elastic matching system for anatomic (CT, MR) and functional (PET) cerebral images. J Comput Assist Tomogr 1989 Jul-Aug;13(4):603-11 5. Greitz T, Bohm C, Holte S, Eriksson L.: A computerized brain atlas: construction, anatomi-cal content, and some applications. J Comput Assist Tomogr 1991 JanFeb;15(1):26-38. 6. Toh MY, Falk RB, Main JS.: Interactive brain atlas with the Visible Human Project data: development methods and techniques. Radiographics 1996 Sep;16(5):1201-6 7. Toga AW, Thompson PM.: Maps of the brain. Anat Rec 2001 Apr15;265(2):37-53 8. Gee JC, Reivich M, Bajcsy R.: Elastically deforming 3D atlas to match anatomical brain images. J Comput Assist Tomogr 1993 Mar-Apr;17(2):225-36 9. Thompson PM, Toga AW.: A Surface-Based Technique for Warping 3-Dimensional Images of the Brain. IEEE Trans on Med Imag. 1996;15(4):1-16 10. Davatzikos C.: Spatial normalization of 3D brain images using deformable models. J Comput Assist Tomogr 1996 Jul-Aug;20(4):656-65
66
G. Grabner et al.
11. Wang L, Joshi SC, Miller MI, Csernansky JG.: Statistical analysis of hippocampal asymme-try in schizophrenia. Neuroimage. 2001 Sep;14(3):531-45 12. Evans AC, Collins DL, Milner B.: An MRI-based stereotaxic atlas from 250 young normal subjects. Society Neuroscience Abstracts. 1992;18:408 13. Sled JP, Zijdenbos AP and Evans AC.: A non-parametric method for automatic correction of intensity non-uniformity in MRI data. IEEE Transactions on Medical Imaging. 1998;17(1):87-97 14. Collins DL, Holmes C, Peters T, Evans A.: Automatic 3D Model-Based Neuroanatomical Segmentation. Human Brain Mapping. 1995;3:190-208 15. Pruessner JC, Li LM, Serles W, Pruessner M, Collins DL, Kabani N, Lupien S, Evans AC.: Volumetry of Hippocampus and Amygdala with High-Resolution MRI and Three-dimensional Analysis Software: Minimizing the Discrepancies between Laboratories. Cerebral Cortex. 2000;10(4):433-442 16. Thompson PM, Woods RP, Mega MS, Toga AW.: Mathematical/computational challenges in creating deformable and probabilistic atlases of the human brain. Human Brain Mapping 2000 Feb;9(2):81-92 17. B. M. Dawant, A. P. Zijdenbos, and R. A. Margolin.: ”Correction of Intensity Variations in MR Images for Computer-Aided Tissue Classification”, IEEE Trans. Med Imaging, Vol. 12, 1993.
Image Diffusion Using Saliency Bilateral Filter Jun Xie1,3 , Pheng-Ann Heng2,3 , Simon S.M. Ho4 , and Mubarak Shah1 1
Dept. of Computer Science, University of Central Florida, Orlando, USA Dept. of Computer Science, The Chinese University of Hong Kong, HK 3 Shun Hing Inst. of Advanced Engineering, The Chinese University of Hong Kong, HK Dept. of Diagnostic Radiology & Organ Imaging, The Chinese University of Hong Kong, HK 2
4
Abstract. Image diffusion can smooth away noise and small-scale structures while retaining important features, thereby enhancing the performances of many image processing algorithms such as image compression, segmentation and recognition. In this paper, we present a novel diffusion algorithm for which the filtering kernels vary according to the perceptual saliency of boundaries in the input images. The boundary saliency is estimated through a saliency measure which is generally determined by curvature changes, intensity gradient and the interaction of neighboring vectors. The connection between filtering kernels and perceptual saliency makes it possible to remove small-scale structures and preserves significant boundaries adaptively. The effectiveness of the proposed approach is validated by experiments on various medical images including the color Chinese Visible Human data set and gray MRI brain images.
1 Introduction To accentuate certain image features for subsequent analysis or display, image enhancement is usually performed by either suppressing the noise or increasing the image contrast. In the early development of image processing, linear filters were used primarily owing to their mathematical simplicity and ease of implementation. However, linear filters such as low-pass filtering [1] tend to blur away the sharp boundaries that help to differentiate large scale anatomical structures. Even in cases where linear filters do not obliterate boundaries, they tend to distort the fine structures of the image and thereby change subtle aspects of the anatomical shapes in question. To alleviate the detrimental effects of linear filters, many efforts have been devoted in the literature for preserving the signal details while eliminating the noise. A powerful approach for denoising is to define a penalty functional that results in the recovered signal as close as possible to the measured signal while maintaining the smoothness except for the salient boundaries. As shown in [2], many well-known approaches, such as the Weighted Least Squares (WLS) [3], Robust Estimation technique (RE) [4] and Anisotropic diffusion (AD) [5], have a fundamental relationship. Similar to the Complex Diffusion [6], all these methods are based on the assumption that local relations between the samples dictate the final result and hence resort to an iterative algorithm. Tomasi and Manduchi proposed an alternative non-iterative Bilateral Filter (BF) [7] which performs smoothing by using both domain and range neighborhoods. This nonlinear approach does not involve the solution of partial differential equations and can be R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 67–75, 2006. c Springer-Verlag Berlin Heidelberg 2006
68
J. Xie et al.
implemented in a single iteration to generate results as good as iterative approaches [8]. Elad [2] shows that the bilateral filter can also be derived from the Bayesian approach using a novel penalty functional. Despite the relationship between the bilateral filter and AD/RE/WLS, the major difference of the former approach is that it considers all the relative neighborhoods in parallel while the latter focuses on the diffusion of connected samples. Although the bilateral filtering is capable of reducing the noise in an image by an order of magnitude while maintaining edges, it is a challenging task to choose an appropriate kernel, for balancing the trade-off between edge maintenance and noise removal. In [9], Ramanath and Snyder proposed a demosaicking approach using an adaptive bilateral filter kernel generated by linear interpolation. In this paper, we propose to enhance the performance of the bilateral filter by replacing the constant kernels with a function decreasing with the boundary saliency. This leads to an adaptive filtering scheme which can average smooth regions with a broad filtering kernel and preserves strong boundaries with a sharp kernel. For each pixel in an image, we define a fixed set of “orientation elements” connecting the pixel to its neighboring pixels. The saliency of boundaries in the original images are evaluated through a novel saliency measure which is determined by the intensity gradient at an element, the interaction of neighboring elements and the curvature changes along a sequence of elements. The curvature gradient factor provides us a measure that can treat different types of curves equally. Another advantage of this novel measure is that it doesn’t depend on the closure criterion, which makes it more suitable for realistic applications. Using this edge feature, we propose the Saliency Bilateral Filter (SBF) which can be applied on both gray and color images for adaptive edge-preserving smoothing.
2 Method The idea behind the bilateral filter is to combine domain and range filtering together, thereby enforcing both geometric and photometric locality. This filter generally smooths signals while preserving steps via a nonlinear combination of nearby sample values. For a 2D image f , the bilateral filter can be described as: ∞ ∞ f (q)Υd (p, q)Υs (f (p), f (q))dq ∞−∞ ∞ h(p) = −∞ (1) Υ (p, q)Υs (f (p), f (q))dq −∞ −∞ d where p and q refer to space variables. Function Υd (p, q) is the closeness function which measures the geometric distance: Υd (p, q) = exp{
−d2 (p − q) } 2σd2
(2)
and Υs (f (p), f (q)) is the photometric similarity function: Υs (p, q) = exp{
−[f (p) − f (q)]2 }. 2σs2
(3)
The diffusion effect of the bilateral filter is controlled by three parameters: the filtering window size N , the geometric spread σs , and the photometric spread σd . Usually,
Image Diffusion Using Saliency Bilateral Filter
69
Fig. 1. ( Left ) Summary of the proposed searching algorithm to compute the novel saliency measure. ( Right ) Schematic explanation of an iteration of the searching algorithm.
the filtering window size is decided by the spatial kernel radius, while the photometric spread σs and geometric spread σd are chosen to achieve the desired amount of combination of sample values and the desired amount of low-pass filtering. 2.1 Curvature Gradient Saliency Measure In this section, we propose to choose the filtering kernel in an adaptive scheme by using perceptual metrics so that the regions with different properties can be treated accordingly. For the measurement of the perceptual saliency of structures in the original image, we propose a novel saliency measure, called Curvature Gradient Saliency (CGS). For each orientation element v on the input image, the L level saliency ΦL (v) is defined to be the maximum saliency over all curves of length L emanating from v. The saliency of a curve Γ (s) of length l (s denotes arc length, 0 ≤ s ≤ l) is defined as
l
ρs g(s)C(s)ds
S(Γ ) =
(4)
0
where ρ is a constant that determines how quickly the contribution of an element to its neighboring elements decays with distance, and g(s) is the normalized intensity gradient of arc Γ (s) defined as: g(s) =
G(s) − Gmin Gmax − Gmin
(5)
with G(s) represents the intensity gradient of Γ (s). Terms Gmax and Gmin are the maximum and minimum values of the gradient map respectively. The purpose of employing the normalized intensity gradient is to guide the expansion of saliency in a smoother manner so that elements with high intensity gradient can extend their saliency farther. Function C(s) in Eq. (4) is the curvature gradient function defined as C(s) = exp(−|∇k(s)|)
(6)
where k(s) refers to the curvature of arc Γ (s). This measure favors significant, long boundaries with constant curvature. Further, it calculates the smoothing factor in local areas providing an efficient way to restrain
70
J. Xie et al.
Fig. 2. Three cases of spatial relationship between the central pixel p and its neighbor q for photometric similarity computation, including two cases ((a) and (b)), where q is away from the salient edge across the central point p and case (c) when q is just located on the salient edge.
the spread of local noise. Unlike most existing methods [10, 11], this measure does not depend on the closure criterion. Since the boundaries are not always closed curves in natural and medical images, the independence of closure criterion makes our measure more genetic for realistic applications. 2.2 Computation of CGS To compute this saliency measure in gradient maps of 2D images, we developed a local searching algorithm which is summarized in Fig. 1(left) and is briefly described as follows. To compute the L level saliency SL (v) of element p, we consider all the elements V = {hi }, i = 1, . . . , N in the 2L ∗ 2L window centered on p. A set of elements is maintained E = {h1 , h2 , . . . , hn } to which we currently know the most salient curve from v as illustrated in Fig. 1(right). Initially, only element v is in set E. Then at each following iteration, we find an element ht from set V − E such that dt =
max
[dj−2 − Ψj−1 ]
hj ∈(V −E)
(7)
where dt is the saliency of the most salient curve from v to ht and dj−2 is the saliency of the most salient curve from v to hj−2 , the previous second element before element hj . It is obvious that all previous elements should have been in E. Ψj−1 is the saliency of the fragment from ht to ht−1 and is defined as Ψj−1 = (ρj−1 gj−1 + ρj gj )Cj−1
(8)
Cj−1 = |Kj−2,j−1 − Kj−1,j |
(9)
with where Kj−1,j is the curvature of fragment from hj−1 to hj . Since ht satisfies Eq. (7), it is guaranteed that all elements of the most salient curve from v to ht must have been in set E. Hence, we put ht into set E as a new element with a known most salient curve from v. After finding the most salient curve for each element in the window, we search for the element hmax with the maximum saliency value among the elements whose lengths lj are equal to L. Then, we define its saliency value dmax as SL (v), the saliency of the central element v.
Image Diffusion Using Saliency Bilateral Filter
71
Fig. 3. The filtering results on a 2D gray MRI brain image. ( a ) The input MRI image. ( b ) An enlarged part. ( c ) The result of bilateral filter. ( d ) The result of adaptive bilateral filter using intensity gradient. ( e ) The result using saliency bilateral filtering.
2.3 Adaptive Filtering Based on CGS After extracting the saliency features using the approach introduced above, each pixel p on the 2D image plane Ω has a saliency measure sp which is defined as the saliency of the greatest boundary across it. Using the basic idea in Eq. (2), the geometric closeness function for SBF can be defined as: (1 + sp )2 ||p − q|| Υd (p, q) = exp − . (10) 2 Further, we follow the region based filtering idea to measure the photometric similarity between two locations using region difference. The region filtering can improve the robustness of the filtering process instead of using the intensity of single pixel. For pixels on salient boundaries, there are two subregions with significantly different appearances besides the boundary. To compute the photometric similarity more accurately, we compare two pixels within the same side of the salient boundary so that the interference of other subregions can be avoided. As shown in Fig. 2, there are three relations of the locations of pixels p with its neighbor q. For both the cases (Fig. 2(a) and 2(b)) where q is not located on the salient edge, we compute the similarity using only the subregion Ωpq containing q. When pixel q is located on the salient edge (Fig. 2(c)), we compare the similarity within the whole filtering window Wp . Thus, the photometric similarity function for our approach is defined as: ⎧ ÊÊ (f (x)−f (y))2 dxdy ⎪ ⎨ Ωpq ÊÊ dxdy , Case 1 & 2 Ωpq D(p, q) = ÊÊ (f (x)−f (11) (y))2 dxdy ⎪ ⎩ Wp ÊÊ , Case 3 dxdy Wp
The range similarity function using saliency measures is then obtained by: Υs (p, q) = exp(
−D(p, q) ) 2σs2
(12)
When σs is set at a larger value (e.g. 100), the range filtering kernel is very broad, and the range component of the filter has little effect for the domain component. In other words, all neighbors have the same weight so that the combined filter acts as a simple domain filter. When σs is small (e.g. 10), only pixels with very similar contexts will be averaged.
72
J. Xie et al.
3 Results In this section, we analyze the performance of the proposed approach on both synthetic and medical images. One of the constants for saliency computation is the maximum curve length L which should be chosen large enough to smooth away small-scale structures and noise, but not too large to corrupt significant features. Another constant is the interaction factor ρ, ranging from 0 to 1. A large value of ρ enables our algorithm to fill in large gaps between edge segments, while a small ρ can restrain the spread of local noise more effectively. For all experiments presented here, we set L = 30, ρ = 0.8 and σs = 30. Figure 3 shows the results on a gray MRI brain image. The bilateral filter was applied with σd = 3 and σs = 30. From the result in Fig. 3(c), we observe that most of small-scale structures have been removed by the bilateral filter but important edges were also blurred. Figures 3(d) and 3(e) show the results using adaptive filtering scheme based on intensity gradient and saliency measure respectively. It is evident that using intensity gradient can improve the preserving of boundaries, and SBF maintains most of the significant boundaries in the original image. Following the same framework in [7], our algorithm can be easily extended to vector-valued images. Figure 4 shows an experiment on a color image (Fig. 4(a)). When using the RGB space (Fig. 4(c)), there are some extraneous colors on the filtering result (e.g., the blurred pink-purple area). This is because the RGB color models are not well suitable for describing colors in terms that are practical for human interpretation. Therefore we make use of the CIELab color model [12] which is based on a large body of psychophysical data concerning color-matching experiments performed by human observers. Thus, the similarity measured in this space correlates strongly with the perception of color discrepancy. As seen
Fig. 4. An experiment on a natural color image. ( a ) The original color image. ( b ) The output of the bilateral filter using the CIE-Lab color space with σd = 1 and σs = 50. ( c ) The output of the adaptive bilateral filter using the RGB color space. ( d ) The output of the SBF approach using CIE-Lab color space.
Fig. 5. A diffusion example on the color CVH images. ( a ) The original color image in the CVH dataset. ( b ) The patch selected as the test image. ( c ) The result of the bilateral filter. ( d ) The result of SBF.
Image Diffusion Using Saliency Bilateral Filter
73
Fig. 6. The brain MR phantom. ( a ) The artificial phantom of the MR head tomogram as a reference image. ( b )The canny edge map. ( c ) The noisy version of (a). ( d ) An enlarged view of the noisy image. ( e ) The canny edge map of the noisy image.
(a)
(b)
(c)
(d)
Fig. 7. The diffusion results of the noisy brain phantom image using different filters including ( a ) the bilateral filter with σd = 1 and σs = 40, ( b ) the Perona-Malik anisotropic filter 100 iterations with diffusivity constant K = 10 and time step of 0.1, ( c ) the complex decomposition using θ = π/15, and ( d ) the proposed SBF. Each result shows the filtered image, an enlargement part of the result and the canny edge map.
above, by measuring the color distance in the CIE-Lab space, both the bilateral filter (Fig. 4(b)) and the SBF approach (Fig. 4(d)) have greatly corrected this undesirable effect. However, it is noticeable that our algorithm retains more details (e.g. the ear) than the bilateral filter. Figure 5 illustrates one of our experiments on the color Chinese Visible Human (CVH) data set. The highest resolution CVH data set is about 1143 GB in size and consists of 18, 200 cross-sectional color digital images (4064 × 2704 resolution, 24 bits per pixel) at 0.1mm intervals. Both the bilateral filter and our algorithm have successfully removed most of shading, while more perceptually important edges were preserved by the SBF filter. Both the methods were implemented using C code. For a color image in the CVH data set, the bilateral filter took 1.4 mins on a 2.8-GHz Pentium IV, and our algorithm needs 3.2 mins including both saliency computation and adaptive filtering.
74
J. Xie et al.
Fig. 8. ( Left ) The intensity difference of each region interior between brain phantom image with the noisy image and the filtering results of different approaches. ( Right ) The Mean-Squared Errors between the brain phantom image with the noisy image and the filtering results of different approaches.
To evaluate the performances of those algorithms quantitatively, we use the stepwise constant MR-head phantom in [13]. It contains five ideally segmented components as shown in Fig. 6(a). Each component is stepwise constant with a given intensity. The input image (Fig. 6(c)) is a noisy version of this reference image with a Gaussian noise of standard deviation σ = 20. Figure 7 shows the filtered results using four algorithms including the bilateral filter, anisotropic diffusion [5], complex diffusion [6] and the proposed SBF algorithm. All the parameters involved in the previous methods were selected by trial-and-error in order to get an optimal result from each method. We compared the filtering results with the original phantom image. Figure 8(left) shows the intensity differences in five region interiors of the filtering results and the stepwise constant reference image. The average difference of interior intensity reflects the similarity of filtered images with the reference image. The statistics show that among the tested four algorithms, the SBF approach produced the closest result to the phantom image. For the whole frame of filtered images, the obtained Mean-Squared Errors are illustrated in Fig. 8(right), where the result of the SBF algorithm hold the lowest overall MSE for this experiment.
4 Conclusions The paper presents a novel diffusion approach based on perceptual metrics. To accommodate regions with different features, the filtering kernels are adjusted based on the saliency of boundaries. The connection between the filtering kernel and perceptual saliency makes it possible to remove noise and preserve salient boundaries adaptively. Moreover, the calculation of the saliency provides an efficient way to accurately measure the photometric similarity. We have shown that the proposed approach can be extended for vector-valued images by combining the CIE-Lab color model and tested the proposed saliency bilateral filter on a variety of synthetic and medical images. The qualitative and quantitative results show convincing advantages in comparison with the bilateral filter and other popular approaches.
Image Diffusion Using Saliency Bilateral Filter
75
Acknowledgement The work described in this paper was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region (Project no. CUHK4461/05M) and CUHK Shun Hing Institute of Advanced Engineering
References 1. R. Deriche, “Fast algorithms for low-level vision,” IEEE Trans. Pattern Anal. Machine Intell., vol. 12, no. 1, pp. 78–87, 1990. 2. M. Elad, “On the origin of the bilateral filter and ways to improve it,” IEEE Trans. Image Processing, vol. 11, no. 10, pp. 1141–1151, Oct. 2002. 3. R. L. Lagendijk, J. Biemond, and D. E. Boekee, “Regularized iterative image restoration with ringing reduction,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 36, no. 12, pp. 1874–1887, 1988. 4. M. J. Black and G. Sapiro, “Edges as outliers: Anisotropic smoothing using local image statistics,” in Scale-Space, 1999, pp. 259–270. 5. P. Perona and J. Malik, “Scale-space and edge detection using anisotropic diffusion,” IEEE Trans. Pattern Anal. Machine Intell., vol. 12, no. 7, pp. 629–639, 1990. 6. G. Gilboa, N. A. Sochen, and Y. Y. Zeevi, “Image enhancement and denoising by complex diffusion processes.” IEEE Trans. Pattern Anal. Machine Intell., vol. 26, no. 8, pp. 1020– 1036, 2004. 7. C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Proc. IEEE Int. Conf. on Computer Vision, Washington DC, USA, 1998, pp. 839–846. 8. D. Barash, “A fundamental relationship between bilateral filtering, adaptive smoothing, and the nonlinear diffusion equation,” IEEE Trans. Pattern Anal. Machine Intell., vol. 24, no. 6, pp. 844–847, 2002. 9. R. Ramanath and W. Snyder, “Adaptive demosaicking,” J. Electron. Imaging, vol. 12, no. 4, pp. 633–642, Oct. 2003. 10. L. Williams and D. Jacobs, “Stochastic completion fields: A neural model of illusory contour shape and salience,” Neural Computation, vol. 9, pp. 849–870, 1997. 11. A. Desolneux, L. Moisan, and J. Morel, “A grouping principle and four applications,” IEEE Trans. Pattern Anal. Machine Intell., vol. 25, no. 4, pp. 508–513, Apr. 2003. 12. G. Wyszecki and W. Stiles, Color Science: Concepts and Methods, Quantitative Data and Formulae. New York, USA: John Wiley and Sons, 1982. 13. I. Bajla and I. Hollander, Empirical Evaluation Methods in Computer Vision. Stockholm, Sweden: World Scientific, 2001, ch. Task-based Evaluation of Image Filtering within A Class of Geometry-Driven-Diffusion Algorithms.
Data Weighting for Principal Component Noise Reduction in Contrast Enhanced Ultrasound Gord Lueck1,2 , Peter N. Burns1,2 , and Anne L. Martel1,2 1
2
Sunnybrook Health Sciences Centre, Toronto, Canada Department of Medical Biophysics, University of Toronto, Canada
[email protected]
Abstract. Pulse inversion ultrasound is a mechanism for preferentially displaying contrast agent in blood vessels while suppressing signal from tissue. We seek a method for identifying and segmenting areas of the liver with similar statistically significant time intensity curves. As a first step in this process, a method of weighting Rayleigh distributed ultrasound image data before principal components analysis is presented. Simulation studies show that relative mean squared error can be reduced by 14% when the correct number of dimensions in selected. Our method is tested on an in vitro ultrasound phantom showing slightly increased error suppression, and is demonstrated on a clinical liver scan, showing decreased correlation between signals in the low intensity range.
1
Introduction
Pulse inversion imaging has been shown to be useful in the diagnosis of several varieties of liver lesions [1]. The introduction of small spheres of encapsulated gas called microbubbles can provide a means of detecting intravascular contrast agent independently of tissue. It is commonly used clinically to differentiate between areas of contrast and tissue to characterize the vasculature of the liver[2]. Liver screening based on contrast enhanced ultrasonography is regularly performed at Toronto General Hospital for those who are at increased risk of developing cancer of the liver. Acquisition is sensitive to the position and motion of the ultrasound transducer and the scanner settings. Diagnosis is also sensitive to the experience of the radiologist and the conspicuity of fine structures in the noisy image. We seek a means of segmenting the arterial and portal blood flow of the liver over time to aid in diagnosis. Multidimensional techniques such as Principal Components Analysis (PCA) and Factor Analysis (FA) have been proven effective at identifying time-intensity curves from series of images [3][4]. The dual blood supply of the liver makes factor analysis a good tool candidate for improving the interpretability of these screenings. Here we outline some of the properties of pulse inverted ultrasound images. Next, we propose a statistically weighted method of performing PCA for purposes of dimensionality reduction. Its effect on an in vitro blood flow study is given, and finally its smoothing effect on in vivo human liver images is demonstrated. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 76–83, 2006. c Springer-Verlag Berlin Heidelberg 2006
Data Weighting for Principal Component Noise Reduction
2
77
Orthogonal Analysis
The first step in most factor analysis studies is to reduce noise while defining the study subspace representing physical changes. PCA performs this function for us by obtaining the directions of maximum total variance among the data. Applied blindly, this technique captures the features of maximum variation, signal or otherwise, in the data sets. Suppose we decompose our data set as follows: D = UV +
(1)
where D is the m × n matrix composed of the measured m images aligned as columns and n pixels (rows). D is factored into two components, a m × f matrix U representing a map of spatial locations, and a f × n matrix V representing relative time intensity curves. n is the number of pixels in an image or region of interest (ROI) and m is the number of measurements taken at each pixel. f is the number of retained principal components (PCs). When f = n, U and V are trivially assigned to perfectly represent the measured data; and is zero. Typically, f is chosen based on the relative contribution to variance as expressed by each successive eigenvalue in the decomposition. It has been argued that PCA may not necessarily capture the relevant variance in non-gaussian distrubuted data sets[5]. In these cases, ignorant orthogonal analysis will capture a biased subset of data. So, a blind PCA operation on this type of ultrasound data will capture the two types of variance: that which is caused from intensity changes due to changes in contrast agent concentration, and that which arises naturally due to speckle at each level of concentration. We seek a method of separating out variance due to concentration changes of contrast agents to more accurately capture the clinically relevant data. In this way, we will eliminate the potential variance bias of PCA reduction. 2.1
Ultrasound Weighted PCA
Microbubbles have a tendency to resonate nonlinearly when exposed to an ultrasound field returning a nonlinear reflected pulse. In pulse inversion imaging, the transducer is configured to send and receive two successive pulses that are inversions of each other. The sum of reflected waves from linearly reflective tissue will produce a null signal, while the same operation applied to the signal reflected by microbubbles will produce a non-zero signal (Table 1). Ultrasound imaging suffers from some fundamental physical limitations affecting the signal-to-noise ratio of resulting images. One particular component of limitation is due to speckle, which is a signal seen in ultrasound images arising from coherent interference from scatterers smaller than the wavelength of the transmission pulse. An ensemble of microbubbles approximately 1 − 5μm in size contributes a signal with speckle characteristics. We are interested in isolating signal changes in the image due to concentration changes of microbubbles.
78
G. Lueck, P.N. Burns, and A.L. Martel
Table 1. Pulse Inversion Imaging: the sum of two linear inverted tissue echoes cancel while the sum of the same nonlinear echoes from bubbles do not Transmitted Pulse Linear Tissue Echo Nonlinear Bubble Echo
Pulse
Inverted Pulse
Sum
2.2
Weighted PCA for Rayleigh Distributed Data
Cochran and Horne showed that there are data transforms that may be applied prior to the PCA operation that can aid in the significance of the re˘amal[7] has written a comprehensive review of methods for sulting PCs [6]. S´ correspondence analysis. Benali et al. used correspondence analysis [5] to split data between a physical model and a statistical model. Keenan and Kotula have recently shown [8] that this type of blind variance separation can be applied to Poisson noise data resulting from surface mass spectroscopy data. These transforms make an assumption about the expected variance of a given error or data element. The variance of each data element dij is factored into two factors: var(dij ) = gi hj
(2)
where gi and hj represent contributions to the variance from as a result of time i at pixel j. Variance due to spatial interference patterns is assumed to be independent of time. Although the expected velocity of the bubbles and the detector is small, it is sufficient to decorrelate them. Cochran and Horne [6] showed that the optimally weighted data matrix DW is DW = (aG)−1/2 D(bH)−1/2
(3)
where a and b are arbitrary constants, and G and H are diagonal matrices containing the elements gi and hj . Scalar constants a and b are added for convenience, as the linear scaling does not affect the variance of the data. A priori information in factor analysis helps decompose our signal into those representing directions of change in the data set. The received signal from a ensemble of nonlinear coherent scatterers can be modeled as a zero mean multivariate normal distribution in the imaginary plane[9]. The magnitude of the envelope-detected signal forms a Rayleigh distribution [10], with β representing
Data Weighting for Principal Component Noise Reduction
79
the variance of each of the underlying normal distributions. The probability distribution function is: ⎧ 2 ⎨ x −x 2β 2 β, x ≥ 0 2e β p(x) = (4) ⎩0 x<0 ˆ mean (ˆ Estimates of parameter β, μ) and variance (ˆ σ 2 ) can be estimated from sample data using the following equations: N 1 π 4 − π ˆ2 2 ˆ μ ˆ=β (6) σ ˆ2 = β (7) βˆ = xi (5) 2 2 2N i=1
We can see from the above equations that the intrinsic variance of the signal can be estimated from the data set provided we have a large number of samples N . We can estimate G and H by using our variance estimates of the Rayleigh distribution outlined in 7. Combining 7 and 5 we obtain ⎛ ⎞2 n n 4 − π ⎝ 1 4−π 1 2 var(di ) = gi = d2ij ⎠ = d 2 2n j=1 4 n j=1 ij Similarly,
⎛ ⎞2 m m 4 − π ⎝ 1 4−π 1 2 2 ⎠ var(dj ) = hj = dij = d 2 2m i=1 4 m i=1 ij
(8) (9) (10)
So that, if we set a = b = (4 − π)/4 then aG =
n 1 diag( d2ij ) n j=0
(11)
m 1 bH = diag( d2ij ) m i=0
(12)
which are simply mean squared value of the rows and columns, respectively. It is now simple to determine the weighting matrix by inverting the square root of the row and column averages, above. DW =
n m 1 (diag( d2ij ))−1/2 D(diag( d2ij ))−1/2 mn j=0 i=0
(13)
The original data set can be estimated from the principal components of the weighted matrix, UW and VW by inverting the weighting matrices U = (aG)1/2 UW and V = VW (bH)1/2 ignoring reconstruction error . This inversion operation is simple as the weighting matrices are diagonal. These relationships will be used to weight our data, perform PCA, and subsequently undo the transform on the resulting components for display.
80
3
G. Lueck, P.N. Burns, and A.L. Martel
Simulated Data
To test the properties of the weighted principal components analysis above, a simulated ultrasound data set was constructed under the assumption of Rayleigh distributed signals. A series of images was constructed with four different areas enhancing with a different, but known time intensity curve. These image areas were multiplied by the corresponding curves as shown in Figure 1.
(a)
(b)
(c)
Fig. 1. Time intensity curves (a) and image areas used. (c) is an image from the simulated data set where each area is Rayleigh distributed.
Four curves are shown, of which only three are linearly independent, so three principal components are expected. To simulate ultrasound image data, the resulting image series was converted into Rayleigh distributed data, with each data point having a mean equal to the original value of the pixel. An image from the set appears is shown in Figure 1(c), after scaling for display. PCA was performed on the simulated noisy data set and the weighted noisy data set as explained above. The result of the PCA was compared to the noise-free data set gener2 ated, and the relative mean squared error (RSM E = Σ(D − Dtrue )2 /ΣDtrue ) was calculated after reconstructing the data for varying dimensionalities.
Fig. 2. Reconstruction error using Principal Components of the two data sets
The reconstruction error from three or more principal components is less for the weighted data than the unweighted data. If two or less principal components are used, the unweighted data produces a better estimate of the original data.
Data Weighting for Principal Component Noise Reduction
(a)
81
(b)
Fig. 3. Image from phantom study bidirectional flow through overlapping tubes (a) and RMS error in the phantom study (b)
The error values actually increase beyond the ideal number of components. This same basic error structure was found for other curves as well; always at a minimum at the real number of principal components, independently of the shape of the curves. At least a 14% improvement in RMS error is found when the minimum number of PCs are retained.
4
Phantom Results
To test this method in vitro, a dilution of microbubbles (Definity, DuPont Pharmaceuticals, Billerica, Massachusetts) in water (approximately 0.1mL/L) was directed through polyethylene tubing suspended in water. Images were captured using using an ultrasound scanner(AplioTM , Toshiba Medical Systems, Tokyo, Japan) in pulse inversion mode. Both tubes were designed to cross in the field of view. Images were obtained with a 6 MHz rectilinear transducer array (PLT 604AT) at a maximum depth of 3 cm with the focus at that depth, MI=0.04, 2DG=95dB, DR=45dB. A sample image was scaled for display and is shown in figure 3(a). The images were processed as outlined above, using both the weighted PCA and unweighted PCA. A region of interest was placed on the image above the features, where no signal is expected. PCA smoothing was performed on the data, and the ability of each method to reduce noise in the ROI was measured by calculating the mean intensity. The results are presented in figure 3(b). The magnitudes in this application were obtained after scanner post-processing of the envelope-detected data.
5
Clinical Application
This technique was applied to clinically obtained contrast enhanced ultrasound scans of the liver. A region of interest was manually delineated around the liver of the patient, and a series images taken over 40 seconds post contrast injection
82
G. Lueck, P.N. Burns, and A.L. Martel
(a)
(b)
(c)
Fig. 4. Late enhancement frame (a) and time intensity curves for a typical background and lesion pixel (b). The loadings (V) of the first two PCs is plotted (c).
were analyzed using the above techniques. Images are reconstructed using 3 principal components. The largest difference between the weighted and nonweighted PCA smoothing can be seen on the images resulting from only 2 PCs. In these images, the weighted PCA smoothes the image more in regions of high intensity. It is clear that using only 2 PCs has left out some important features of the image, most notably the dark region in the centre of the lesion that has been identified as a necrotic core. This is improved when using 3 PCs, and using 4 PCs or more shows few discernible differences between the images. The corresponding time intensity curves show a marked decorrelation in signal from lesion and background. 2 PC Reconstruction
3 PC Reconstruction
3 PC Reconstruction Curves
Fig. 5. The same frame as in figure 5 after PCA smoothing (top) and weighted PCA smoothing (bottom). Time intensity curves for same pixel as in 5(right).
Data Weighting for Principal Component Noise Reduction
6
83
Discussion and Conclusion
There is some agreement between theory and our in vitro experiments where error can be measured. The minimum RMS error occurs at the correct number of principal components. Weighted PC noise reduction only shows a slight difference in the ability to reduce noise in the phantom study. When enough dimensions are retained, our technique has proven to capture noiseless data more accurately. We see this effect in our in vivo study in the ability of the method to more accurately smooth background pixels while maintaining contrast in ’feature’ pixels. We have outlined some of the theory and shown the clinical applicability of our weighted PCA smoothing technique. Although only a first step in a goal of mapping time intensity curves in the liver resulting from hepatic and portal blood supplies, some properties of PCA smoothing have proved useful. We can see that weighted PCA can perform better for Rayleigh distributed signals by removing the inherent variance bias towards larger signal intensities, and thus smoothing more uniformly over the image. Preliminary clinical data indicates that we can improve on the ability to identify subtle lesions in the data that may otherwise have been lost in the background noise.
References 1. Burns, P., Wilson, S., Simpson, D.: Pulse inversion imaging of liver blood flow: Improved method for characterizing focal masses with microbubble contrast. Investigative radiology 35 (2000) 58 2. Wilson, S., Burns, P.: Liver mass evaluation with ultrasound: The impact of microbubble contrast agents and pulse inversion imaging. Seminars in liver disease 21 (2001) 147 3. Williams, Q.: Tissue perfusion diagnostic classification using a spatio-temporal analysis of contrast ultrasound image sequences. Lecture notes in computer science (3565) 2005 4. Martel, A.: Extracting parametric images from dynamic contrast-enhanced mri studies of the brain using factor analysis. Medical image analysis 5 (2001) 29 5. Benali, H.: A statistical model for the determination of the optimal metric in factor-analysis of medical image sequences (famis). Physics in medicine biology 38 (1993) 1065 6. Cochran, R.N., Horne, F.H.: Statistically weighted principal component analysis of rapid scanning wavelength kinetics experiments. Analytical Chem. 49 (1977) 846–853 ˘ amal, M.: Experimental comparison of data transformation procedures for analy7. S´ sis of principal components. Phys. Med. Biol. 44 (1999) 2821 8. Keenan, M.R., Kotula, P.G.: Accounting for Poisson noise in the multivariate analysis of ToF-SIMS spectrum images. Surf. Interface Analysis 36 (2004) 203– 212 9. Wagner, R.F., Smith, S.W., Sandrick, J.M., Lopez, H.: Statistics of speckle in ultrasound b-scans. IEEE Transactions on Sonics and Ultrasonics 30 (1983) 156– 163 10. Goodman, J.W.: Some fundamental properties of speckle. Optical Society of America, Journal 66 (1976) 1145–1150
Shape Filtering for False Positive Reduction at Computed Tomography Colonography Abhilash A. Miranda , Tarik A. Chowdhury, Ovidiu Ghita, and Paul F. Whelan Vision Systems Group, Dublin City University, Ireland
[email protected]
Abstract. In this paper, we treat the problem of reducing the false positives (FP) in the automatic detection of colorectal polyps at Computer Aided Detection in Computed Tomography Colonography (CAD-CTC) as a shape-filtering task. From the extracted candidate surface, we obtain a reliable shape distribution function and analyse it in the Fourier domain and use the resulting spectral data to classify the candidate surface as belonging to a polyp or a non-polyp class. The developed shape filtering scheme is computationally efficient (takes approximately 2 seconds per dataset to detect the polyps from the colonic surface) and offers robust polyp detection with an overall false positive rate of 5.44 per dataset at a sensitivity of 100% for polyps greater than 10mm when it was applied to standard and low dose CT data.
1
Introduction
According to the World Health Report [1], colorectal cancer caused approximately 622,000 deaths in 2002. It is understood that the risk of colon cancer could be reduced considerably if growths in the colon called polyps are removed before they become cancerous. CAD-CTC is being developed as a reliable technique for early detection of premalignant polyps useful in mass screening of risky individuals. It involves the analysis of the images of the colon obtained using a Computed Tomography (CT) examination of the abdominal region. Computer vision is used for the automatic detection of potential polyps by analyzing the three-dimensional model of the colon obtained by combining the two-dimensional axial images obtained from the CT examination. Many methods developed for CAD-CTC suffer from severe false-positive (FP) findings owing to the confusion in correctly distinguishing polyps from other colonic shapes such as folds, residual materials, etc. In this paper, we propose a novel method to accurately discriminate the polyp shapes from other colonic shapes (folds) by analysing a reliable shape distribution function (SDF) in the frequency domain. After a detailed analysis of the SDF in the frequency domain, we developed a novel technique that addresses the polyp−non-polyp classification as a shape filtering problem. In order to evaluate
Corresponding author.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 84–92, 2006. c Springer-Verlag Berlin Heidelberg 2006
Shape Filtering for False Positive Reduction at CT Colonography
85
the validity of the proposed technique we have analysed its performance when applied to a large number of synthetic (scanned with standard and low radiation dose) and real CT datasets (scanned at a standard radiation dose). This paper is organised as follows. In section 3, the method designed to extract the candidate surfaces from the CT data is briefly explained. In section 4, we introduce the SDF of an extracted colonic surface and discuss its properties. The next two sections discuss the properties of the polyp and non-polyp SDFs and explain the motivation for analysing the SDF in the frequency domain in terms of its power spectral density. In section 5.1, we detail the implementation of the novel shape filtering scheme developed for polyp−non-polyp classification. In the concluding section, we present the results obtained on the large number of real patient datasets on which the shape filtering technique was tested.
2
Materials and Method
In order to perform bowel cleansing, the patients were instructed to follow a low-residue diet for 48 hours, followed by the ingestion of clear fluids for 24 hours. Before the examination, the colon is gently insufflated with room air at the maximum level tolerated by the patient and the CT data acquisition was performed on a Siemens-Somatom 4-slice multi-detector spiral CT scanner by the radiologists at the Mater Hospital, Dublin, Ireland. The scanning parameters in use are 120kVp, 100mAs tube current, 2.5mm collimation, 3mm slice thickness, 1.5mm reconstruction interval, and 0.5 gantry rotation. The scanning is performed in a single breath-hold and the acquisition time is in the interval of 20 to 30 seconds depending on the body-size of the patient. The scanning is performed first with the patient in the supine position and then repeated with the patient in the prone position. Typically the number of slices varies from 200 to 350 and the total size of the volumetric CT data is approximately 150MB.
3
Candidate Surface Extraction
The candidate surface extraction method consists of two main stages. In the first stage, the colonic wall is identified in the CT data based on the high contrast between the gas insufflated regions and the colon tissue. In this way, the gaseous region can be successfully identified by using a simple region growing algorithm where the colonic wall is defined by the voxels that are adjacent to the colon tissue. The threshold value for region growing algorithm was set to −800 HU [2] to ensure that only the voxels inside the colon (gas filled volume) are segmented. The second stage of the algorithm attempts to identify the convex surfaces from the colon wall by analysing the intersections of the normal vectors which are calculated using the Zucker-Hummel operator [3]. The normal intersections are recorded for a number of Hough points that are uniformly distributed from 2.5 mm to 10 mm in the normal direction for each voxel of the colon wall. The normal intersections are recorded into a 3D histogram and the candidate surfaces are generated by the Hough points that have more than 5 intersections in the 3D histogram. In order to
86
A.A. Miranda et al.
eliminate the voxels associated with spurious non-convex surfaces an additional convexity test is performed [2],[4]. During the candidate surface processing, the centre-point and the radius of each candidate cluster is calculated by finding the sum of weighted Gaussian distances dG for each point of the surface (full details about the implementation of this algorithm can be found in [2]).
4
Shape Distribution Function
Robust polyp detection is very difficult to achieve since the polyps and other convex colonic structures such as folds have a large number of shapes and sizes. Typically the polyps can be assumed to have a spherical shape where the nominal model for folds is cylindrical. Based on this geometrical characterization, many researchers attempted to perform the polyp/fold classification using geometrical features that are extracted from candidate surfaces [5][6][7]. Unfortunately very often the polyp and fold surfaces present very subtle shape differences as illustrated in Fig. 1 and as a result the geometrical approaches return high sensitivity in polyp detection but at the expense of an increased number of false positives.
Fig. 1. Images illustrating some examples of the extracted surfaces for polyps (first row) and folds (second row)
In the next section we detail the shape distribution function and describe its properties that make it suitable for robust polyp−non-polyp classification. 4.1
Definition and Properties
For a given candidate colonic surface, the computation of the shape distribution function that we use in this paper can be explained as follows. Given the coordinates of the n voxels defining the candidate surface U and the Gaussian center v of the voxels [4] associated with the candidate surface, we find the n−length array w which lists the Euclidean distances between each of the n surface voxels and the Gaussian center v of the candidate surface U. We call the histogram x of the entries in w as the shape distribution function(SDF) of the candidate colonic surface U, which is a distribution function of the relative distances of the surface voxels of the candidate colonic surface.
Shape Filtering for False Positive Reduction at CT Colonography
87
The SDF x is simple in definition and computationally efficient. Also, x is robust to characterise the shape of the candidate surface when the voxel data is sparse or affected by noise. Since x is invariant to rotation and translation of the surface, it has a very important advantage that it is dependent only on the colonic shape and independent of the colonic orientation. The SDF used here is similar to the shape distribution function D1 used for object recognition [8]. Another important benefit in using the SDF is the significant reduction in the dimensionality that is achieved by transforming 3D data (U) to a 1D timeseries (x) without losing interesting shape cues. This dimensionality reduction, as discussed later, reduces the complexity of the analysis thus making it suitable for real-time operation.
5
Analysis of the SDFs Using the Autocorrelation Functions
Some of the observed general characteristics and behaviours of the 3−neighbour averaged samples SDFs of polyps and non-polyps illustrated in Fig. 2 are as follows. The SDF of a polyp is, in general, a smooth curve having a single and global maxima. Conversely, a non-polyp SDF is noisy and might have single or multiple maxima, which might be similar to that of polyps, but whose location or variance near the neighbourhood of the maxima are random. Non−polyp Shape Distribution Functions
Number of Voxels
Polyp Shape Distribution Functions 35
30
30
25
25
20
20
15
15 10
10
5
5 0
0 15
20
25 30 35 Distance Bins
40
45
50
20
30 40 Distance Bins
50
60
Fig. 2. SDFs of polyps and non-polyps surfaces from real-datasets shows significant differences in smoothness and maxima characteristics
We now briefly describe the observed characteristics of the SDF and evaluate the need to analyse the autocorrelation functions of the SDF for obtaining the information that differentiates the polyp surfaces from non-polyp surfaces. Although the SDFs for polyp and fold surfaces depicted in Fig. 2 offer the primary discrimination between these surfaces, there is no robust directly recognisable shape for the SDF curve for any one of the classes. As a result, we cannot apply straight-forward SDF shape matching technique to classify polyps from non-polyps. In addition, since there are multiple maxima with unpredictable variances in their neighbourhoods for a non-polyp SDF, techniques based on probability density function fitting or variance-based feature extraction will be less efficient in finding optimal patterns to classify non-polyp surfaces correctly.
88
A.A. Miranda et al.
Since the polyp and non-polyp SDFs are generated by unknown probability distributions and appear to be corrupted by noise, there is an uncertainty in the location and variance of the maxima in the SDFs. Hence, it is necessary to compare each of the SDF time-series x with a delayed version of itself and others. This can be done by analysing the autocorrelation functions Rxx of the SDFs. Moreover, in order to study the effects of noisy frequency components on the variance (power) of the SDF, it is necessary to see the contribution of the various frequency components of x to the total variance. According to the WienerKhinchin theorem, the power spectral density Sxx of a signal x is the Fourier transform of Rxx . In practice, for computational purposes, the Sxx is calculated using Fast Fourier Transform methods. 5.1
Power Spectral Density of SDFs
order to understand the distribution of the variance of the SDFs in the frequency domain, we analyse the power spectral density of the SDFs on which we make certain interesting observations which help us to design a robust shape filtering scheme as explained in this section. Let x = x0 , . . . , xM−1 be the time-series corresponding to an SDF. The sam −i2πkl M ples of its discrete Fourier transform is given by Xl = M−1 so that k=0 xk e ∗ the power spectral density of SDF is given by Sxxl = Xl Xl , the product of Xl and its conjugate and normalised in accordance with the desired resolution of Xl . We used a 128-point FFT to compute the power spectral density Sxx of the SDFs and we noticed that Sxx of the fold surfaces attenuate much quicker than those associated with polyps. Our experiments indicate that the frequency fdB at which Sxx reaches 12.5% of its power at DC frequency offers a robust discrimination between the polyp and fold surfaces as illustrated in the Fig. 3. Non−polyp Power Spectral Density S
Polyp Power Spectral Density S
xx
xx
2000
Sxx in (num of voxels/Hz)
2
400
300
1500
200
1000
100
500
0
0 0
5 10 Frequency in Hz
15
0
5 10 Frequency in Hz
15
Fig. 3. Sxx for the Shape Distribution Functions used in Fig.2 shows faster attenuation for Non-polyps than Polyps
Thus the classification criterion can be defined as follows: an SDF belongs to the polyp class if its fdB is higher than an experimentally selected threshold frequency fth . The threshold-based rejection rule mentioned above works very well for small and medium sized polyps but it rejects very large polyps whose SDFs are not
Shape Filtering for False Positive Reduction at CT Colonography
89
very different from those associated with folds. Based on observations on an experimental set of samples, we find that for certain large polyps fdB lies in an uncertain classification region fmin < fth < fmax . References [4] and [2] have shown that the sum of the weighted Gaussian distances dG of the voxels that define the candidate surface offers robust discrimination between the surfaces associated with large polyps and those derived from folds. Based on the experimentation with various sizes of polyps having a weighted sum of Gaussian distances, d1 and d2 , corresponding to fmin and fmax , we devised a technique to compute adaptively the classification threshold frequency fth according to fmin −fmax the following linear function: fth = fmax + (dG − d2 ) d1 −d2 . In this way, a SDF belongs to a polyp surface if its fdB is higher than the computed threshold frequency fth , otherwise it is classed as a fold surface.
6
Experiments and Results
In this section we discuss the tests that we carried out using the shape filtering scheme and present the results obtained which we compare with other established CAD-CTC techniques. The shape filtering CAD-CTC technique was tested on 61 real datasets obtained as part of the clinical study in conjunction with the Mater Hospital, Dublin, Ireland. The shape filtering technique was also tested on custom-built phantom datasets [9] which contain a large number of synthetic structures which replicate accurately the real polyp shapes encountered in clinical studies. One of the main concerns associated with CTC as a mass screening technique for colorectal polyp detection is the patient exposure to ionising radiation. A reduction in the radiation dose will increase the level of noise in the CT data and our aim is to fully evaluate the impact of the radiation dose on the performance of our CAD-CTC system. In this regard, we have used in our experiments a synthetic phantom that have 47 synthetic polyps that have different shapes (sessile, pedunculated and flat) and sizes (3 to 18mm). In our experiments we used the phantom CT data that has been obtained for the following radiation doses: 100mAs, 60 mAs, 20mAs, and 13mAs. The results of the shape filtering CAD-CTC for the Phantom Data [9] are summarised in Tables 1 to 4. The experimental results indicate that the radiation dose does not have a significant impact on the performance of our polyp detection method since the sensitivity for clinical relevant polyps (larger than 10mm) is not affected. A small reduction in sensitivity for polyps in the range 5-10mm has been noticed when the data has been scanned with 20mAs radiation dose. Table 5 shows the performance of the shape filtering technique when applied to 61 real patient datasets. These results were validated by our clinical partners from the Mater Hospital, Dublin, Ireland and indicate that the technique achieved 100% sensitivity for detection of polyps larger than >10mm which are the most important features to be examined in clinical studies. The experimental data also show that there is a high true positive rate for polyps with sizes ranging between 5mm - 10mm where a sensitivity of 81.25% is achieved in spite of some polyps
90
A.A. Miranda et al. Table 1. Phantom Data (100 mAs)
Table 2. Phantom Data (60 mAs)
Polyp Total True Sensitivity Type Polyps Positives % ≥ 10 mm 14 14 100 [5, 10) mm 19 19 100 < 5 mm 5 4 80 Flat 9 2 22.22 Total 47 39 83.97 Phantom Dataset(100 mAs): FP = 1
Polyp Total True Sensitivity Type Polyps Positives % ≥ 10 mm 14 14 100 [5, 10) mm 19 19 100 < 5 mm 5 4 80 Flat 9 3 33.33 Total 47 40 85.11 Phantom Dataset(60 mAs): FP = 1
Table 3. Phantom Data (20 mAs)
Table 4. Phantom Data (13 mAs)
Polyp Total True Sensitivity Type Polyps Positives % ≥ 10 mm 14 14 100 [5, 10) mm 19 17 89.47 < 5 mm 5 3 60 Flat 9 2 22.22 Total 47 36 76.60 Phantom Dataset(20 mAs): FP = 0
Polyp Total True Sensitivity Type Polyps Positives % ≥ 10 mm 14 12 85.71 [5, 10) mm 19 18 94.74 < 5 mm 5 3 60 Flat 9 1 11.11 Total 47 34 72.34 Phantom Dataset(13 mAs): FP = 1
Table 5. Real Datasets (100 mAs) Polyp Total True Sensitivity Type Polyps Positives % ≥ 10 mm 10 10 100 [5, 10) mm 32 26 81.25 < 5 mm 104 62 59.62 Mass 11 7 63.64 Flat 2 1 50 Total 159 106 66.89 FP/ Real Dataset = 5.44
having very complex shapes. The false positives returned by our shape filtering method stands at an average of 5.44 per dataset and compares well with the techniques that will be examined in the next section. The method detailed in this paper has not been developed to primarily detect the flat polyps since these polyps have many geometrical characteristics that are similar with those of the folds. High sensitivity for flat polyps detection cannot be achieved by using methods that attempt to discriminate the polyp and fold surfaces based on global shape models and the flat polyps should be approached as a distinct category of polyps. The average time for surface extraction per dataset for processing each volume of data is approximately 80 seconds on a Pentium-IV 3-GHz processor machine with 1GB memory. The computation of the power spectral density Sxx and the shape-filtering process takes in average 2 seconds per dataset when the algorithm was executed on the same machine. 6.1
Comparison with Established Techniques
In order to evaluate the validity of our shape filtering CAD-CTC we compare its performance with the results in polyp detection obtained by other relevant
Shape Filtering for False Positive Reduction at CT Colonography
91
documented techniques. Though they were not tested on the same datasets, the reported sensitivity and false positive per dataset (FP) should be indicative of their relative robustness in polyp detection. For polyps larger than 10mm, the CAD-CTC techniques evaluated in this section present the following results. The method developed by Summers et al. [5] which uses surface curvature information gives a sensitivity of 29% − 100% with a varying FP rate (6 − 20) depending on the scanning parameters. The method proposed by Vining et al. [6] based on surface extraction and curvature analysis recorded a 73% sensitivity with a very inconsistent FP rate in the range 9 − 90. Yoshida et al. [7] used various shape indices (cup, rut, saddle, ridge, cap) and curvedness values on small volumes of interest in conjunction with fuzzy clustering. They report a sensitivity of 89% with an FP of 2.0 which increased by a factor of 1.5 when sensitivity was increased to 100%. For polyps larger than 5mm, the technique developed by Kiss et al. [4] based on surface normal distribution and sphere fitting achieves a sensitivity of 90% with a FP of 2.82. It is useful to note that this performance is obtained when they applied their method to high resolution CT data (0.8mm reconstruction interval). We have evaluated the performance of our shape filtering technique on CT data with a lower resolution (1.5mm reconstruction interval) where the problems caused by the partial volume effects are more pronounced. The SDF based shape filtering technique that we presented in this paper, achieved 100% sensitivity for polyps greater than 10mm and 81.25% sensitivity for polyp sizes between 5 − 10mm with an overall FP of 5.44. The performance in polyp detection of our polyp detection technique compares well with the performance returned by the CAD-CTC systems evaluated in this section.
7
Conclusion
We have designed and implemented a simple, stable and robust CAD-CTC technique which attains better results than other published CAD-CTC techniques. Currently, the false positive rate stands at an average of 5.44 FPs per dataset that can be further reduced by fine tuning the technique to increase the detection rate for flat polyps. For this category of polyps, the developed shape filtering technique achieved the lowest sensitivity. By taking the 3D shape recognition into the Fourier domain analysis, we were able to simplify the problem with significant reductions in processing time. We envisage further extensions to the shape-distribution based shape recognition technique to further improve the performance in polyp detection for small and medium polyps.
Acknowledgments We acknowledge the contributions of our clinical partners in this project: Dr. Helen Fenlon (Dept. of Radiology) and Dr. Padraic MacMathuna (Gastrointestinal Unit) of the Mater Hospital, Dublin. This work was supported under an Investigator Programme Grant (02/IN1/1056) by Science Foundation Ireland.
92
A.A. Miranda et al.
References 1. World Health Organisation: The World Health Report 2004. WHO (2004) 2. Chowdhury, T., Ghita, O., Whelan, P.: A Statistical approach for Robust Polyp Detection in CT Colonography. Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology Society 27 (2005) 3. Zucker, S.W., Hummel, R.A.: A Three-Dimensional Edge Operator. IEEE Transactions on Pattern Analysis and Machine Intelligence 3 (1981) 324–331 4. Kiss, G., Cleynenbreugel, J., Thomeer, M., Suetens, P., Marchal, G.: Computer Aided Diagnosis for Virtual Colonography. MICCAI (2001) 621–628 5. Summers, R., Johnson, C., Pusanik, L., Malley, J., Youssef, A., Reed, J.: Automated Polyp Detection at CT Colonography: Feasibility Assessment in a Human Population. Radiology 216 (2000) 284–290 6. Vining, D., Hunt, G., Ahn, D., Steltes, D., Helmer, P.: Computer-Assisted Detection of Colon Polyps and Masses. Radiology 219 (2001) 51–59 7. Yoshida, H., Masutani, Y., MacEneaney, P., Rubin, D., Dachman, A.: Computerized Detection of Colonic Polyps at CT Colonography on the Basis of Volumetric Features: Pilot Study. Radiology 222 (2002) 327–336 8. Osada, R., Funkhouser, T., Dobkin, D.: Matching 3D Models with Shape Distributions. International Conference on Shape Modeling and Applications (2001) 154–166 9. Chowdhury, T., Sadleir, R., Whelan, P., Moss, A., Varden, J., Short, M., Fenlon, H., MacMathuna, P.: The Impact of Radiation Dose on Imaged Polyp Characteristics at CT Colonography: Experiments with a Synthetic Phantom. Technical report, Association of Physical Scientists in Medicine, Annual Scientific Meeting, Dublin (2004)
Evaluation of Texture Features for Analysis of Ovarian Follicular Development Na Bian1 , Mark. G. Eramian1 , and Roger A. Pierson2 1
2
Department of Computer Science, University of Saskatchewan
[email protected],
[email protected] Department of Obstetrics, Gynecology and Reproductive Sciences, University of Saskatchewan
[email protected]
Abstract. We examined the echotexture in ultrasonographic images of the wall of dominant ovulatory follicles in women during natural menstrual cycles and dominant anovulatory follicles which developed in women using oral contraceptives (OC). Ovarian follicles in women are fluid-filled structures in the ovary that contain oocytes (eggs). Dominant follicles are physiologically selected for preferential development and ovulation. Statistically significant differences between the two classes of follicles were observed for two co-occurrence matrix derived texture features and two edge-frequency based texture features which allowed accurate distinction of healthy and atretic follicles of similar diameters. Trend analysis revealed consistent turning points in time series of texture features between 3 and 4 days prior to ovulation coinciding with the time at which follicles are being biologically “prepared” for ovulation.
1
Introduction
Ovarian follicles are fluid-filled structures in human ovaries which contain the oocytes (eggs). Early studies of follicular activity have relied on measurement of reproductively active hormones; Leutenizing Hormone (LH), follicle stimulating hormone (FSH), estradiol (E2), and progesterone [6, 11]. Recent studies [1, 3, 5] have used high resolution transvaginal ultrasonography to visualize follicular activity. Ultrasonography is non-invasive and permits sequential analyses, therefore broadening research approaches to the study of folliculogenesis. A normal dominant follicle is destined to ovulate and produces increasing amounts of E2 as it develops. In the final phase of growth, a rapid increase in E2 levels trigger an acute increase of LH that causes ovulation approximately 38h later. Subsequently, the collapsed follicle undergoes structural and functional transformation to become the corpus luteum. Peak E2 concentrations occur on the day of, or one day prior to, ovulation [1]. Exploration of follicular development during oral contraceptive (OC) use is a current area of interest. Dominant follicles grow to pre-ovulatory sizes under the influence of OC but most do not ovulate [3]. It is difficult to visually differentiate ovulatory follicles in natural menstrual cycles from anovulatory follicles in OC users with ultrasonography although the hormonal milieu is quite different. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 93–100, 2006. c Springer-Verlag Berlin Heidelberg 2006
94
N. Bian, M.G. Eramian, and R.A. Pierson
Fig. 1. Selection of follicle wall. (a) Original image, (b) isocontours of (a), (c) pixels with intensity between selected contours H and L, (d) homogeneous follicle wall region segmented from (c) using EDISON [7].
The purpose of our study was to identify texture features which can be used to 1) distinguish between dominant follicles in women during natural cycles and women using OC, and 2) determine when during their development these two classes of follicles begin to exhibit differences.
2
Materials and Methods
Our data set consisted of 15 temporal series of ultrasonographic images of dominant follicles acquired on a semi-daily schedule. Eight series were from women with natural menstrual cycles [1] and seven were from women using OC [2]. Each series from natural cycles contained six to eight images beginning 7 days prior to ovulation and ending on the day of ovulation. Each series from OC cycles consisted of 3 to 8 images beginning 7 days before the E2 peak. The OC series were aligned with the natural cycle series by synchronizing E2 peaks with the days of ovulation. All images were acquired using the same equipment and techniques [1, 3]. In total, there were 57 images in the natural cycle data set and 34 images in the OC cycle data set. All images were 640 × 480 pixel arrays with a dynamic range of 256 gray levels. The fluid-filled interior of follicles exhibited echo-responses close to the level of background noise (the dark area in Fig. 1(a)). Therefore we attempted to measure differences the echotexture of the follicle walls. The follicle wall is a 1-2mm thick area surrounding the fluid-filled interior. Our method consisted of preprocessing stages which included selection of the follicle wall regions and greylevel adjustment, followed by computation of features and their analysis.
3 3.1
Preprocessing Follicle Wall Selection
Manual and automatic contouring were used to select follicle wall regions. Isocontours were computed and contours corresponding to the approximate inner and outer edges of the follicle wall were manually selected (e.g. contours H and
Evaluation of Texture Features
95
L in Figure 1(b)). The greylevels of the contours were used as upper and lower thresholds for semi-band-thresholding of the original image (Figure 1(c)). Figure 1(d) was obtained from that in Figure 1(c) using the EDISON System v1.1 (Edge Detection and Image SegmentatiON) [7] which used user-defined thresholds to segment the original image into homogeneous regions with respect to mean intensity. The thresholds used to segment each image were manually selected and differed from image to image. The resulting follicle wall images were black in background while the gray levels of follicle walls remained unchanged. The regions obtained were not always contiguous because portions of the follicle wall become very thin at the apex as the interval to ovulation decreases. This does not present a problem when computing features as one simply excludes zero-intensity pixels. 3.2
Gray Level Adjustment
We subtracted from each pixel the mean intensity of a 3 × 3 square neighborhood about the pixel, following the preprocessing procedure from [4]. Though the authors of [4] used an 11 × 11 neighborhood, we used 3 × 3 since our regions of interest were only a few pixels wide. Background pixels were not considered when they appeared within the neighborhood of a pixel and did not contribute to the average. The resulting pixel values were shifted into the interval [0, M ] using an additive constant and then normalized into the interval [0, 255].
4
Extraction of Texture Features
We chose to investigate only directionally invariant features since it is not anatomically possible monitor each patient in the same orientation consistently. Eight texture features, described in the following subsections, were computed for each follicle wall image. 4.1
Co-occurrence Features
Haralick’s Graylevel co-occurrence matrices (GLCM) [8] describe the joint graylevel histogram of the image. Homogeneity, correlation, contrast and energy were chosen from the 14 features that Haralick defined. A displacement distance of one pixel was selected since our regions of interest were, in some places, only a few pixels wide. The four symmetric co-occurrence matrices for the 0◦ , 45◦ , 90◦ , and 135◦ directions were computed. Prior to normalizing the matrices, the first row and column of each matrix were replaced with zeros to eliminate cooccurrences between foreground and background pixels. The resulting matrices were normalized and the four texture features computed from each matrix. For each feature, the four values from each direction were averaged to obtain four rotationally invariant features for each image.
96
4.2
N. Bian, M.G. Eramian, and R.A. Pierson
Edge Frequency Features
We computed edge density and edge contrast as representatives of texture features based on edge frequency measures [14]. Edge contrast is the average edge magnitude of a region where edge magnitudes were calculated using a 3 × 3 Laplacian operator. Edge density is the proportion of pixels with “strong” gradient magnitudes. We approximated the gradient as a function of distance d. Let M and N be pixel dimensions of the image I. The gradient magnitude image gd of I(i, j) given distance d is: gd (i, j) = |I(i, j) − I(i + d, j)| + |I(i, j) − I(i − d, j)| +
(1)
|I(i, j) − I(i, j + d)| + |I(i, j) − I(i, j − d)|. The edge density texture descriptor g(d) is the mean of gd (i, j). Our analysis proceeds with d = 1. This scheme avoided the problem of selecting appropriate thresholds for measuring proportion of pixels with “strong” gradient magnitude. 4.3
Laws’ Measures
Laws’ measures compute a set of energy statistics for each pixel in a texture image from which global texture features are derived. We computed the features E5L5, L5E5, E5S5 and S5E5, as denoted by Laws [10]. To obtain rotationally invariant features the symmetric energy measures were combined, via averaging, to obtain two features: E5L5/L5E5 and E5S5/S5E5.
5
Selection of Discriminator Features
A series of experiments were carried out to determine if any of our candidate features were good discriminators of the two classes of follicles. 0.3
0.8 0.75
0.25
Homogeneity
Energy
0.7
0.2
0.15
0.65 0.6 0.56
0.55
Anov1 Anov2 Anov3 Anov4 Anov5 Anov6 Anov7 Ov1 Ov2 Ov3 Ov4 Ov5 Ov6 Ov7 Ov8 linear separation
0.5 0.45
0.1
0.4 0.35
0.05 -7
-6
-5
-4
-3
-2
day
(a) energy
-1
0
-7
-6
-5
-4
-3
-2
-1
0
day
(b) homogeneity
Fig. 2. Co-occurrence features of follicle wall images. Anov1 to Anov7 were from OC cycle group, Ov1 to Ov8 were from the natural cycle group.
Figures 2 and 3 represent computed texture feature values for each of our 15 image series. The horizontal and vertical axes, respectively, denote cycle day
50
50
45
45
40
40
Edge Contrast
Edge Density
Evaluation of Texture Features
35
30
25
20
97
Anov1 Anov2 Anov3 Anov4 Anov5 Anov6 Anov7 Ov1 Ov2 Ov3 Ov4 Ov5 Ov6 Ov7 Ov8 linear separation
35
30
25
20
15
15
10
10 -7
-6
-5
-4
-3
-2
-1
day
(a) edge density
0
-7
-6
-5
-4
-3
-2
-1
0
day
(b) edge contrast
Fig. 3. Edge frequency features of follicle wall images. Anov1 to Anov7 were from the OC cycle group, Ov1 to Ov8 were from the natural cycle group.
(where day 0 is day of ovulation/E2 peak) and texture feature value. Figures 2(a) and 2(b), show that the images could be separated into two categories in the GLCM energy and homogeneity feature spaces since the feature values from the different classes form distinct populations (p < 0.0001). This indicates that the features are likely to be powerful features for class discrimination. Similarly, Figure 3 shows the computed edge contrast and edge density feature values. Follicle wall images in the natural menstrual cycles had consistently higher edge density and edge contrast (p < 0.001) indicating that these are also discriminating features. GLCM contrast and correlation, E5S5/S5E5 and E5L5/L5E5 measures exhibited considerable overlap between the two populations (p < 0.04) and were considered to be inferior class discriminators compared to the others.
6
Classification Using a Single Texture Feature
We investigated the ability of each feature to classify our image series as belonging to either a natural cycle or an OC cycle. For each series and texture feature, a vector was produced consisting of the measured values of that feature on each day. All feature vectors were of size 8 where missing data due to lack of an image on a particular day in a series was generated by linear interpolation of neighboring samples. For each feature, six randomly chosen vectors (3 normal and 3 OC) were used as training data. Remaining vectors were used for testing. To classify the test set, we fit a multivariate normal density to each data set such that the difference of the means in the two groups was maximum and each observation in the testing samples was assigned to its nearest group [12]. MATLAB 7’s classify() function was used to perform the classification; rates are shown in Table 1. Edge density, edge contrast, GLCM homogeneity and GLCM energy were able to correctly classify the test set with 100% accuracy. Both Laws’ energy statistics had a classification rate of 77.8% while cooccurrence contrast and GLCM correlation achieved rates of 66.7% and 44.4%, respectively.
98
N. Bian, M.G. Eramian, and R.A. Pierson Table 1. Single-feature classification rate of ovulatory and anovulatory follicles Feature Edge Density Edge Contrast Laws’ E5S5/S5E5 Laws’ E5L5/L5S5
% Correct 100% 100% 77.8% 77.8%
Feature GLCM Energy GLCM Homogeneity GLCM Contrast GLCM Correlation
2
36
% Correct 100% 100% 66.7% 44.4%
2
42
edge density 1.5 1
1 41
0.5 0
34
U
−0.5 33
Contrast
Edge Density
35
0.5 0
40.5
U(dl)
l
−6
−5
−4
−3
−2
−1 39.5
−1.5
−1
*
39 −7
−6
−5
−4
−3
day
day
(b) U(dl)
l
Energy
0.5 0
U
−0.5
0.06
−1
l
1.5 1 0.5 0
0.48
−6
−5
−4
−3
−2
−1
−2 0
U
−0.5 0.47
−1
−1.5 0.05 −7
−2 0
homogeneity
U(d )
0.49
Homogeneity
0.07
−1
2 *
U (dl)
1.5 1
−1.5
l
−2
0.5
2 U*(d )
energy
U(d )
U (dl)
contrast
−2 0
(a) 0.08
U
−0.5
40
−1 U*(d )
32 −7
1.5
41.5
−1.5 0.46 −7
−6
−5
−4
−3
day
day
(c)
(d)
−2
−1
−2 0
Fig. 4. Turning points for (a) edge density, (b) edge contrast, (c) GLCM energy, and (d) GLCM homogeneity of ovulatory follicle images
7
Trend Analysis Using the Mann-Kendall Method
The Mann-Kendall method was used to locate changes in the trends (upward or downward) of feature values over time. The method used by Sneyers [13] and Johnson et. al. [9] was followed which estimates the single turning point (change of trend) of a series: 1. Givena sequence of feature values ft , t = 1, 2, ..., l, l ≤ N , the test statistic is dl = i<j sign(fj − fi ) where sign(x) = 1 if x is positive, and 0 otherwise. 2. Let E(dl ) and V ar(dl ) be the expected value and variance of dl , l = 1, 2,. . ., N. Calculate the N statistics U (dl ) where U (dl ) = (dl − E(dl ))/ V ar(dl ). (2) 3. Repeat 1 and 2 on the original sequence in reverse order to obtain Unew (dl ). Let U ∗ (dl ) = −Unew (dN −l+1 ). Plot U ∗ (dl ) and U (dl ) on the same axes. 4. If |U | ≤ 1.96 at the time of intersection, x, of the two plots, then Mann’s test supports the hypothesis of a turning point at time x with 95% confidence.
Evaluation of Texture Features 2
25 *
2
25
U(d )
U (dl)
l
1.5
1.5
1
1
0.5 0
20
Contrast
Edge Density
edge density
U
−0.5
0.5 0
20
−1
−1.5
−6
−5
−4
−3
−2
−1
15 −7
−6
−4
−3
day
(b) 2 *
l
−1
2 1.5
0.69
0
U
−0.5
Homogeneity
0.5
*
U(d )
U (dl)
l
1
0.68
0.5 0
0.67
−1.5
−6
−5
−4
−3
−2
−1
−2 0
U
−0.5
0.66
−1
0.1 −7
−1.5 −2 0
homogeneity 1.5 1
0.15
−2
0.7
U(d )
U (dl)
U(dl)
l
−5
day
energy
Energy
U*(d )
contrast
−2 0
(a) 0.2
U
−0.5
−1
15 −7
99
−1 0.65
0.64 −7
−1.5
−6
−5
−4
−3
day
day
(c)
(d)
−2
−1
−2 0
Fig. 5. Turning points for (a) edge density, (b) edge contrast, (c) GLCM energy, and (d) GLCM homogeneity of anovulatory follicle images
The application of the Mann-Kendall method on both the natural cycle group and OC cycle group are depicted in Figures 4 and 5, respectively. The solid curves represent the average feature value on each day over all women in their respective group. The dotted curves are the statistics U and U ∗ , respectively. On each plot, there is one point of intersection of U and U ∗ such that |U | ≤ 1.96, namely, the turning point. The observation of these turning points was interpreted as an indication of a physiologic change in follicle walls associated with impending ovulation occurring 3 or 4 days before ovulation, or E2 peak, respectively.
8
Conclusion and Discussion
We discovered that it is possible to distinguish dominant follicles during natural cycles from those in women using OC though in vivo ultrasonography. GLCM energy, GLCM homogeneity, edge density and edge contrast texture measures of the follicle walls of anovulatory dominant follicles during OC cycles were significantly different than those of ovulatory dominant follicles from women in natural menstrual cycles. Moreover, all four features were able to classify images as either normal or OC-user series with 100% accuracy. Lower homogeneity observed in ovulatory follicles was attributed to increased vascularity and interstitial edema as ovulation approached. Lower edge density and contrast values observed in anovulatory follicles indicated coarser textures on the follicle walls. This was attributed to failure to develop interstitial edema and peripheral vascularity. Turning points for edge density, edge contrast, GLCM homogeneity and GLCM energy were found around the 3rd or 4th day before ovulation for ovulatory follicles and around the 3rd or 4th day before peak E2 for anovulatory follicles. This
100
N. Bian, M.G. Eramian, and R.A. Pierson
is an exciting observation as it coincides with the time at which follicles are being biologically “prepared” for ovulation. This coincidence suggests that texture features of ultrasound images could be used to detect physiologic changes associated with impending ovulation which would be a useful aid to oocyte harvesting for in vitro fertilization. Future work will investigate the question of whether a similar approach could be used to identify other biological events such as physiologic selection of dominant follicles.
References [1] A. R. Baerwald, G. P. Adams, and R. A. Pierson. Characterization of ovarian follicular wave dynamics in women. Biology of Reproduction, 69(1):1023–1031, 2003. [2] A. R. Baerwald, O. A. Olatunbosun, and R. A. Pierson. Ovarian follicular waves are initiated during the hormone free interval in women using oral contracption. Contraception, 70(5):371–377, 2004. [3] A. R. Baerwald and R. A. Pierson. Ovarian follicular development during the use of oral contraception: A review. Journal of Obstetrics and Gynaecology Canada, 26(1):19–24, Jan 2004. [4] O. Basset, Z. Sun, J. L. Mestas, and G. Gimenez. Texture analysis of ultrasonic images of the prostate by means of co-occurrence matrices. Ultrasonic Imaging, 15:218–237, 1993. [5] M. Broome, J. Clayton, and K. Fotherby. Enlarged follicles in women using oral contraceptives. Contraception, 52:13–16, 1995. [6] P. G. Crosignani, G. Testa, W. Vegetti, and F. Parazzini. Ovarian activity during regular oral contraceptive use. Contraception, 24(5):271–273, 1996. [7] B. Georgescu and C. M. Christoudias. Code for the edge detection and image segmentation system. http://www.caip.rutgers.edu/riul/research/code/ EDISON/index.html. [8] R. M. Haralick and K. Shanmugam. Textural feature for image classification. IEEE Transactions On Systems, Man, And Cybernetics, SMC–3(6):610–619, November 1980. [9] N. L. Johnson, C. B. Read, and S. Kotz. Encyclopedia of Statistical Sciences. New York : Wiley, 1981. [10] K. I. Laws. Texture energy measures. Proc.Image Understanding Workshop, pages 41–51, 1979. [11] M. A. Renier, A. Vereechen, E. V. Herck, D. Straetmans, P. Ramaekers, J. Vanderheyden, H. Degezelle, and P. Buytaert. Dimeric inhibin serum values as markers of ovarian activity in pill-free intervals. Contraception, 57(1):45–48, January 1998. [12] G. A. F. Seber. Multivariate Observations. Wiley Series in Probability and Mathematical Statistics. Wiley, 1984. [13] R. Sneyers. On the statistical analysis of series of observations. Technical note, World Meteorological Organization Technical Note, 1990. [14] M. Sonka, V. Hlavac, and R. Boyle. Image Processing, Analysis and Machine Vision, chapter 14, pages 653–656. Brooks/Cole, A Division of Thomson Asia Pte Led, USA, 2 edition, 1998.
A Fast Method of Generating Pharmacokinetic Maps from Dynamic Contrast-Enhanced Images of the Breast Anne L. Martel Department of Medical Biophysics, University of Toronto, Canada
[email protected] Abstract. A new approach to fitting pharmacokinetic models to DCE-MRI data is described. The method relies on fitting individual concentration curves to a small set of basis functions and then making use of a look up table to relate the fitting coefficients to pre-calculated pharmacokinetic parameters. This is significantly faster than traditional non-linear fitting methods. Using simulated data and assuming a Tofts model, the accuracy of this direct approach is compared to the Levenberg-Marquardt algorithm. The effect of signal to noise ratio and the number of basis functions used on the accuracy is investigated. The basis fitting approach is slightly less accurate than the traditional non-linear least squares approach but the ten-fold improvement in speed makes the new technique useful as it can be used to generate pharmacokinetic maps in a clinically acceptable timeframe.
1 Introduction MRI has been shown to be a powerful technique for the diagnosis and assessment of breast cancer. The use of dynamic contrast-enhanced MRI (DCE-MRI) as a screening modality is currently under investigation in several centres [1, 2], but although the negative predictive value of the technique approaches 100%, which is desirable for a screening technique, the positive predictive value is much lower than for conventional mammography [1, 3]. This indicates that many women with benign lesions are being referred for breast biopsies and much effort is currently being made to improve the PPV and specificity of MR breast imaging. The MR images contain a great deal of information and once morphological features and parameters characterizing the contrast kinetics have been extracted from the data a computer classification scheme can be used to distinguish between malignant and benign lesions. Many investigators use simple methods to characterize the signal intensity versus time curves; a subjective scoring system which attempts to distinguish between curves with and without a wash-out phase has been described [4] and quantitative measures of uptake and wash-out ratios may also be used [5-7]. The main difficulty with these approaches is that the kinetic features extracted from the image data tend to vary between institutions due to differences in imaging equipment, protocol and analysis techniques. In principle pharmacokinetic modeling is less sensitive to variations in MRI acquisition protocols and it has the additional advantage that parameters obtained in this manner have some underlying physiological meaning. The signal intensity versus R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 101 – 108, 2006. © Springer-Verlag Berlin Heidelberg 2006
102
A.L. Martel
time curves are first converted to concentration time curves and are then fitted to a compartmental model such as the Brix model [8] or the Tofts model [9]. Fitting the concentration time curves to a model is usually achieved using a non-linear least squares method algorithm such as the Levenberg-Marquardt method [10]. Concentration curves generated using a region of interest have a good signal to noise ratio due to averaging over many pixels and the time taken to carry out a fit of a single curve is not significant. Generating parametric images where every pixel concentration curve must be fitted to a compartmental model is more problematic, however. Fitting techniques are sensitive to the starting values used and may fail to converge [11, 12], and the time taken to carry out the fit on every single pixel in a 3D volume image becomes a significant obstacle to the translation of pharmacokinetic modeling into clinical practice. This means that although pharmacokinetic modeling of DCE–MRI data has the potential to improve the diagnostic accuracy of screening studies it has not been used as extensively as less reproducible approaches such as mapping the percentage contrast enhancement or wash-out. For this reason we have developed a new method of generating pharmacokinetic maps using basis functions. This technique will allow us to assess the utility of pharmacokinetic modeling for breast screening in future studies.
2 Theory and Methods 2.1 Pharmacokinetic Modeling in DCE-MRI The compartmental model described by Tofts [9] is frequently used to characterize contrast dynamics in tumours. The concentration of Gd-DTPA contrast agent in the tumour is given by the expression
C (t ) = DK trans
(
)
⎛ exp − k ep t − exp(mi t ) ⎞ ⎟ ⎟ mi − k ep ⎝ ⎠
2
∑ ai ⎜⎜ i =1
(1)
where C(t) is the tissue concentration and D is the injected dose (mmol/kg). Published values were used for the rate constants of Gd-DTPA clearance (m1 and m2 ) and for the corresponding amplitudes (a1 and a2 ) [9]. Assuming that permeability-limited conditions apply, Ktrans (min-1) equals the transfer constant between the blood plasma and tissue compartments and kep is equal to Ktrans /ve where ve is the extracellular extravascular volume fraction. Conventionally the parameters Ktrans and ve are determined using a non-linear least squares technique such as the Levenberg-Marquardt algorithm [10]. 2.2 Using Basis Functions to Estimate Pharmacokinetic Parameters
Any concentration versus time curve can be represented by a linear combination of basis functions where C (t ) =
m
∑ a k φ k (t ) + e(t ) k =1
(2)
A Fast Method of Generating Pharmacokinetic Maps
103
ak are the fitting coefficients, φk(t) is the k’th basis function and e(t) is an error term. Ideally a small number of basis functions should be sufficient to give a good approximation to C(t) and the basis functions themselves should have features matching those known to belong to the functions being estimated. Representing C(t) as a linear combination of basis functions has two advantages; firstly it smooths the curve and secondly if the number of basis functions required is small then the fitting coefficients provide a very efficient way of parameterising the curve. Commonly used methods of defining basis functions include Fourier series expansions and spline functions however these are generic basis functions and may not be as efficient at representing the signal intensity curves as a tailored set of basis functions would be. We propose to use principal components analysis (PCA) to generate a set of basis functions that are optimized to represent DCE-MRI signal intensity curves from breast studies. First a population of “plausible” curves is generated using equation 1. This was done by varying the values of Ktrans (0.001< Ktrans <1.0, Δ Ktrans =0.001 min-1) and ve (0.001
T
(3)
where V is an orthonormal (p x p) matrix, W is a diagonal (p x p) matrix where the diagonal elements are known as singular values and U is a column-orthonormal (n x p) matrix. If the diagonal elements of W are sorted into descending order and the columns of U and V are reordered accordingly then equation (2) is equivalent to a PCA where the columns of V contain the principal component curves, the diagonal elements of W are equal to the square roots of the eigenvalues and the columns of U correspond to the coefficients. The first 6 principal components are shown in figure 1. The first m PCs are used as basis functions where the selection of m is discussed later in this paper. The coefficients in the matrix U together with the corresponding values for k21 and kel are saved to create a lookup table. Any concentration curve C(t) can then be expressed as a linear combination of the first M principal components C (t ) ≈
m
m
k =1
k =1
∑ a k φ k (t ) = ∑ a k wkk v k (t )
(4)
C = AΦ T = AW' V 'T
(5)
Or in matrix terms
where C is the (n x p) matrix with the rows representing the concentration curves for the n pixels, A is the (n x m) matrix of fitting coefficients, W2’ is the (m x m) diagonal matrix containing the first m eigenvalues and V’ is the (p x m) matrix containing the first m principal components. The fitting coefficients A are estimated by
(
A = CΦ Φ T Φ
)
−1
(6)
The coefficients corresponding to the j’th pixel SI curve are then compared against those generated for the reference curves to find the closest point.
104
A.L. Martel
i ( j ) = min i
∑ (a jk − u ik )2 k
(7)
k =1
Parametric images of Ktrans and ve are then generated by looking up the values corresponding to the ith reference curve.
Fig. 1. The first 6 basis functions
2.3 Simulation Studies In order to test the accuracy of this approach and to determine how many basis functions were required, a test set of 55 curves was generated with combinations of Ktrans and ve values selected over the same range as that used to generate the reference curves. A temporal resolution of 10 seconds and scan duration of 15 minutes was assumed. The parameter estimates obtained using a Levenburg-Marquardt algorithm [11] were compared with results obtained using the basis function approach outlined above. The following conditions were varied
• In addition to the noise free case, Gaussian noise was added to each curve to give signal to noise ratios of 10 and 20 (SNR defined as peak concentration/noise SD). 1000 noisy curves were generated for each of the 55 Ktrans and ve combinations to allow median values and 5%-95% confidence intervals to be calculated. • The number of basis functions used was varied between 2 and 6 2.4 Patient Study The proposed technique was tested using a DCE-MRI study carried out as part of an on-going screening study. During the first 2 minutes postinjection, 11 dynamic images were acquired using a 2-dimensional SPGR sequence with fat saturation (TR/TE/flip angle, 150 ms/4.2 ms/50°) and a temporal resolution of 20 seconds. This was followed by a single high spatial resolution 3-dimensional SPGR scan that took 7 minutes and was used to provide morphological information. This was then followed up by an additional series of 3 dynamic images to monitor contrast media washout. A contrast dose of 0.1 mmol/kg Gd-DTPA was injected after the third image was acquired. Each image comprised 256 x 256 pixel and there were 12 slices. A uniform
A Fast Method of Generating Pharmacokinetic Maps
105
T1 value of 710 ms was assumed in order to convert the signal intensity values into concentration values. Pharmacokinetic parameters were then calculated from the concentration data using the non-linear least squares technique (NLLS) described in [11] and using the basis function fitting method described above with m=2.
3 Results In Fig. 2 the estimated values of Ktrans obtained using noise free simulation data are plotted against the corresponding value of ve and the true values of Ktrans are given on the right. Concentration curves corresponding to small ve and high Ktrans show very rapid early increase followed by a rapid washout and it is clear from the results shown in figure 2 that the basis functions do not fit the data well in this situation. The errors in the estimated Ktrans values increase as the number of basis functions used decreases.
Fig. 2. Effect of altering m, the number of basis functions used on noise free data
Fig. 3. Median values and 5%-95% confidence intervals are plotted for a subset of the simulation curves (SNR=10). Ktrans measurements for ve = 80% (top row), and ve measurements for Ktrans = 0.8 (bottom row). When SNR=10, overfitting occurs for m>3.
106
A.L. Martel
Fig. 4. (Upper left) Slice taken from a DCE-MRI study of a patient with a fibroadenoma in the left breast. (Middle row) Pharmacokinetic maps generated using a conventional NLLS technique. (Lower row) Maps generated using 2 basis functions. (Upper right) Close-up view of region defined by ROI showing excellent agreement between methods.
The results from the noise free simulations suggest that a higher number of basis functions is desirable but in noisy data there is a tradeoff between bias in the fitted curve and variance in the fitting coefficients. In the noisy simulations, results suggest
A Fast Method of Generating Pharmacokinetic Maps
107
that over-fitting occurs when m>4 for SNR = 20 and m>3 for SNR = 10 and we found that good results could be obtained using m = 2 or 3. Provided that ve > 20% there was no significant reduction in accuracy. Increased error in the Ktrans values due to bias were seen in curves where ve < 20%. Fig. 3 shows a subset of the simulation results for SNR = 20. In Fig. 4 it can be seen that there is excellent agreement between the maps obtained using a conventional NNLS approach and the basis fitting approach where m=2. The time taken to carry out the Levenberg–Marquardt fitting on a 3GHz Xeon processor was 1 hour, 26 minutes compared to 9 minutes for the basis function approach.
4 Discussion and Conclusions We have reduced the processing time required to generate pharmacokinetic parameter maps from pixel concentration curves by a factor of 10. This means that images showing pharmacokinetic parameters can be more readily utilised in a clinical screening environment where lengthy pre-processing of image data may not be viable. The faster technique has introduced some errors in the calculated parameters for curves generated with low values of ve and further work is needed to determine whether this affects diagnostic accuracy. An adaptive technique that allows more basis curves to be used for pixels with a greater SNR may reduce the error. Although this paper has described a method to fit a Tofts model to the concentration data it is possible to use any pharmacokinetic model provided that the “model space” of possible concentration curves can be defined a priori. We have also fitted data to a modified Brix model [12] using this approach with similar results (unpublished data). Another application of the technique is as a pre-processing step for conventional non-linear least-squares routines where accuracy is important. The pharmacokinetic parameters obtained using the basis fitting technique could be used as initial starting estimates for the Levenberg-Marquardt technique which is very sensitive to the choice of initial values.
Acknowledgements This work was supported by the Canadian Breast Cancer Research Alliance. We would like to thank Don Plewes for providing the breast MRI data.
References 1. Warner, E., et al., Surveillance of BRCA1 and BRCA2 mutation carriers with magnetic resonance imaging, ultrasound, mammography, and clinical breast examination. JAMA, (2004). 292(11): 1317-25. 2. Brown, J., et al., Magnetic resonance imaging screening in women at genetic risk of breast cancer: imaging and analysis protocol for the UK multicentre study. Magnetic Resonance Imaging, 2000. 18(7): 765-776.
108
A.L. Martel
3. Warren, R.M., et al., Reading protocol for dynamic contrast-enhanced MR images of the breast: sensitivity and specificity analysis. Radiology, 2005. 236(3): 779-88. 4. Kuhl, C.K., et al., Dynamic breast MR imaging: are signal intensity time course data useful for differential diagnosis of enhancing lesions? Radiology, 1999. 211: 101-10. 5. Heywang, S.H., et al., MR imaging of the breast with Gd-DTPA: use and limitations. Radiology, 1989. 171(1): 95-103. 6. Kaiser, W.A. and E. Zeitler, MR imaging of the breast: fast imaging sequences with and without Gd-DTPA. Preliminary observations. Radiology, 1989. 170: 681-6. 7. Gibbs, P., et al., Differentiation of benign and malignant sub-1 cm breast lesions using dynamic contrast enhanced MRI. Breast, 2004. 13(2): 115-21. 8. Brix, G., et al., Pharmacokinetic parameters in CNS Gd-DTPA enhanced MR imaging. J Comput Assist Tomogr, 1991. 15(4): 621-8. 9. Tofts, P.S., B. Berkowitz, and M.D. Schnall, Quantitative analysis of dynamic Gd-DTPA enhancement in breast tumors using a permeability model. Magn Reson Med, 1995. 33: 564-8. 10. Press, W.H., et al., Numerical Recipes in C. 1994, Cambridge: Cambridge University Press. 11. Ahearn, T.S., et al., The use of the Levenberg-Marquardt curve-fitting algorithm in pharmacokinetic modelling of DCE-MRI data. Phys Med Biol, 2005. 50(9): p. N85-92. 12. Buckley, D.L., et al., Quantitative analysis of multi-slice Gd-DTPA enhanced dynamic MR images using an automated simplex minimization procedure. Magn Reson Med, 1994. 32(5): 646-51.
Investigating Cortical Variability Using a Generic Gyral Model Gabriele Lohmann1 , D. Yves von Cramon1 , and Alan C.F. Colchester2 1
Max-Planck-Institute for Human Cognitive and Brain Sciences, Leipzig, Germany 2 University of Kent at Canterbury, UK
Abstract. In this paper, we present a systematic investigation of the variability of the human cortical folding using a generic gyral model (GGM). The GGM consists of a fixed number of vertices that can be registered non-linearly to an individual anatomy so that for each individual we have a clearly defined set of landmarks that is spread across the cortex. This allows us to obtain a regionalized estimation of intersubject variability. Since the GGM is stratified into different levels of depth, it also allows us to estimate variability as a function of depth. As another application of a polygonal line representation underlying the generic gyral model, we present a cortical parcellation scheme that can be used to regionalize cortical measurements.
1
Introduction
Human cortical folding shows a high degree of inter-individual variability which makes inter-subject comparisons very difficult and prone to error. In this paper, we systematically investigate inter-subject variability using a set of cortical landmarks. These landmarks are derived from a generic gyral model (GGM) that was previously introduced in [1]. The vertices of the GGM serve as landmarks that can be easily identified across a large population of subjects using non-linear registration techniques. Their variances and covariances can then be analyzed and used in a regionalized way. The identification of useful landmarks is a key problem in inter-subject registration and thus in variability studies [2]. We use the vertices of the generic gyral model because this model is very well justified as an average representation of cortical gyri across a large population of subjects. As another extension of the work in [1] we present a novel cortical parcellation technique. While in our previous work, gyri were represented as polygonal lines, we now extend this concept so that entire gyral surfaces can be segmented and a semi-complete parcellation of the cortical surface results. This will be helpful for regionalizing cortical measurements such a thickness estimations.
2
Parcellation of the Cortical Surface
In [1], a polygonal line representation of cortical gyri was introduced. A polygonal line consists of a set of vertices in 3D space that are connected by edges. We R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 109–116, 2006. c Springer-Verlag Berlin Heidelberg 2006
110
G. Lohmann, D.Y. von Cramon, and A.C.F. Colchester
envisage a thin 3D sheet residing at the centre of a gyrus that is obtained by a 3D topological thinning operation. A gyral line intersects this sheet at a userdefined level of depth (see figure 1). A set of such lines at increasing levels of depth provide a simplified representation of a gyral sheet. 11111111111111111 00000000000000000 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 white matter 00000000000000000 surface 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111 00000000000000000 11111111111111111
gyral lines
Fig. 1. A gyral line is a polygonal line the intersects the skeletonized gyrus (gyral sheet) at a given level of depth. In this paper, we investigate sets of gyral lines that intersect the gyral sheet at eight pre-defined depth levels.
A semi-automatic approach can be used to obtain neuroanatomical labels for some major gyri [1]. Specifically, the following gyri can be identified in most individuals: inferior frontal, pre- and postcentral, perisylvian, temporal, and a parietal gyrus. Here, this approach is extended so that entire gyri can be segmented and labelled and a parcellation of the cortical surface results. This parcellation does not cover the entire cortical surface as the gyral labelling is restricted to the major gyri. Figure 2 shows this method applied to one individual brain. As input into the algorithm, a skull-stripped T1-weighted image, its white matter segmentation and the corresponding labelled gyral line representation is required. For each voxel on the white matter surface, the nearest vertex of the gyral lines is sought. If this vertex has a neuroanatomical label then the white matter surface voxel inherits this label, otherwise it receives a null label. Once all white matter surface voxels are labelled in this manner, all grey matter voxels are labelled likewise: each grey matter voxel inherits the label of the nearest white matter voxel.
3
Obtaining Cortical Landmarks
In [1], a generic model of human cortical gyri was obtained using non-linear principal components analysis applied to 96 T1-weighted MRI data sets (see figure 4). The generic gyral model (GGM) consists of an averaged line representation for some major gyri at eight depth levels. Each depth level contains about 85 vertices so that the entire model has about 700 vertices. Here, we propose to use these vertices as landmarks for inter-subject registration. Since these landmarks can be identified in each individual via non-linear registration, they can
Investigating Cortical Variability Using a Generic Gyral Model
111
Fig. 2. Mapping gyral labels from the line-based representation to the cortical surface (1:middle frontal, 2: precentral, 3: postcentral, 4:perisylvian, 5: parietal) gyral lines
white matter surface
grey matter surface
Fig. 3. Mapping of gyral labels from the line-based representation to the cortical surface: each point on the white matter surface inherits the label of the nearest gyral line, and each point within the grey matter inherits these labels
subsequently be used to describe and analyze inter-subject variability. In the following, we describe our non-linear inter-subject registration method. As a prerequisite to any further processing, all MRI data sets are assumed to be registered to an AC/PC-based coordinate frame [3] using an affine linear transformation. The GGM can be then used to guide inter-subject registrations since it provides a set of cortical landmarks. In the following, we propose a two-stage approach: at the first stage, a cubic polynomial is applied to provide an initial rough match. The second stage employs thinplate splines for a more refined match. Both stages require a point-to-point correspondence between the vertices of the GGM and the vertices of the gyral lines of an individual data set. However, the number of vertices differs greatly between these two types of data. The generic gyral model contains about 700 vertices across eight different levels of depth. Since the GGM is an averaged representation it is a lot smoother than the line representations of individual data. In addition, the GGM only contains neuroanatomically labelled major gyri, whereas line representations of individual data also contain numerous unlabelled smaller gyri. Therefore, the number of vertices of an individual data set is dramatically larger than that of the GGM. To obtain point correspondences, we have tested several methods including the assignment method by Rangarajan et al. [4] which worked quite well. However,
112
G. Lohmann, D.Y. von Cramon, and A.C.F. Colchester
Fig. 4. The generic gyral model (see [1])
we found that a simple nearest neighbour assignment produced similar results where for each point pi , i = 1, ..., n of the GGM, the gyral line point qj with minimal Euclidean distance and the same neuroanatomical label is selected. For an initial rough match, a cubic polynomial of the following form is used: f (x, y, z) = a1 x + a2 y + a3 z + a4 xy + a5 xz + a6 yz + a7 x2 + a8 y 2 + a9 z 2 +a10 xyz + a11 x2 y + a12 x2 z + a13 y 2 x + a14 y 2 z + a15 z 2 x + a16 z 2 y +a17 x3 + a18 y 3 + a19 z 3 . The deformation parameters a1 , ..., a19 are estimated such that the term (qj − f (a1 , ...., a19 ; pi ))2 i,j
is minimized. In our experiments, we have used Powell’s method [5] for non-linear optimization. This initial registration is then refined using a thinplate spline approach [6,7]. As for the cubic polynomial, the point-to-point correspondence is obtained by selecting for each point pi , i = 1, ..., n of the GGM a gyral line point qj with minimal Euclidean distance and the same neuroanatomical label. Since the cubic polynomial was applied beforehand, the Euclidean distances are now much smaller than before. Note that the cubic polynomial is governed by global parameters that minimize the distances between the two data sets globally, whereas the thinplate splines produce a local match that moves each landmark point directly onto its nearest corresponding point.
4
Regionalized Estimation of Cortical Variability
As described above, the GGM (and hence the vertices) can be registered nonlinearly so that for each individual we have a fixed number of landmarks at defined locations spread across the cortex at various levels of depth. This allows us to obtain a regionalized estimation of inter-subject variability. Since the GGM
Investigating Cortical Variability Using a Generic Gyral Model
113
is stratified into different levels of depth, it also allows us to estimate variability as a function of depth. For each of the 96 data sets, we extracted a 3D gyral line representation and performed a non-linear registration that mapped the vertices of the GGM onto each individual data set using the method described in the previous section. This yielded a deformation map that contained a displacement vector for each GGM-vertex in each individual. For each individual, we obtained a map that recorded the magnitude of the displacement (see figure 5).
Fig. 5. A displacement map of one individual data set. The displacement is corrected for the bias introduced by the AC/PC-alignment by dividing the displacement magnitude by the distance from AC/PC (see text). Therefore, the color-coded values correspond to corrected displacement magnitudes.
Note however that such a measure of displacement is biased by the initial AC/PC alignment (see figure 6). Landmarks that are closer to AC/PC will tend to be less variable than those that are further away. To correct for this problem, we divided the displacement magnitude of each vertex by its distance from AC/PC.
d AC Fig. 6. Measures of variability must be corrected for the bias introduced by the AC/PC alignment
The displacements record the deviation from the generic gyral model. Since the model represents a population average, large displacements over many subjects indicate a high degree of variability. Therefore, the individual displacement
114
G. Lohmann, D.Y. von Cramon, and A.C.F. Colchester
maps were averaged across the 96 data sets to produce a map that represents the variability across the population (see figure 7). Note that the middle frontal gyrus shows a higher degree of variability than the parietal gyrus. Also note that the dorsal tip of the pre- and postcentral gyri is more variable than their ventral parts.
Fig. 7. Variability map computed as an average of displacement maps of 96 subjects. The right-hand image shows the vertices of the GGM superimposed onto a T1-weighted structural image. The ventral region of the post-central gyrus (a) is less variable than the dorsal region (b). The parietal gyrus (d) is less variable than the middle frontal gyrus (c). The color coding is based on the AC/PC-corrected variability measure described in the text.
Since the GGM contains eight different strata of depth, we can investigate the inter-subject variability as a function of depth. For this purpose, we extracted a deep and a shallow level of depth and performed non-linear registrations separately for both levels. We then extracted the main gyri contained in the GGM for both depth levels across all 96 subjects. We estimated the standard deviation in x, y, z for each landmark and divided this estimate by the distance from the landmark to the anterior commissure (AC). This was done to counterbalance the bias in variance estimation introduced by the AC/PC-alignment. We found that in 5 out of 6 main gyri, the variance decreased with depth when comparing the deepest level against the most shallow level. Table 1 shows the results. Note that comparisons between gyri are not possible with this approach as the gyri are represented by a different number of vertices. Table 1. The decrease in variance with depth. The estimates correspond to standard deviations averaged across the landmarks within a gyrus and summed over x, y, z. To account for the bias in estimating variances, the standard deviations were divided by the distance from anterior commissure (AC) for each landmark separately.
deep shallow
perisylvian precentral postcentral parietal temporal middle frontal 6.890 2.689 1.811 2.435 6.054 5.399 8.892 3.837 1.995 2.128 6.085 5.578
Investigating Cortical Variability Using a Generic Gyral Model
5
115
Discussion
We have presented a new method for analyzing cortical variability and for obtaining cortical parcellation. Some previous methods for analyzing the variability were also based on extracting labelled representations of sulci or gyri (e.g. [8,9]). The novelty of our approach lies the usage of the generic gyral model which provides a set of well-defined landmarks that can be identified across may subjects using non-linear registration. These landmarks are particularly useful because they are located on main gyri that can be easily detected in most healthy subject. Also, these landmarks are separated into different strata of depth so that variability can be investigated as a function of depth. We have found that variability decreases with depth which agrees well with earlier findings [10,11]. While our approach is essentially line-based (or vertex-based), several other authors have used surface-based approaches for analyzing cortical variability (e.g. [12,2]). Van Essen [2] however noted the importance of identifying useful landmarks. Here, we have proposed to use landmarks that are derived from the generic gyral model. These landmarks are very well justified because the generic gyral model resulted from averages across a large population of individual data sets. Therefore, we can expect these landmarks to be quite representative of major cortical features. One aspect of our future work will be directed towards exploiting the parcellation of the cortical surface so that morphometric studies can be regionalized.
References 1. G. Lohmann,D.Y. von Cramon, A.C.F. Colchester: A construction of an averaged representation of human cortical gyri using non-linear principal component analysis. In: 8th Int. Conf. on Medical Image Computing and Computer Assisted Intervention (MICCAI 2005), Palm Springs, CA, USA (2005) 2. Essen, D.V.: A population-average, landmark- and surface-based (pals) atlas of the human cerebral cortex. Neuroimage 28 (2005) 635–662 3. Talairach, P. Tournoux, J.: Co-planar Stereotaxic Atlas of the Human Brain: 3Dimensional Proportional System - an Approach to Cerebral Imaging. Thieme Medical Publishers, New York (1988) 4. Gold, A. Rangarajan, S.: A graduated assignment algorithm for graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (1996) 5. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C. 2nd edn. Cambridge University Press (1992) 6. Bookstein, F.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Transactions on Pattern Analysis and Machine Intelligence 11 (1989) 567–585 7. Bookstein, F.: Morphometric tools for landmark data. Cambridge University Press (1991) 8. Goualher, E. Procyk, D.L. Collins, R. Venugopal, C.Barillot, A.C. Evans, G.L.: Automated extraction and variability analysis of sulcal neuroanatomy. IEEE Transactions on Medical Imaging 18 (1999) 206–217
116
G. Lohmann, D.Y. von Cramon, and A.C.F. Colchester
9. Cachia, J.F. Mangin, D. Riviere, F. Kherif, N. Boddaert, A.Andrade, D. Papadopoulos-Orfanos, J.B. Poline, I. Bloch, M. Zilbovicius, P. Sonigo, F. Brunelle, J. Regis, A.: A primal sketch of the cortex mean curvature: A morphogenesis based approach to study the variability of the folding patterns. IEEE Transactions on Medical Imaging 22 (2003) 754–765 10. G.Lohmann, D.Y.v.Cramon, H. Steinmetz: Sulcal variability of twins. Cerebral Cortex 9 (1999) 754–763 11. Goualher, A.M. Argenti, M. Duyme, W.F.C. Baar, H.E. Hulshoff Pol, D.I. Boomsma, A. Zouaoui, C. Barillot, A.C. Evans, G.L.: Statistical sulcal shape comparisons: application to the detection of the genetic encoding of the central sulcus shape. Neuroimage 11 (2000) 564–574 12. Thompson, T.D. Cannon, A.W. Toga, P.: Mapping genetic influences on human brain structure. Annals of Medicine 34 (2002) 523–536
Blood Flow and Velocity Estimation Based on Vessel Transit Time by Combining 2D and 3D X-Ray Angiography Hrvoje Bogunovi´c and Sven Lonˇcari´c Faculty of Electrical Engineering and Computing University of Zagreb, Unska 3, HR-10000 Zagreb, Croatia {hrvoje.bogunovic, sven.loncaric}@fer.hr http://ipg.zesoi.fer.hr Abstract. The X-ray imaging equipment could be used to measure hemodynamic function in addition to visualizing the morphology. The parameters of specific interest are arterial blood flow and velocity. Current monoplane X-ray systems can perform 3D reconstruction of the arterial tree as well as to capture the propagation of the injected contrast agent on a sequence of 2D angiograms. We combine the 2D digital subtraction angiography sequence with the mechanically registered 3D volume of the vessel tree. From 3D vessel tree we extract each vessel and obtain its centerline and cross-section area. We get our velocity estimation from 2D sequence by comparing time-density signals measured at different ends of the projected vessel. From the average velocity and cross-section area we get the average blood flow estimate for each vessel. The algorithm described here is applied to datasets from real neuroradiological studies.
1
Introduction
Medical imaging in addition to diagnostic purpose, is also being used as an interventional and therapeutic tool in order to perform minimally invasive treatment of vascular diseases. Since these interventions are altering the hemodynamic function, the physiologic parameters of interest are blood flow and velocity in arteries. X-ray imaging is still considered as a gold standard for endovascular procedures so it would be of great interest if the same imaging device could be used to immediately evaluate the result of the intervention. Additionally, X-ray equipment is readily available in both developed and developing countries. It can be used to image 3D volumes and it is very suitable for taking a sequence of dynamic images because high frame rate does not deteriorate the spatial image resolution as much as in the other imaging modalities. However some difficulties arise due to its projective imaging and the limitation of the dose patient is allowed to receive. There are already a number of publications that tackle the problem of blood flow and velocity estimation with X-ray systems. A thorough review of algorithms was given in [1]. However, few of them had access to the 3D volume of R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 117–124, 2006. c Springer-Verlag Berlin Heidelberg 2006
118
H. Bogunovi´c and S. Lonˇcari´c
vessels that are being imaged by digital subtraction angiography (DSA) since 3D reconstruction angiography (3DRA) was not largely available at that time. We will focus on the techniques which are based on tracking of the intra arterially injected iodinated contrast agent. We found other approaches to either demand a rarely available contrast injection procedure or proved to be computationally too intensive. The algorithms for blood velocity estimation based on tracking can in general be divided into the following categories [2]: – Techniques based on measuring the transit time of contrast medium from one region of the vessel to the other [1]. – Techniques based on measuring the distance that contrast medium travels between two frames [3,2]. – Optical flow based methods [4,5,2]. In essence, all of them rely on signals which are obtained by observing the image intensity (density) level which changes through time (time-density signal ) and/or space (distance-density signal ) as the contrast agent passes through the vascular system. Techniques based on measuring the distance the contrast medium travels between two frames by comparing distance-density signals in them are regarded as more accurate than time-density based methods. This is because the spatial resolution is normally higher than temporal one and the obtained velocity is localized in time and space. Time-density methods are capable of giving only the average velocities and are difficult to interpret when pulsatile flow is in effect. Optical flow-based methods are using 1D optical flow constraint (contrast agent is preserving its shape between two frames) for efficient distance-density curve matching which is otherwise done by exhaustive search. This matching is accurate when the gradient of the signal is strong which is only during the traversal of the leading or the trailing edge of the contrast agent. All of these methods require imaging with high number of frames per second (at least 25). Otherwise, the contrast agent either moves to much between two frames or time-density signals in two distinct regions are too close to each other. Once the blood velocity (v) is known, the blood flow (Q) is obtained by multiplication of this velocity with vessel cross-section area (dA). In general, flow is a dot product of the velocity vector and the normal of area element. Q= v · dA (1) In the case the flow is perpendicular to area and has a uniform velocity distribution across it (assumption of plug flow), we get a simple form. Q = vA
(2)
Most of the current methods were validated on phantoms or computer simulations. We wanted to develop an algorithm, for use with monoplane X-ray
Blood Flow and Velocity Estimation Based on Vessel Transit Time
119
imaging, that would be especially appropriate for realistic clinical settings with live patients. However validation of in-vivo studies proved to be difficult. We focus on the application in the neuroradiology where pulsation in the flow should not be that strong as in the regions closer to the heart. In order to have a valid registration between 3D and 2D datasets, patient should not move between the two acquisitions. It is relatively easy to fixate the patient’s head to prevent the movement. Neuro cases are of very high interest due to large number of vascular diseases like aneurysms, stenosis, and arteriovenous malformations that occur there and threaten to cause a stroke. This leads to a large amount of endovascular treatments which are performed there.
2
Proposed Algorithm
Our approach is based on registering the 3D vessel tree and 2D dynamic DSA sequence thus enabling comparison of time-density (TD) signals in the two opposite ends of the vessel. We think the TD based method is more robust than using distance-density (DD) oriented algorithm. For DD signals to be useful, the distance the contrast agent passes has to be reconstructed in 3D because we observe only the 2D projection, thus correction for angulation between imaged vessel and emitted X-rays has to be performed. Secondly, projected vessels can overlap and the parts of the vessel covered by other vessels have to be neglected and distance-density signal adequately interpolated, which is a difficult task. Both the TD and DD signals are heavily cluttered with noise so special attention has to be given to adequate denoising procedures. In the previous research we focused on denoising techniques of these signals based on linear Wiener filtering and non-linear denoising using wavelet transform [6]. The 1D signals that we will show later in the paper will be already filtered. The algorithm for automatic estimation of the blood velocity and flow, using 3D volume of arteries and sequence of 2D DSA images, that we used and performed tests with, can be summarized into these steps: 1. Vessel tree is first segmented out from the raw volume data. 2. Every vessel of the vessel tree is detected and labeled. Vessels are segments of the tree with no bifurcations. For each of these vessels, we are interested in its blood flow value. During this process the 3D centerline together with distance transform of vessels is computed. From the distance transform, cross-section area of the vessel is found. Centerline is also required to find the actual length of the vessel in 3D. 3. Each extracted 3D vessel is projected onto 2D image thus automatically segmenting this vessel in 2D image sequence. This would otherwise be very difficult task to do by just observing 2D images. Also overlap with other vessels is detected. 4. Contrast propagation in each vessel is recovered from 2D sequence. Vessel transit time is estimated which together with known actual vessel length gives us the average blood velocity in the vessel. 5. Finally, blood flow for each vessel is found using Eq. 2.
120
2.1
H. Bogunovi´c and S. Lonˇcari´c
Vessel Tree Volume
Monoplane X-ray imaging equipment is capable of making 3D reconstruction of the imaged volume from a series of images taken at different angles. From this volume, vessel tree segmentation has to be performed. Simple global thresholding proved to be sufficient for our datasets. Skeletonization of the segmented 3D binary object is performed based on the algorithm described in [7]. The result contains centerlines for each vessel of the tree. Additionally since the skeletonization procedure is based on distance transform every centerline voxel has its distance to nearest boundary assigned. Now labeling of the original segmented vessel tree can be performed. The result is shown in Fig. 1 (a). Every vessel is assigned a different label (shade of gray). From the labeled vessel tree, each vessel of interest can be extracted. Since we based our method on comparing two TD signals, we first need to define two segments of the vessel that are on the opposite ends in which we will measure the signals. So we have to partition the vessel into smaller segments. To create vessel segments, we first create segment’s border planes which are perpendicular to the centerline and the distance between these planes is chosen to be around ten voxels. Segment’s volume within these borders is then obtained by region growing. Result of such partitioning is shown in Fig. 1 (b).
(a)
(b)
Fig. 1. Vessel tree labeling (a) and vessel partitioning (b)
2.2
Projection of 3D Volume onto 2D Plane
Once we obtain the necessary vessel data in 3D, we need to make a link with the 2D sequence showing contrast propagation. Since the imaging setting is known we can project accurately the 3D volume onto the 2D imaging plane. Thus the 3D-2D registration is mechanically based. The projection is perspective and is performed by ray tracing. The imaging setting and the corresponding projection of the vessel tree onto the 2D image are shown in Fig. 2.
Blood Flow and Velocity Estimation Based on Vessel Transit Time
121
0
50
100
150
200
250
300
350
400
450
500 0
50
(a) Imaging setting
100
150
200
250
300
350
400
450
500
(b) Projected vessel tree (in white)
Fig. 2. Projection of the 3D vessel tree
Together with the vessel we project its two segments of interest in which we measure TD signal. Since ray tracing is used, we can easily detect and avoid segments with vessel overlap and choose the two most distant ones which are overlap-free. Such two projected overlap-free segments in vessel which generally overlaps with other vessels are shown in Fig. 3. Vessel of interest is shown in dark gray while the two segments chosen are shown in light gray.
20
40
60
80
100
120
140
160 20
40
60
80
100
120
140
160
Fig. 3. Projection of two valid segments
2.3
Vessel Transit Time
When we have two regions of interest on the imaging plane we can measure the TD signal in them. From TD signals we need to estimate time of arrival (TA) of contrast agent to each of the two chosen regions. Vessel transit time (VTT) is obtained as a difference between these two TA values. TA can in general be found using two different criteria: peak to peak or half-peak to half-peak method.
122
H. Bogunovi´c and S. Lonˇcari´c
Contrast agent during its travel generates TD signal with three distinct phases: wash-in, maximum opacification and wash-out. Normally, TA would be taken as the peak of the signal during the maximum opacification phase which is the least affected by dilution. However, the problem is with signals which have steady maximum opacification yielding flat profile with unknown exact peak value. To overcome this problem, the moment the TD signal reaches its half-peak value can be taken as TA which normally occurs during the wash-in phase where the signal is quite steep. The nice property of this approach is that all values close to the flat peak will have very similar TA parameter (Fig. 4). Subframe accuracy can be achieved with suitable interpolation.
TA
Fig. 4. Half-peak method for VTT estimation
From TD signals, one actually detects the frame number of the sequence when the contrast arrives. The difference in frame numbers (ΔN ) is transformed into seconds using frames per second (F P S) parameter of the imaging study. In addition to VTT, from the 3D centerline we know the distance between the two segments as well as the average radius of the vessel so we can find both the average blood velocity and the flow in the vessel.
3
Results
In this section, experimental results of estimating average blood velocity and flow will be presented. The above algorithm is performed automatically on every vessel of the vessel tree. Here we will show the results for the internal carotid artery (ICA) vessel. Result of carotid vessel partitioning was already shown in Fig. 1 (b). Result of the projection is shown in Fig. 5. It is seen (light gray) how couple of segments from the top side had to be skipped because vessel is overlapping with itself (centerline is parallel with X-rays). The more lighter gray part of the chosen segments is the one actually used for TD signal construction since the rest of the segment had to be discarded due to the overlap with neighbouring segments. TD signals are measured in the two segments from DSA sequence as average intensity value per frame of all the pixels in the segment. We estimate both the peak to peak distance as well as half-peak to half-peak distance. This is shown in Fig. 6. Since it is difficult to get images with high frames-per-second imaging setting, the arrival times to the two regions of the vessel are very close to each other. The final estimated values for this example are shown in Table 1. For this
Blood Flow and Velocity Estimation Based on Vessel Transit Time
123
250
300
350
400
450
150
200
250
300
Fig. 5. Projected vessel tree with its segments (close up)
(a) peak to peak method
(b) half-peak to half-peak method
Fig. 6. VTT estimation (Normalized intensity vs. Frame number) Table 1. Results for carotid artery parameter peak-peak half peak-half peak ΔS [mm] 40 40 ΔN [frames] 4 3.3 ΔT [s] 0.16 0.13 Diameter [mm] 5.8 5.8 v [cm/s] 25 30.7 Q [ml/min] 392 481.5
case both methods give similar result for VTT since the signal around its peak does not become flat as in some other, smaller vessels. For the validation, we can only compare our values with the typical ones that appear in ICA, usually obtained by Doppler ultrasound. Although it is difficult to say what are normal velocities in ICA, we have found reports that the mean peak systolic value (PSV) should be in the range from 30 to 80 cm/sec, and is
124
H. Bogunovi´c and S. Lonˇcari´c
not viewed as abnormal until PSV exceeds 100 cm/sec. Although the valid range is quite large, it at least proves that our obtained results have meaningful values.
4
Conclusion
In this paper we discuss a technique for estimating blood velocity and flow in the arteries of the brain using monoplane X-ray imaging system. Estimation had to be supported with the mechanically registered 3D volume of the vessel tree obtained with the same device just before the DSA study. Velocity calculation is based on estimating vessel transit time from TD signals in two distant regions of the vessel. The results are in the expected range of values commonly found in the specific vessels. In the future work, we plan to compensate for the vessel overlapping by suitable interpolation, and do appropriate correction for angulation of the vessels. This will enable us to focus more on the development of DD based methods which have a number of previously discussed advantages over TD based ones.
References 1. Shpilfoygel, S.D., Close, R.A., Valentino, D.J., Duckwiler, G.R.: X-ray videodensitometric methods for blood flow and velocity measurement: A critical review of literature. Medical Physics 27 (2000) 2008–2023 2. Rhode, K.S., Lambrou, T., Hawkes, D.J., Seifalian, A.M.: Novel approaches to the measurement of arterial blood flow from dynamic digial x-ray images. IEEE Trans. on Medical Imaging 24 (2005) 500–513 3. Shpilfoygel, S.D., Close, R.A., Valentino, D.J., Duckwiler, G.R.: Comparison of methods for instantaneous angiographic blood flow measurement. Medical Physics 26 (1999) 862–871 4. Huang, S.P., Decker, R.J., Goodrich, K.C., Parker, D.J., Muhlestein, J.B., Blatter, D.D., Parker, D.L.: Velocity measurement based on bolus tracking with the aid of three-dimensional reconstruction from digital subtraction angiography. Medical Physics 25 (1997) 677–686 5. Rhode, K.S., Lambrou, T., Hawkes, D.J., Hamilton, G., Seifalian, A.M.: Validation of an optical flow algorithm to measure blood flow waveforms in arteries using dynamic digital x-ray images. In: SPIE Medical Imaging: Image Processing. (2000) 1414–1425 6. Bogunovic, H., Loncaric, S.: Denoising of time-density data in digital subtraction angiography. In: Scandinavian Conference on Image Analysis. (2005) 1157–1166 7. Farag, A.A., Hassouna, M.S., Falik, R., Hushek, S.: Reliable fly-throughs of vascular trees. Technical report, CVIP Laboratory, University of Louisville (2005)
Accurate Airway Wall Estimation Using Phase Congruency Ra´ ul San Jos´e Est´epar1, George G. Washko2 , Edwin K. Silverman2,3 , John J. Reilly2 , Ron Kikinis1 , and Carl-Fredrik Westin1 1
Surgical Planning Lab, Brigham and Women’s Hospital, Boston, MA {rjosest, westin, kikinis}@bwh.harvard.edu 2 Pulmonary and Critical Care Division, Brigham and Women’s Hospital, Boston, MA {gwashko, jreilly}@partners.org 3 Channing Laboratory, Brigham and Women’s Hospital, Boston, MA
[email protected]
Abstract. Quantitative analysis of computed tomographic (CT) images of the lungs is becoming increasingly useful in the medical and surgical management of subjects with Chronic Obstructive Pulmonary Disease (COPD). Current methods for the assessment of airway wall work well in idealized models of the airway. We propose a new method for airway wall detection based on phase congruency. This method does not rely on either a specific model of the airway or the point spread function of the scanner. Our results show that our method gives a better localization of the airway wall than ”full width at a half max” and is less sensitive to different reconstruction kernels and radiation doses.
1
Introduction
Chronic obstructive pulmonary disease (COPD) is a general term used to describe a family of conditions that affects the lung tissue (emphysema) and airways. Pathologically, emphysema is the enlargement and permanent destruction of the air compartments distal to the terminal bronchiole and airways disease is characterized by lumenal obstruction due to wall inflammation, thickening and smooth muscle constriction. While both processes are associated with exposure to agents such as cigarette smoke, the susceptibility to development of COPD and the relative contributions of airway and airspace disease in an individual is highly variable. The development of precise and reproducible methods to accurately quantify the amount of airways disease and emphysema will facilitate efforts to understand the genetic and mechanistic pathways for disease development and may help optimize the selection of therapy for patients. Computed tomographic imaging of the lung has allowed researchers to examine the architecture of the diseased lung in-vivo. Recent investigations using quantitative image analysis of CT datasets have provided new insights into the characterization of
This work has been funded by grants R01 HL075478, R01 HL68926 and P41RR13218.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 125–134, 2006. c Springer-Verlag Berlin Heidelberg 2006
126
R. San Jos´e Est´epar et al.
both emphysema and airways disease. Measurements of airway wall thickness on CT images are predictive of lung function; thicker airway walls tend to be found in subjects with reduced airflow [5]. In this paper we have focused on the accurate estimation of the wall thickness. Such measurements requires detecting the inner interface between lumen and airway wall and the outer interface between airway wall and the surrounding lung tissue. While the definition of the inner part of the wall is characterized by a clear air-tissue interface, defined by an increase in the intensity profile, the outermost part of the wall poses greater difficulty. The airways are surrounded by parenchymal tissue and vessel structures. The traditional approach to perform wall thickness detection has relied on the so-called ”full-width at half-max ” (FWHM) principle. This method defines the wall boundaries by considering the point with half intensity with respect to the peak intensity inside the wall. It is based on the fact that under an ideal step function that undergoes a Gaussian blurring, the edge is located in the FWHM location. Gaussian blurring is the de-facto model for the point spread function of a scanner. This method, although simple, has two major drawbacks: – The airway wall can be considered as a laminated structure whose scale is close to the scanner scale resolution. This violates the perfect edge principle that FWHM is based on. The result is that, as reported by Reinhardt et al. [7], the measurements yields by FWHM are biased towards underestimation/overestimation of the inner/outer boundary respectively. – The geometry of the wall is assumed to be circular. In this sense, the airway can be analyzed in polar coordinate system by just looking in the radial direction. Airways that do not fall in the orthogonal to the scanning direction do not fulfill this requirement. Regarding the first problem, in [7] the authors propose a model-based fitting method that assumes a Gaussian point spread function and an idealized steplike model for the airway. After fitting the model to the data, the wall thickness is derived from the model. This method, although highly accurate, is very computationally intensive and does not take into account deviations from the proposed model that often happen; for example, areas where the airway wall is in contact with a vessel. Regarding the second problem, Sabba et al. [8] have proposed a method that assumes an elliptical model of the airway and a full 3D point spread function and does a fitting of the model to the actual image. The method improves the location of the lumenal edge; however the outer edge is not improved. In this paper we propose a new method for wall thickness estimation that has as primary features of being model free, in terms of wall shape or blurring mechanism, and appears to be quite accurate. The method is based on the phase congruency principle for the location of edges. Under phase congruency, the location of relevant features, i.e. edges or lines, is given by those locations where the local phase exhibits maximal coherency. We will show that phase congruency is present at the scanner level when reconstructing the data with different reconstruction kernels. This has allowed us to lay the basis for a bronze
Accurate Airway Wall Estimation Using Phase Congruency
127
Fig. 1. Analysis pipeline for the extraction of unfolded orthogonal airways from lung CT datasets
standard for airway wall localization. We have characterized and validated using synthetic and real datasets.
2
Airway Extraction Pipeline
Before proceeding to the presentation of our method, we briefly describe the processing pipeline that we follow to extract unfolded airway volumes as depicted in Fig. 1. The lung mask is extracted by performing a thresholding at -600HU, applying connected components from a pre-estimated seed points falling in the left and right lung centroid and removing inside holes by means of mathematical morphology [2]. From the lung mask, the CT volume is cropped to the bounding box enclosing the lung and the intensity variations outside the lung are set to a fixed value. This cropped volume is used as an input to a surface evolution method to extract the airway lumen. The evolution is implemented by a fast marching approach initially proposed by Pichon et al. [6] for brain segmentation. The propagation is stopped when the physical volume of the segmented region sharply increases, being an indicator of leakage into the lung parenchyma. Once the airway lumen is segmented, the centerline is extracted by means of a thinning process [3]. From the centerline, the CT volume can be resampled in orthogonal planes to the centerline producing a flat airway volume. The flattened airway can be analyzed slice by slice for the task of airway wall detection. To simplify the analysis even further, the 2D flattened airway is unfolded in a polar coordinate system to work on 1D profiles cast from the airway center.
3 3.1
Method CT Reconstruction and Phase
To provide a basis for the use of phase congruency, we present an experiment that reveals the variations associated with the use of different reconstruction kernels from the same CT acquisition. A CT image was reconstructed using a series of 9 different kernels and the same airway was selected and analyzed across the range of resulting images. Common intensity profiles projecting radially from the center of the airway were superimposed and displayed graphically (Fig. 2a).
128
R. San Jos´e Est´epar et al.
(a)
(b)
(c)
Fig. 2. (a) Intensity profiles along the radial direction of an airway for 9 different reconstruction kernels. Kernels B25f and B60f correspond to the smoothest and sharpest modulation transfer function respectively. (b)-(c) Comparison between the traditional phase congruency measurement (b) and the proposed phase congruency (c). The intensity profile that the phase congruency is estimated from is plotted in dash line.
On visual inspection, it can be seen that the peak intensity corresponding to the wall center is a function of the chosen kernel while the average intensity in the lumenal and parenchymal area are roughly constant. This implies that the FWHM method is going to yield a different result depending on the chosen reconstruction kernel. An interesting point is revealed in Figure 2a. The profiles exhibit a common crossing point in the location of the inner and outer wall. This is replicated across the whole airway and across all airways assessed (data not shown). This crossing point is associated with the concept of phase congruency. Based on the model for feature perception postulated by Morrone and Owens [4], features on a image are perceived in those locations where the Fourier components of the local signal are maximally in phase. The features are associated with changes of intensity due to the presence, in our case, of an interface. This finding suggests that the crossing point of the intensity profile for different reconstruction kernels can be used as a bronze standard for airway wall location in real datasets. The results that we will present after this section corroborate this contention. 3.2
Phase Congruency
Local phase is the spatial counterpart of the widely known concept of Fourier phase. From a mathematical standpoint, the local phase is defined from the analytic signal. The analytic signal of a 1D real signal f (x) is a complex value signal whose real part is the signal itself and the complex part is the Hilbert transform of the real signal fA (x) = f (x) − iH{f (x)} = E(x)eiφ(x) .
(1)
E(x) is the local energy of the signal and φ(x) is the local phase. Local phase is meaningful for signals that are well localized in the frequency domain. For a signal with a wide frequency spectrum, like any signal that can be found in real
Accurate Airway Wall Estimation Using Phase Congruency
129
applications, a frequency localization has to be carried out before computing local phase. In this sense, it is possible to achieve different measurements of the local phase for different bands, i.e. different scales. fAn (x) = hn (x) ∗ f (x) + iH{hn (x)} ∗ f (x) = En (x)eiφn (x) ,
(2)
where h is given family of band-pass filters and n indexes a particular scale. h and H{h} form a quadrature pair. Quadrature filters are an ubiquitous topic in computer vision and an extensive review can be found in [1]. In this paper, we have chosen a log-Gabor filter as our quadrature pair. The log-Gabor filter log2 ( ω ) ω0 κ ) ω0
− 2log2 (
pair is defined in the Fourier domain by G(ω) = e , where ω0 is the center frequency of the filter and ωκ0 is a constant factor that defines the filter bandwidth. Then h and H{h} are obtained as the real and imaginary part of the inverse Fourier transform of G respectively. Phase congruency, Ψ , is a normalized measure of phase variance across scales. In terms of the local phase, phase congruency is defined as N En (x) cos(φn (x) − φ¯n (x)) Ψ (x) = n=1 , (3) n En (x) where φ¯n (x) is the average local phase and N is the number of bands used in the analysis. Venkatesh and Owens [9] have shown that phase congruency is proportional to local energy and it can be calculated from the analytic signal components as N 2 2 ( N n=1 Re {fAn }) + ( n=1 Im {fAn ) } Ψ (x) = (4) N n=1 fAn Phase congruency is a smooth function with local maxima in locations where the local phase is consistent across scales. For our application, it is very interesting to be able to distinguish between locations where the local maxima in phase congruency is due to the inner wall edge or the outer wall edge. This distinction can be easily achieved by using the feature encoding properties of the local phase. In this paper, we introduce a new measure for phase congruency: feature dependent phase congruency, Ψθ , given by ¯ Ψθ (x) = Ψ (x)max cos(φ(x) − θ), 0 , (5) where θ depends on the feature type we want to tune the phase congruency measure. For a dark-to-bright edge θ = π/2 and for a bright-to-dark edge θ = 3π 4 . Therefore, the inner/outer wall is given by the location that maximizes Ψπ/2 and Ψ3π/4 respectively, ai = arg max Φπ/2 (x),
ao = arg max
Φ3π/4 (x).
(6)
The performance of the proposed phase congruency measure can be seen in Figure 2b-c in comparison with the traditional measure for phase congruency. The
130
R. San Jos´e Est´epar et al.
traditional measure does not provide feature selectivity and phase congruency has a flat maxima region across the whole airway wall. However, the proposed measurement adds localization based on the corresponding feature. 3.3
Bronze Standard
From the experimental results presented in Section 3.1, we proposed a bronze standard for wall thickness location based on the intersecting point between different reconstruction kernel acquisitions for the same CT Radon space acquisition. We define the location of the wall as the median intersection point for all possible pair kernel combinations. The median operator is used because it is a robust estimator.
4
Materials
The evaluation and validation of the proposed method has been carried out by means of a computer-based airway simulator, a CT phantom of airways and a CT scan of a human lung. Characterizing the method robustness against noise is an important metric to assess the applicability of the method to clinical-quality CTs obtained with low dose protocols. Low dose protocols can be modeled by an increment of the amount of additive Gaussian noise in the Radon space. Our simulator works as follows: an ideal airway is transformed to the Radon space. In the Radon space, independent Gaussian noise with power σn is added to the data. The noise-corrupted data is smoothed and downsampled to simulate different scanner reconstruction kernels and scanner resolutions respectively. The data is transformed back to the spatial domain by means of the backprojection transform. We use this simulator to generate different airway models. The airways were sampled in a 0.6mm image. A CT airway phantom was constructed using Nylon66 tubing inserted into polystyrene to simulate lung parenchyma surrounding the airways. Non-overlapping, 0.6mm collimation images, 40cm FOV, were acquired using a Siemens Sensation 64 CT scanner and reconstructed with a series of 9 different kernels provided by the manufacturer. Two tubes with wall thickness a0 − ai = 0.9mm and a0 − ai = 1.05mm, as measured by a caliper, were studied. Finally, a CT scan of a human lung was used to evaluate the behavior of our method on real data. Images were acquired using the Siemens Sensation 64 CT scanner, non-overlapping isotropic voxels of 0.6mm in dimension. Reconstructions of the raw data were obtained using the same 9 kernels utilized for the phantom investigation. Two representative airways were picked by an expert.
5
Results
All the results for phase congruency were obtained using a log-Gabor filter bank. 10 bands were used for the frequency decomposition. The filter bandwidth was
Accurate Airway Wall Estimation Using Phase Congruency
(a)
(b)
131
(c)
Fig. 3. Wall thickness analysis using the computerized phantom for airways. Airways with different radius and wall thickness sizes have been simulated. (a) Validation of the bronze standard for airway wall location. (b) FWHM and (c) Phase congruency wall thickness estimation.
given by ωκ0 = 0.85 and the central frequency for the ith band was given by ω0i = λ2i with λi = αi λmin , where λmin is the minimum wavelength of the filter bank. In our case we chose α = 1.3 and λmin = 4 pixels. The election of the parameters was based on experimental results based on the scale of typical airways. 5.1
Computer Phantom
The validity of the bronze standard was tested simulating different airway sizes with different smoothing kernel parameters. Figure 3a shows the results obtained for different airway thicknesses estimated at different inner radius level. Ideally, the plot should be a straight line at the wall thickness level corresponding to the tested airway. The bronze standard yields an accurate measure, up to discretization error, with no bias in the measurement. The error in the estimation is slightly bigger for the thinnest airways (1.2mm and 1.6mm) given that we are close to the sampling limitation imposed by the Nyquist theorem. Similar plots are shown in Figure 3b and 3c for the wall thickness estimated by FWHM and phase congruency respectively. Although there is a bias, as reported by [7], in the estimation of the wall thickness, phase congruency estimations are closer to the expected values. Therefore, phase congruency is more accurate than FWHM across the whole range of airway sizes. Finally, we studied the robustness of FWHM and phase congruency for wall estimation against noise. Figure 4a shows the wall thickness estimation for an airway with inner radius of 4mm and wall thickness of 2mm at different SNR levels. For each noise level, 80 realizations were performed. The plot shows the mean wall thickness and the standard deviation is shows as vertical rectangles. FWHM has a great sensitivity due to noise given by a larger slope when the SNR decreases. There is a critical point at 10dB in both methods at which the slope increases; this point imposes a limit to the amount of noise allowed for accurate wall detection. Although the standard deviation is similar for both methods, it is slightly smaller for phase congruency.
132
R. San Jos´e Est´epar et al.
(a)
(b)
(c)
Fig. 4. (a) Noise analysis for FWHM and Phase Congruency for an 2mm thick airway using our airway simulator. (b)-(c) Wall thickness measurements for the airway phantom at different reconstruction kernels: (b) Tube 1 (0.9 mm wall thickness) and (c) Tube 2 (1.05 mm wall thickness).
(a)
(b)
Fig. 5. Wall Estimation for a real CT scan for two different size airways. A continuous line corresponds to the inner boundary and a dash line to the outer boundary. FWHM has difficulties recovering the outer boundary when the wall has not enough contrast or vessels are in close contact with the wall.
5.2
Airway Phantom
The results for the airway phantom are shown in Figure 4b-c. We can see how accurate the bronze standard predicts the wall thickness of the tube. This corroborates the use of the bronze standard for the validation of methods to detect thin structures in CT, like the airway walls. FWHM and phase congruency yield an overestimation of the wall thickness for smooth kernels as predicted by our simulation. Better results can be though achieved using phase congruency. 5.3
Lung CT
The detected wall for two unfolded airways are shown in Figure 5. Although we have datasets for each different reconstruction kernel, only the B46f kernel was employed to estimate the wall using FWHM and phase congruency. Unlike the
Accurate Airway Wall Estimation Using Phase Congruency
133
previous results, real data has a greater complexity due to surrounding structures that complicate the detection of the outer wall. As it can be easily seen in Figure 5a, FWHM breaks when trying to detect the outer wall boundary when it is abutting a blood vessel. This is in part because these areas do not fit the model that FWHM was developed for. When looking at airways where the contrast between wall and parenchyma is very subtle, phase congruency shows its superiority with respect to FWHM, as shown in Figure 5b. FWHM is not able to yield any reliable measure for the outer boundary of the wall. Comparing against the bronze standard, we can say that phase congruency gives an accurate estimation of the wall boundaries.
6
Conclusions
We have proposed a new method for airway wall detection based on the principle of phase congruency. The method that we propose is based on a new definition for phase congruency that makes the traditional phase congruency phase dependent. We have seen that this new phase dependent phase congruency yields a sharper response with clear maxima in the locations of edges and lines. The measurement of the wall thickness has been traditionally based on the FWHM method. We have corroborated the results presented in [7] that quantifies the overestimation on the wall measurement when using FWHM. Our method, while still sensitive to this overestimation, gives a better result than FWHM, as shown by synthetic and real data examples It would be very interesting to study the sources of this overestimation in the phase congruency method by evaluating the analytic expressions of the filters responses. Phase congruency appears to be suitable for the wall estimation in low dose CT scans based on the results for different noise levels. In part, the analysis by bands, that phase congruency is based on, explains the capability to handle noise better than FWHM. Another important characteristic of our method is the lower sensitivity to the reconstruction kernel selection. This is crucial when applying the method to retrospective data and data acquired across multiple sites. Finally, an important contribution of this paper has been the proposal of a bronze standard for the accurate detection of the airway wall based on the isoline for the intersection across multiple reconstruction kernels.
References 1. D. Boukerroui, J. Alison Noble, and M. Brady. On the choice of band-pass quadrature filters. Journal of Mathematical Imaging and Vision, 21:53–80, 2004. 2. S. Hu, E. A. Hoffman, and J. M. Reinhardt. Automatic lung segmentation for accurate quantitation of voumetric x-ray ct images. IEEE Trans. Med. Imaging, 20:490–498, 2001. 3. Gregoire Malandain, Gilles Bertrand, and Nicholas Ayache. Topological segmentation of discrete surfaces. Int. Journal of Computer Vision, 10(2):183–197, 1993. 4. M. C. Morrone and R. A. Owens. Feature detection from local energy. Pattern Recognition Letters, 6:303–313, 1987.
134
R. San Jos´e Est´epar et al.
5. Y. Nakano and S. Muro et al. Computed tomographic measurements of airway dimensions and emphysema in smokers. Am J REspir Crit Care Med, 162:1102– 1108, 2000. 6. Eric Pichon, Allen Tannenbaum, and Ron Kikinis. A statistically based flow for image segmentation. Medical Image Analysis, 8(3):267–274, 2004. 7. Joseph M. Reinhardt, Neil D. D’Souza, and Eric A. Hoffman. Accurate measurement of intra-thoracic airways. IEEE Trans. Medical Imaging, 16(6):820–827, Dec. 1997. 8. Osama Saba, Eric A. Hoffman, and Joseph M. Reinhardt. Maximizing quantitative accuracy of lung airway lumen and wall measures obtained from X-ray CT imaging. J. Applied Physiology, 95:1063–1095, 2003. 9. S. Venkatesh and R. Owens. On the classification of image features. Pattern Recognition Letters, 11:339–349, 1990.
Generation of Curved Planar Reformations from Magnetic Resonance Images of the Spine Tomaˇz Vrtovec1,2 , S´ebastien Ourselin2 , Lavier Gomes3 , Boˇstjan Likar1 , and Franjo Pernuˇs1 1
University of Ljubljana, Faculty of Electrical Engineering, Slovenia {tomaz.vrtovec, bostjan.likar, franjo.pernus}@fe.uni-lj.si 2 BioMedIA Lab, Autonomous Systems Laboratory, CSIRO ICT Centre, Australia
[email protected] 3 Department of Radiology, Westmead Hospital, University of Sydney, Australia
[email protected]
Abstract. We present a novel method for curved planar reformation (CPR) of spine images obtained by magnetic resonance (MR) imaging. CPR images, created via a transformation from image-based to spinebased coordinate system, follow the structural shape of the spine and allow the whole course of the curved structure to be viewed in a single image. The spine-based coordinate system is defined on the 3D spine curve and on the axial vertebral rotation, both described by polynomial models. The 3D spine curve passes through the centers of vertebral bodies, and the axial vertebral rotation determines the rotation of vertebral spinous processes around the spine. The optimal polynomial parameters are found in an optimization framework, based on image analysis. The method was evaluated on 19 MR images of the spine from 10 patients.
1
Introduction
Three-dimensional (3D) images of anatomical structures, obtained by computed tomography (CT) or magnetic resonance (MR) imaging, are usually presented as two-dimensional (2D) planar cross-sections in the standard image reformation (i.e. axial, sagittal and coronal). However, planar cross-sections do not follow the curvature and orientation of curved anatomical structures (e.g. spine). To improve clinical evaluation and quantitative analysis, a more effective visualization is achieved by curved planar reformation (CPR), where cross-sections are represented in the coordinate system of the 3D curved structure. Different approaches to reformation of CT spine data have already been introduced and reported to be useful in evaluating spinal deformities. Rothman et al. [1] obtained curved coronal images by connecting manually selected points into a continuous curve. After reformation, structures such as nerve roots, facet joints and spinal cord could be observed in a single image. Congenital spine abnormalities were examined by Newton et al. [2], who improved the identification of the deformities by generating curved multiplanar reformatted images of the whole spine, which were obtained by manually outlining the boundaries of the R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 135–143, 2006. c Springer-Verlag Berlin Heidelberg 2006
136
T. Vrtovec et al.
spine curve. Roberts et al. [3] reformatted the images orthogonal to the long axis of the neural foramina of the cervical region of the spine. By oblique reformation, they improved the consistency in the interpretation of neural foraminal stenosis between observers. In [4], the authors focused on vertebral morphology that is of significant importance in pedicle-screw placement for the treatment of scoliosis. In order to obtain true values of pedicle width, length and angle, the MR spine images were reformatted so that measurements could be performed in cross-sections perpendicular to the vertebral bodies. In contrast to the reported manual methods, an automated CPR method for 3D spine images was recently presented by Vrtovec et al. [5], who used polynomial models to describe the spinal curvature and axial vertebral rotation in the spine-based coordinate system. However, their method was developed for CT images only and is therefore modality dependent. Dedicated commercial software or software provided by CT/MR scanner vendors allows the generation of multiplanar and curved cross-sections, however, the points that are connected into a continuous curve have to be manually selected by the user, which requires navigation through complex (spine) anatomy. MR has become the dominant modality in imaging of the spine. Intial poor resolution of the bone structures on early MR machines was improved by the better signal-to-noise ratio (SNR), obtained from dedicated multichannel spine coils. Moreover, as there is no exposure to ionizing radiation, MR is considered to be the modality of choice for follow up examinations and longitudinal studies. However, the detection of bone structures in MR images remains a challenging task. In this paper, we present a novel method for automated CPR of MR spine images, based on the transformation from the standard image-based to the spinebased coordinate system. The transformation parameters, represented by the 3D spine curve and axial vertebral rotation, are obtained automatically in an optimization-based framework.
2
Method
2.1
Spine-Based Coordinate System
The standard image-based reformation is represented by the Cartesian coordinate system, where axes x, y and z represent the standard sagittal, coronal and axial direction, respectively. The spine-based coordinate system [5] is represented by the axes u, v and w, which are defined on a continuous curve c (n) that goes through the same anatomical landmarks on individual vertebrae and represents the 3D spine curve, and by the rotation ϕ (n) of the vertebral spinous processes around the spine curve that represents the axial vertebral rotation (Fig. 1a). The continuous variable n parameterizes the spine domain: • The axis w is tangent to the 3D spine curve c (n) = (x(n), y(n), z(n)), • The axis v is orthogonal to axis w and oriented in the direction of vertebral spinous processes. The angle ϕ (n) = (v(n), y (n)) between the axis v and the corresponding projection y of the Cartesian axis y onto the plane that
Generation of CPR from MR Images of the Spine
137
is orthogonal to the 3D spine curve determines the rotation of the axis v around the 3D spine curve. • The axis u is orthogonal to axis v (and hence to axis w) and extends in the direction parallel to the vertebral transverse processes.
(b)
(a)
(c)
Fig. 1. (a) Transformation from the image-based to spine-based coordinate system is based on the 3D spine curve c(n) and axial vertebral rotation ϕ(n). (b) The center of the vertebral body in an axial cross-section is found on the line that splits the image into two mirror halves. (c) Typical response of the entropy-based operator Γ on the straight line in (b). The minimum coincides with the center of the vertebral body.
Since the 3D spine curve and the axial vertebral rotation are expected to be smooth and continuous functions of n, they are parameterized by polynomial functions: K{x,y,z}
c(n) = {x, y, z}(n) =
k=0
b{x,y,z},k n
k
ϕ(n) =
Kϕ
bϕ,k nk ,
(1)
k=0
where Kx , Ky , Kz and Kϕ are the degrees and bx = {bx,k ; k = 0, 1, . . . , Kx }, by = {by,k ; k = 0, 1, . . . , Ky }, bz = {bz,k ; k = 0, 1, . . . , Kz } and bϕ = {bϕ,k ; k = 0, 1, . . . , Kϕ } are the parameters of polynomials x(n), y(n), z(n) and ϕ(n), respectively. The parameters b = bx ∪by ∪bz ∪bϕ , specific for the spine, determine the transformation to the spine-based coodinate system and the generation of
138
T. Vrtovec et al.
CPR images. For the purpose of implementation, the continuous spine domain is discretized into N samples, ni ; i = 1, 2, . . . , N . The discretization yields a discrete 3D spine curve c(ni ) = c(x(ni ), y(ni ), z(ni )); i = 1, 2, . . . , N and a discrete axial vertebral rotation ϕ(ni ); i = 1, 2, . . . , N . 2.2
Determination of the 3D Spine Curve
Let the 3D spine curve be described by a curve c(n) that passes through the centers of vertebral bodies, parametrized by the 3D spine curve parameters bc = bx ∪ by ∪ bz . The optimal parameters bopt are obtained in an optimizac tion procedure that minimizes the sum of distances dist(·) from the curve c(n) to the centers of vertebral bodies, represented by a set of points {p} in 3D: bopt = arg min c bc
N
dist ({p}, c(ni )) |bc .
(2)
i=1
The centers of vertebral bodies {p} are determined in each axial cross-section z = zm ; m = 1, 2, . . . , Z of the 3D image. A straight line ym (xm ) = tan( π2 − ϕm )(xm − xm0 ) that splits the vertebral body into two mirror halves and passes through the vertebral spinous process is first obtained by maximizing the simiopt larity between the two halves (Fig. 1b). The location (xopt m , ym ) of the vertebral bodies and intervertebral discs, which appear as circular areas of homogeneous intensity values, is found by minimizing the response of the entropy-based operator Γ , which consists of K concentric circular rings, on the obtained straight line ym (xm ) (Fig. 1c): K Γ =
k=1 wk Hk ; K H k=1 wk
where Hk = −
−
wk = e
(k−1)2 K−1 2 S
(
2
) ;
k = 1, 2, . . . , K,
(3)
J
pj,k log pj,k is the entropy of the k th ring (pj,k is the probaJ bility distribution of the intensity values in k th ring) and H = − j=1 pj log pj is the entropy of the area covered by all rings (pj is the probability distribution of the intensity values). The histograms are computed using J bins. The weights wk are chosen to be within S standard deviations of the normal distribution. By repeating this procedure for all axial cross-sections, we obtain a set of points opt {p} = {xopt m , ym , zm }; m = 1, 2, . . . , Z. 2.3
j=1
Determination of the Axial Vertebral Rotation
The angle ϕm ; m = 1, 2, . . . , Z that represents the inclination of the straight line that splits the vertebral body into two mirror halves from the sagittal reference plane x = const. is determined in the standard, image-based coordinate system. As it does not take into account the sagittal and coronal vertebral inclination, it is an approximation of the true axial vertebral rotation. The true axial vertebral rotation ϕ(n), parameterized by the rotational parameters bϕ , is obtained by
Generation of CPR from MR Images of the Spine
139
maximizing the sum of similarities sim(·) of the mirror halves in planes P (ni ); i = 1, 2, . . . , N normal to the 3D spine curve and centered at the 3D spine curve samples c(ni ); i = 1, 2, . . . , N , along the whole 3D spine curve: bopt ϕ = arg max bϕ
2.4
N
sim (P (ni , ϕi )) |bϕ .
(4)
i=1
Curved Planar Reformation
CPR images are obtained by following the computed 3D spine curve c(n) and axial vertebral rotation ϕ(n) and by folding the obtained curved surfaces onto a plane. The curved transformation does not preserve distances, however, by applying the inverse transformation distances can be measured in the imagebased coordinate system.
3 3.1
Experiments and Results Data and Implementation Details
The images used in this study were 18 axial spinal scans of the lumbar or thoracic spinal regions from 9 patients, acquired with a spine array coil on a 1.5T MR scanner (General Electric Signa Excite). For each patient, corresponding T1 weighted (average TR /TE = 550/15 ms) and T2 -weighted (average TR /TE = 4560/110 ms) images were obtained with the matrix size X × Y = 512 × 512 pixels. The average voxel size was Sx × Sy = 0.398 × 0.398 mm2 , the thickness Sz = 3 ÷ 6 mm and the number of axial cross-sections Z = 23 ÷ 31. In addition, one whole-length T2 -weighted spine image (Z = 208, Sz = 3 mm) was used to test the performance of the algorithm on the whole length of the spine. User interaction was limited to pinpointing the center of the vertebral body in only one axial cross-section, which served as the initialization for the search of the set {p} (Eq. 2), and determination of the spinal region of the selected cross-section (cervical/thoracic/lumbar). The result was used to initialize the procedure in the subsequent cross-sections. Depending on the spinal region of the cross-section, the size of the entropy-based operator Γ (Eq. 3) was automatically adjusted to K = 30 ÷ 70 mm (the ring width was 1 mm and the weights were within S = 2 standard deviations of the normal distribution). The 3D spine curve c(n) and axial vertebral rotation ϕ(n) (Eq. 1) were initialized as polynomials of 1st degree (i.e. straight lines). In a hierarchical optimization scheme (Eq. 2 and 4), the degree of polynomials was gradually increased up to K{x,y,z,ϕ} = 5. The number of samples N was set to the number of axial crosssections. The simplex method in multidimensions was used as the optimization procedure. Standard mutual information was used as the similarity measure (function sim(·)) and J = 16 bins were used for the computation of histograms. Euclidean distance was used as the distance measure (function dist(·))
140
T. Vrtovec et al.
(a)
sagittal
coronal (b)
sagittal
coronal (c)
Fig. 2. (a) Course of the coronal (x(n), left) and sagittal spine curve (y(n), middle), showing the S-shape of the normal whole-length spine, and axial vertebral rotation (ϕ(n), right), in the spine domain. The vertical lines represent the centers of vertebral bodies of ground truth data. Sagital and coronal CPR images are obtained by folding the curved surfaces (b) onto a plane (c).
In order to quantitatively assess the performance of the method, landmarks were manually placed in 3D at the centers of vertebral bodies and at the tips of vertebral spinous processes, representing ground truth data. 3.2
Results
By following the course of the computed 3D spine curve c(n) and of the computed axial vertebral rotation ϕ(n) in the spine domain n (Fig. 2a), CPR images of an arbitrary view can be generated (Fig. 2b,c and 3), allowing the whole length of the spine to be observed in a single 2D image. The CPR images were successfully created from the whole-length spine image and from images of smaller spinal regions. The minimal distance between c(n) and the centers of vertebral bodies in ground truth data was measured in 3D for all images in this study. The computed
Generation of CPR from MR Images of the Spine
141
ϕ(n) was compared to the angles obtained from ground truth data (i.e. angles between the lines through the center of vertebral bodies and the corresponding tips of vertebral spinous processes, and the sagittal reference plane). The results are presented in Table 1 and show that the method is insensitive to T1 - or T2 -weighted images. We compared the courses of c(n) and ϕ(n) of the whole-length spine to the courses for the CT images presented in [5], and they proved to be consistent. Table 1. Comparison between the computed 3D spine curve and axial vertebral rotation, and ground truth data (WLS denotes the whole-length spine image) Image number
1
2
3
4
5
6
7
8
9
WLS
Mean
2.3
2.6 2.1
1.7
1.7 1.7
Mean 3D spine curve difference [mm] T1 -weighted T2 -weighted
2.6 4.3 1.1 2.5 2.8 3.4 2.4 2.1 2.0 3.4 3.8 1.3 1.5 1.7 2.5 1.1 1.7 1.6
Mean axial vertebral rotation difference [deg] 4.6 1.2 1.9 0.7 2.0 2.0 0.5 1.7 1.0 3.0 2.7 2.0 1.7 1.7 1.3 0.7 1.1 0.8
T2 -weighted
T1 -weighted
T1 -weighted T2 -weighted
sagittal
coronal (a)
sagittal
coronal (b)
Fig. 3. CPR images, created by folding curved surfaces (a) onto a plane (b) from a T1 -weighted image (top row) and a T2 -weighted image (bottom row) of a spinal region
4
Discussion
Spinal curvature [6,7,8] and axial vertebral rotation [9,10,11,12,13,14,15] were the subject of many research studies, as they are important in surgical planning,
142
T. Vrtovec et al.
analysis of surgical results and monitoring the progression of spinal deformities. Our method allows to measure both quantities automatically. The presented method for CPR of MR spine images allows inspection and analysis of images in the coordinate system of the spine. The 3D spine curve and axial vertebral rotation are described continuously along the whole spinal length in a low-parametric form, namely by polynomials of up to 5th degree. The use of CPR is valuable in clinical evaluation as it can reduce the structural complexity in favour of an improved feature perception of the spine. The representation of pathological and healthy anatomy in the same coordinate system allows an objective evaluation of pathology, especially in case of significant coronal (i.e. scoliosis) or sagittal (i.e. kyphosis or lordosis) spinal curvature. The major limitation of the presented method is that the determination of axial vertebral rotation strongly depends on the prior estimation of the 3D spine curve, as the rotation is measured in planes centered in the estimated spine curve samples. Knowledge on the location and orientation of the spine in 3D can be exploited by other image analysis techniques and applied in a clinical environment, for example for the identification and measurement of dimensions of the spinal canal and the spinal cord. The notion of the spine-based coordinate system is modalityindependent and can therefore be used for data fusion, i.e. merging of CT and MR images of the same patient. We will focus our future research on these topics.
References 1. Rothman, S., Dobben, G., Rhodes, M., Glenn, W.J., Azzawi, Y.M.: Computed tomography of the spine: Curved coronal reformations from serial images. Radiology 150 (1984) 185–190 2. Newton, P., Hahn, G., Fricka, K., Wenger, D.: Utility of three-dimensional and multiplanar reformatted computed tomography for evaluation of pediatric congenital spine abnormalities. Spine 27 (2002) 844–850 3. Roberts, C., McDaniel, N., Krupinski, E., Erly, W.: Oblique reformation in cervical spine computed tomography: A new look at an old friend. Spine 28 (2003) 167–170 4. Liljenqvist, U., Allkemper, T., Hackenberg, L., Link, T., Steinbeck, J., Halm, H.: Analysis of vertebral morphology in idiopathic scoliosis with use of magnetic resonance imaging and multiplanar reconstruction. Journal of Bone and Joint Surgery 84 (2002) 359–368 5. Vrtovec, T., Likar, B., Pernuˇs, F.: Automated curved planar reformation of 3D spine images. Physics in Medicine and Biology 50 (2005) 4527–4540 6. Stokes, I., Bigalow, L., Moreland, M.: Three-dimensional spinal curvature in idiopathic scoliosis. Journal of Orthopaedic Research 5 (1987) 102–113 7. Perdriolle, R., Vidal, J.: Thoracic idiopathic scoliosis curve evolution and prognosis. Spine 10 (1985) 785–791 8. Drerup, B., Hierholzer, E.: Assessment of scoliotic deformity from back shape asymmetry using an improved mathematical model. Clinical Biomechanics 11 (1996) 376–383 9. Nash, C., Moe, J.: A study of vertebral rotation. Journal of Bone and Joint Surgery 51 (1969) 223–229
Generation of CPR from MR Images of the Spine
143
10. Aaro, S., Dahlborn, M.: Estimation of vertebral rotation and the spinal and rib cage deformity in scoliosis by computer tomography. Spine 6 (1981) 460–467 11. Krismer, M., Sterzinger, W., Christian, H., Frischhut, B., Bauer, R.: Axial rotation measurement of scoliotic vertebrae by means of computed tomography scans. Spine 21 (1996) 576–581 12. Birchall, D., Hughes, D., Hindle, J., Robinson, L., Williamson, J.: Measurement of vertebral rotation in adolescent idiopathic scoliosis using three-dimensional magnetic resonance imaging. Spine 22 (1997) 2403–2407 13. Hecquet, J., Legaye, J., Duval-Beaupere, G.: Access to a three-dimensional measure of vertebral axial rotation. European Spine Journal 7 (1998) 206–211 14. Rogers, B., Haughton, V., Arfanakis, K., Meyerand, E.: Application of image registration to measurement of intervertebral rotation in the lumbar spine. Magnetic Resonance in Medicine 48 (2002) 1072–1075 15. Kuklo, T., Potter, B., Lenke, L.: Vertebral rotation and thoracic scoliosis in adolescent idiopathic scoliosis. Journal of Spinal Disorders & Techniques 18 (2005) 139–147
Automated Analysis of Multi Site MRI Phantom Data for the NIHPD Project Luke Fu, Vladimir Fonov, Bruce Pike, Alan C. Evans, and D. Louis Collins McConnell Brain Imaging center, Montreal Neurological Institute, Quebec, Canada
[email protected],
[email protected], {vfonov, bruce, alan}@bic.mni.mcgill.ca www.bic.mni.mcgill.ca/nihpd/
Abstract. In large multi-center studies it is important to quantify data variations due to differences between sites and over time. Here we investigate inter-site variability in signal to noise ratio (SNR), percent integral uniformity (PIU), width and height using the American College of Radiology (ACR) phantom scans from the NIHPD project. Longitudinal variations are also analyzed. All measurements are fully automated. Our results show that the mean SNR, PIU and the 2 metric values were statistically different across sites. The maximum mean difference in diameter across sites was 2 mm (1.1%), and the maximum mean difference in height was 2.5 mm (1.7%). Over time, an average drift of 0.4 mm per year was observed for the diameter while a drift of 0.5 mm per year was observed for the height. Trends observed over time often depended not only on site, but also on modality and scanner manufacturer.
1 Introduction The National Institute of Health funded MRI study of normal brain development aims to collect MRI data from a representative sample of normal, healthy pediatric brains [5]. The purpose is twofold: 1) to provide the largest normative database to date of the developing human brain for comparison with children who have neurological, developmental and psychiatric disorders; and 2) provide longitudinal data for comparative studies involving brain maturation and behavioral/cognitive development in normal children. The study consists of approximately 546 children (aged 10 days to 18 years) who will make 3 visits over a period of 5 years, and involves 6 different data collection sites in the United States. In this paper, this normal pediatric database is termed NIHPD. The study requires multi-site collaboration due to the quantitative and demographic nature of the project – over 500 children with varying financial backgrounds scanned across the United States over several years. All data collected are transferred to the Data Coordinating Center (DCC) in Montreal, where databasing, initial statistical analysis and image processing are carried out. Since data received at the DCC originate from six different sites, inherent variability is expected within the acquired data [4]. Differences such as scanner make (Siemens/GE), model type and acquisition software all contribute to the overall R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 144 – 151, 2006. © Springer-Verlag Berlin Heidelberg 2006
Automated Analysis of Multi Site MRI Phantom Data for the NIHPD Project
145
differences between sites. These differences can cause complications in image processing: inherent difference in intensity uniformity can affect tissue classification [2], whereas difference in geometric distortion can affect the size of segmented structures [3]. Clearly, both artifacts will contribute to overall uncertainty in the estimation of cranial structures – hence, an unbiased, systematic characterization of inter-site variance is of great importance. One possible solution to ensure quality control in a large multi-site MRI study is the use of standardized phantoms. The ACR phantom is the official MRI accreditation phantom, as proposed by the American College of Radiology [6]. The phantom consists of a plastic cylinder with inner dimensions of 190 mm (wide) by 148 mm (long), and is filled with a solution of 10 mM nickel chloride and 75 mM sodium chloride. It has several unique plastic structures embedded inside the cylinder, which is used to measure image qualities of a MRI scanner. The seven quantitative tests proposed by the ACR MRI accreditation program are: 1) Geometric accuracy, 2) high contrast spatial resolution, 3) slice thickness accuracy, 4) slice position accuracy, 5) image intensity uniformity, 6) percent signal ghosting and 7) low contrast object detectability. For the NIHPD study, data collecting sites are required to conduct ACR phantom scans at least once per month. In this study we investigate the effects of inter-site scanner variability on geometric accuracy (height, width), signal to noise ratio (SNR) and percent integral uniformity (PIU) of 1048 phantom volumes acquired in the NIHPD project (visit 1), using an automated approach. Eventually, we hope to enhance the current technique to include all 7 of the ACR accreditation tests.
2 Methods The following pediatric study centers (PSCs) are involved in the NIHPD data collection process: Boston Children's Hospital, Children's Hospital Medical Center of Cincinnati, Children's Hospital of Philadelphia, University of Texas Health Science Center at Houston, Washington University St. Louis and University of California Los Angeles. The scanners used to scan the ACR phantom across all sites include: Siemens Magnetom Vision, Siemens Magnetom Sonata, GE Excite Signa and GE Genesis Signa. Actual acquisition parameters depend on the scanner’s manufacturer. 2.1 Scanners For the GE scanners, the acquisition details are as follows: • 3D T1 weighted scans: TR = 22 ms, TE = 10 – 11 ms, Field of View (FoV) = 250 mm Inferior-Superior (IS) x 250 mm Anterior Posterior (AP), 124 slices of 1.7 mm slice thickness; • T1 weighted fallback scans: TR = 500 ms, TE = 10 ms, FoV = 250 mm AP x 220 mm Left-Right (LR), 66 slices of 3 mm thickness; • 2D Proton Density weighted (PDW)/T2 weighted scans: TR = 3500 ms, TE1 = 17 ms, TE2 = 119 ms, FoV = 250 mm AP x 220 mm LR, 88 slices of 2 mm thickness; • 2D PDW/T2W fallback scans: TR = 3500 ms, TE1 = 17 ms, TE2 = 119 ms, FoV = 250 mm AP x 190 mm LR, 30 – 60 slices of 3 mm thickness.
146
L. Fu et al.
The Siemens scanners had same parameters with slight modifications: • 3D T1 weighted scans: TR = 25 ms, TW = 11 ms, Field of View (FoV) = 256 mm Inferior-Superior (IS) x 256 mm Anterior Posterior (AP), 124 – 128 slices of 1.7 mm slice thickness; • T1 weighted fallback scans: TR = 500 ms, TE = 12 ms, FoV = 256 mm AP x 224 mm Left-Right (LR), 66 slices of 3 mm thickness; • 2D Proton Density weighted (PDW)/T2 weighted scans: TR = 3500 ms, TE1 = 17 ms, TE2 = 119 ms, FoV = 256 mm AP x 224 mm LR, 88 slices of 2 mm thickness; • 2D PDW/T2W fallback scans: TR = 3500 ms, TE1 = 17 ms, TE2 = 119 ms, FoV = 256 mm AP x 192 mm LR, 22 slices of 3 mm thickness. PSCs were asked to perform scans using the parameters listed above for the test subjects as well as the ACR phantoms. Ideally, the ACR phantom should be placed in the scanner according to the recommendations of the ACR MRI accreditation protocol. The MRI data collected from the scanner console is to be transferred to a Study Workstation (SWS) using DICOM transfer, then from the SWS to the DCC using another DICOM transfer. The files are eventually transformed to MINC format [12] before the automated analysis is performed. 2.2 Signal to Noise Ratio In our study the SNR was defined as:
SNR =
Mean Phantom Signal (0.8 × Mean Background Signal)
(1)
Since the background noise has a Raleighian distribution, the 0.8 factor is used to correct it for Gaussian characteristics of the MR signal [8]. The mean foreground signal, mean background signal and the background standard deviation are all estimated using a program available in the MNI-autoreg package called MINCSTATS. A bimodal-T threshold value for pixel intensity is calculated to separate foreground and background signal. The foreground and background regions are eroded to avoid voxels with partial volume effects. 2.3 Percent Integral Uniformity The Percent Integral Uniformity is calculated using the formula:
⎛ ⎡ (high − low) ⎤⎞ PIU = 100 × ⎜1− ⎢ ⎥⎟ ⎝ ⎣ (high + low) ⎦⎠
(2)
This is almost the same uniformity equation proposed by Colombo and Magnusson [10,13]. According to the original definitions of the ACR MRI accreditation manual, high/low denotes regions of high/low signal value in slice 7 of the ACR phantom, and should be 1 cm2 in area [6]. The calculations done in this study follow a slightly different approach - instead of analyzing by slice, parameters from the
Automated Analysis of Multi Site MRI Phantom Data for the NIHPD Project
147
entire volume are evaluated. Using a histogram, the upper and lower 5 percentile of the entire phantom volume are calculated and used to define the “high” and “low” signal values in Equation 2. 2.4 Geometric Uniformity The geometric accuracy test specified by the ACR MRI accreditation manual requires measuring the diameter of the mid slice of the ACR phantom in 4 different directions (i.e. 4 diameter measurements per volume). In this study the automated script measures two extra planes (60 mm above and below the center plane) in 6 different directions, generating a total of 18 diameter measurements per volume. The measurements are done using a bimodal threshold to delineate phantom volume signal (representing the boundary/edges of the phantom) and the background signal. Identified boundaries have their coordinates recorded, and the lengths are calculated as the difference between the top boundary coordinates and the bottom boundary coordinates. Height is also measured using the same method.
3 Results Measurements obtained from the ACR analysis were examined using the JMP statistics package (JMPIN version 5.1.2). 3.1 Inter-site Variability A oneway ANOVA by site was carried out for each of the 4 variables (SNR, PIU, diameter, height) followed by Tukey-Kramer’s Honestly Significant Difference (HSD) test. While T1, T2, PD and their associated fallbacks were acquired and analyzed, only results for T1 data are reported (Fig. 1) due to space limitations in this paper. The SNR was found to have a mean of 111.8 with an average standard error of 4.2 across the 8 sites. The observed SNR values were significantly different between different sites (F ratio = 39.6633, p < 0.0001), and the maximum difference was found to be 78. The Tukey-Kramer HSD shows that sites 5 and 6 have mean SNR significantly different than the other 5 sites. In terms of PIU, data from the 7 sites showed less variance than the SNR - the mean PIU value was found to be 83.6% with an average standard error of 0.6%. The F ratio value of 9.4064 (p < 0.0001) confirms that the difference in means from each site is statistically significant, and the largest difference between the mean PIU values was 5.6%. Tukey-Kramer HSD results show that only site 6 had a PIU that is significantly different to the other sites. The average phantom diameter was found to have a mean of 189.6 mm and an average standard error of 0.1 mm. The differences between the mean phantom diameters of different sites were less than 1.9 mm (1%), but the differences were still statistically significant (F ratio = 48.0775, p < 0.0001). From Tukey-Kramer HSD it was found that sites 2 and 5 had significantly different means than the rest of the sites.
148
L. Fu et al. 200 180 90
160 PIU (%)
SNR
140 120 100 80
80
60 40 20 S1
S2
S3
S4
S5
S6
S7
S1
S2
S3
190 189 188 187 S2
S3
S4
S5
Site
S6
S7
Average Sampled Height (mm)
Average Sampled Diameter (mm)
191
S1
S4
S5
S6
S7
S5
S6
S7
Site
Site
150
149
148
147
S1
S2
S3
S4 Site
Fig. 1. Plot of SNR, PIU, phantom diameter and height vs. site for the T1 modality. The horizontal line represent the grand mean of all sites, while the diamond represent 95% normal confidence interval centered at the sample mean.
The average phantom height was determined to be 148.4 mm, with an average standard error of 0.1 mm. The difference between the largest group mean and the smallest was 1.9 mm (1.3%), and the difference in mean height across the 7 sites was also statistically significant (F ratio of 57.0644, p < 0.0001). Tukey-Kramer HSD revealed that the different mean heights can be grouped roughly into 4 pairs, where the mean height difference was statistically insignificant within groups. 3.2 Longitudinal Variability The results presented here represent changes in PIU, SNR, phantom diameter/height in time (Fig. 2). An average was determined for the 6 modalities in each site, with statistically significant trends in time detected and analyzed. Here we report results for site 2 and site 7 only due to space limitations. 3.2.1 Site 2 PIU averages across different modalities from site 2 had a mean value of 84.2%. The averages were within 1% of each other, and no statistically significant trends were associated with any of the modalities. Similar to site 1 data, SNR for the site 2 also showed great range – from the mean T1 value of 115.5 to the mean T2 value of 36.6. No statistically significant trends were observed except for the T1 data, which showed a significant decreasing trend (SNR decrease of 10.8 per year) with time. The 2 metrics (diameter and height) were found to have a mean value of 189.0 mm and 147.5 mm, respectively. Five out of six modalities showed an increasing diameter in time (T1WF, T2, T2WF, PD, PDWF, mean increase of 0.5 mm per year), while the height remained constant for all modalities.
Automated Analysis of Multi Site MRI Phantom Data for the NIHPD Project
90 88 86 84 82
SNR
PIU (%)
94 92
80 78 76 74 72 2
3
4
5
200 185 170 155 140 125 110 95 80 65 50 35 20
6
2
191.5 191 190.5 190 189.5 189 188.5 188 187.5 187 2
3
4
5
Time (years since 2000)
3
4
5
6
Time (years since 2000)
Average Sampled Height (mm)
Average Sampled Diameter (mm)
Time (years since 2000)
186.5
149
6
150.5 150 149.5 149 148.5 148 147.5 147 146.5 2
3
4
5
6
Time (years since 2000)
Fig. 2. Longitudinal T1 plots of PIU, SNR, diameter and height for all sites. The legend is as follows: Red = Site 1, Green = Site 2, Blue = Site 3, Orange = Site 4, Lime = Site 5, Purple = Site 6, Aqua = Site 7.
3.2.2 Site 7 For Site 7, the average PIU was calculated to be 79.3%: however, none of the 6 modalities analyzed showed significant trends in time. Analysis of the mean SNR values reported a maximum of 116.6 for the T1, and a minimum of 28.1 for the T2. T1 also showed significant increase with time, while T1WF and T2WF showed significant decrease with time. Analysis of the metrics found the average phantom diameter to be 189.7 mm. Trends in time for the diameter showed erratic behavior, with T2 increasing in time but T1WF, PD and PDWF decreasing in time. The average change (in modalities that showed significant changes in time) was 0.3 mm per year. The average phantom height was determined to be 148.7 mm, and all modalities showed a significant average decrease of 0.4 mm per year.
4 Discussion The aim of this paper is to quantify inter-site variability of a multi-center MRI study using the ACR phantom. The parameters used to characterize variability are SNR, PIU and geometric accuracy. The 3 parameters were processed automatically for all 1048 phantom data.
150
L. Fu et al.
While all data went through a manual QC process before integration into the database, inclusion of some lower quality data in the analysis may affect the overall statistics – e.g., streak(s)/bands, data cropped near the edges (resulting in overlapping images) and data with high noise/significant ghosting. Data considered to be of lower quality are believed to be less than 2% of the entire ACR phantom dataset. Bourel et al. reported for their Vision scanner excellent stability in terms of SNR and uniformity over 3 years [9]. In our study, SNR and PIU both showed small yet statistically significant trends in time, depending on manufacturer, modality and site. In terms of metrics, results obtained showed a slight drift over time but are otherwise stable – this was in agreement with findings from Ewers and Schnack [11,14]. The largest change per year was a decrease of 2.1 mm in diameter from Site 5 PDWF, which corresponds to a change of 1.1% in length over time – the majority of trends are considerably smaller in magnitude, with an average of 0.4 mm shift per year in diameter (0.2%) and 0.5 mm shift per year for height (0.3%). This is likely due to drifts in the gradient coil strengths in time, with an apparent difference in magnitude between the x/y plane and the z plane causing the geometric distortions [1,7]. To the author’s knowledge, a multicenter QA study involving over one thousand ACR MRI phantom images has not been reported in previous literature. The results obtained in this study should serve as a useful reference for future multicenter studies.
5 Conclusion Inter-site variability is an important factor to consider in large multi-site studies. In this paper we investigated the effect of site and time versus SNR, PIU, ACR phantom diameter and height for the NIHPD project. Across sites, the means were found to be statistically different from the grand mean for all 4 parameters. Longitudinally, the presence of statistically significant trends depended on the site involved as well as modality used in the scan.
References 1. Mah D., Steckner M., Palacio E., Mitra R., Richardson T., Hanks G.E.: Characteristics and quality assurance of a dedicated open 0.23 T MRI for radiation therapy simulation. Medical Physics, Vol. 29(11):2541-2547, November 2002 2. Jones R.W., Witte R.F.: Signal intensity artifacts in clinical MR imaging. Radiographics Vol. 20(3):893-901, May-June 2000. 3. Lemieux L., Barker G.J.: Measurement of small inter-scan fluctuations in voxel dimensions in magnetic resonance images using registration. Medical Physics, Vol. 25(6):10491054, June 1998. 4. Jovicich J., Czanner S., Greve D., Haley E., Kouwe A., Gollub R., Kennedy D., Schmitt F., Brown G., MacFall J., Fischl B., Dale A., Reliability in multi-site structural MRI studies: Effects of a gradient non-linearity correction on phantom and human data. NeuroImage, Vol. 30(2):436-43, April 2006. 5. Evans A.C., Brain Development Cooperative Group: The NIH MRI study of normal brain development. NeuroImage, Vol. 30(1):184-202, March 2006. 6. American College of Radiology: Phantom Test Guidance for the ACR MRI Accreditation Program. American College of Radiology, September 2000.
Automated Analysis of Multi Site MRI Phantom Data for the NIHPD Project
151
7. Chen C.C., Wan Y.L., Wai Y.Y., Liu H.L.: Quality Assurance of Clinical MRI Scanners Using ACR MRI Phantom: Preliminary Results. Journal of Digital Imaging, Vol. 17(4):279-284, December 2004. 8. Haacke E.M., Brown R.W., Thompson M.R., Vankatesan R: Magnetic Resonance Imaging – Physical Principles and Sequence Design. Wiley-Liss, 1st Edition, June 1999. 9. Bourel P., Gibon D., Coste E., Daanen V., Rousseau J.: Automatic quality assessment protocol for MRI equipment. Medical Physics, Vol. 26(12), December 1999:2693-2700. 10. Colombo P., Baldassarri A., Del Corona M., Mascaro L., Strocchi S.: Multicenter trial for the set-up of aMRI quality assurance programme. Magentic Resonance Imaging, Vol. 22:93-101, 2004. 11. Ewers M., Teipel S.J., Dietrich O., Schonberg S.O., Jessen F., Heun R., Scheltens P., van de Pol L., Freymann N.R., Moeller H.J., Hampel H.: Multicenter assessment of reliability of cranial MRI. Neurobiology of Aging, epub. [Vol. 27(8):1051-9, August 2006]. 12. Neelin P.: The MINC File Format: From Bytes to Brains. NeuroImage, Vol. 7(4), 1998. 13. Magnusson P., Olson L. E.: Image analysis methods for assessing levels of image plane nonuniformity and stochastic noise in a magnetic resonance image of a homogeneous phantom. Medical Physics, Vol. 27(8):1980-1994, August 2000. 14. Schnack H.G., van Haren N. E. M., Hulshoff Pol H. E., Picchioni M., Weisbrod M., Sauer H., Cannon T., Huttunen M., Murray R., Kahn R. S.: Reliability of Brain Volumes from Multicenter MRI Acquisition: A Calibration Study. Human Brain Mapping, Vol. 22:312320, 2004.
Performance Evaluation of Grid-Enabled Registration Algorithms Using Bronze-Standards Tristan Glatard1,2 , Xavier Pennec1 , and Johan Montagnat2 1
2
INRIA Sophia - Projet Asclepios, 2004 Route des Lucioles BP 93 06902 Sophia Antipolis Cedex, France {Xavier.Pennec, Tristan.Glatard}@sophia.inria.fr CNRS - I3S unit, RAINBOW team, 930 route des Colles, BP 145 06903 Sophia Antipolis Cedex, France {glatard, johan}@i3s.unice.fr
Abstract. Evaluating registration algorithms is difficult due to the lack of gold standard in most clinical procedures. The bronze standard is a real-data based statistical method providing an alternative registration reference through a computationally intensive image database registration procedure. We propose in this paper an efficient implementation of this method through a grid-interfaced workflow enactor enabling the concurrent processing of hundreds of image registrations in a couple of hours only. The performances of two different grid infrastructures were compared. We computed the accuracy of 4 different rigid registration algorithms on longitudinal MRI images of brain tumors. Results showed an average subvoxel accuracy of 0.4 mm and 0.15 degrees in rotation.
1
Performance Evaluation Using Bronze Standards
The accuracy performances of registration algorithms are critical for many clinical procedures but quantifying them is difficult due to the lack of gold standard in most clinical applications. To analyze registration algorithms, one may consider them as black boxes that take images as input and output a transformation. The performance evaluation problem is to estimate the quality of the transformation. However, no registration algorithm will perform the same for all types of input data. For instance, one algorithm may perform very well for multimodal MR registration but poorly for SPECT/CT. This means that the evaluation data set has to be representative of the clinical application problem we are targeting: all sources of perturbation in the data should be represented, such as acquisition noise and artifacts, pathologies, etc, and that we cannot just conclude from one experiment that one algorithm is better than the others for all applications. 1.1
Performance Quantifiers
As far as the registration result is concerned, one can distinguish between gross errors (convergence to wrong local minima) and small errors around the exact transformation. The robustness can be quantified by the size of the basin of R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 152–160, 2006. c Springer-Verlag Berlin Heidelberg 2006
Performance Evaluation of Grid-Enabled Registration Algorithms
153
attraction of the right solution or by the probability of false positives. The small errors may be sorted into systematic biases, repeatability and accuracy [1]. The repeatability accounts for the errors due to internal parameters of the algorithm, mainly the initial transformation, and to the finite numerical accuracy of the optimization algorithm, while the external error accounts for the propagation of the data errors into the optimization result. It is important to notice that accuracy measures the error with respect to the truth (which may be unknown), while the precision or repeatability only measures the deviation from the average value, i.e. it does not take into account systematic biases, which are often hidden. For instance, a calibration error in the acquisition system will consistently bias all the images acquired with that device. Unless another calibration is done or an external reference is used (e.g. another acquisition device), there is no way to detect such a bias. In terms of statistical modeling, this means that all the potential error sources should be made random in order to be included. In a statistical setting, considering the input data and the output transformation as random variables naturally leads to quantify the precision (resp. accuracy) of the transformation as the standard deviation or expected RMS distance to the mean (resp. the exact) transformation, or more interestingly with the covariance matrix as the transformation uncertainty is usually non isotropic (e.g. radians and millimeters for rotation and translation part of a rigid transformation). Then, the variability of the transformation can be propagated to some target points using standard first order linearizations to obtain the covariance on the transformed test points, or its trace, the variance (see e.g. [6]). 1.2
Performance Evaluation
One of the simplest evaluation schemes is to simulate noisy data and to measure how far is the registration result from the true one (the ground truth is obviously known). The main drawback of synthetic data is that it is very difficult to identify and model faithfully all the sources of variability, and especially unexpected events (pathologies, artifacts, etc). Forgetting one single source of error (e.g. camera calibration errors in 2D-3D registration) automatically leads to underestimation of the final transformation variability. In some cases, however, images may be faithfully simulated (e.g SPECT and MRI), with a very high computational cost due to the complexity of image acquisition physics. The second evaluation level is to use real data in a controlled environment, for instance imaging a physical phantom. There is possibly a gold standard, if one can precisely measure the motion or deformation of the phantom with an external apparatus. However, it is difficult to test all the clinical conditions (e.g. different types or localizations of pathologies). Moreover, it is often argued that these phantoms are not representative of real in vivo biological systems. One level closer to the reality, experiments on cadavers correctly take into account the anatomy, but fail to exhibit all the errors due to the physiology. Moreover, images may be very different from the in-vivo ones. We tackle in this paper the last level of evaluation methods, which relies on a database of in-vivo real images representative of the clinical application.
154
T. Glatard, X. Pennec, and J. Montagnat
Such a database can be large enough to span all sources of variability, but there is usually no gold standard registration to compare with. One method is to perform a cross comparison of the criteria optimized by different algorithms [2]. However, this does not give any insight about the transformation itself. A more interesting method for registration evaluation is the use of consistency loops [3,4]. The principle is to compose transformations that form a closed circuit and to measure the difference of the composition from the identity. This criterion does not require any ground truth, but it only measures the repeatability as any bias will get unnoticed. A last type of methods is to see the ground truth as a hidden variable, and to estimate concurrently the ground truth and the quality as the distance of our results to this reference (EM like algorithms). This method was exemplified for the validation of segmentation by the STAPLE algorithm [5]. 1.3
The Bronze Standard Method
The principle of the bronze standard method is similar but concerns registration: from a set of registrations between images, we want to estimate the exact transformations, and the variability of the registration results with respect to these references. Let us assume that we have n images of the same organ of the patient k and m methods to register them, i.e. m × n2 transformations Ti,j (we denote here by k the index of the method and by i and j the indexes of the reference and target images). Our goal here is to estimate the n − 1 free transformations k T¯i,i+1 that relate successive images and that best explain the measurements Ti,j . The bronze standard transformation between images i and j is obtained by composition: T¯i,j = T¯i,i+1 ◦ T¯i+1,i+2 ◦ . . . ◦ T¯j−1,j if i < j (or the inverse of both terms if j > i). The free transformation parameters are computed by minimizing the prediction error on the observed registrations: k 2 C(T¯1,2 , T¯2,3 , . . . , T¯n−1,n ) = d Ti,j , T¯i,j (1) i,j∈[1,n],k∈[1,m]
Here, d is a distance function between transformations chosen as a robust variant of the left invariant distance on rigid transformation developed in [6]: (−1) d(T1 , T2 ) = min μ2 (T1 ◦ T2 ) , χ2 with μ2 (R(θ, n), t) = θ2 /σr2 + t2 /σt2 where θ is the angle of rotation R and n is the unitary vector defining its axis. t is the translation vector of the transformation. Details on the general methods for doing statistics on Riemannian manifolds and Lie groups are given in [7]. In this process, we do not only estimate the optimal transformations, but also the rotational and translational variance of the “transformation measurements”, which are propagated through the criterion to give an estimate of the variance of the optimal transformations. Of course, these variances should be considered as a fixed effect (i.e. these parameters are common to all patients for a given image registration problem, contrarily to the transformations) so that they can be computed more faithfully by multiplying the number of patients. The estimation T¯i,i+1 is called bronze standard because the result converges toward the perfect registration as the number of methods m and the number of
Performance Evaluation of Grid-Enabled Registration Algorithms
155
images n increases. Indeed, considering a given registration method, the variability due to the noise in the data decreases as the number of images n increases, and the registration computed converges toward the perfect registration up to the intrinsic bias introduced by the method. Now, using different registration procedures based on different methods, the intrinsic bias of each method also becomes a random variable, which is hopefully centered around zero and averaged out in the minimization procedure. The different bias of the methods are now integrated into the transformation variability. To fully reach this goal, it is important to use as many independent registration methods as possible. k Criterion (1) is in fact the log-likelihood of the observations Ti,j assuming Gaussian errors around the bronze standard registrations with a variance σr2 on the rotation and σt2 on the translation. An important variant is to relax the assumption of the same variances for all algorithms, and to unbias their estimation. This can be realized by using only m − 1 out of the m methods to determine the bronze standard registration, and use the obtained reference to determine the accuracy of the last method (a kind of leave-one-method-out test).
2
Gridifying Registration Algorithms
The large amount of input data and registration algorithms required to compute the bronze standard makes this method very compute intensive. A grid infrastructure can handle the load of the computations involved and help in managing the medical image database to process. A grid is a pool of shared computing and storage resources accessible through a middleware software layer which hides as much as possible the complexity of the infrastructure to the user. Those platforms are designed to help users to share and execute efficiently their algorithms and data, which fulfills the needs of the bronze standard application. 2.1
Interoperability to Compare and Share Algorithms
Sharing registration algorithms implies that each registration service is semantically described. A complete system would include an ontology of registration problems (modalities, anatomical region, rigid or non-rigid registration, etc), of registration algorithms (input/output data types, method used, etc) and of image and transformation formats. In our case, converting rigid transformations formats was quite simple (although some transformations were expressed in different bases) but extending that to non-rigid transformations is still open. From a practical point of view, we wrapped each registration algorithm into a standard Web service. Those Web services are responsible for the grid execution of their algorithm on the data sets specified at invocation. Algorithms are thus standard black boxes, ready to be assembled into an application. To minimize the user effort for the gridification of its algorithm, we developed a generic wrapper that is able to submit a job on the grid, given a simple description of the corresponding command line.
156
T. Glatard, X. Pennec, and J. Montagnat
Fig. 1. Bronze standard workflow. Each double squared box represents a registration algorithm. Lightweight computing tasks such as data transfers and format transformations are represented with simply-squared boxes and arrows show computation dependencies. Triangles figure the inputs of the workflow, rhombs the outputs and ellipses the parameters. The final box is the bronze standard computation.
2.2
Workflow Description and Execution
Once registration algorithms are wrapped, one can describe the global bronze standard application. Data links are first specified between outputs and inputs of algorithms, in order to define the data pipeline. Control links may also be specified in order to describe precedence constraints between algorithms. We chose to describe the workflow with the Scufl language [8] which is a good trade off between high expressiveness and simplicity. The obtained bronze-standard workflow is depicted on figure 1 and fully described in Section 3. To efficiently execute such a workflow on a grid, we developed a workflow manager called MOTEUR [9]. It particularly focuses on optimizing the time performances, which are critical in the case of data-intensive applications such as our bronze standard. MOTEUR enables three different kinds of parallelism (workflow, data and service parallelism), in order to exploit the massively parallel resources available on the grid infrastructure. Moreover, it groups sequential jobs to lower the number of services invocations and minimize the grid overhead resulting from jobs submission, scheduling and data transfers. Finally, MOTEUR can execute workflows on grid systems with very different scales. We made experiments on the EGEE production grid1 (including 18,000 CPUs all over the world and 5 PB storage capacity), as well as on the Grid5000 experimental infrastructure2 (2,000 CPUs and hundreds of GB storage capacity). 1 2
Enabling Grids for E-sciencE, http://www.eu-egee.org Grid5000 national grid, http://www.grid5000.org
Performance Evaluation of Grid-Enabled Registration Algorithms
3
157
Experiments
We are targeting the clinical follow-up of the radiotherapy of brain tumors, which requires several registrations. To optimize the dose planning, a deformable atlas to patient registration is performed to segment target volumes and organs at risk. To be more accurate, multimodal images are often co-registered. Last but not least, the tumor evolution and the result of the treatment are assessed in followup images, thanks to a monomodal rigid registration. This is the registration problem that we consider in this paper. Quantifying its accuracy is important to ensure the precision of the tumor evolution estimation in the assessment of the efficiency of clinical treatments. Precisely registered longitudinal studies may be used to validate the quality (reproducibility and accuracy) of segmentation algorithms used for radiotherapy planning. To evaluate all these registration / segmentation problems, a database of 110 patients with 1 to 6 times points and MR T2, T1 and gadolinium injected T1 modalities was acquired at a local cancer treatment center (courtesy of Dr PierreYves Bondiau from the "Centre Antoine Lacassagne", Nice, France) on a Genesis Signa MR scanner. Among them, 29 have more than one time point and were suitable to inclusion in our rigid registration evaluation study. We chose to select only the injected T1 images in a first step. These images are more demanding for registration than other MRI sequences as the gadolinium uptake is likely to vary at different time points, leading to local intensity outliers. All T1i images are 256×256×60 ×16 bits. We considered four different registration algorithms. Two of them are intensitybased: Baladin [10] has a block matching strategy optimizing the coefficient of correlation and a robust least-trimmed-squares transformation estimation; Yasmina uses the Powell algorithm to optimize the SSD or a robust variant of the correlation ratio (CR) [4]. The two others are feature-based and match specific points crest lines with different strategies [11]: CrestMatch is a predictionverification method and PFRegister is an ICP algorithm extended to features more complex than points. In the computation of the bronze standard registration, CrestMatch is used to initialize all the other algorithms close to the right registration. This allows us to ensure that all algorithms converge toward the same (hopefully global) minimum. A visual inspection is performed a posteriori on the bronze standard registration to ensure that this “optimal” transformation is indeed correct. As we are focusing on accuracy and not on the robustness, this initialization does not bias the evaluation. Figure 1 illustrates the application workflow. 3.1
Accuracy Results
The workflow was run on the 29 selected patients with σr = 0.15 degrees, σt = 0.42 mm and a χ2 value of 30. A high number of registration results were rejected in the robust estimation of the bronze standard transformations. A visual inspection revealed that there was a scaling and shear problem in the yz plane for one of the image involved in each of these rejected registrations.
158
T. Glatard, X. Pennec, and J. Montagnat
A detailed analysis of the DICOM headers showed that the normal to the slices (xy plane), given by the cross product of the Image Orientation vectors, was not perfectly parallel to the slice trajectory during the acquisition (axis obtained from the Image Position field). This tilt was found to be +1.19 degree in most of the images and -1.22 degree in 13 images. It seems that nothing in the DICOM standard ensures that 3D images are acquired on an orthogonal grid: it would be interesting to better specify the acquisition protocols on the MR workstation (the radiologists were even not aware of that tilt!). Thus, images are not in an orthogonal coordinate system and should be either registered with an affine transformation (which adds 6 additional parameters among which only one -the tilt- has a physical justification) or the tilt should be taken into account within the rigid registration algorithm, but this solution was not implemented for the algorithms we were considering. As the tilt was small, we chose not to resample the images (in order to keep the original image quality), but rather to perform an uncorrected rigid registration within the group of images with a positive tilt only. This led us to remove 13 images among the 82, and 4 patients for which only one image was remaining (the statistics on the remaining number of patients, images and registrations are given in table 1). Table 1. Summary statistics about the image database used Number of time points: 2 3 4 6 Registration per patient (and per algorithm): 2 6 12 30 Patients (including/without tilted images): 15/15 6/7 7/2 1/1 Total number of registrations: 120/120 144/168 336/96 120/120
Fig. 2. Example of a slice of two registered images with a high deformation
The bronze standard workflow was run again with the same parameters on this reduced database of 25 patients. This time, only 20 registrations were rejected, among which 15 were concerning two patients with a very high deformation in the tumor area, leading to some global deformations of the brain (Fig. 2). In that situation the rigid motion assumption does not hold any more and several "optimal" rigid registration may be valid depending on the area of the brain. The last 5 rejected transformations involve two acquisitions with phase-encoded motion artifacts which impacted differently feature-based and intensity-based registration algorithms, leading to two non-compatible sets of transformations. However, it was not possible to visually decide which result was the “right” one.
Performance Evaluation of Grid-Enabled Registration Algorithms
159
Excluding these 20 transformations which correspond to special conditions where the rigid assumption does not really hold, we obtained mean errors of 0.130 degree on the rotations and 0.345 mm on the translations. The propagation of this error on the estimated bronze standard leads to an accuracy of 0.05 degree and 0.148 mm. We then determined the unbiased accuracy of each of the 4 algorithms by comparing its results to the bronze standard computed from the 3 others methods. Results are presented in table 2 and show slightly higher but equivalent values for all algorithms. Table 2. Accuracy results Algorithm CrestMatch PFRegister Baladin Yasmina
3.2
σr (deg) σt (mm) 0.150 0.424 0.180 0.416 0.139 0.395 0.137 0.445
Table 3. Execution times Image pairs 12 66 126 Sequential 2h40min 14h40min 28h Grid5000 10min 35min 2h10min EGEE 2h10min 3h22min 4h57min Grid5000 speed-up 13 5.8 2.3 w.r.t EGEE
Grid-Computing Results
The execution times of the whole workflow was compared on the EGEE and Grid5000 (Sophia shared cluster of 105 nodes) platforms and on the sequential case, for different numbers of image pairs to register (Table 3). Even though the EGEE production infrastructure gathers many more processors than the Grid5000 cluster, the workflow was always faster on the Grid5000 cluster. This is explained by the high overhead introduced by the EGEE grid, coming from the large scale of this platform and its multi-users nature. However, the speedup obtained on the Grid5000 cluster vs EGEE is decreasing with the number of input images. The Grid5000 cluster progressively enters a saturation phase, where all the available processors are used by the application, while the EGEE grid is more scalable and less impacted by the growth of the input data set size.
4
Discussion
We propose in this paper a bronze standard evaluation framework to analyze the accuracy of rigid registration algorithms wrapped into web services, and a workflow engine to efficiently deploy this application on grids. The gridification of the application is motivated by the fact that both databases and registration algorithms may be more efficiently shared on grids. This is fundamental for bronze standards methods since the registration performance is converging from precision to accuracy only for a large number of data and algorithms. Experiments demonstrate that the bronze standard method can be precise enough to detect very small deviations from the rigidity assumption (shears of 2 degrees) in images, and that the 4 rigid registration algorithms used actually reach a subvoxel accuracy of 0.15 degree in rotation and 0.4 mm in translation for the registration of longitudinal T1 injected 1x1x2mm images of the brain.
160
T. Glatard, X. Pennec, and J. Montagnat
Concerning the grid computing part, our results showed that the workflow engine was quite general and powerful. Moreover, execution times revealed that choosing the platform with the highest number of processors is not always the best solution as strong latencies may slow down the execution on wide infrastructures. The targeted grid infrastructure should thus be chosen according to the size of the problem to be solved.
References 1. P. Jannin et al. Validation of medical image processing in image-guided therapy. IEEE TMI, 21(12):1445–1449, December 2002. 2. P Hellier et al. Retrospective evaluation of intersubject brain registration. IEEE Trans Med Imaging, 22(9):1120–30, September 2003. 3. M. Holden et al. Voxel similarity measures for 3D serial MR brain image registration. IEEE Trans. Med. Imaging, 19(2):94–102, 2000. 4. A. Roche et al. Rigid registration of 3D ultrasound with MR images: a new approach combining intensity and gradient information. IEEE TMI 20(10):1038– 1049, 2001. 5. SK Warfield et al. Simultaneous truth and perf. level estimation (staple): an algorithm for the validation of image segmentation. IEEE TMI, 23(7):903–921, 2004. 6. X. Pennec et al. Feature-based Registration of Medical Images: Estimation and Validation of the Pose Accuracy MICCAI’1998, LNCS 1496:1107–1114, 1998. 7. X. Pennec et al. Intrinsic Statistics on Riemannian Manifolds: Basic Tools for Geometric Measurements J. of Math. Imaging and Vision, 2006, to appear. 8. I. Oinn et al. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics journal, 17(20):3045–3054, 2004. 9. T. Glatard et al. Efficient services composition for grid-enabled data-intensive applications. Proc. of HPDC’06 333–334 10. S. Ourselin et al. Block matching: A general framework to improve robustness of rigid registration. In Proc. of MICCAI’2000, LNCS 1935:557–566, 2000. 11. X. Pennec et al. Landmark-based registration using differential geometric features. Handbook of Medical Imaging, Chap. 31:499–513. Acad. Press, 2000.
Anisotropic Feature Extraction from Endoluminal Images for Detection of Intestinal Contractions Panagiota Spyridonos1, Fernando Vilari˜ no1 , Jordi Vitri` a1 , Fernando Azpiroz2 , and Petia Radeva1 1
Computer Vision Center and Computer Science Dept. Universitat Autonoma de Barcelona, Barcelona, Spain 2 Hospital Universitari Vall d’Hebron, Barcelona, Spain
Abstract. Wireless endoscopy is a very recent and at the same time unique technique allowing to visualize and study the occurrence of contractions and to analyze the intestine motility. Feature extraction is essential for getting efficient patterns to detect contractions in wireless video endoscopy of small intestine. We propose a novel method based on anisotropic image filtering and efficient statistical classification of contraction features. In particular, we apply the image gradient tensor for mining informative skeletons from the original image and a sequence of descriptors for capturing the characteristic pattern of contractions. Features extracted from the endoluminal images were evaluated in terms of their discriminatory ability in correct classifying images as either belonging to contractions or not. Classification was performed by means of a support vector machine classifier with a radial basis function kernel. Our classification rates gave sensitivity of the order of 90.84% and specificity of the order of 94.43% respectively. These preliminary results highlight the high efficiency of the selected descriptors and support the feasibility of the proposed method in assisting the automatic detection and analysis of contractions.
1
Introduction
Small intestine (SI) contractions are among the motility patterns which bear very high clinical pathological significance for many gastrointestinal disorders, such as ileus, bacterial overgrowth, functional dyspepsia and irritable bowel syndrome [1]. SI contractions could be classified on the basis of their duration. Manometry is currently viewed as the best technique for diagnosing intestinal motor disorders. SI manometry gives a picture of the contractile activity of the gut over time by providing continuous recordings of intra-luminal pressures resulting from phasic contractions in the small bowel [2]. With the very recent invent of capsule endoscopy (CE) [3,4] physicians were allowed to record continuous information of SI activity, and to visualize and trace the whole variety of contraction patterns in a unique way. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 161–168, 2006. c Springer-Verlag Berlin Heidelberg 2006
162
P. Spyridonos et al.
In this work, our purpose is to explore efficient methods for the automatic detection of frames registered during contractile activity. Visual annotation of contractions is an elaborating task, since the recording device of the capsule stores about 50,000 images and contractions might occur in undetermined time periods throughout the navigation course of the capsule in the intestinal tract. Computer vision offers means to encode in quantitative patterns, ‘scenes’that posse their own properties to be outstanding. Several techniques for the analysis of endoluminal images have been reported in the literature [5,6,7,8,9]. However, addressing the automatic recognition of special patterns of contractions is a novel and challenging field in computer vision applications, and up to our knowledge it is not addressed in the bibliography. In this paper, we introduce an efficient feature extraction methodology based on anisotropic image processing for tracing frames of contractions, mining local directional information from the image gradient tensor. Feature evaluation in detecting positive examples of contractions has been performed by means of support vector machine classifier (SVM) [10]. The paper is organized as follows: Section 2 focuses on the method of feature extraction to characterize contraction patterns, section 3 discusses our results and the paper finishes with conclusions and future lines of research.
2
Methodology for Feature Extraction and Classification of Contraction Patterns
Contractions used to appear with a prolonged, strong occlusion of the lumen as shown in Fig.1. The omnipresent characteristic in these frames are the strong edges (wrinkles) of the folded intestinal wall, distributed in a radial way around the closed intestinal lumen.
Fig. 1. Endoluminal images of positive (top) and negative examples (bottom)
The procedure to encode in a quantitative way the wrinkle–star pattern was accomplished in three steps. Firstly, the skeleton of the wrinkle pattern is extracted. Secondly, the center of the intestinal lumen is detected, as the point where the wrinkle edges converge using the image structure tensor. Finally, a set
Anisotropic Feature Extraction from Endoluminal Images
163
of descriptors were estimated taking into account the near radial organization of the wrinkle skeleton around the center of the intestinal lumen. 2.1
Extraction of the Wrinkle Skeleton and Detection of the Center of the Intestinal Lumen
To extract the wrinkle skeleton we used the local directional information of the image gradient tensor also called in the bibliography structural matrix (image structural tensor). The structural tensor of an image J is calculated as follows: Jxx Jxy J= (1) Jyx Jyy where, subscripts indicate image derivatives [11]. Before estimating the structural matrix the image is smoothed by a σ-standard deviation 2D Gaussian filter. The value of σ was set equal to three, which was a good compromise between removing noisy edges and retaining local information in the scale of the wrinkle edges. The eigenvectors and the eigenvalues of the structural matrix provide rich information for the detection of the wrinkle edges. More specifically, the eigenvector that corresponds to the highest eigenvalue, is crossing the wrinkle edge, pointing large gradient variations. The second eigenvector is lying to the edge direction pointing towards the smallest gradient variations. Since the pattern of the contraction is a pattern with strong valley directionality, we used the gray matrix of the prime eigenvalue of the structure matrix for the initial detection of the wrinkle skeleton. An example is illustrated in Fig.2.
(a)
(b)
(c)
(d)
Fig. 2. (a) Original image, (b) gray-matrix of the first eigenvalues of the structural tensor, (c) thresholded binary image and (d) wrinkle skeleton
Since the center of the intestinal lumen serves as the center of symmetry for the wrinkle skeleton, the automatic detection of this point is a critical step in the whole procedure of the wrinkle–pattern extraction. In other words, this point defines the small pool in which strong streams (wrinkle edges) converge. To define this point we smoothed the image with a 2D Gaussian filter of σ equal to 13. The purpose of using such a large σ is due to the fact that we want to retain information related only with large scale structures in the image, such as the blob of the intestinal lumen. In this case, the eigenvalues of the structural tensor would take high values in the area of the predominant structure of the
164
P. Spyridonos et al.
intestinal lumen. Using the point with the highest second eigenvalue is a good approximation for detecting the center of the intestinal lumen. An illustrative example is given in Fig.3.
Fig. 3. (a) original image,(b) gray-matrix of the second eigenvalues of the structural tensor pointing the highest eigenvalue and (c) detection of the wrinkle pool
2.2
Descriptors of Wrinkle Skeletons
Our visual perception of a strong wrinkled–contraction is a star–wise wrinkle skeleton, co–central to the intestinal lumen. In other words, we have the presence of strong edges directed towards a central point, and uniformly spread around this point. To capture this feature of the star–pattern, we extracted a set of eight descriptors as follows: First considering the center of intestinal lumen as the center of wrinkle symmetry, we divide the wrinkle skeleton in four quadrants (see Fig. 4). In each quadrant we construct the histogram of the orientation of the second eigenvector estimated in the skeleton pixels. Since we are interested in tolerating small differences and keeping the general tendency of skeleton orientation, our histogram used just four bins corresponding to the directions 0◦ , 90◦ , 135◦ , 45◦ . Consequently, we have a pattern of four descriptors in each quadrant. As it is illustrated in the example of Fig. 4, in the case of near symmetric star–wise pattern, the directional information in quadrant 1 is matching the directional information in quadrant 3. Equally, the directional information in quadrant 2 is similar with the directional information in quadrant 4. Therefore, for the whole, wrinkle skeleton, we result in 8 descriptors by summing the directional information of quadrants 1 and 3, and the direction information of quadrants 2 and 4, resulting in a set of 8 descriptors. Figure 4 , illustrates the construction of the set of descriptors for a synthetic star–wise pattern. The histogram directional information in quadrant 1, 2, 3 and 4 is: {132, 132, 132, 0}, {132, 132, 0, 132}, {132, 132, 132, 0} and {132, 132, 0, 132} respectively. Thus, the resulted pattern for the whole skeleton is: {264, 264, 264, 0, 264, 264, 0, 264}. In addition to the set of the eight descriptors giving information about the orientation of wrinkles, we are interested in defining the entropy of the skeleton features. In particular, four more descriptors -one for each quadrant- were extracted measuring the local directional information of wrinkle edges. Since the
Anisotropic Feature Extraction from Endoluminal Images
165
Fig. 4. Skeleton of an ideal star-wise pattern
second eigenvector gradient fields are following the direction of image valleys, local directional information on wrinkle edges can be described by estimating the tendency of the direction of second eigenvector fields for the skeleton points. More specifically, for each quadrant, the entropy of the direction of second eigenvectors is estimated for the skeleton points. In a pure wrinkled contraction, in each quadrant of the wrinkle skeleton, the entropy should be low. For a regular frame where no contractile activity of the lumen is evident, the local directional information in the extracted skeleton is less ordered, resulting in high entropy values. An example is given in Fig.5 and Fig.6. Figure 5, shows a wrinkle pattern and the entropy obtained is: {0.34, 0.37, 0.24, 0.33}. Figure 6, shows two negative examples and the obtained entropy is: {0.46, 0.36, 0.58, 0.48} and {0.47, 0.32, 0.59, 0.55}, for Fig.6(a) and Fig.6(b) respectively. Finally, we include two parameters to measure the sharpness of the wrinkle edges. From the structural matrix J (1), we define two more descriptors, a and b estimated from the skeleton points. These descriptors are given as follows [12]: a = trace {J}
(2)
b = a2 − 4|J| ∗ (|a2 − 4|J||)
(3)
Descriptors a and b were calculated at each skeleton point and averaged over all points. Well defined crisp edges with predominant directionality tend to have high values for a and b parameters, whereas skeletons resulted from smoothed gradient variations have lower values. An example is given in Fig.7. Finally, the resulted pattern for the wrinkle skeleton comprised of 14 features: a set of 8 directional features, a set of 4 local entropy related features and 2 features of edge sharpness.
3
Results
The proposed feature extraction methodology was evaluated, using a set of images from 9 videos obtained using endoluminal capsules developed by Given Imaging Ltd, Israel [13]. The data were provided from the Digestive Diseases Department of the General Hospital de Vall d’ Hebron, Barcelona Spain. Our data contained 521 positive examples of images of contractions and 619 images not corresponding to intestine contractions (called negative examples). The negative
166
P. Spyridonos et al.
Fig. 5. A wrinkle pattern and the second eigenvectors gradient fields following the wrinkle edges. The well ordered vectors result in low entropy values in each quadrant.
(a)
(b)
Fig. 6. Negative examples, with rich edge information organized in a random way
(a)
(b)
Fig. 7. (a) Well defined crisp wrinkles from a contraction frame {a, b} = {0.71, 0.82}, (b) regular frame with smoothed wrinkle edges {a, b} = {0.21, 0.05}
examples were selected in order to represent different phenomena occurring in the intestine (turbid presence, empty intestine without contraction, images with lumen appearance (partially or totally visible), images without lumen appearance, etc). Classification performance was tested by means of a SVM classifier with radial basis function kernel and employing the hold out cross validation method. The hold out cross validation method was simulated for 100 times, using for training each time 50% of the data from each data set and testing with the remaining. The classification performance was estimated in terms of sensitivity
Anisotropic Feature Extraction from Endoluminal Images
167
and specificity. On average the system detected correctly 90.84% of the positive examples, and 94.43% of the negative examples (Table 1). For a comparative evaluation, Table 2 reports the performance of the k-nearest neighbor (KNN) classifier on the same task. Table 1. Classification accuracies estimated using the SVM classifier
Mean Value Standard Deviation
Sensitivity(%) Specificity(%) 90.84 94.43 1.8 1.29
Table 2. Classification accuracies estimated using the KNN classifier
Mean Value Standard Deviation
4
Sensitivity(%) Specificity(%) 92.38 89.48 1.50 2.58
Discussion
In this paper we presented a feature extraction methodology for the detection of specific configurations characterizing the wrinkled contractions in video capsule endoscopy. We proposed an approach based on image structural matrix, to extract informative skeletons, to obtain the center of the intestinal lumen and to estimate sets of informative descriptors. We extracted three sets of descriptors that our intuition can easy interpret. First, we showed a simple model for extracting directional information in four directions, which model adequately, pure, star–symmetric patterns. However, endoluminal images have rich texture and frequently, skeletons are contaminated from edge information that do not correspond to wrinkle contractions. A second set of local directional information was used to add information of homogeneous and well directed valleys. Contractions are not always pure symmetric patterns, and the model of symmetric information in quadrants defined from the center of contraction seemed to give incomplete information. For this, we have added in our pattern two more descriptors, related with the sharpness of the extracted wrinkle edges. Concluding, anisotropic image processing combined with statistic measurements and classification techniques offer elegant means for modelling and characterizing motility information in endoluminal images. Our results in testing sets of endoluminal images suggest the feasibility of the method as an assisting tool for the detection of contractions.
Acknowledgments This work was supported in part by a research grant from Given Imaging Ltd., Yoqneam Israel, H. U. Vall d’Hebron, Barcelona, Spain, as well as the
168
P. Spyridonos et al.
projects FIS-G03/1085, FIS-PI031488, TIC2003-00654 and MI-1509/2005. The technology and methods embraced by this disclosure has been filed for patent protection.
References 1. Kellow, J.E., Delvaux, M., Aspriroz, F., Camilleri, M., Quigley, E.M.M., Thompson, D.G.: Principles of applied neurogastroenterology: physiology/motilitysensation. Gut 45(2) (1999) 1117–1124 2. Hansen, M.B.: Small intestinal manometry. Physiological Research 51 (2002) 541–556 3. Idden, G., Meran, G., Glukhovsky, A., Swain, P.: Wireless capsule endoscopy. Nature (2000) 541–556 4. Rey, J.F., Gay, G., Kruse, A., R, L.: European society of gastrointestinal endoscopy guideline for video capsule endoscopy. Endoscopy 36 (2004) 656–658 5. Tjoa, M.P., Krishnan, S.M.: Feature extraction for the analysis of colon status from the endoscopic images. Biomedical Engineering OnLine 2 (2003) 3–17 6. Karkanis, S.A., Iakovidis, D.K., Maroulis, D.E., Karras, D.A., Tzivras, M.: Computer aided tumor detection in endoscopic video using color wavelet features. IEEE Trans-actions on Information Technology in Biomedicine 7 (2003) 141–152 7. Magoulas, G., Plagianakos, V., Vrahatis, M.: Neural network-based colonoscopic diagno-sis using online learning and differential evolution. Applied Soft Computing 4 (2004) 369–379 8. Kodogiannis, V.S., Chowdrey, H.S.: Multi-network classification scheme for computer-aided diagnosis in clinical endoscopy. In: Proceedings of the International Conference on Medical Signal Processing (MEDISP), Malta (2004) 262–267 9. Boulougoura, M., Wadge, V., Kodogiannis, V.S., Chowdrey, H.S.: Intelligent systems for computer-assisted clinical endoscopic image analysis. In: Proceedings of the 2nd IASTED Conference on Biomedical Engineering, Innsbruck, Austria (2005) 405–408 10. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer (1995) 11. Cyganek, B.: Combined detector of locally-oriented structures and corners in images based. LNCS 2658 (2003) 721–730 12. Acar, B., Beaulieu, C.F., et al.: Edge displacement field-based classification for improved detection of polyps in ct colonography. IEEE Transactions on Medical Imaging 21(12) (2002) 1461–1467 13. Given Imaging, Ltd: (http://www.givenimaging.com/. 2005)
Symmetric Curvature Patterns for Colonic Polyp Detection Anna Jerebko, Sarang Lakare, Pascal Cathier*, Senthil Periaswamy, and Luca Bogoni Siemens Medical Solutions, CAD group, Malvern, PA 19380, USA {Anna.Jerebko, Sarang.Lakare, Senthil.Periaswamy, Luca.Bogoni}@siemens.com
Abstract. A novel approach for generating a set of features derived from properties of patterns of curvature is introduced as a part of a computer aided colonic polyp detection system. The resulting sensitivity was 84% with 4.8 false positives per volume on an independent test set of 72 patients (56 polyps). When used in conjunction with other features, it allowed the detection system to reach an overall sensitivity of 94% with a false positive rate of 4.3 per volume.
1 Introduction As CT Colonography (CTC) is gaining broader acceptance as a screening method for colorectal neoplasia [1], [2], [3], [4] which is one of the leading causes of cancer deaths in the US, in a rather near future CAD tools could be employed as a second reader to aid in polyp detection. In this paper, we introduce a novel approach for generating a set of features derived from properties of patterns of curvature to characterize the shape, size, texture, density and symmetry with the goal of facilitating the differentiation between polyps and other structures in the volume of interest. Our computer aided polyp detection algorithm consists of the following steps: candidate generation, feature extraction and classification. A detailed description of the feature extraction algorithm, the focus of this paper, is given in Section 2. We briefly explain the candidate generation step in Section 3, followed by introduction of the feature selection and classification methods. Error validation approach and the results on the training and test sets are reported in Section 4. The conclusion is given in Section 5.
2 Symmetrical Curvature Pattern Based Features for Polyp Characterization 2.1 Symmetric Pattern Extraction The new technique presented in this paper allows generating a set of discriminative features for a given set of candidate locations. The features aim at capturing a general class of mostly symmetrical and round structures protruding inward into the lumen *
Pascal Cathier is now at SHFJ, CEA, Orsay, France.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 169 – 176, 2006. © Springer-Verlag Berlin Heidelberg 2006
170
A. Jerebko et al.
(air within the colon), having smooth surface and density and texture characteristics of muscle tissue. These kinds of structures exhibit symmetrical change of curvature sign about a central axis perpendicular to the object’s surface. Colonic folds, on the other hand, have shapes that can be characterized as half-cylinders or paraboloids and hence do not present similar symmetry about a single axis. We first apply 3D Canny edge detection to find the boundary between tissue and colon lumen (air). Next, mean (H) and Gaussian (K) curvatures are calculated for each boundary voxel using the intensity values of the original volume [5]. Then, centering on each boundary voxel, all the connected neighbors within a maximum distance (rmax), determined by the maximum targeted polyp’s size (20 mm in our experiments, plus extra ~5 mm was added in order to include the part of the boundary where the polyp merges with the surrounding tissue) are considered. These neighbors are grouped radially into concentric bands (see Fig.1).
Fig. 1. Left: 2D view of a sessile polyp; Right: rendered view with overlaid concentric bands of increasing radius. The dashed circle illustrates the inflection band where polyp merges with the surrounding tissue.
Average and standard deviations of K and H are computed within each band and then, the curvature slope is computed from the top of the polyp to the place of merging. When the center is located on the top of a polypoid structure, the bands having minimal Gaussian curvature (Kmin) are likely to correspond to the polyp’s inflection band (see Fig. 1) – the transition area between the polyp and normal healthy tissue. Then K and H curvature slopes SK and SH are calculated as: SK = ( Kc - Kmin ) / DKmin
(1)
SH = ( Hc - Hmax ) / DHmax
(2)
Where SK – Gaussian curvature slope, SH – mean curvature slope, Kc – average Gaussian curvature in the top central band, Hc – average mean curvature in the top central band, DKmin - distance from the top to the band having minimal Gaussian curvature, DKmax - distance from the top to the band having maximum mean curvature. Voxels having negative Gaussian curvature slope are likely to correspond to a polyp or other protrusions on the surface. Folds or other cylindrical structures could also have similar properties, but the standard deviations of K and H curvature within the bands having minimal Gaussian curvature are significantly higher in folds than in polyps. In a specific VOI, the cluster having minimum average SK among all clusters is then selected as most likely corresponding to the boundary of polypoid object of interest.
Symmetric Curvature Patterns for Colonic Polyp Detection
171
The following features are derived from the concentric bands centered in the voxel of this cluster, which has maximum slope SK: minimum SH, standard deviations of K and H in the polyp inflection band (which could be in different distances for K and H), distance from the central voxel to the inflection band DHmax and DKmin, standard deviations of K and H in the central band, largest diameter d1, two orthogonal diameters d2, d3 and their ratios λ1, λ2, λ3. 2.2 Cutting Surface Interpolation Colonic polyps are attached to colon wall, folds or could be partially (or fully) surrounded by stool with no visible intensity difference, and hence detectable edges, amongst all these objects. Thus, after detecting the cluster of air/tissue boundary voxels belonging to a polypoid object, it is necessary to interpolate underlying (cutting) surface to separate polyp from the tissue it is attached to. The segmented polyp surface is transformed to spherical coordinates using the center mass of the lesion boundary voxels as the origin for the transformation. The new coordinates r (ϕ ,θ) of every point (x,y,z) are estimated using linear interpolation. The discretization degree is computed as D=round(2π rmax) so that a voxel at a maximum radial distance, rmax from the center of the transformation, maps to a single voxel in spherical space. Intuitively, as a result of this step, continuous patches of surface in spherical coordinates will represent an unwrapped object boundary. The following features are derived from the object surface in spherical coordinates: the average, standard deviation, and skewness of radius. These features will help to characterize the object size and shape and its similarity to a fully symmetrical portion of a sphere. The areas where there was no boundary voxel for the given (ϕ,θ ) correspond to the place where polyp is attached to the colon wall, or just touching other portions of the colon wall, fold or stool. Usually the largest gaps in the image r (ϕ ,θ ) correspond to the base of a polyp, or attachment location. Additionally there could be other gaps where object boundary is absent or is in contact with other portion of the colon wall. The neighbors for each empty point in spherical coordinates can be much more easily located than in Cartesian space. For example, in case of an empty hemisphere, which has random holes, only the set of points from the rim are needed to interpolate the base, and other sets of points from the edge of each hole to fill the holes. In our implementation, for each empty voxel two closest neighbors are determined from the existing surface patches first in vertical (ϕ) and then in horizontal (θ) directions, (more neighbors could be used for better precision). Fig. 2 shows a sketch in Cartesian coordinates for a section to interpolate where two closest neighbors, for a fixed value of θ, have been identified. These two points (identified by the tips of the arrows from the center) and the center of coordinates form a triangle for which two sides (radii r1 and r2) and the angle between them ϕo, in ϕ -direction (θo, in θ-direction) are known. By solving this triangle relation, it is possible to interpolate all points lying along the third side (dashed line segment connected the two endpoints of radii r1 and r2), knowing the angle between the third side and the radial line to be interpolated. For any empty point in spherical coordinate system, the values for ϕ and θ are known and so is the angle with the origin of the
172
A. Jerebko et al.
transformation. For each value of ϕi between ϕ1 and ϕ2 each interpolating radius value spanning the linear segment can be computed. Segment interpolation is linear in Cartesian space; in spherical space it spans a concave arc (see Fig 2. Right).
Fig. 2. Left: Interpolation of cutting surface in Cartesian space. Right: Interpolation results in spherical space.
Final values of radii in the empty spots are computed as average of ϕ and θ neighbors. Once the interpolation is completed in spherical space, all points of the interpolated surface are mapped back to the Cartesian coordinates.
Fig. 3. Left: 2D view of a rectal polyp shows no intensity difference between the polyp and colon fold it is attached to. Air/polyp boundary is shown by dotted white line and the black dashed line shows the cutting line to be interpolated. Middle: Rendered view of the top polyp surface. Right: Rendered view of interpolated cutting surface. The segmented voxels belonging to the polyp are highlighted in the rendered views. (Rendered views are not in scale with the 2D view).
3 A Polyp Detection System 3.1 Candidate Generation The data pre-processing phase includes colon segmentation and a transformation to render the volume isotropic. The candidate generation approach is based on the observation that, planar cross-sections of the volume at different orientations will yield roughly circular cuts only in the case of polyps. After the segmentation of the lumen (air inside the colon), our implementation uses 13 planar orientations to search for cross-sections with such characteristics around each voxels. The overall algorithm is
Symmetric Curvature Patterns for Colonic Polyp Detection
173
computationally efficient and produces a rather low number of candidates per volume, typically less than 60 candidates/volume (over 100 candidates/patient). 3.2 False Positive Reduction: Classification The next step aims at filtering out most of the false positives by means of features characterizing the resemblance of the detections to our model (described in Section 2). Greedy search method [6] was used for feature selection using the set of features described throughout the paper (see Table 1) and quadratic discriminant analysis classifier (QDA) [7] was used for false positive reduction. Table 1. Set of features for classification chosen by a greedy search method
#
Feature
Feature description
1
SH
Mean curvature slope
2
StdKc
Standard deviation of K in the central ring
3
StdHc
Standard deviation of H in the central ring
4
StdHb
Standard deviation of H in the polyp inflection belt
5
d2
Second maximum diameter
6
d3
Third maximum diameter
7
λ1
Ratio d1 / d2 (first to second maximum diameter)
8
MeanInt
Average intensity inside the object
9
MaxInt
Maximum intensity inside the object
10
StdInt
Standard deviation of intensity inside the object
4 Experiments and Results 4.1 Data Characteristics In order to evaluate the discriminating power that features computed from curvature patterns provide, we used a dataset of 168 patients with, 329 prone and supine volumes (some cases did not have both prone and supine studies) from high-resolution CT scanners obtained from four different hospitals. These included both patients with polyps (positive cases) (n=81) and patients without polyps (negative cases) (n=87). The sensitivity and false positive rate of our system was established with respect to CTC by comparison to results from concurrent fiber-optic colonoscopy. The datasets were randomly partitioned into two groups: training set (n=96, 187 volumes) and test set (n=72, 142 volumes). The training set had a total of 76 polyps: 33 small (smaller than 6 mm), 30 medium (6 to 9.99 mm), 13 large (10 to 20 mm). The test set had total of 56 polyps: 23 small, 22 medium, and 11 large. Polyps smaller or equal to 3 mm were then excluded from both training and test sets leaving 71 and 55 in the training and test sets respectively.
174
A. Jerebko et al.
The test set was sequestered and only used to evaluate the performance of the final system. The training-set was used to design the classifier and to automatically select the relevant features as described above. The system’s performance was determined on the training-set using Leave-OnePatient-Out (LOPO) cross-validation procedure. In this scheme, both supine and prone views of a case are left out. The classifier is trained using the volumes from the remaining N-1 cases, and tested on volumes of the “left-out” case. This process was repeated N times, and the resulting testing errors were averaged to determine the LOPO error. In the computation of the per patient sensitivity, a polyp was considered as “found” if it was detected in at least one of the volumes (supine or prone) from the same patient. 4.2 Results Results on the Training Set: The candidate generation yielded an average of 51.7 candidates per volume while missing 6 small (2 of which were 3 mm and smaller) and 2 large polyps. LOPO cross-validation yielded a false positive rate of 4.9 per volume and overall sensitivity 84.4%. When considering polyps lost during the candidate generation phase, the overall sensitivity reduces to 77.5%.
Fig. 4. ROC curves for LOPO, training and test rates for false positive reduction step (without candidate generation losses taken into account)
Results on Test Group: The test group was used for evaluation and provided an estimate of performance on completely unseen data. The candidate generation phase yielded an average of 47.9 candidates per volume while missing 7 small (one of
Symmetric Curvature Patterns for Colonic Polyp Detection
175
which was smaller than 3 mm) and 1 medium polyps. We obtained a false positive rate of 4.8 per volume and an overall sensitivity was 84.8%. The overall sensitivity on the test set comes to 73.6% when the polyps lost during the candidate generation are taken into account. A variation of receiver operating characteristic curve (ROC), plot of sensitivity of the QDA classifier using the set of features listed in Table 1 against its false positives rate, (without candidate generation losses taken into account) is shown in the Fig.4 below. The plot shows ROC curves for LOPO validation, training and test sets. The operating point on the ROC curve was chosen according to the desired classifier sensitivity of around 85% with false positives rate less than 5 for both test set and LOPO validation rate.
5 Conclusion In this paper we presented a method for colonic polyp detection and generation of features for classification based on the symmetry of curvature patterns. The high sensitivity and low false positive rate achieved demonstrates that the proposed algorithm allows generating highly discriminative features suitable for computer aided colonic polyp detection. In fact, when used in conjunction with other features, it allowed the detection system to reach an overall sensitivity of 94% with a false positive rate of 4.3 per volume [3]. This approach could also be applied to the detection and segmentation of other abnormal anatomical structures such as aneurisms or lung nodules. The curvature pattern based polyp segmentation method used as a part of this could be used to automatically obtain accurate polyp size measurements [10]. Finally, while classification was not the focus of the paper, the false positives rate could be further reduced by using a more advanced classification such as methods described in [8], [9]; applying detection ranking scheme, and by introducing additional features.
References 1. Pickhardt P.J, Choi J.R., Hwang I., Butler J.A., Puckett J.L., Hildebrand H.A., “Computer Tomographic Virtual Colonoscopy to Screen for Colorectal Neoplasia in Asymptomatic Adults”, NEJM-2003, 349(23): 2191-2200. 2. Sosna, J., Morrin, M. M., Kruskal, J. B., Lavin, P. T., Rosen, M. P., Raptopoulos, V. (2003). CT Colonography of Colorectal Polyps: A Metaanalysis. AJR 181: 1593-1598. 3. Bogoni L., Cathier P., Dundar M., Jerebko A., Lakare S., Liang J., Periaswamy S.,.Baker M.E., Macari M., “Computer-aided detection (CAD) for CT Colonography: a tool to address a growing need”, The British Journal of Radiology, 78, pp. 57-62 (2005). 4. Yoshida H, Nappi J, MacEneaney P, Rubin DT, Dachman AH. Computer-aided diagnosis scheme for detection of polyps at CT colonography. Radiographics - Jul 2002 (Vol. 22, Issue 4). 5. Thirion, J.P. and Gourdon, A. “Computing the Differential Characteristics of Isointensity Surfaces,” Computer Vision and Image Understanding, vol. 61, no. 2, pp. 190-202, 1995. 6. John, G. H., Kohavi R., Pfleger K., “Irrelevant Features and the Subset Selection Problem”, International Conference on Machine Learning, 1994.
176
A. Jerebko et al.
7. Fukunaga K.: Introduction to Statistical Pattern Recognition. San Diego: Academic Press, 1990. 8. Jerebko A.K., Malley J.D., Franaszek M., Summers R.M.: “Multinetwork classification scheme for detection of colonic polyps in CT colonography data sets”, Acad Radiol 2003; 10:154–160 9. Jerebko A.K., Malley J.D., Franaszek M., Summers R.M.: Computer-aided polyp detection in CT colonography using an ensemble of support vector machines. CARS 2003 Vol. 1256, 1019 - 1024 (2003). 10. Jerebko, A.K., Lakare, S., Mang, T., Mueller, C., Salovich, D., Schima, W., Wessling, J., Bogoni L.: Evaluation of an automatic polyp measurement tool compared with experienced observers: A comparative study, ECR 2005.
3D Reconstruction of Coronary Stents in Vivo Based on Motion Compensated X-Ray Angiograms Babak Movassaghi1,3, Dirk Schaefer2, Michael Grass2 , Volker Rasche2 , Onno Wink1,3 , Joel A. Garcia3 , James Y. Chen3 , John C. Messenger3 , and John D. Carroll3 1
Philips Research North America, 345 Scarborough Road, Briarcliff Manor, New York, USA
[email protected] 2 Philips Research Laboratories, Sector Technical Systems Hamburg, Roentgenstrasse 24-26, D-22335 Hamburg, Germany 3 University of Colorado Health Sciences Center Division of cardiology, 4200 East Ninth Avenue, Denver, Colorado, USA
Abstract. A new method is introduce for the three-dimensional (3D) reconstruction of the coronary stents in-vivo utilizing two-dimensional projection images acquired during rotational angiography (RA). The method is based on the application of motion compensated techniques to the acquired angiograms resulting in a temporal snapshot of the stent within the cardiac cycle. For the first time results of 3D reconstructed coronary stents in vivo, with high spatial resolution are presented. The proposed method allows for a comprehensive and unique quantitative 3D assessment of stent expansion that rivals current x-ray and intravascular ultrasound techniques.
1 Introduction Stent deployment is currently the preferred vascular interventional procedure for stenosis treatment. During a percutaneous transluminal coronary angioplasty (PTCA) procedure a stent is delivered to the stenotic segment of a coronary artery using a ballontipped delivery catheter. The stent is crimped on the deflated balloon before insertion into the patient. The balloon catheter has two metallic markers with high absorption coefficients, which help the cardiologist to locate the balloon position in the angiograms. Once in place, the balloon is inflated, expanding the stent which locks in place and forms a scaffold. The complete expansion of the stent is essential to treating the stenosis. Inadequate expansion of the stent is a major predisposing factor to acute thrombosis, the formation of a blood clot within the stent resulting in loss of flow and an acute myocadial infarction, i.e. heart attack. Stents are positioned and deployed by fluoroscopic guidance. Although the current generation of stents are made of materials with some degree of radiopacity to enable detecting their location after deployment proper stent expansion is hard to asses. A considerable amount of work has been performed in order to visualize the coronary stent for improved positioning and outcome control, utilizing different non-invasive [1,2,3,4,5] and invasive [6,7] imaging modalities. The main limitation of existing imaging methods is the limited spatial resolution, mainly caused due to cardiac and respiratory motion. While the future trend is heading towards noninvasive R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 177–184, 2006. c Springer-Verlag Berlin Heidelberg 2006
178
B. Movassaghi et al.
modalities for diagnosis and post-operative follow-up, there remains an interest to have a 3D representation of the deployed stent during a PTCA intervention using the same imaging modality, i.e. X-ray, as used for the stent deployment. The main challenge for constructing this 3D representation is the presence of cardiac and respiratory motion. One possible solution to derive more accurate images from moving objects such as the heart is to apply motion compensation techniques to the acquired data [8,9,10,11,12,13]. In this paper an approach to accomplish a three-dimensional reconstruction of coronary stents in vivo is presented. In this approach the presence of markers on the balloon catheter is used to accurately determine their position, thus facilitating motion compensation. A reconstruction with high spatial resolution is achieved, by involving additional motion compensated projections in the reconstruction. The paper is organized as follows: In Section 2 the method for accomplishing motion compensated 3D reconstruction of coronary stents is introduced. In Section 3, initial results on the application of the proposed method in a clinical settings are presented. Conclusions and suggestions for future work are given in Section 4.
2 Method In this section the methodology to derive high-resolution 3D reconstructed stents invivo is presented. The method described in this work is based on the assumption that the deployed stent has a negligible relative movement to the enclosed part of the guide wire. Therefore, the extraction of the motion of the stent is based on determining the motion of the markers on the balloon catheter. First, the data acquisition protocol is introduced. Second, the image processing methods for determining the position of the two markers is discussed. Third, it is described how the principles underlying coronary modeling can be utilized for generating an initial 3D representation of the two bullets in space. Finally, a 2D transformation matrix for each projection image is determined, which can be used for motion compensated coronary stent reconstruction. 2.1 Data Acquisition The projection data are obtained during a continuous rotation of a calibrated monoplane C-arm X-ray system (Philips Integris Allura, Philips Medical Systems, Best, The Netherlands). A propeller rotation starting at 600 RAO and ending at 1200 LAO is used, which ensures sufficient angular coverage for volume reconstruction. The angiographic projections were acquired at a frame rate of 30 frames per second resulting in 180 projection images during a 6 second rotational run. During acquisition the ECG signal is recorded in order to retrospectively determine projections acquired in the same cardiac phase. All data sets were acquired with a 512x512 image matrix. 2.2 2D Marker Extraction Figure 1 illustrates a typical X-ray angiogram, which is utilized to ensure correct deployment and positioning of the stent (inspection of the borders) acquired immediately after stent deployment in a routine interventional procedure. In such images, the balloon
3D Reconstruction of Coronary Stents in Vivo
179
Fig. 1. X-ray angiogram, acquired immediately after a stent deployment. The markers can clearly be differentiated due to the high absorption coefficient.
markers can clearly be differentiated due to the high absorption coefficient. The guide wire, with smaller size, can also be identified. However, the deployed stent surrounding the guide wire in between the markers is barely visible. The marker positions in all available projections are automatically extracted based on a segmentation algorithm, where the sub-pixel precise identification of the bullet center point is achieved using a coarse to fine approach utilizing scale space techniques as described in [14,15]. For each marker (M i 1 , M i 2 ), this step yields a 2D position (x, y)i , which depends on the projection number i. 2.3 Reference 3D Marker Position Based on the ECG signal, two projection images are automatically selected belonging to the same cardiac phase where cardiac phases in the end-systole or late-diastole are preferred. In order to reconstruct a point P3D ∈ R3 in space from two projections, first its corresponding image points PA ∈ R2 and PB ∈ R2 in the projections must be determined, as shown in Figure 2(a). With the knowledge of the exact projections parameters including the location and orientation of the two focal spots FA and FB and the detector locations, the correspondence can be solved using the epipolar constraint [16]. The point PB is determined by searching only along the epipolar-line in projection B. The spatial position P3D is computed by a simple intersection of the lines from the points on the projections to their respective focal points. If the points PA ∈ R2 and PB ∈ R2 are acquired for different motion states, the connecting lines from the points on the projections to their respective focal spot may not intersect (see Figure 2(b)). Nevertheless, the approximate 3D position can be determined by calculating the midpoint of the line with the smallest Euclidean distance between the two connecting lines as illustrated in Figure 2(b). 2.4 Forward Projection The 2D position corresponding to a 3D point P3D in an arbitrary projection C can be determined by the forward projection of P3D onto the projection plane C. The forward
180
B. Movassaghi et al. B
A
B
A
PB
PB Ep
Ep
o ip
PA
o ip e lin lar
e lin lar
PA
E
E
A
A
P3D P3D
FA
FB
FA
FB
(a)
(b)
Fig. 2. Illustration of the 3D modeling technique in order to reconstruct a point P3D ∈ R3 in space from two projections using the epipolar constraint. If the points PA ∈ R2 and PB ∈ R2 are acquired in the same motion state the connecting lines from the points on the projections to their respective focal spot intersect (a). If the points PA and PB are acquired for different motion states, the connecting lines may not intersect (b) and the 3D position can be approximated by calculating the midpoint of the line with minimum Euclidean distance between the two connecting lines (b).
projection of a 3D location P3D , for a given orientation (α) of the C-arm, onto the projection plane C at the point PC = (xc ; yc ) is described by the projection matrix Mα , according to: ⎛ ⎞ ⎛ ⎞ Px x1 ⎜ Py ⎟ x = x1 /x3 ⎝ x2 ⎠ = Mα ⎜ ⎟ with c (1) ⎝ Pz ⎠ yc = x2 /x3 . x3 1 In our application, the feature points (markers) are subject to periodic movement due to cardiac and respiratory motion. Every projection is acquired at a different projection geometry (Gi ) and belongs to a specific cardiac motion state (Tm ). 2.5 3D Motion Compensated Stent Reconstruction In this section we will introduce the methods for determining the motion owing to the coronary arteries, so as to compensate for this motion in projection images acquired in a different heart phase Tc0 . Hereto, the 2D position of the two balloon markers are determined in each projection such as described in Section 2.2. Based on the two images a 3D model of the two markers is modelled such as explained in Section 2.3. The two 3D markers are then forward projected into the acquired projections with different projection geometry and different motion state. Therefore, in each projection two positions are determined for every marker: the actual position for this specific projection geometry (Gi ) and cardiac motion state (Tm ), and a calculated position of the marker at this projection geometry (Gi ) but at the cardiac motion state Tc0 (see Figure 3). All projections are then motion compensated into cardiac motion state Tc0 , by determining the transformation that brings the actual position of the marker in correspondence with the calculated position. In this initial study a affine transformation is applied. All
3D Reconstruction of Coronary Stents in Vivo
(a)
181
(b)
Fig. 3. A projection acquired at motion state Tm before (a) and after warping (b) into the optimal cardiac phase (Tc0 ) based on a Thin Plate Spline (TPS) transformation matrix (Φ)
corresponding points such as reference marker are warped exactly. The other pixels in each slice are warped linear depending on the distance to the landmarks. In Figure 3, a projection is shown for a motion state Tm 3(a) and its warped projection 3(b), transformed into cardiac phase (Tc0 ). Using this approach all projection images are motion compensated into cardiac phase Tc0 . These projection are then used as an input for a standard 3D filtered back-projection reconstruction algorithm such as Feldkamp [17]. 2.6 Initial Experiments on Patient Data In this work we present initial results of patient data sets of an ongoing study carried out at the Division of Cardiology at the University of Colorado Health Science Center (UCHSC). The objective of the study is to investigate the feasibility of fully automated 3D reconstruction of deployed coronary stents and other intra-cardiac devices in-vivo to enable device assessment and monitoring immediately after deployment and longterm. Angiograms are selected for inclusion in this study from patients who undergo coronary stent deployment in the proximal or mid left anterior descending (LAD), the proximal or mid circumflex (CX) artery, obtuse marginal branches (OM), or the proximal, mid, or distal right coronary artery (RCA). Up till now, data sets from 11 paR tients with intracoronary stents (CYPHER Sirolimus-eluting Coronary Stent) were acquired. Depending on the size of the respective stenosis, three different stent sizes were used: the diameter of the stents were either 2.75mm or 3.0mm and the stent length (unexpanded) varied between 8mm and 23mm. The acquisition protocol was carried out such as explained in Section 2.1. The acquisition was performed directly after stent implementation. The heart rate of the patients varied between 52 and 71 beats per minute. Since the center of the heart is not on the central axis of the patient, but located within the left thorax, the patients had to be positioned off-iso-center to ensure proper alignment of the iso-center of the system with the center of the heart. The diameter of the image intensifier was preferably chosen at 5 inch. However, the acquisition of large patients was carried out at 7 inch assuring the correct iso-centering of the system with the center of the markers.
182
B. Movassaghi et al.
(a)
(b)
(c)
(d)
Fig. 4. Volume rendered images of a 3D reconstructed stent deployed in patient one, utilizing the original data set (a) and utilizing motion compensated data set (b). The struts of the stent have a diameter of approximately 40µm.C ut planes perpendicular (c) and parallel (d) to the longitudinal direction of the volume rendered images of the reconstructed stent are visualized revealing information about the in-stent lumen.
(a)
(b)
(c)
Fig. 5. Illustration of volume rendered images of a deployed stent in patient two. Different cut planes perpendicular to the tangential direction of the stent have been chosen to visualize the inner-stent lumen.
3 Results In all 11 cases the deployed stent could be reconstruction successfully enabling high resolution images. The image quality varied based on the patient size and best results could be achieved in small patients(< 60kg). In Figure 4 exemplary images of a 3D
3D Reconstruction of Coronary Stents in Vivo
183
reconstructed stent in-vivo, utilizing the original data set (Figure 4(a)) and utilizing a motion compensated data set (Figure 4(b)) are presented. From Figure 4(b) it can be seen that the meshes (Ø < 114µm) of the stent can be visualized, which shows that a very high spatial resolution is achieved. In Figures 4(c,d) cut planes perpendicular(c) and parallel(d) to the longitudinal direction of the stent are visualized revealing information about the in-stent lumen. Volume rendered images of stents in different patients are illustrated in Figure 5. Different cut planes perpendicular to the tangential direction of the stent have been chosen to visualize the inner-stent lumen. Different viewing angles are selected to demonstrate the 3D aspect.
4 Discussion A fully automatic method has been presented that produces a 3D reconstruction of coronary stents after deployment in patients. The proposed method achieves a high spatial resolution (up to 40µm), by implementing additional projections that have been motion compensated in the reconstruction algorithm. The method was applied to 11 patient data sets and is subject of an ongoing study at the UCHSC. The coronary stents could be reconstructed and visualized in all cases. Best results were achieved utilizing the smallest image intensifier size (Ø 5inch). The proposed method supports cardiologists in allowing an immediate assessment of the completeness of stent deployment and expansion during a PTCA procedure. This method of X-ray based automatic 3D reconstruction has the potential to replace the need for intravascular ultrasound. The 3D stent reconstruction can be used to identify which stents do and which do not need addtional high-pressure balloon inflations to optimize treatment of the stenosis and minimize the risk of stent thrombosis. Unlike intravascular ultrasound that requires addtional equipment and adds more complexicity to the procedure, 3D stent reconstruction uses the existing imaging system and will add minimal time to the procedure. Thus, the 3D stent reconstruction tool could provide a cost-effective and rapid assessment of stent deployment leading to improved patient outcomes. Clinical studies showing the economic and patient outcomes effects will be executed after further technical refinements. A limitation of the method is the need of the markers positions in each projection. Thus, the proposed method can only be carried out during a invasive procedure. Furthermore the proposed method is only accurate for stent segments enclosing the guide wire segment between the markers. In future work more accurate results might be achieved by 1) incorporation of the guide wire between the two markers. 2) determination of the 4D motion vector field of the markers and guide wire in space; 3) incorporation of markers into the body of the stent itself obviating the need of using the guide wire and delivery balloon markers.
References 1. K. Nieman, F. Cademartiri, R. Raaijmakers, P. Pattynama, and P. de Feyter, “Noninvasive angiographic evaluation of coronary stents with multi-slice spiral computed tomography,” Herz, vol. 28, no. 2, pp. 136 – 42, 2003. 2. K. Nieman, J. M.R. Ligthart, P. W. Serruys, and P. J. de Feyter, “Left main rapamycin-coated stent: Invasive versus noninvasive angiographic follow-up,” Circulation, vol. 105, pp. 130 – 131, 2002.
184
B. Movassaghi et al.
3. N. Funabashi, N. Komiyama, and I. Komuro, “Patency of coronary artery lumen surrounded by metallic stent evaluated by three dimensional volume rendering images using ecg gated multislice computed tomography,” Heart, vol. 89, pp. 388, 2003. 4. D. Maintz, KU. Juergens, T. Wichter, M. Grude, W. Heindel, and R. Fischbach, “Imaging of coronary artery stents using multislice computed tomography: in vitro evaluation,” Eur Radiol., vol. 13, no. 4, pp. 830–835, 2003. 5. D. Maintz, H. Seifarth, T. Flohr, S. Kramer, T. Wichter, W. Heindel, and R. Fischbach, “Improved coronary artery stent visualization and in-stent stenosis detection using 16-slice computed-tomography and dedicated image reconstruction technique,” Invest Radiol., vol. 38, no. 12, pp. 790–795, 2003. 6. Degertekin M Ligthart JM Oortman RM Serruys PW Slager CJ Tanabe K, Gijsen FJ, “Images in cardiovascular medicine. true three-dimensional reconstructed images showing lumen enlargement after sirolimus-eluting stent implantation,” Circulation, vol. 106, no. 22, pp. e179–180, 2002. 7. P. Delachartre, E. Brusseau, F. Guerault, G. Finet, C. Cachard, and D. Vray, “Correction of geometric artifacts in intravascular ultrasound imaging during stent implantation,” IEEE International Ultrasonics Symposium,Caesars Tahoe, Nevada, 1999. 8. B. Movassaghi, V. Rasche, M. A. Viergever, and W. J. Niessen, “3D coronary reconstruction from calibrated motion-compensated 2D projections based on semi-automated feature point detection,” in Proc. SPIE Medical Imaging: Image Processing, San Diego, CA, 2004, vol. 5370, pp. 1943–1950. 9. C. Blondel, R. Vaillant, G. Malandain, and N. Ayache, “3-d tomographic reconstruction of coronary arteries using a precomputed 4-d motion field.,” Physics in Medicine and Biology, vol. 49, no. 11, pp. 2197–2208, 2004. 10. G. Shechter, F. Devernay, E. Coste-Maniere, A. Quyyumi, and E.R. McVeigh, “Threedimensional motion tracking of coronary arteries in biplane cineangiograms,” IEEE Trans. Med. Imag., pp. 493–503, 2003. 11. M. Moriyama, Y. Sato, H. Naito, M. Hanayama, T. Ueguchi, T. Harada, F. Yoshimoto, and S. Tamura, “Reconstruction of time-varying 3-d left-ventricular shape from multiview X-ray cineangiocardiograms,” IEEE Trans. Med. Imag., vol. 21, no. 7, pp. 773–785, 2002. 12. C. Blondel, R. Vaillant, F. Devernay, G. Malandain, and N. Ayache, “Automatic trinocular 3d reconstruction of coronary artery centerlines from rotational x-ray angiography,” in Computer Assisted Radiology and Surgery 2002 Proceedings (CAR ’02). 2002, pp. 832 – 837, Springer. 13. B. Movassaghi, V. Rasche, R. Florent, M. A. Viergever, and W. J. Niessen, “3D coronary reconstruction from calibrated motion-compensated 2D projections,” in Computer Assisted Radiology and Surgery (CARS 2003). 2003, pp. 1079–1084, Elsevier. 14. T. Netsch and H. O. Peitgen, “Scale-space signatures for the detection of clustered microcalcifications in digital mammograms,” IEEE Trans. Med. Imag., vol. 18, no. 9, pp. 774 – 786, 1999. 15. T. Netsch, “A scale-space approach for the detection of clustered microcalcifications in digital mammograms,” Digital Mammography ’96, pp. 301 –306, 1996. 16. T.J. Keating, P.R. Wolf, and F.L. Scarpace, “An improved method of digital image correlation,” Photogrammetric Eng. Remote Sensing, vol. 41, pp. 993 – 1002, 1975. 17. M. Grass, R. Koppe, E. Klotz, R. Proksa, M.H. Kuhn, H. Aerts, J. Op de Beek, and R. Kemkers, “3D reconstruction of high contrast objects using C-arm image intensifer projection data,” Computer. Med. Imaging Graphics, vol. 23, no. 6, pp. 311 – 321, 1999.
Retina Mosaicing Using Local Features Philippe C. Cattin, Herbert Bay, Luc Van Gool, and G´ abor Sz´ekely Computer Vision Laboratory, ETH Zurich, 8092 Zurich, Switzerland {cattin, bay, vangool, szekely}@vision.ee.ethz.ch
Abstract. Laser photocoagulation is a proven procedure to treat various pathologies of the retina. Challenges such as motion compensation, correct energy dosage, and avoiding incidental damage are responsible for the still low success rate. They can be overcome with improved instrumentation, such as a fully automatic laser photocoagulation system. In this paper, we present a core image processing element of such a system, namely a novel approach for retina mosaicing. Our method relies on recent developments in region detection and feature description to automatically fuse retina images. In contrast to the state-of-the-art the proposed approach works even for retina images with no discernable vascularity. Moreover, an efficient scheme to determine the blending masks of arbitrarily overlapping images for multi-band blending is presented.
1
Introduction
With the dramatic demographic shift towards an older population, age-related eye diseases, such as Glaucoma and Macular Degeneration, are about to become increasingly prevalent and to develop into a serious public health issue. However, age-related eye diseases are not the only emerging causes of visual loss. Diabetic retinopathy that is directly linked to the increase of diabetes in the developed world has already surpassed age-related diseases as the main cause of blindness. All the aforementioned potential causes of blindness can be treated with laser photocoagulation. The small bursts from the laser cause controlled damage that can be used to either seal leaky blood vessels, destroy abnormal blood vessels, reattach retinal tears or destroy abnormal tumorous tissue on the retina. Photocoagulation laser surgery is an outpatient procedure. The treatment is performed while the patient is seated in a chair and eye drops will be given to dilate the pupil, immobilise and anesthetise the eye. The laser treatment itself is currently performed manually and lasts from a few minutes up to half an hour, depending on the type of treatment and number of laser spots to apply. Although photocoagulation is the most widely used treatment for the aforementioned conditions, the success rate is still below 50% for both the first treatment and the possible re-treatments [1]. Many of these failures can be attributed to the manual nature of the procedure, in particular to the difficulty to quickly respond to rapid eye and head movements of the patient that can not be completely stopped, and the problems to quantitatively measure and control the applied laser energy. As it takes several weeks before knowing if laser surgery has been successful or re-treatment is required to prevent further vision loss, the success rate of the R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 185–192, 2006. c Springer-Verlag Berlin Heidelberg 2006
186
Ph.C. Cattin et al.
procedure has to be maximised. Additionally, the visual recovery declines with each re-treatment. A fully automatic surgery under controllable and reproducible conditions would therefore be desirable. Computer vision methods for mosaicing retina images would be a useful tool to achieve these goals. Such mosaics could be used on the one hand for diagnosis and interventional planning, and on the other hand as a spatial map for the laser guidance system during treatment. Many retinal image registration methods have been proposed in the literature [2,3,4]. The automatic stitching method proposed by Can et al. marked an important milestone in retinal image mosaicing. They used bifurcations of the vascular tree on the retinal surface to establish image correspondences. These bifurcations are easy to detect but sometimes lack distinctiveness, and their localisation is often quite inaccurate. To improve the landmark extraction Tsai proposed in [5] a model-based approach that greatly increased the accuracy and repeatability of estimating the locations where vascular structures branch or cross over. Stewart proposed in [6] a different approach that uses one or more initial correspondences defining the mapping only in a small area around these bootstrap regions. In each of these regions the transformations are iteratively refined using only information from this same area. The region is then expanded and tested to see if a higher order transformation model can be used. The expansion stops when the entire overlapping region of the images is covered. Finally, in [7] Tsai evaluated the performance of the two methods proposed by Can and Stewart in a clinically-oriented framework. All the previously mentioned methods are limited to cases with clearly visible vascular structures. Quite often, however, bleedings or tumorous tissue limits the number of detectable bifurcations or even prevents their proper segmentation, see Fig. 3(a) for an example. The work described below is part of a project aiming towards a fully automatic laser photocoagulation system. As will be argued, a key component of such a system is the automatic mosaicing method capable to stitch and blend the captured retina images. The method should be robust against all morphologies seen in the various pathologies subjected to laser photocoagulation.
2
Methods
Recent interest point detection/description schemes, such as SIFT [8] have brought about important progress in image mosaicing [9,10]. Yet, in our experiments, these state-of-the-art methods failed to identify sufficient number of reliable, distinct landmarks allowing to build retinal mosaics. The use of our very recently developed framework [11] proved more successful. The method for obtaining a mosaic image from multiple images of the retina, using region detectors/descriptors, can be summarised with the following steps. (1) Interest points are detected at different scales and their feature descriptors are calculated in each image individually. (2) The region descriptors are robustly matched for each image pair to find the overlapping images using the secondnearest-neighbour ratio matching strategy [12]. (3) A graph theoretical algorithm
Retina Mosaicing Using Local Features
187
is used to find the anchor image that is connected to all the other images through the shortest path. (4) The mappings from all images into the coordinate system of the anchor image are linearly estimated and then globally optimised. (5) All images are warped into the coordinate frame of the anchor image and multi-band blending is used to seamlessly fuse the individual retina images. The subsequent paragraphs describe the aforementioned steps in more detail. (1) Interest point detection and feature descriptors: The first task for image mosaicing is the identification of a possibly high number of interest points in all images of the same scene. As described in the introduction, the previously presented retina mosaicing methods used the branches and crossovers of the vascular tree as interest points. Although this approach works very well it is bound to fail for pathological retinas, where the vascular tree is seriously occluded by bleeding or tumorous tissue, see Fig. 3(a) for an example. Only our recent developments in interest point detection and characterisation [11] made the application of this technology possible for the highly self-similar regions generally seen in retina images. Our algorithm proved to be superior to the state-of-the-art methods with regard to repeatability and stability. It uses a Hessian-matrix based detector in order to identify blob-like interest points and a 128-dimensional distributionbased vector to describe the neighbourhood of every interest point. Given a set of N input images I1 , ..., IN for the construction of the mosaic, let each image Ii have a number Ni of interest points. As a first step the image coordinates (u, v) of these interest points are extracted yielding pi,1 , ...pi,Ni . Their respective feature descriptors are defined by ci,1 , ..., ci,Ni . Once the interest points have been found, the overlapping images can be identified by establishing the correspondences between all image pairs. (2) Matching: Given a pair of images Ii , Ij with their respective interest points and feature descriptors, for every interest point in the first image Ii , we calculate the euclidian distance to all feature descriptors in the second image Ij . If the ratio of the nearest neighbour e1N N and the second-nearest neighbour e2N N is smaller than the predefined threshold of 0.8, see [8,10], a match is assumed to be correct and is therefore added to the list of putative matches, see Fig. 1(a). Note, that any given interest point pj,1..Nj in Ij can (in contrast to the interest points pi,1..Ni in Ii ) be matched by more than one interest point from image Ii . Even if the human retina is not planar, the images appear flat as the ophthalmoscope captures only a small part of the whole retina. Therefore, homographies can be used to roughly approximate the transformation of the putative interest point matches for a given image pair. Let X be a point on a planar surface with projections, xi and xj , in two images taken from different viewpoints. The homography H describes the transformation that connects xi and xj for any point X on that surface, thus xj = Hxi . By robustly estimating the homography with the classical RANSAC scheme [13], the transfer error d(xj , Hxi ) can be used to filter out possible mismatches from the list of putative matches. As the initial matches are paired using a very conservative feature vector comparison, many pairs are not matched in the first step. The advantage of
188
Ph.C. Cattin et al.
I2
I1
I5 IA I6
I3 (a)
(b)
(c)
Fig. 1. (a) Correspondences after matching SURF features (b) after the guided matching. (c) Example graph of the connected images after pairwise matching.
this initial conservative selection is that the ratio of inliers vs. outliers is better and thus a correct homography can be found using RANSAC. Once the initial homography is established, more correspondences can be found with a guided matching method. Knowing the homography, the final correspondences are established by ignoring the feature descriptors and just matching interest points if they are spatially close enough, see Fig. 1(b) for an example. (3) Anchor Image Selection: Once the correspondences between all image pairs have been determined, the anchor image IA is identified. The selection of the anchor image among the available images IA ∈ I1 , ...IN , plays a crucial role in image mosaicing as the anchor defines the base coordinate system towards which all other images are warped. A central image having direct correspondences with all other ones in the data set would be an ideal candidate. However, such an image usually does not exist. The next logical candidate would thus be an image that connects to all other images through the least number of connections, see Fig. 1(c). Graph theoretical methods [14] allow to calculate the minimum distances between nodes given the connectivity matrix. If more than one node is connected to the rest with the minimal number of connections, the node with the highest number of correspondences with its neighbouring images is selected. ⎛ 2⎞ x ⎜xy ⎟ ⎜ 2⎟ ⎟ x θ11 θ12 θ13 θ14 θ15 θ16 ⎜ ⎜y ⎟ = (1) ⎜ y θ21 θ22 θ23 θ24 θ25 θ26 ⎜ x ⎟ ⎟ ⎝y⎠ 1 (4) Mapping Estimation: Even if overlapping regions of image pairs can be related with homographies, a planar transformation model for the mosaic would result in a false representation of distances and angles. The curved nature of the retina can best be taken into account by using a quadric transformation model Θ as proposed by Can et al. in [4]. It transforms a point x = (x, y) of an input image to a point x = (x , y ) in the anchor image IA coordinate system. In a first step the transformations Θi,A for each image Ii ∈ I1 , ..., IN \ IA overlapping with the anchor image IA are estimated solving the linear equation
Retina Mosaicing Using Local Features
189
system with SVD shown in Eq. 1. For all other images having no overlap with the anchor image a different, indirect approach is followed. As an example, the mapping of image I5 to the anchor image IA in Figure 1(c) can be estimated indirectly via the already known transformations Θ2A and Θ3A . To ensure the most accurate estimation for the indirect transformations, all possible paths1 are used when building the linear equation system. These initial transformation estimates Θi,A are then subject to a global optimisation in which the sum of squared reprojection errors is minimised (Sec. 5.2 in [15]). (5) Image Warping and Blending: At this stage, the geometrical relationship between the anchor image IA and all its surrounding images is known. In a first step, all images Ii are warped into the spatial domain of the anchor image IA forming a new image Ii0 . As only the forward mapping ΘiA is known and inverting this transformation proved to be difficult [15], interpolation is used to calculate all pixels in Ii0 . The intensity at each pixel location x = (x, y) in Ii0 is the weighted average of all the intensities in Ii0 falling within a pixel radius. With all images mapped into the same coordinate system, the images are combined to form a single visually appealing mosaic with invisible image boundaries. Ideally, corresponding image pixels should have the same intensity and colour on every image. In practice, however, this is not the case. Possible causes are inhomogeneous illumination, changes in aperture and exposure time, misregistrations of the mosaicing procedure, etc. Thus, advanced image blending methods are required to yield visually pleasing results. In order to preserve the fine structures seen on retina images while smoothing out low frequency variations caused by irregular illumination, we decided to use the multi-band blending method proposed by Burt [16]. The idea behind this approach is to blend the high frequencies over a small spatial range and the low frequencies over a large spatial range. This can be performed for multiple frequency bands. In our implementation, we consider 6 octaves using a Laplacian pyramid. The main difficulty with this approach is the design and implementation of suitable and computationally efficient blending masks for the arbitrarily overlapping images. In order to determine this blending mask for every image, we assigned a weight wi (x, y) to each pixel in an image that varies from 1 in the center of the image to 0 at the edge. The choice of this weighting function was inspired by the observation that (a) retina images tend to be more focused in the middle and (b) the illumination gets inferior with increasing distance from the image center. The weights have to be warped into the anchor coordinate system together with the image pixels. In case pixels from multiple images are mapped to the same pixel location in the anchor domain, the pixel of image Ii0 with the strongest weight wi (x, y) is considered. By evaluating this decision for each pixel location we obtain a binary blending mask Mi0 for each warped image Ii0 . For the blending to work properly, see [16], pixels at blending borders have to be averaged. This can be easily implemented by dilating all the masks Min at each level n of the Laplacian pyramid by one pixel and then dividing the fused 1
Applied to the example in Fig. 1(c) Θ5A is estimated via Θ2A as well as via Θ3A .
190
Ph.C. Cattin et al.
Mask Image Pyramid Pyramid
REDUCE EXPAND
Mi2 ⇑ Mi1
Ii2 ⇑ Ii1
⇑
⇑
Mi0
Ii0
Fused Laplacian
Laplacian
L2i -P
-P
L1i
L0i
P
/
&
P
P
l1
r1
Mi1 /
&
r2
Mi2 /
&
l2
Fused Pyramid
l0
r0
Mi0
Fig. 2. A summary of the steps involved for the multi-band blending of two images (the steps for the second image are in light gray) over three octaves. First, the warped images Ii0 and their blending masks Mi0 are used to generate the Gaussian pyramid levels Ii1 , Ii2 , ... and masks sizes Mi1 , Mi2 , ... . The Laplacian pyramid is computed as the difference between adjacent Gaussian levels (one of them must be expanded first). The Laplacian pyramid is then masked with the dilated blending mask (&) and summed for every image in the pyramid. Finally, the resulting blended image r 0 is generated by expanding and summing all levels of the pyramid.
Laplacian image at each level by the number of images that contributed to a certain pixel. The data flow diagram to blend two images over three octaves is depicted in Fig. 2.
3
Results
We have applied the proposed technique to acquisitions from a patient database with retinal images showing various pathologies. In particular, we also investigated cases where the vascular tree is not clearly visible and thus the state-ofthe-art methods would fail. The images were taken with a high resolution digital camera 2256 × 2032 pixels but scaled to half their size, i.e. 1128 × 1016 pixels, to assess the performance on the more common lower resolution images. We evaluated the robustness of the matching step that automatically finds overlapping image pairs using a set of 100 pairs. The matching method failed in 6 cases. All of these cases had an overlap of less than 20.2% (average 11.4%). On the other hand it successfully managed to match 12 image pairs with an overlap of 20% or less (average 14.3%). Per image pair an average of 311 correspondences (max 1763, min 9) were found and none for the failed cases. The sample mosaics depicted in Fig. 3 give a visual impression of the efficacy of the proposed method. It can be clearly seen, that even retinas without visible vascularisation in the overlapping region can be correctly mosaiced. The average reprojection error for these mosaics was 1.01 ± 0.01 pixels.
Retina Mosaicing Using Local Features
(a)
191
(b)
Fig. 3. Two contrast enhanced sample mosaics with different pathologies composed of (a) 6, (b) 7 retina images and multi-band blending over 6 octaves
4
Conclusions
In this paper, we described an approach to mosaic retina images relying on our recently published results for region detection and characterisation [11]. We demonstrated that the SURF method can be efficiently used to mosaic highly self-similar retina images even for cases with no discernable vascularisation. We also proposed an efficient way to determine the blending masks required for multi-band blending of arbitrarily overlapping images. The algorithm is currently integrated into the interventional planning system. At the same time we are working on approaches allowing to utilise the applied feature matching method for intra-operative navigation support.
Acknowledgments This work has been supported by the CO-ME/NCCR research network of the Swiss National Science Foundation (http://co-me.ch). We thank the University Hospital in Berne, Switzerland for providing the retina images.
References 1. Zimmer-Galler, I., Bressler, N., Bressler, S.: Treatment of choroidal neovascularization: updated information from recent macular photocoagulation study group reports. Int. ophthalmology clinics 35 (1995) 37–57 2. Hart, W.E., Goldbaum, M.H.: Registering retinal images using automatically selected control point pairs. In: Proc. IEEE International Conference on Image Processing (ICIP). Volume 3. (1994) 576–81
192
Ph.C. Cattin et al.
3. Becker, D.E., Can, A., Turner, J.N., Tanenbaum, H.L., Roysam, B.: Image processing algorithms for retinal montage synthesis, mapping, and real-time location determination. IEEE Trans. on Biomedical Engineering 45 (1998) 105–18 4. Can, A., Stewart, C.V., Roysam, B.: Robust hierarchical algorithm for constructing a mosaic from images of the curved human retina. In: IEEE Conference on Computer Vision and Pattern Recognition. Volume 2. (1999) 286–92 5. Tsai, C.L., Stewart, C.V., Tanenbaum, H.L., Roysam, B.: Model-based method for improving the accuracy and repeatability of estimating vascular bifurcations and crossover from retinal fundus images. IEEE Transactions on Information Technology in Biomedicine 8 (2004) 122–30 6. Stewart, C.V., Tsai, C.L., Roysam, B.: The dual-bootstrap iterative closest point algorithm with application to retinal image registration. IEEE Trans. Med. Imaging 22 (2003) 1379–1394 7. Tsai, C.L., Majerovics, A., Stewart, C.V., Roysam, B.: Disease-oriented evaluation of dual-bootstrap retinal image registration. In: MICCAI. Volume 2878 of LNCS. (2003) 754–761 8. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (2004) 9. Brown, M., Lowe, D.G.: Recognising panoramas. In: 10th International Conference on Computer Vision. (2003) 1218–25 10. Brown, M., Szeliski, R., Winder, S.: Multi-image matching using multi-scale oriented patches. In: International Conference on Computer Vision and Pattern Recognition (CVPR). (2005) 510–517 11. Bay, H., Tuytelaars, T., Gool, L.V.: SURF: Speeded up robust features. In: ECCV. Volume 3951 of LNCS. (2006) 404–17 12. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. Transactions on Pattern Analysis and Machine Vision (2005) Accepted to PAMI. 13. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press (2000) 14. Diestel, R.: Graph Theory. New York: Springer-Verlag (1997) 15. Can, A., Stewart, C., Roysam, B., Tanenbaum, H.: A feature-based technique for joint, linear estimation of high-order image-to-mosaic transformations: Application to mosaicing the curved human retina. In: IEEE Conf. on Computer Vision and Pattern Recognition. Volume 2. (2000) 585–91 16. Burt, P.J., Adelson, E.H.: A multiresolution spline with application to image mosaics. ACM Transactions on Graphics 2 (1983) 217–36
A New Cortical Surface Parcellation Model and Its Automatic Implementation C´edric Clouchoux1 , Olivier Coulon1 , Jean-Luc Anton2 , Jean-Fran¸cois Mangin3 , and Jean R´egis4 1 2
Laboratoire LSIS, UMR 6168, CNRS, Marseille, France Centre d’IRM fonctionnelle de Marseille, Marseille, France 3 Equipe UNAF, SHFJ, CEA/DSV, Orsay, France 4 INSERM U751, Marseille, France
Abstract. In this paper, we present an original method that aims at parcellating the cortical surface in regions functionally meaningful, from individual anatomy. The parcellation is obtained using an anatomically constrained surface-based coordinate system from which we define a complete partition of the surface. The aim of our method is to exhibit a new way to describe the cortical surface organization, in both anatomical and functional terms. The method is described together with results applied to a functional somatotopy experiments.
1
Introduction
Understanding the organisation of the human cortex has been a very challenging field of research for the neuroimaging community in the last few years. One of the major challenges is to find relationships between anatomy and function. In this framework, considering the cortex as a 2D-surface has been shown to be a very interesting and efficient way to analyze and to try to understand the cortical organisation. Whereas many representations and models of the cortical anatomy have been proposed, the interaction between anatomy and functions are a lot harder to model. In [8], cortical maps are described, which suggest a specific organisation of the brain functions in relation with the cortex anatomy. Besides, different methods to represent and to label the human brain cortex have been presented previously [5,12,3]. An interesting approach consists in parcellating the cortical surface, in order to identify regions, patches or gyri, which may have both anatomical and functional homogeneity. This kind of approach offers solutions to describe the cortical surface by overcoming the sulcal variability. Few methods describe a way to automatically define the parcels [1,4], although different nomenclatures can be used. In [4], the cortical surface is labelled via probabilistic information estimated from a manually labelled training set. In [1], sulci are considered as indicators of the meeting lines buried in the fold’s depth between two neighboring gyri. A limit of this method is that the model of sulci does not define all the limits of the gyri, which implies some technical problems and instability. In this paper, we propose a new approach to automatically define a robust gyri-based parcellation of the cortical surface. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 193–200, 2006. c Springer-Verlag Berlin Heidelberg 2006
194
C. Clouchoux et al.
Our method relies on stable anatomical features, the sulcal roots [10], and their known geometric organisation on the cortical surface. Those features are used to first define an individual surface-based coordinate system as described in [2]. From this parameterization, we define a complete partition of the cortex in gyri defined by anatomically stable and functionally meaningful iso-coordinate axis. We aim at exhibiting reproducible gyri with an anatomical/functional homogeneity. The paper is organized as follow: in section 2, we present the cortical anatomy theories underlying our method. Section 3 presents the method itself. An experimental protocol is presented in section 4, and results and discussion about the method are proposed in section 5.
2
Anatomical Background and Parcellation Model
Inter-subject variability is a major problem in studying human brain. One reason for this variability, especially the pattern of sulci, is thought to be the result of the cortex folding process. Several representations of the cortex try to understand the sulci organisation over the cortical surface [13,10,7,14,12,5]. A geometric model is proposed in [14], suggesting an orthogonal organization of the main sulci. In [10], an original approach is proposed to explain the sulci organisation: the sulcal roots. The main idea of this new model is the spatial stability of deeply buried subparts of sulci, corresponding to the first folding location during antenatal life. Moreover, this model proposes that these entities are naturally organised, according to two major orthogonal directions. This does not only provide an anatomical model of the sulci organization over the cortical surface, but also gives a functional meaning to it. The cerebral cortex is divided into anatomically and functionally distinct areas, forming a species-specific area map across the cortical sheet. The protomap model [9] proposes that cells of the embryonic vesicle carry intrinsic programs for species-specific cortical regionalization [8]. Hence, the protomap stability across individuals should imply stability of the folding process during gestation [15]. This observation, assuming that the deeply buried cortex at the adult stage correspond to the places where the folding process begins at the foetal stage, leads us to hypothesize the existence of prosulcal and progyral maps embedded in the protomap. This implies that sulcal roots related folds are bound to occur somewhere into this map related to boundaries between cortical areas. In [16], it is shown that some boundaries between architectonic maps do clearly correspond to cortical folding. Admitting that sulcal roots are closely related to first folding locations raises the fact that aligned sulcal roots are thought to describe axes delimiting functional cortical areas. For instance, an axis going through several sulcal roots, from the Central Sulcus to the frontal cortex describes a modal gradient. Another example would be an axis, also going through several sulcal roots, along the Central Sulcus, describing a vertical somatotopic gradient. Hence, sulcal roots are thought to be crucial information for both anatomical and functional organization of the cortical surface. In [1], correlation between sulci-based and gyri-based representations has been established. In our approach, a meaningful parcellation can be
A New Cortical Surface Parcellation Model
195
Fig. 1. Sulcal roots (in red) delimiting different parcels
raised using axes described by several sulcal roots alignments. Figure 1 shows a schematic description of a parcel. Following the sulcal roots model, an axis going through several sulcal roots has a functional meaning, separating 2 different functional regions. Therefore, we define a parcel as the region defined by 4 axes, two meridians and two parallels. In other words, a parcel is a region delimited by four sulcal roots (fig. 1). We then aim at identifying, in a population, similar cortical regions having anatomical and functional homogeneity.
3
Method
The parcellation we propose here is based on an anatomically constrained surface parameterization, presented in [2], based on the sulcal roots theory presented above. From the resulting coordinate system, we define a parcellation using some specific axes, corresponding to the alignement of several anatomical markers associated with sulcal roots. 3.1
Surface-Based Coordinate System
The first step of the parcellation is to define an individual cortical coordinate system on the white matter/grey matter interface, constrained by a set of anatomical markers. The process is fully described in [2]. The set of anatomical markers was defined by studying a number of brains. From this expertise, a generic set of landmarks was chosen. They are subparts of sulci, corresponding to folds around sulcal roots. For each individual brain, automatic labelling of sulci [11] is performed. Identified markers are then projected on the cortical surface, and each one is attributed a coordinate value. Attention must be given to this step, as it guarantees a homogeneous spreading of the coordinates on the whole surface and minimizes metric distortions. Coordinates are then propagated on the surface using the heat equation, where anatomical markers are used as heat sources. In order to guarantee a better regularity of the coordinate field and minimize distortions of parallel/meridians, the process has been improved in the following way : the ”heat equation with constant sources” model has been changed to a heat diffusion process with an additional data-driven term that minimizes the
196
C. Clouchoux et al.
mean square difference between the coordinate field and the marker coordinates, as defined in equation 1: ∂I(r, t) = ΔI(r, t) − β(r)(I(r, t) − C(r)), ∂t
(1)
where I(r, t) stands for the considered coordinate at node r at time t. C(r) is a function that equals 0 everywhere except at the markers location where it equals the marker coordinate. β(r) is a weight function that equals 0 if C(r) = 0 (no marker) and a constant value β if C(r) = 0. The resulting coordinate system is illustrated in figure 2. Iso density lines of the coordinates system follows the geometry of the surface, and locally complies with the anatomical constraints.
Fig. 2. Resulting coordinate systems on 2 different brains (All visualizations made with free package Brainvisa - http://brainvisa.info)
3.2
Parcellation
Isodensity lines going through the predefined anatomical markers are very specific axes. First, as they are constrained to fit with the sulcal roots organization, they are reproducible through individuals [2]. Second, they are believed to be functionally meaningful, since they represent the alignement of several sulcal roots. Hence, we can use those sulcal roots related axes to delimit parcels over the entire cortical surface. An example of such parcellation is shown in figure 3. Parcels described here are orthogonally organised, following the sulcal roots scheme (fig. 4), and neighborhood relationships between gyri are conserved.
4
Experiments
We tested our parcellation method on a functionnal experiment on a set of mature non pathological brains. The method was applied on 10 right-handed healthy volunteers who were part of a somatotopy experiment. Individual tstatistic maps corresponding to right motor activations of the foot, elbow, little finger, index finger, thumb, and tongue were projected on individual left brain hemisphere. Maxima corresponding to primary motor activations were then extracted, and plotted on each brain surface (fig. 5).
A New Cortical Surface Parcellation Model
197
Fig. 3. Results of the parcellation on six different brains
5
Results and Discussion
What we expect from our first experiment is a reproductibility of the parcellation across individuals, and a good fit with individual anatomical features. We also expect the primary motor activations involving different parts of the body to be located in different parcels. For instance, the parallel going through the Central Sulcus, the Interparietal Fissure and the Temporal Superior Sulcus describes a modal gradient. Moreover, it is known to delimitate, at the location of these three sulci, a separation of functional process involving the face and process involving lower parts of the body. Therefore, we expect such a separation at the somatotopic level in the primary motor areas along the precentral gyrus. Similarly, we would expect a separation between the foot and higher parts of the body along the same gyrus. Figure 3 shows the resulting parcellation on 6 different brains. A first observation is that the different parcellations are respectful of each individual anatomy,
198
C. Clouchoux et al.
A.
B.
C.
D.
Fig. 4. (A and B) Resulting parcellation on an inflated cortical surface. (C) A cytoarchitectonic and functional parcellation of the human cerebral cortex [7] , compared to (D) our automatic parcellation.
Fig. 5. Plotting of right motor activations on 3 parcellated brains. Activations colors: red=foot, blue=elbow, pink=little finger, yellow=index finger, black=thumb, green=tongue.
and complies with the cortex geometry. Moreover, parcellation is stable across individuals. All the regions are present in every subject, with a stable relationship scheme between the different parcels. Figure 4 (A,B) shows the parcels organization on an inflated brain. It is interesting to point out the similarity between the cytoarchitectonic and functional scheme presented in [6] and the parcellation model we get (see figure 4). Figure 5 shows the plotting of right motor activations on 3 brains. The usual pre-central gyrus is here divided in 4 parcels, called pre-central gyrus 1, 2, 3 and 4. On 10 subjects, the results of our experiment were : all the foot activations were located in pre-central gyrus 1. On 40 activations of the arm (elbow and 3 fingers), 37 are located in gyrus pre-central 2, 1 at the limit between pre-central gyri 1 and 2, and 2 at the limit between pre-central gyri 2 and 3. 7 tongue activations were in the precentral gyrus 3, the 3 others beeing located
A New Cortical Surface Parcellation Model
199
in the pre-central gyrus 4. We therefore observe that the delimitation induced by the parallel Central Sulcus/Interparietal Fissure/Temporal Superior Sulcus systematically separates activations of the face from activations of the rest of the body. Foot activations are also dissociated from the rest of the body. These results seem to show the expected functional specialization of our parcellation model. It is interesting to note that our parcellation provides a higher level of description than more usual models. However, the set of anatomical markers used to constrain our coordinate system (and therefore our parcellation scheme) is the same than the one used in [1] to directly define gyrus. Therefore, it seems natural that by merging our parcels, we can recover the gyri decribed in [1]. For instance, the pre-central gyrus can be recovered by merging pre-central gyri 1, 2, 3 and 4 described in figure 3. Figure 6 shows an example of this. By doing so, we get a better definition than in the parcellation presented in [1], which uses anatomical features to define 2 borders of a gyri, while the 2 other borders are defined by competing evolution on the surface. This induces problems: unstable neighborhood relationships between gyri, and unstable delimitation of some gyri, whereas with the method presented here, all parcels are directly defined by anatomical information and are therefore more stable.
Fig. 6. Recovering a previous gyral description. (left) Gyral parcellation presented in [1]. (right) Gyral parcellation obtained by merging parcels produced by our method.
6
Conclusion
In this paper we presented a method to automatically parcellate the cortical surface in regions functionally meaningful, using an anatomically constrained surface-based coordinate system. The parcellation is reproducible through individuals, and primary activations involving different parts of the body are located in different gyri, which shows that our method is both anatomically and functionally relevant. The method has many potential neuroscience applications, such as cortex morphometry studies, functional localization and inter-subject functional and anatomical correlation studies.
200
C. Clouchoux et al.
References 1. A. Cachia, J.-F. Mangin, D. Rivi`ere, D. Papadopoulos-Orfanos, F. Kherif, I. Bloch, and J. R´egis. A generic framework for parcellation of the cortical surface into gyri using geodesic voronoi diagrams. Medical Image Analysis, 7(4):403–416, 2003. 2. C. Clouchoux, O. Coulon, D. Riviere, A. Cachia, J-F. Mangin, and J. Regis. Anatomically constrained surface parameterization for cortical localization. In J. Duncan and G. Gerig, editors, LNCS - Medical Image Computing and ComputerAssisted Intervention - MICCAI 2005: 8th International Conference, volume 3750, pages 344–351. Springer-Verlag Berlin Heidelberg, octobre 2005. 3. D.C. Van Essen and H.A. Drury. Structural and functional analyses of human cerebral cortex using a surface-based atlas. The Journal of Neuroscience, 17(18):7079– 7102, 1997. 4. B. Fischl, A. van der Kouwe, C. Destrieux, E. Halgren, Florent Sgonne, D.H. Salat, E. Busa, L.J. Seidman, J. Goldstein, D. Kennedy, V. Caviness, N. Makris, B. Rosen, and A.M. Dale. Automatically parcellating the human cerebral cortex. Cerebral Cortex, 14:11–22, 2004. 5. J.-F. Mangin, V. Frouin, I. Bloch, J. R´egis, and J. L´ opez-Krahe. From 3D magnetic resonance images to structural representations of the cortex topography using topology preserving deformations. Journal of Mathematical Imaging and Vision, 5:297–318, 1995. 6. M.-M. Mesulam. Principles of Behavioral and Cognitive Neurology, Second Edition. Oxford University Press, New York, 2000. 7. M. Ono, S. Kubik, and C. Abernathey. Atlas of the cerebral sulci. Thieme Medical Publishers, Inc., New York, 1990. 8. P. Racik. Neurobiologie. neurocreationism - making new cortical maps. Science, 294:1011–1012, 2001. 9. P. Rakic. Specification of cerebral cortical areas. Science, 241:170–176, 1988. 10. J. R´egis, J.F. Mangin, T. Ochiai, V. Frouin, D. Rivi`ere, A. Cachia, M. Tamura, and Y. Samson. Sulcal roots generic model: a hypothesis to overcome the variability of the human cortex folding patterns. Neurol Med Chir, 45:1–17, 2005. 11. D. Rivi`ere, JF. Mangin, D. Papadopoulos-Orfanos, J.M. Martinez, V. Frouin, and J. R´egis. Automatic recognition of cortical sulci of the human brain using a congregation of neural networks. Medical Image Analysis, 6(2):77–92, 2002. 12. P.M. Thompson, C. Schwartz, and A.W. Toga. High-resolution random mesh algorithms for creating a probabilistic 3-d surface atlas of the human brain. Neuroimage, 3:19–34, 1996. 13. P. Todd. A geometric model for the cortical folding pattern of simple folded brains. J Theor Biol, 97:529 538, 1982. 14. R. Toro and Y. Burnod. Geometric atlas: modeling the cortex as an organized surface. NeuroImage, 20(3):1468–1484, 2003. 15. O. Turner. Growth and development of the cerebral pattern. Man. Arch. Neurol.Psychiatry, pages 1–12, 1948. 16. K. Zilles, A. Schleicher, C. Langemann, K. Amunts, P. Morosan, N. PalomeroGallagher, T. Schormann, H. Mohlberg, U. Brgel, H. Steinmetz, G. Schlaug, and P.E. Roland. Quantitative analysis of sulci in human cerebral cortex: Development, regional heterogeneity, gender difference, asymetry, intersubject variability and cortical architecture. In Human Brain Mapping, volume 5, pages 218–221, 1997.
A System for Measuring Regional Surface Folding of the Neonatal Brain from MRI Claudia Rodriguez-Carranza1, Pratik Mukherjee2 , Daniel Vigneron2 , James Barkovich2, and Colin Studholme1,2 1
2
NCIRE/VAMC, San Francisco, CA Department of Radiology, University of California, San Francisco
Abstract. This paper describes a novel approach to in-vivo measurement of brain surface folding in clinically acquired neonatal MR image data, which allows evaluation of surface curvature within subregions of the cortex. This paper addresses two aspects of this problem. Firstly: normalization of folding measures to provide area-independent evaluation of surface folding over arbitrary subregions of the cortex. Secondly: automated parcellation of the cortex at a particular developmental stage, based on an approximate spatial normalization of previously developed anatomical boundaries. The method was applied to seven premature infants (age 28-37 weeks) from which gray matter and gray-white matter interface surfaces were extracted. Experimental results show that previous folding measures are sensitive to the size of the surface of analysis, and that the area independent measures proposed here provide significant improvements. Such a system provides a tool to allow the study of structural development in the neonatal brain within specific functional subregions, which may be critical in identifying later neurological impairment.
1
Introduction
The preterm birth rate, or percent of infants born at less than 37 completed weeks of gestation, has been on the rise in the last two decades in the USA. During 2003 the preterm delivery rate was 12.3% [6], which represents an increase of 16% since 1990 and more than 30% since 1981. There is growing evidence that premature birth can result in structural and functional alterations of the brain, which are related to adverse neurodevelopmental outcome later in life [7, 8]. Some of the challenges that preterm infants face range from spastic motor deficits (cerebral palsy) [17], impaired academic achievement [5, 3], and behavioral disorders [9]. However, the etiology of cerebral abnormalities that underlie these common and serious developmental disabilities is not entirely understood [8]. The wider availability of clinical in vivo magnetic resonance imaging of neonatal brain anatomy, provided by systems which make use of an MRI compatible incubator, creates a new opportunity to quantify brain development. In this work we are particularly interested in the study of cortical folding, or gyrification, in preterm infants. We would like to examine folding within specific subregions of the cortex that may correspond to the development of particular functional areas. In our earlier work we have applied previously proposed global R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 201–208, 2006. c Springer-Verlag Berlin Heidelberg 2006
202
C. Rodriguez-Carranza et al.
measures of folding to surfaces extracted from neonatal MRI, and showed their use in tracking global age changes [13]. The main limitation was that the previously proposed measures are heavily dependent on the area of the surface being examined. Hence those measures cannot be directly applied to subregions of the brain. In this paper we describe an approach to creating area-independent global measures of folding and an approach to partitioning the early cortex into regions to which these area-independent folding measures can be applied. We show the influence of the normalization on synthetic data and then apply the measures to surfaces extracted from MR images of seven premature infants with ages from 28-37 weeks.
2
Method
Several global 3D brain surface folding measures have previously emerged. Some are based on surface principal curvatures (k1 , k2 ), mean curvature (H = 12 (k1 + k2 )), or Gaussian curvature (K = k1 k2 ). Examples are folding index (F I) [4], the intrinsic curvature index (ICI) [4], L2 norm of the mean curvature (M LN ) [1], L2 norm of Gaussian curvature (GLN ) [1], and average curvature [11]; global shape index (GS) and global curvedness (GC) were defined in [13] based on the local shape descriptors curvedness (c) [10] and shape index (s) [10]. Other measures are non-curvature based such as the gyrification index [18], calculated as the ratio of the entire cortical contour of the brain to the superficially exposed contour, and roundness (Rn), based on surface area (A) and volume (V ) (it is also described as surface complexity [11] and isoperimetric ratio [1]). A list of current global 3D measures of gyrification is shown on the left column of Table 1. These expressions are normalized to yield the unit value for a sphere [13]. 2.1
Size-Independent Measures of Surface Folding
Examination of the form of the current global measures in Table 1 reveals a critical dependency on surface area, with the exception of shape index and gyrification index. This can be better illustrated with the following example, using M LN to measure folding. Take a whole sphere of radius Ro, half a sphere of radius Ro, and a whole sphere of half that radius. The three objects have the same surface complexity, hence a measure of global folding should yield identical results. By definition, H = 1/R at each point in a sphere. Surface folding yields 1 M LN = 4π × 4πRo2 × (1/Ro)2 = 1 for the whole sphere with radius Ro and 1 2 2 also for whole sphere with radius Ro/2: M LN = 4π × (4π Ro 2 ) × (2/Ro) = 1. On the other hand, M LN for the half sphere of radius Ro is half that value: 1 M LN = 4π × 12 4πRo2 × (1/Ro)2 = 12 . This demonstrates the dependency of the measure on the size of the surface of analysis. Normalizing M LN with the surface 1 1 2 2 2 area does not alleviate the problem: M LN = A A H = 4πRo2 (4πRo )(1/Ro) 1 = Ro2 (whole sphere of radius Ro). We propose two normalization factors: 1) T = 3V /A and 2) |H| = A1 (|H + | + |H − |). The resulting T -normalized and H-normalized expressions (T -measures and H-measures, respectively) are shown
A System for Measuring Regional Surface Folding of the Neonatal Brain
203
Table 1. Measures of surface folding Current measures Area-dependent 2 1 M LN = 4π A H [1]
GLN = [1] GLN =T K ICI= ICI =T K F I= F I =T ak GC= GC =T Uc U= U = H / H Rn= [1] SH2SH=T K / K SK2SK=T Area-independent H /A s [13] GS= M LN = c Gyrification Index [18] GC = 1 4π 1 4π 1 4π
A ∗ K K [4] ak [4] c [13] U [11]
New measures Area-independent 2 2 M LNT = TA AH
2
A
+
A A
√1 A A4π 1 A A A √ 3 36πV 2
1 2
A
T
1 A
A
T
1 A
A
T
1 A
A
T
T
4
T AU
2
+
A
2
A
A
2
A
1 A
H
A
H
Notation: c = 2 π
arctan
k2 +k1 k2 −k1
1 |H| 1 A|H|
A
2
A
A
AFU + =AU + /A 1 (k12 2
+ k22 ) (curvedness [10]), s =
(shape index [10]), ak = |k1 |(|k1 | − |k2 |),
1 T = and |H| = A (|H + | + |H − |). A curvature’s sign is + − identified with or , U ∈ [H + , H − , K + , K − ], and AU + is the area of the surface with positive U .
3 VA ,
in the right column of Table 1. For a sphere, T = Ro, |H| = 1/Ro, and it can be verified that M LNT = 1 for the three sphere cases. Similarly for GLNT , ICIT , F IT , GCT , UT , M LNH , and GCH . In addition, we defined three new area-independent folding measures: SH2SH, SK2SK2, and AFU . The first two were based on the rationale that measures composed of ratios of local curvature factors would intrinsically eliminate the dependence on area. The latter was based on the idea that a reasonable characterization of the degree of folding in a brain surface is the fraction of the surface which contains convex folds (gyri). This can be mathematically characterized by looking at the relative portion of the surface which has positive mean or Gaussian curvature. For an undeveloped brain, the fraction would be large, and it would progressively decrease as the concave folds (sulci) appear. The definition of the three expressions is shown in Table 1. 2.2
Cortical Partitioning Using Spatial Normalization
In order to study changes in regional surface folding across different subjects, we need to be able to identify boundaries from the structural image data which consistently partition the brain surface in different subjects at different stages of development. In this initial work we have used a single reference anatomy which
204
C. Rodriguez-Carranza et al.
was partitioned into frontal and non-frontal regions based on the presence of the post-central gyrus. The sulci around this region develop earlier in the brain folding process and provide a landmark to allow us to study regional folding over a range of ages after this is present. The approximate partitioning into left and right frontal cortex and non-frontal cortex was marked on a single reference MRI. A multi-resolution B-Spline based spatial normalization, adapted from that described in [15, 16] and as used in [2], was then applied to estimate a spatial transformation mapping from this reference to each subject MRI being studied. This transformation was then numerically inverted to allow the assignment of the nearest reference label to a given subject image voxel, which brought the voxelwize partitioning of the reference anatomy to the same space as the surface extracted from each individual. The partitioning was then used to constrain the voxelwize evaluation of the folding measures. 2.3
Data and Image Processing
Synthetic Image Data. The measures were tested on synthetic binary shapes constructed from the following expression: x = ρ sin(φ) cos(θ), y = ρ sin(φ) cos(θ), z = ρ cos(φ), ρ = k ∗ (sinb (aφ) + cosd (cφ) + sinf (eθ) + cosh (g ∗ θ)) (k is a scaling factor). Coordinates x, y, z are computed in the range φ ∈ [0, π] and θ ∈ [0, 2π] given a set of user-defined parameters a, . . . , h. The voxels within the shape were assigned a value of 1000 and background voxels a value of 0. Human Neonatal Images. High resolution (0.703 × 0.703 × [1.5 − 2.0] mm) 3D T1 weighted SPGR images were acquired on seven premature infants using a 1.5T GE MRI scanner with an MRI compatible incubator. The gestational ages of the premature infants ranged between 24-31 weeks. The postmenstrual ages (gestational age + postnatal age) at the time of acquisition were 28-37 weeks. A subsequent scan for three of the infants was available, hence a total of ten brains was processed. The outer gray matter and the gray-white matter interface1 surfaces were manually extracted as described in [13] using the rview software package [14]. A value of 1000 was assigned to each segmented brain voxel and 0 to the background. Computation of Surface Folding. Each binary volume (the segmented neonatal brain or the synthetic shape) was supersampled using voxel replication to prevent loss of fine scale features (2 × 2 × 4 for neonatal images and 2 × 2 × 2 for synthetic images). A voxelwize approach to surface curvature estimation from iso-surfaces, derived from that developed by Rieger [12], was then employed. It avoids the need of a parametric model. A summary of the sequence of steps (described in detail in [13]) is as follows: 1) computation of the image gradient g = ∇f (x, y, z) of the replicated binary volume; 2) computation of the gradient structure tensor (GST) [12] defined as T = gg t ; 3) calculation of the eigenvectors v vt v1 , v2 , v3 of T and the mapping M (v1 ) = ||v1 1 1|| ; 4) computation of the principal 1
The terms gray-white matter interface and white matter will be used interchangeably through the text.
A System for Measuring Regional Surface Folding of the Neonatal Brain
205
curvatures from |k1,2 | = √12 ||∇v2,3 M (v1 )||F (Fr¨obenius norm); the sign of k1 and k2 is determined from the Hessian matrix of the image and the eigenvectors v2 and v3 . The curvatures k1 and k2 were then used to compute mean (H) and Gaussian (K) curvature and the folding measures of Table 1 Each step involves image smoothing or differentiation. For all images, the 3D filters used at the ith step had a full-width at half maximum (fi ) of: f1 = 2.1mm (for synthetic images, f1 = 3mm), f2 = 1mm, f3 = 2mm, f4 = 1mm. The convolution of the first step creates a volume with voxel values between 0 and 1000, which represents percentage occupancy. The surface of interest, i.e. the one on which the global measures of folding were calculated, was taken to be the iso-surface with at least 50% occupancy and satisfying 6-connectivity with the background. The rationale behind the choice of values was to create a smooth surface that would yield smooth variations in curvature.
3
Results
Firstly we applied the measures to subregions of synthetic surfaces. Figures 1 (a), (b), (c), and (d) show the map of (local) mean curvature on four subregions of the shape with parameters a=1, b=4, c=1, d=3, e=1, f=2, g=1, h=4, and scale factor k=10. The numerical results that follow are shown in the order of appearance in Figure 1. The area of each subregion was: 3626.5, 1814.5, 789, 772.9 mm2 . M LN yielded 2.32, 1.17, 0.72, 0.43; note that M LN for the whole shape is about twice the value of M LN for the half shape. This exemplifies the dependency of the old measures on the size of the region of analysis. On the other hand, the new M LNT for the whole and half spheres was the same: 1.37, 1.37, 1.08, 0.91. This is reasonable since the partition was at an axis of symmetry. We then applied the new measures globally to surfaces from the ten MRI studies to investigate their response to folding with age using linear regression of their values against age. Figure 1 (e) shows the map of (local) mean curvature on white matter. Figure 2 shows the graphs of five of the global folding measures against postmenstrual age, for both cortical gray matter and white matter (due to space limitations, only five graphs are shown). The measures GS and AFH + decrease with age, as opposed to the other measures. This is because the younger the infant, the less folded is the brain and the shape is closest to that of a sphere (GS = 1 and AFU = 1 for a sphere). The measures with greater goodnessof-fit for gray matter were T-normalized measures: F IT , H + , and K − (R2 = [0.86, 0.89]); for white matter, it was GS (global shape index), AFH + , GCH , and M LNH (R2 = [0.95, 0.96]). The goodness-of-fit for gray matter and white matter (in that order) for all measures was: M LNT (0.80, 0.04), GLNT (0.79, 0.14), ICIT (0.83, 0.0), F IT (0.89, 0.13), GCT (0.85, 0.01), , H − (0.70, 0.06), H + (0.86, 0.13), K − (0.88, 0.25), K + (0.83, 0.0), GS (0.69, 0.96), SH2SH (0.77, 0.17), SK2SK (0.66, 0.37), M LNH (0.48, 0.95), GCH (0.55, 0.95), AFH + =(0.69, 0.95), AFK + =(0.51, 0.0). Finally, we applied the measures to examine their relationship with age regionally on gray matter, over frontal and non frontal cortical regions (subdivided
206
C. Rodriguez-Carranza et al.
(a)
(b)
(c)
(e)
(d)
Fig. 1. Map of mean curvature (H) for the synthetic shape: (a) whole surface, (b) half the surface, (c) subregion I, (d) subregion II and (e) for white matter of a 32.4 week old infant. H varies within [-0.15,0.15]. 5.0
8.5
0.5
0.70
4.0
0.65
7.0
3.0
6.5
AFH+
3.5
MLNH
+
Mean H
T- Foldng Index (FIT )
7.5
3.5 4.0
Global Shape Index (GS)
8.0 4.5
6.0
0.60
5.5
2.5 3.0
5.0 0.55
4.5
2.0
2.5
0.4
0.3
0.2
0.1
4.0
1.5
2.0 26
28
30
32
34
36
38
Age (weeks)
0.50
3.5
26
28
30
32
34
Age (weeks)
36
38
26
28
30
32
34
Age (weeks)
36
38
0.0 26
28
30
32
34
Age (weeks)
36
38
26
28
30
32
34
36
38
Age (weeks)
Fig. 2. Plots of five folding measures on cortical gray matter (, ten studies) and white matter (, 8 studies) against age. Linear fits and 95% confidence intervals are shown.
into left and right). Plots for area and four folding measures are shown in Figure 3. The rate of change of F IT , H + , and ICIT is positive with age and, hence, with cortical folding. The results show that frontal regions had a smaller area but larger degree of folding than non frontal regions. Except for F IT on the left frontal subregion of the cortex, linear fits on all subregions had similar R2 to the linear fits for the whole brain. The graph of GLN exemplifies the dependency of the previously defined measures of the area of analysis.
4
Discussion
An understanding of the cortical folding process in the development of premature infants may be important in explaining and predicting abnormal neurological outcome. The use of formal mathematical descriptions provides a more quantitative tool to study the folding process than is available with simple visual evaluation of MRI scans. The long term goal of our research is to create a model to track preterm in vivo neonatal brain cortical development that will help characterize normal gyrification and departures from it.
A System for Measuring Regional Surface Folding of the Neonatal Brain
4.5 4.0
3.5
3.0 3.5
2.5 3.0
2.0
2.5 26
28
30
32
34
36
38
7.0
200
6.5
175
35000 30000
6.0
150 25000
5.5
125
5.0
28
Age (weeks)
30
32
34
Age (weeks)
36
38
100
4.5
75
4.0
20000 15000 10000
50
3.5
5000
25
3.0 2.5
26
2
+
5.0
Mean H
T-Folding Index (FIT )
4.0
Area (mm )
T-Intrinsic Curvature Index (ICIT )
4.5 5.5
GLN
6.0
207
0
0 26
28
30
32
34
Age (weeks)
36
38
26
28
30
32
34
Age (weeks)
36
38
26
28
30
32
34
36
38
Age (weeks)
Fig. 3. Plots of four global folding measures and area against age for the frontal () and non-frontal () regions of gray matter. Results for the whole surface (∗) are included. Filled shapes represent the left side of the brain, empty ones the right side. Linear fits are shown.
In this work we have shown that most previously proposed folding measures are dependent on the size of the surface patch on which they are applied. To alleviate this problem we examined two approaches to normalization of the measures and proposed new forms of measure, which satisfy the requirements of area-independence. These were evaluated on synthetic image data and neonatal brain surfaces. We evaluated the ability of the measures to detect change in normal aging by calculating the linear regression of each measure with age globally and locally. Globally, gray-white matter interface folding tracked best with age. No single measure rated consistently well for both surfaces: F IT , H + , K − had the best fit for cortical gray matter, while M LNH , GCH , GS, and AFH + did for the gray-white matter interface. Locally, the measures were able to capture folding differences between frontal and non frontal regions. The proposed normalization factors T and |H| opened the possibility to adapt previously defined measures to quantify gyrification in subregions of the brain. We need to further investigate the sensitivity and validity of the measures in a larger cohort, containing both normal and pathological cases. In conclusion, the proposed new normalized measures allow area-independent assessment of folding which provides the ability to study regional gyrification. Such measures will provide a new tool in assessing global and local surface folding which is independent of the overall surface area, and are hence applicable to developing a model that tracks development in premature infants.
Acknowledgments This work was funded by the NIH grant R01 MH65392.
References 1. P. G. Batchelor, A. D. Castellano Smith, D. L. G. Hill, D. J. Hawkes, T. C. S. Cox, and A. F. Dean. Measures of folding applied to the development of the human fetal brain. IEEE Transactions on Medical Imaging, 21(8):953–965, August 2002. 2. V. Cardenas, L.L. Chao, R. Blumenfeld, and et al. Using automated morphometry to detect association between ERP latency and structural brain MRI in normal adults. Human Brain Mapping, 25(3):317–327, 2005.
208
C. Rodriguez-Carranza et al.
3. R. W. I. Cooke, L. Foulder-Hughes, D. Newsham, and D. Clarke. Ophtalmic impairment at 7 years of age in children born very preterm. Archives of Disease in Childhood Fetal and Neonatal Edition, 89:249–253, 2004. 4. D.C. Van Essen and H. A. Drury. Structural and functional analyses of human cerebral cortex using a surface-based atlas. The Journal of Neuroscience, 17(18):7079– 7102, 1997. 5. M. Hack, D. J. Flannery, and et al. Outcomes in young adulthood for very-lowbirth weight infants. New England Journal of Medicine, 346(3):149–157, 2002. 6. B. E. Hamilton, J. A. Martin, and P. D. Sutton. Births: Preliminary data for 2003. National Vital Statistics Reports, 53(9), 2004. 7. P. S. H¨ uppi and et al. Structural and neurobehavioral delay in postnatal brain development of preterm infants. Pediatric Research, 39(5):895–901, 1996. 8. T. E. Inder and et al. Abnormal cerebral structure is present at term in premature infants. Pediatrics, 115(2):286–294, 2005. 9. T. E. Inder, S. J. Wells, N. B. Mogridge, C. Spencer, and J. J. Volpe. Defining the nature of the cerebral abnormalities in the premature infant: a qualitative magnetic resonance imaging study. Journal of Pediatry, 143:171–179, 2003. 10. J. J. Koenderink and A. J. van Doorn. Surface shape and curvature scales. Image and Vision Computing, 10(8):557–564, 1992. 11. V. A. Magnotta, N. C. Andreasen, and S. K. Schultz et al. Quantitative in vivo measurement of gyrification in the human brain: changes associated with aging. Cerebral Cortex, 9:151–160, 1999. 12. B. Rieger, F. J. Timmermans, L. J. van Vilet, and P. W. Verbeek. On curvature estimation of iso surfaces in 3D gray-value images and the computation of shape descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(8):1088–1094, August 2004. 13. C. Rodriguez-Carranza, F. Rousseau, B. Iordanova, O. Glenn, D. Vigneron, J. Barkovich, and C. Studholme. An iso-surface folding analysis method applied to premature neonatal brain development. In Proc. SPIE Medical Imaging: Image Processing, volume 6144, pages 529–539, 2006. 14. C. Studholme. http://rview.colin-studholme.net/. 15. C. Studholme, V. Cardenas, and et al. Accurate template-based correction of brain MRI intensity distortion with application to dementia and aging. IEEE Transactions on Medical Imaging, 23(1):99–110, 2004. 16. C. Studholme, V. Cardenas, and et al. Deformation tensor morphometry of semantic dementia with quantitative validation. Neuroimage, 21(4):1387–1398, 2004. 17. J. J. Volpe. Neurologic outcome of prematurity. Archives of Neurology, 55:297–300, 1998. 18. K. Zilles, G. Schlaug, M. Matelli, G. Luppino, and et al. The human pattern of gyrification in the cerebral cortex. Anatomy and Embriology, 179:173–179, 1988.
Atlas Guided Identification of Brain Structures by Combining 3D Segmentation and SVM Classification Ayelet Akselrod-Ballin1 , Meirav Galun1 , Moshe John Gomori2 , Ronen Basri1 , and Achi Brandt1 1
Dept. of Computer Science and Applied Math, Weizmann Institute of Science, Rehovot, Israel 2 Dept. of Radiology, Hadassah University Hospital, Jerusalem, Israel
Abstract. This study presents a novel automatic approach for the identification of anatomical brain structures in magnetic resonance images (MRI). The method combines a fast multiscale multi-channel three dimensional (3D) segmentation algorithm providing a rich feature vocabulary together with a support vector machine (SVM) based classifier. The segmentation produces a full hierarchy of segments, expressed by an irregular pyramid with only linear time complexity. The pyramid provides a rich, adaptive representation of the image, enabling detection of various anatomical structures at different scales. A key aspect of the approach is the thorough set of multiscale measures employed throughout the segmentation process which are also provided at its end for clinical analysis. These features include in particular the prior probability knowledge of anatomic structures due to the use of an MRI probabilistic atlas. An SVM classifier is trained based on this set of features to identify the brain structures. We validated the approach using a gold standard real brain MRI data set. Comparison of the results with existing algorithms displays the promise of our approach.
1
Introduction
Accurate classification of magnetic resonance images (MRI) is essential in many neuroimaging applications. Quantitative analysis of anatomical brain tissue such as white matter (WM), gray matter (GM) and cerebrospinal fluid (CSF) is important for clinical diagnosis, therapy of neurological diseases and for visualization and analysis ([1],[2],[3]). However, automatic segmentation of MRI is difficult due to artifacts such as partial volume effect (PVE), intensity nonuniformity (INU) and motion. The INU artifact also referred to as inhomogeneity or shading artifact causes spatial inter-scan variation in the pixel intensity distribution over the same tissue classes. It depends on several factors but is predominantly caused by the scanner magnetic field. Numerous approaches have been developed for Brain MRI segmentation (see surveys [1],[4],[5],[6]). Low-level classification algorithms (which we compare to in sec. 3) typically involve common unsupervised algorithms including k-means R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 209–216, 2006. c Springer-Verlag Berlin Heidelberg 2006
210
A. Akselrod-Ballin et al.
and fuzzy c-means ([2],[4]). Yet, low level techniques that exploit only local information for each voxel and do not incorporate global shape and boundary constraints are limited in dealing with the difficulties in fully automatic segmentation of brain MRI. Deformable models have also been proposed in [7] to find coupled surfaces representing the interior and exterior boundary of the cerebral cortex. These techniques benefit from consideration of global prior knowledge about expected shape yet their limitations come from dependence on initialization and from the computation time required for 3D segmentation. Statistical approaches, which classify voxels according to probabilities based on the intensity distribution of the data, have been widely used [8]. These approaches typically model the intensity distribution of brain tissues by a Gaussian mixture model. Given the distribution, the optimal segmentation can be estimated by a maximum a posteriori (MAP), or a maximum likelihood (ML) formulation ([2],[3]). The expectation maximization (EM) is a popular algorithm for solving the estimation problem. It has been applied to simultaneously perform brain segmentation and estimate the INU correction [9]. The EM framework has been extended to account for spatial considerations by including a Markov Random Field [10] and by utilizing a brain atlas [11]. This paper introduces a fully automatic method to identify brain structures in MRI, utilizing the 3D segmentation framework presented in [12], which extends the algorithm presented in ([13],[14]) to handle 3D multi-channel anisotropic MRI data. Our work combines the fast multiscale segmentation algorithm with a support vector machine (SVM) classifier based on a novel set of features. We incorporate prior knowledge of anatomic structures using an MRI brain atlas. In addition to these priors a set of regional features are computed for each aggregate, which includes intensity, texture, and shape features, accumulated during the aggregation process.Unlike existing studies our approach does not involve explicit correction of magnetic field inhomogeneities. We validate our approach by applying our method to a standard data base with varying bias field and compare our results to existing algorithms. The paper is organized as follows. Section 2 describes the segmentation, feature extraction and classification process. Comparative experimental results for automatic detection of the major brain anatomical tissues are shown in section 3. Concluding remarks are provided in section 4.
2 2.1
Method Segmentation
Our method begins with utilizing the segmentation algorithm presented in [12]. This algorithm has extended the 2D segmentation algorithm developed for natural images ([13], [14]) to apply it to 3D multi-channel anisotropic MRI data. The segmentation scheme is described briefly below (for more details see [12],[13],[14]). The method, which is derived from algebraic multigrid (AMG) [15], starts by assembling together adjacent voxels into small aggregates based on intensity similarity, each voxel being allowed to belong to several aggregates with different
Atlas Guided Identification of Brain Structures
211
association weights. These aggregates are then similarly assembled into larger aggregates, then still larger aggregates, etc. The affiliations between aggregates are based on tunable statistical measures, which are called features. Features obtained for small aggregates at a fine level affect the aggregation formation of larger aggregates at coarser levels, according to feature resemblance (see 2.2). In this multiscale Segmentation by Weighted Aggregation (SWA), a pyramid (hierarchy) of aggregates is constructed from fine (bottom) to coarse (top), such that each aggregate at a finer level may be associated to several larger aggregates at a subsequent coarser scale, with different weights. This soft weighted aggregation allows the algorithm to avoid pre-mature local decisions and to detect segments based on a global saliency measure. The algorithm is very fast, since only its initial stage operates at the level of individual voxels. It collects statistics of filter responses, identifies regions of coherent textures, quantifies the shape of segments, their boundaries and their density variability at all scales, etc., allowing the emergence of image segments according to any desired aggregation and saliency criteria. Moreover, each segment emerges from the aggregation process equipped with a list of accumulated numerical descriptors, its features, which can then be used for classification and diagnosis processes. Multi-Channel and 3D Anisotropy: A major aspect of MRI is the wide variety of pulse sequences (modalities) available for producing different images. Each modality gives rise to a different image that may highlight different type of tissues. In this work segmentation was applied to a single T1 channel. However applying segmentation (and likewise classification) simultaneously to images obtained by several channels can lead to superior results that usually cannot be achieved by considering just one channel. Another important aspect is the anisotropic nature of most clinical MRI data (with lower vertical resolution) which if not taken into account may lead to inaccurate analysis of the data. Relying on the flexibility of the segmentation framework we applied a 3D multi-channel segmentation algorithm that can process several modalities simultaneously, and handle both isotropic data as well as anisotropic data. 2.2
Feature Extraction
The segmentation process computes statistical aggregative features throughout the pyramid construction. These features, which affect the formation of aggregates, are also available for the classification of anatomical structures at the end of the process. The development of the set of features is guided by interaction with expert radiologists, and the quantitative effects of the various features are determined by the automatic learning process described below. It can be shown that these properties can be calculated recursively (see [12], [14] for notations). In this work, we expanded the set of features to include information about the expected location of the major tissue types. We incorporate the prior probability knowledge of anatomic structures using an MRI probabilistic atlas. We employ the International Consortium for brain mapping (ICBM) probability maps which
212
A. Akselrod-Ballin et al.
represent the likelihood of finding GM, WM and CSF at a specified position for a subject that has been aligned to the atlas space. The ICBM452 atlas and brain data sets are brought into spatial correspondence using the Statistical Parametric Mapping (SPM) registration software [16] so that for every aggregate we can compute its average probability to belong to any of the three tissue categories. Construction of the classifier based on these features requires consideration of the inter-subject and intra-subject variability; therefore all features were normalized for each brain. Table 1 presents a partial list of the features for aggregate k at scale s used in this study. Table 1. Aggregative features Prior anatomical knowledge:
–– – –
– –
– –
Average probabilities: denoted PW M , PGM , PCSF which represent the average likelihood of finding WM, GM, and CSF in k respectively. Intensity statistics: ¯ Average intensity: of voxels in k, denoted I. Maximum intensity: maximal average intensity of sub-aggregates at scale 2. Variance of average intensities: of sub-aggregates of k at scale r. Intensity moments: averages of products of the intensity and the coordinates of voxels in k, denoted Ix, Iy, Iz. Shape: Location: averaging the locations of the voxels in k, denoted (¯ x, y¯, z¯), all the brains were spatially normalized to the same stereotaxic space using the SPM [16]. Shape moments: the length, width and depth (L, W ,D respectively) calculated by applying principal component analysis to the covariance matrix of the aggregate. Neighborhood statistics: Boundary surface area: denoted Bkl , refers to the surface area of the common border of aggregates k and l. B I¯ neighborhood average intensity: of aggregate k defined as l klB l l =k
2.3
kl
Support Vector Machine (SVM) Classification
We extract a candidate set of segments from the intermediate level of the pyramid (scales 5,6 from all 13 scales) which correspond to brain tissue regions. To construct the classifier we utilize ”ground-truth” expert segmentation which is provided along with the real clinical brain MRI data. Generally we assume (1) having a training sample of M candidate segments, Cand = {f1 , . . . , fM }, each is described by a d-dimensional feature vector (we normalize each of the features to have zero mean and unit variance), and (2) a mask indicating the voxels marked by an expert as GM, WM, CSF and background (BG). Since many of the candidate segments may contain a mixed collection of the categories, the labeling category is determined based on the category marked by the maximum number of voxels associated with the segment.
Atlas Guided Identification of Brain Structures
213
Table 2. Our average classification measures. The table lists the mean (±S.D) classification measures obtained on all 20 subjects for the four different classes. Classes WM: GM: CSF: BG:
Overlap 0.80 ± 0.04 0.86 ± 0.04 0.43 ± 0.12 0.987 ± 0.005
FP 0.19 ± 0.05 0.26 ± 0.00 0.25 ± 0.20 0.003 ± 0.002
κ 0.80 ± 0.04 0.81 ± 0.04 0.51 ± 0.11 0.985 ± 0.004
J 0.67 ± 0.03 0.68 ± 0.03 0.35 ± 0.12 0.992 ± 0.002
Twenty normal MR brain data sets and their manual segmentations were provided by the Center for Morphometric Analysis at Massachusetts General Hospital and are available at ”http://www.cma.mgh.harvard.edu/ibsr/”. The sets are sorted based on their difficulty level. We separated the sets to ”odd” and ”even” indexes. The classifier was trained on the odd sets and tested on the even sets and vice versa, so that both the training and the testing consist of ten sets including the entire range of difficulty. The training data was used to construct an SVM-based classifier. The results presented here were obtained 2 by using a radial basis function RBF kernel (K(x, y) = e−γ|x−y| ). A detailed description of SVMs can be found in [17]. Following the learning phase, in the testing phase an unseen MRI scan is obtained. After segmentation and feature extraction we apply the SVM classifier to every candidate segment in the test set and finally assign a category-label to each candidate. All candidates segments are projected onto the data voxels using the segmentation interpolation matrix (see details in [12]). The maximum association weight of the voxel determines the segment which the voxel belongs to, which lead to an assignment of a class label to each voxel.
3
Results
The integrated approach was tested on 20 coronal T1-weighted real MRI data set of normal subjects with GM,WM and CSF expert segmentations provided by the Internet Brain Segmentation Repository (IBSR), after they have been positionally normalized. The brain scans used to generate these results were chosen because they have been used in published volumetric studies in the past and because they have various levels of difficulty. This allows the assessment of the methods performance under varying conditions of signal to noise ratio, INU, PVE, shape complexity, etc. We tested our approach using 45 central coronal slices which contain 0.94 ± 0.02 of the brain voxels including the cerebellum and brain stem. The results presented were obtained by overlaying the candidate segments of the brain set tested according to their labeling category by the SVM classifier. The validation scores presented are based on the common measures for spatial overlap (e.g., [6], [19]). Denote by (S) the set of voxels automatically detected as a specific class and (R) the set of voxels labeled as the same class in the ’ground
214
A. Akselrod-Ballin et al. Table 3. Average J-scores for various segmentation methods on 20 brains Method: Manual (4 brains averaged over 2 experts): Our Method: Pham and Prince [8]: Shattuck et. al. [3]: Zeng et. al.[7] : Burkhardt et. al. [18] (trium): adaptive MAP(amap) [2]: biased MAP(bmap) [2]: fuzzy c-means(fuzzy) [2]: Maximum Aposteriori Probability (map) [2]: Maximum-Likelihood(mlc)[2] : tree-structure k-means(tskmeans) [2]:
WM 0.876 0.669 0.7 0.595 0.578 0.567 0.562 0.567 0.554 0.551 0.571
GM 0.832 0.680 0.6 0.664 0.657 0.644 0.564 0.558 0.473 0.550 0.535 0.477
CSF 0.346 0.206 0.069 0.071 0.048 0.071 0.062 0.049
truth’ reference. The classification measures used in Table 2 and 3 are defined as follows: – – – –
Overlap: |S ∩ R|/|R|. ¯ FP: |S ∩ R|/|R|. κ statistics (Dice coefficient): 2|S ∩ R|/(|S| + |R|) Jaccard similarity J : |S ∩ R|/(|S ∪ R|)
Table 3 and Figure 1 display a quantitative comparison of our approach with ten other algorithms. Six of them are provided with the data [2]. We also included in Table 3 four additional studies which report the average results for part of the tasks ([3],[7],[8],[18]). The comparison is based on the J metric score provided in these studies. Our average scores for all classes were comparative or superior to previously reported results, where we obtained a significance difference to other 0.7
0.8
0.8
0.6
0.7
0.7
0.5
0.6
0.6
OUR trium
0.3
0.5
Score
0.4
Score
Score
0.5
0.4
amap
0.4
bmap 0.3
0.2
0.3
0.1
0.2
fuzzy
0.2
map mlc
0.1
tskmean 0
4
8
12
Set #
16
20
0.1
4
8
12
16
Set #
(a) Cerebrospinal Fluid (CSF)(b) Gray Matter (GM)
20
0
4
8
12
16
20
Set #
(c) White Matter (WM)
Fig. 1. Overlap scores between manual and automatic segmentations over 20 brain scans with decreasing levels of difficulty (from set index 1 to 20). Our results compared with seven other algorithms for the task of GM, WM, and CSF Detection.
Atlas Guided Identification of Brain Structures
215
(a)WM-Ground-Truth(b)WM-Automatic(c)GM-Ground-Truth(d) GM-Automatic Fig. 2. WM and GM identification. The upper row presents classification results projected on a 2D T1 slice. The lower row demonstrates a 3D view of the results.
algorithms (for the GM and WM p ≤ 0.005). The results are especially high on the most difficult cases (i.e. sets 1-5 see Fig. 1). Moreover, the other metrics presented in Table 2 show high detection rates for all categories identified. Figure 2 demonstrates the WM and GM segmentations produced by the method in a 2D and 3D view respectively.
4
Discussion
MRI is considered the ideal method for brain imaging. The 3D data and the large number of possible protocols allow to identify anatomical structures as well as abnormal brain structures. In this work we focus on segmenting normal brains into three tissues WM, GM and CSF. The segmentation pyramid provides a rich, adaptive representation of the image, enabling detection of various anatomical structures at different scales. A key aspect of the approach is the comprehensive set of multiscale measurements applied throughout the segmentation process. These quantitative measures, which take into account the atlas information, can further be used for clinical investigation. For classification we apply automatic learning procedure based on an SVM algorithm using data pre-labeled by experts. Our approach is unique since it combines a rich and tunable set of features, emerging from statistical measurements at all scales. Our competitive results, obtained using a standard SVM classifier, demonstrate the high potential of such features. The algorithm was evaluated on real 3D MRI brain scans, demonstrating its ability to detect anatomical brain structures. It requires no restrictions from the MRI scan protocols and can further be generalized to other tasks and modalities, such as to detect evolving tumors or other anatomical substructures. Future work will expand the approach to detection of internal brain anatomical structures. Extraction of other classes of structures may require the use of additional features and perhaps a more detailed knowledge domain available in brain atlases.
216
A. Akselrod-Ballin et al.
Acknowledgement. Research was supported in part by the Binational Science foundation, Grant No. 2002/254, by the Israel Institute of Technology and by the European Commission Project IST-2002-506766 Aim Shape. Research was conducted at the Moross Laboratory for Vision and Motor Control at the Weizmann Institute of Science.
References 1. Pham, D., Xu, C., Prince, J.: Current methods in medical image segmentation. Annual Review of Biomedical Engineering 2 (2000) 315–337 2. Rajapakse, J., Kruggel, F.: Segmentation of MR images with intensity inhomogeneities. IVC 16(3) (1998) 165–180 3. Shattuck, D.W., Sandor-Leahy, S.R., Schaper, K.A., Rottenberg, D.A., Leahy, R.M.: Magnetic resonance image tissue classification using a partial volume model. Neuroimage 13(5) (2001) 856–76 4. Bezdek, J.C., Hall, L.O., Clarke, L.P.: Review of MRI segmentation techniques using pattern recognition. Medical Physics 20(4) (1993) 1033–1048 5. Sonka, M.M., Fitzpatrick, J.M., eds.: Handbook of Medical Imaging. SPIE (2000) 6. Zijdenbos, A., Dawant, B.: Brain segmentation and white matter lesion detection in MRI. Critical Reviews in Biomedical Engineering 22 (1994) 7. Zeng, X., Staib, L.H., Schultz, R.T., Duncan, J.S.: Segmentation and measurement of the cortex from 3D MRI using coupled surfaces propagation. IEEE MI (1999) 8. Pham, D., Prince, J.: Robust unsupervised tissue classification in MRI. IEEE Biomedical Imaging: Macro to Nano (2004) 9. Wells, W.M., Grimson, W., Kikinis, R., Jolesz, F.A.: Adaptive segmentation of MRI data. IEEE MI 15 (1996) 429–442 10. Zhang, Y., Brady, M., Smith, S.: Segmentation of brain MRI through a hidden markov random field model and the expectation-maximization algorithm. IEEE Medical Imaging 20(1) (2001) 45–57 11. Van-Leemput, K., Maes, F., Vandermeulen, D., Colcher, A., Suetens, P.: Automated segmentation of MS by model outlier detection. IEEE MI 20 (2001) 677–688 12. Akselrod-Ballin, A., Galun, M., Gomori, J.M., Fillipi, M., Valsasina, P., Brandt, A., R.Basri: An integrated segmentation and classification approach applied to multiple sclerosis analysis. CVPR (2006) 13. Sharon, E., Brandt, A., Basri, R.: Fast multiscale image segmentation. CVPR (2000) 70–77 14. Galun, M., Sharon, E., Basri, R., Brandt, A.: Texture segmentation by multiscale aggregation of filter responses and shape elements. ICCV (2003) 716–723 15. Brandt, A., McCormick, S., Ruge, J., eds.: Algebraic multigrid (AMG) for automatic multigrid solution with application to geodetic computations. Inst. for Computational Studies, POB 1852, Fort Collins, Colorado (1982) 16. Frackowiak, S., Friston, K., Frith, C., Dolan, R., Price, C., Zeki, S., Ashburner, J., Penny, W., eds.: Human Brain Function. Academic Press (2003) 17. Vapnik, V., ed.: The Nature of Statistical Learning Theory. Springer-Verlag (1995) 18. Burkhardt, J.: A markov chain monte carlo algorithm for the segmentation of t1-mr-images of the head. Diploma thesis, Technische Universitat Munchen (2003) 19. Gerig, G., Jomier, M., Chakos, M.: Valmet: A new validation tool for assessing and improving 3D object segmentation. MICCAI (2001) 516–523
A Nonparametric Bayesian Approach to Detecting Spatial Activation Patterns in fMRI Data Seyoung Kim, Padhraic Smyth, and Hal Stern Bren School of Information and Computer Sciences University of California, Irvine, CA 92697-3425 {sykim, smyth}@ics.uci.edu,
[email protected]
Abstract. Traditional techniques for statistical fMRI analysis are often based on thresholding of individual voxel values or averaging voxel values over a region of interest. In this paper we present a mixture-based response-surface technique for extracting and characterizing spatial clusters of activation patterns from fMRI data. Each mixture component models a local cluster of activated voxels with a parametric surface function. A novel aspect of our approach is the use of Bayesian nonparametric methods to automatically select the number of activation clusters in an image. We describe an MCMC sampling method to estimate both parameters for shape features and the number of local activations at the same time, and illustrate the application of the algorithm to a number of different fMRI brain images.
1
Introduction
Functional magnetic resonance imaging (fMRI) is a widely used technique to study activation patterns in the brain while a subject is performing a task. Voxellevel activations collected in an fMRI session can often be summarized as a β map, a collection of β coefficients estimated by fitting a linear regression model to the time-series of each voxel. Detection of activation areas in the brain using β maps is typically based on summary statistics such as the mean activation values of particular brain regions, or detection of significant voxels by thresholding based on statistics such as p-values computed at each voxel. These approaches do not directly model spatial patterns of activation—detection and characterization of such patterns can in principle provide richer and more subtle information about cognitive activity and its variation across individuals and machines. In earlier work on spatial activation patterns, Hartvig [1] represented the activation surface in fMRI as a parametric function consisting of a superposition of Gaussian-shaped bumps and a constant background level, and used a stochastic geometry model to find the number of bumps automatically. This work focused on extracting activated voxels by thresholding after the model parameters were estimated, rather than characterizing activation patterns directly. Penny and Friston [2] proposed a mixture model similar to that described in this paper, R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 217–224, 2006. c Springer-Verlag Berlin Heidelberg 2006
218
S. Kim, P. Smyth, and H. Stern
with each mixture component representing a local activation cluster. In prior work we proposed a response surface model that represents an activation pattern as a superposition of Gaussian shaped parametric surfaces [3]. While the approaches proposed in [2] and [3] provide richer information about spatial activation than voxel-based methods, they both have the drawback that users have to determine the number of activation clusters manually by looking at individual images. While this may be feasible for analyzing relatively small brain regions for a small number of images, in a large scale analysis of brain activation patterns automatic detection of the number of bumps becomes an important pragmatic issue. In this paper we take a nonparametric Bayesian approach and use Dirichlet processes [4] to address the problem of automatically determining the number of activation clusters. Rasmussen [5] illustrated how the Dirichlet process could be used as a prior on mixing proportions in mixture of Gaussian models to automatically determine the number of mixture components from data in Bayesian manner. This approach is sometimes referred to as the infinite mixture model since it does not a priori specify a finite number of mixture components but instead allows the number to be determined by the data. The primary novel contribution of this paper is the application of the general framework of infinite mixtures and Bayesian inference to the specific problem of characterizing spatial activation patterns in fMRI. We model spatial activation patterns in fMRI as a mixture of experts with a constant background component and one or more activation components. An expert assigned to each “activation cluster” models each local activation cluster with a parametric surface model (similar to the surface model we proposed in [3] and detailed in the next section) with free parameters for the heights and center locations of the “bumps.” Combining this mixture of experts model with a Dirichlet process prior we can estimate the shape parameters and the number of bumps at the same time in a statistical manner, using the infinite mixture model framework. The paper is organized as follows. In Sect. 2, we introduce a mixture of experts model for spatial activation patterns, describe inference procedures for this model and extend it using an infinite mixture model to find the number of activation clusters automatically from data. In Sect. 3, we demonstrate the performance of our model on fMRI data for two individuals collected at two different sites. Sect. 4 concludes with a brief discussion on future work.
2 2.1
Activation Surface Modeling The Mixture of Experts Model
We develop the model for the case of 2-dimensional slices of β maps—the 3dimensional case can be derived as an extension of the 2-dimensional case, but is not pursued in this paper. Under the assumption that the β values yi , i = 1, . . . , N (where N is the number of voxels) are conditionally independent of each other given the voxel position xi = (xi1 , xi2 ) and the model parameters, we
A Nonparametric Bayesian Approach
model the activation yi at voxel xi with a mixture of experts model: p(yi |xi , θ) = p(yi |c, xi )P (c|xi ),
219
(1)
c∈C
where C = {cbg , cm , m = 1, . . . , M − 1} is a set of M expert component labels for a background component cbg and M − 1 activation components (the cm ’s). The first term on the right hand side of Equation (1) defines the expert for a given component. We model the expert for an activation component cm as a Gaussian-shaped surface centered at bm with width Σm and height km as follows. −1 yi = km exp −(xi − bm ) (Σm ) (xi − bm ) + ε, (2) where ε is zero-mean additive noise. The background component is modeled as having a constant activation level μ with additive noise. We use the same Gaussian distribution with mean 0 and variance σ 2 for the noise term ε for both types of experts. The second term in Equation (1) is known as a gate function in the mixture of experts framework—it decides which expert should be used to make a prediction for the activation level at position xi . Using Bayes’ rule we can write this term as p(xi |c)πc , c∈C p(xi |c)πc
P (c|xi ) =
(3)
where πc is a class prior probability P (c). p(xi |c) is defined as follows. For activation components, p(xi |cm ) is a normal density with mean bm and covariance Σm . bm and Σm are shared with the Gaussian surface model for experts in Equation (2). This implies that the probability of activating the mth expert is highest at the center of the activation and gradually decays as xi moves away from the center. p(xi |cbg ) for the background component is modeled as having a uniform distribution of 1/N for all positions in the brain. If xi is not close to the center of any activations, the gate function selects the background expert for the voxel. Putting this model in a Bayesian framework we define priors on all of the parameters. The center of activation bm is a priori uniformly distributed over voxels that have positive β values within a predefined brain region of interest. We use an inverse Wishart(ν0 , S0 ) prior for the width parameter Σm for activation, and a Gamma(a0k , b0k ) prior for the height parameter km . A normal distribution N (0, σ0 2 ) is used as a prior on the background mean μ. σ 2 for noise is given a Gamma(a0 , b0 ) prior. A Dirichlet distribution with a fixed concentration parameter α is used as a prior for class prior probabilities πc ’s. 2.2
MCMC for Mixture of Experts Model
Because of the nonlinearity of the model described in Sect. 2.1, we rely on MCMC simulation methods to obtain samples from the posterior probability
220
S. Kim, P. Smyth, and H. Stern
density of parameters given data. In a Bayesian mixture model framework it is common to augment the unknown parameters with the unknown component labels for observations and consider the joint posterior p(θ, c|y, X) where y, X and c represent a collection of yi ’s, xi ’s and ci ’s for i = 1, . . . , N and θ = {μ, σ 2 , {bm , Σm , km }, m = 1, . . . , M − 1}. Notice that the mixing proportions πc ’s are not included in θ and dealt with separately later in this section. During each sampling iteration the parameters θ and component labels c are sampled alternately. To obtain samples for parameters θ, we consider each parameter θj in turn and sample from p(θj |θ−j , c, y, X) ∝ p(y|c, X, θ)p(X|c, θ)p(θj ), where the subscript −j indicates all parameters except for θj . Gibbs sampling is used for the background mean μ. The width parameter Σm for activation can be sampled using the Metropolis-Hastings algorithm with an inverse Wishart distribution as a proposal distribution. For all other parameters the Metropolis algorithm with a Gaussian proposal distribution can be used. We sample the component label ci as follows. Given a Dirichlet(α/M,. . ., α/M ) prior for the πc ’s we can integrate out the πc ’s to obtain p(ci = j|c−i , α) =
n−i,j + α/M , N −1+α
(4)
where n−i,j indicates the number of observations excluding yi that are associated with component j [5]. This is combined with the likelihood terms to obtain the conditional posterior for ci : p(ci |c−i , θ, y, X, α) ∝ p(yi |ci , xi , θ)p(xi |ci , θ, α)P (ci |c−i , α). We can sample the component label ci for observation yi from this distribution. 2.3
The Infinite Mixture of Experts Model
In the previous sections, we assumed that the number of components M was fixed and known. For an infinite mixture model, with a Dirichlet process prior [5], in the limit as M → ∞ the class conditional probabilities shown in Equation (4) become n−i,j (5) N −1+α α all other components combined: P (ci = ci for all i = i|c−i , α) = .(6) N −1+α
components where n−i,j > 0:
P (ci = j|c−i , α) =
When we sample component label ci for observation yi in a given iteration of MCMC sampling, if there are any observations associated with component j other than yi , Equation (5) is used and this component has a non-zero prior probability of being selected as ci . If yi is the only observation associated with
A Nonparametric Bayesian Approach
221
label j, this component is considered as unrepresented and a new component and its parameters are generated based on Equation (6) and prior distributions for the parameters. Once all observations are associated with components, the parameters of these components can be sampled in the same way as in the finite mixture of experts model assuming M is the number of components represented in the current sample of c. The class conditional prior probabilities in the infinite mixture model above (Equations (5) and (6)) are not dependent on the input positions xi , whereas the gate function for P (c|xi ) is a function of xi . To allow for dependence on the gate function we separate the input independent term from the gate function by writing it as in Equation (3), and apply the infinite mixture model to the second term of Equation (3). To sample from the class conditional posterior, we use Algorithm 7 from Neal [6] that combines a Metropolis-Hastings algorithm with partial Gibbs sampling. In the Metropolis-Hastings step, for each ci if it is not a singleton, we propose a new component with parameters drawn from the prior and decide whether to accept it or not based on p(yi |ci , xi , θ)p(xi |ci , θ). If it is a singleton, we consider changing it to other classes represented in the data. This Metropolis-Hastings algorithm is followed by partial Gibbs sampling for labels to improve efficiency. At the start of the MCMC sampling the model is initialized to one background component and one or more activation components. Because we are interested in deciding the number of activation components, whenever a decision to generate a new component is made, we assume the new component is an activation component and sample appropriate parameters from their priors. If the number of voxels associated with a component at a given iteration is 0, the component is removed. It is reasonable to assume that not all regions of the brain are activated during a task and, thus, that there will always be some voxels in the background class. To prevent the sampling algorithm from removing the background component, the algorithm assigns the nlabeled voxels with the lowest β values in the image to the background component, and does partially supervised learning with these labeled voxels. In this case we are interested in sampling from the joint posterior of the parameters and the labels for unlabeled data p(cunlabeled , θ|y, X, clabeled ). In general the number of components depends on the concentration parameter α of the Dirichlet process prior. It is possible to sample α from the posterior using a gamma distribution as a prior on α [7]. In our experiments, we found fixing α to a small value (α < 5) worked well.
3
Experimental Results
We demonstrate the performance of our algorithm using fMRI data collected from two subjects (referred to as Subjects 1 and 2) performing the same sensorimotor task. fMRI data was collected for each subject at multiple sites as part of a large multi-site fMRI study1 (here we look at data from the Stanford (3T) 1
http://www.nbirn.net
222
S. Kim, P. Smyth, and H. Stern
and Duke (4T) sites). Each run of the sensorimotor task produces a series of 85 scans that can be thought of as a large time-series of voxel images. The set of scans for each run is preprocessed in a standard manner using SPM99 with the default settings. The preprocessing steps include correction of head motion, normalization to a common brain shape (SPM EPI canonical template), and spatial smoothing with an 8mm FWHM (Full Width at Half-Maximum) 3D Gaussian kernel. A general linear model is then fit to the time-series data for each voxel to produce β maps. The design matrix used in the analysis includes the on/off timing of the sensorimotor stimuli measured as a boxcar convolved with the canonical hemodynamic response function. The hyperparameters for prior distributions are set based on prior knowledge on local activation clusters. σ02 , for the prior on the background mean, is set to 0.1 and the noise parameter σ 2 is given a prior distribution Gamma(1.01, 1) so that the mode is at 0.01 and the variance is 1. The prior for height km of an activation component is set to Gamma(2,2) based on the fact that maximum β values in this data are usually around 1. The width of an activation component Σm is given an inverse-Wishart prior with degree of freedom 4. The 2 × 2 scale matrix is set to variance 8 and covariance 1. The concentration parameter α of the Dirichlet process prior is set to 1. To initialize the model for posterior sampling, we use a heuristic algorithm to find candidate voxels for local activation centers and assign a mixture component to each of the candidates. To find candidate voxels, we take all of the positive voxels of a cross section, and repeatedly select the largest voxel among the voxels that have not been chosen and that are at least four voxels apart from the previously selected voxels, until there are no voxels left. The location and height parameters of the component are set to the position and the β value of the candidate voxel. The width parameters are set to the mean of the prior on width. As mentioned earlier we fix the labels of nlabeled = 10 voxels with the lowest β values as background, and perform partially supervised learning with these labeled voxels.
(a)
(b)
(c)
Fig. 1. Results for subject 2, data from Stanford MRI machine: (a) raw data (β maps) for a cross section (z = 48) (b) SPM showing active voxels colored white (p ≤ 0.05) (c) Predictive values for activation given the estimated mixture components with height km > 0.1. The width of the ellipse for each bump is 1 standard deviation of the width parameter for that component. The thickness of ellipses indicates the estimated height km of the bump.
A Nonparametric Bayesian Approach
223
number of activation components
50 The MCMC sampler is run for 3000 40 iterations. The heights of the estimated bumps range between 0.003 and 1.2. Since 30 low heights are likely to correspond to 20 weak background fluctuations we display 10 only those activation components above a 500 1000 1500 2000 2500 3000 certain threshold (height km ≥ 0.1). iteration We fit the infinite mixture of experts model to a β map cross section (at z = 48) Fig. 2. The number of activation comfor subject 2 on the Stanford MRI ma- ponents as a function of MCMC iterachine, and summarize the results in Fig. 1. tions for the data in Fig. 1 After 3000 iterations of the sampling algorithm using the β map shown in Fig. 1(a), we sample posterior predictive values for β values at each voxel using the last 1000 samples of the parameters. The medians of these samples are shown in Fig. 1(c) as predicted β values. Parameters from a single sample are overlaid as ellipses centered around the estimated activation centers bm . The model was able to find all of the significant clusters of activation in the raw data even though the number of activation components was unknown a priori to the learning algorithm. Most of the larger activation clusters are in the auditory area in the upper and in the lower middle part of the images, consistent with the activation pattern of sensorimotor tasks. For comparison, in Fig. 1(b) we show the activated voxels found by thresholding the z map (normalized beta-map) with p ≤ 0.05. We can see that the thresholding method cannot separate the two bumps in the lower middle part of Fig. 1(a), and it completely misses the activations in the center of the image. In Fig. 2 we assess the convergence of the Markov chain based on the number of active components at each iteration. The initially large number of components (from the heuristic initialization algorithm) quickly decreases over the first 300 iterations and after around 900 iterations stabilizes at around 22 to 25 components. Fig. 3 shows results for a cross section (at z = 53) of the right precentral gyrus area for subject 1 collected over two visits. The β maps are shown on the left (Fig. 3(a) for visit 1, and Fig. 3(b)(c) for two runs in visit 2) and the
(a)
(b)
(c)
Fig. 3. Results from subject 1 at Duke. (a) Visit 1, run 1, (b) visit 2, run 1, and (c) visit 2, run 2. β maps for a cross section (z = 53) of right precentral gyrus and surrounding area are shown on the left. On the right are shown predictive values for activation given the mixture components estimated from the images on the left. The width of the ellipse for each bump is 1 standard deviation of the width parameter for that component. The thickness of ellipses indicates the estimated height km of the bump.
224
S. Kim, P. Smyth, and H. Stern
estimated components on the right given the images on the left. Even though the β maps in Fig. 3(a)-(c) were collected from the same subject using the same fMRI machine there is variability in activation across visits such as the bump on the lower left of the β map for visit 2 in addition to the bigger bump at the center of β maps common to both visits. This information is successfully captured in the estimated activation components.
4
Conclusions
We have shown that infinite mixtures of experts can be used to locate local clusters of activated voxels in fMRI data and to model the spatial shape of each cluster, without assuming a priori how many local clusters of activation are present. Once the clusters are identified the characteristics of spatial activation patterns (shape, intensity, relative location) can be extracted directly and automatically. This can in turn provide a basis for systematic quantitative comparison of activation patterns in images collected from the same subject over time, from multiple subjects, and from multiple sites. Acknowledgments. The authors acknowledge the support of the following grants: the Functional Imaging Research in Schizophrenia Testbed, Biomedical Informatics Research Network (FIRST BIRN; 1 U24 RR021992, www.nbirn.net); the Transdisciplinary Imaging Genetics Center (P20RR020837-01); and the National Alliance for Medical Image Computing (NAMIC; Grant U54 EB005149), funded by the National Institutes of Health through the NIH Roadmap for Medical Research. Author PS was also supported in part by the National Science Foundation under awards number IIS-0431085 and number SCI-0225642.
References 1. Hartvig, N. (1999) A stochastic geometry model for fMRI data. Research Report 410, Department of Theoretical Statistics, University of Aarhus. 2. Penny, W. & Friston, K. (2003) Mixtures of general linear models for functional neuroimaging. IEEE Transactions on Medical Imaging, 22(4):504-514. 3. Kim, S., Smyth, P., Stern, H. & Turner, J. (2005) Parametric Response Surface Models for Analysis of Multi-Site fMRI data Mixtures of general linear. In Proceedings of the 8th International Conference on Medical Image Computing and Computer Assisted Intervention. 4. Furguson, T. (1973). A Bayesian analysis of some nonparametric problems. Annals of Statistics, 1(2):209–230. 5. Rasmussen, C.E. (2000) The infinite Gaussian mixture model. In S.A. Solla, T.K. Leen and K.-R. Muller (eds.), Advances in Neural Information Processing Systems 12, pp. 554-560. Cambridge, MA: MIT Press. 6. Neal, R.M. (1998) Markov chain sampling methods for Dirichlet process mixture models. Technical Report 4915, Department of Statistics, University of Toronto. 7. Escobar, M. & West, M. (1995) Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90:577–588.
Fast and Accurate Connectivity Analysis Between Functional Regions Based on DT-MRI Dorit Merhof1,2, , Mirco Richter1 , Frank Enders1,2, Peter Hastreiter1,2 , Oliver Ganslandt2 , Michael Buchfelder2, Christopher Nimsky2, and G¨unther Greiner1 1
2
Computer Graphics Group, University of Erlangen-Nuremberg, Germany Neurocenter, Dept. of Neurosurgery, University of Erlangen-Nuremberg, Germany
[email protected]
Abstract. Diffusion tensor and functional MRI data provide insight into function and structure of the human brain. However, connectivity analysis between functional areas is still a challenge when using traditional fiber tracking techniques. For this reason, alternative approaches incorporating the entire tensor information have emerged. Based on previous research employing pathfinding for connectivity analysis, we present a novel search grid and an improved cost function which essentially contributes to more precise paths. Additionally, implementation aspects are considered making connectivity analysis very efficient which is crucial for surgery planning. In comparison to other algorithms, the presented technique is by far faster while providing connections of comparable quality. The clinical relevance is demonstrated by reconstructed connections between motor and sensory speech areas in patients with lesions located in between.
1 Introduction In recent years, medical imaging techniques such as diffusion tensor imaging (DTI) and functional MRI (fMRI) have emerged enabling the exploration of function and structure of the human brain. DTI measures the diffusion of water which is anisotropic in tissue with a high degree of directional organization. For the computation of diffusion tensors, diffusion-weighted images for at least six non-collinear gradient directions are acquired. The respective diffusion tensors provide information about the location and orientation of white matter structures in vivo. The localization of active brain areas such as motor and sensory speech areas is accomplished with fMRI. In neurosurgery, the localization of functional areas on the cortex and their white matter connectivity is of great importance for preoperative planning. With respect to the motor and sensory speech areas, the Broca’s and Wernicke’s areas located in the cerebral cortex, it is generally accepted that these areas are functionally related in speech processing. To avoid neurological deficits after neurosurgical procedures, eloquent structures such as the speech areas as well as connecting white matter structures have to remain intact. For this reason, analysis of white matter connectivity between functional areas is of high interest for neurosurgery and other disciplines in neuroscience. Standard techniques for the reconstruction of white matter structures from DTI data rely on the orientation of the major eigenvector of the diffusion tensor [1]. Thereby,
Corresponding author.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 225–233, 2006. c Springer-Verlag Berlin Heidelberg 2006
226
D. Merhof et al.
streamline-based techniques are employed to propagate the fiber. These approaches enable the reconstruction of major tract systems such as the pyramidal tract or the corpus callosum. However, the reliability of standard tracking techniques is affected by imaging noise and partial volume effects in case of crossing or branching fibers. For this reason, standard tracking algorithms are not suited for a connectivity analysis involving sub-cortical or cortical regions. First approaches addressing this problem used probabilistic or regularization techniques [2,3]. Over a large number of iterations, the process yields a connectivity probability related to the number of probabilistic streamlines found in a volume element. Another class of algorithms derived from level set theory considers arrival times of diffusion fronts [4,5]. In most recent work based on level sets, the problem of white matter connectivity is modeled as wavefront evolution based on a cost function which depends on the entire diffusion tensor [5]. Considering the arrival times of the wavefront, connections are derived by minimizing the cumulative travel cost along the path. Another recent approach uses global optimization and dynamic programming for fiber reconstruction [6]. A graph is spanned over the domain with assigned cost for each edge connecting two voxels. By means of dynamic programming, connections with highest probability are computed. In this work, we extend our previous research for connectivity analysis based on pathfinding [7]. Basically, pathfinding algorithms are highly efficient techniques commonly used in artificial intelligence for problems associated with a state space search using cost functions. They are applied to derive the minimum-cost path between a start and a target region. To investigate neuronal connectivity, a cost function based on the probability distribution function of each tensor is employed. Similarly to other connectivity algorithms [5,6], the entire tensor controls the pathfinding procedure to circumvent biasing of the major eigenvector in isotropic regions. We present several crucial enhancements of our basic algorithm such as an improved cost function, a novel search grid and an optimized implementation. As an important result for clinical application, the presented algorithm is considerably faster than other recently presented connectivity algorithms and provides at the same time comparable accuracy.
2 Method Pathfinding algorithms are commonly used in computer science for different types of search problems. Since there are highly efficient solutions such as the A∗ algorithm [8], it is straightforward to apply pathfinding in the context of white matter connectivity analysis which can be considered as an instance of a minimum-cost path problem [5,6,7]. 2.1 Pathfinding The A∗ algorithm was designed to efficiently compute the path with lowest cost between a start and a target region. For this purpose, the algorithm builds up a graph with nodes and edges, where the edges are assigned a local cost. In each iteration, the path with lowest cost is expanded until the target region is reached. An important fact about the A∗ algorithm is its optimality [9] which guarantees that the best possible solution is found with the smallest computational effort.
Fast and Accurate Connectivity Analysis Between Functional Regions
227
For performing the search, the algorithm maintains two lists, an open list comprising all nodes currently under consideration and a closed list containing nodes that have already been processed. In the beginning, the open list comprises all nodes from the start region and the closed list is empty. Each node stores the movement cost gi required to travel along the path to the respective node i. The cost function fi = gi + hi is evaluated for each node of the open list to decide which one to process next. Thereby, hi denotes an estimate of the remaining cost (also called heuristic) to the target which may optionally be added to direct the search towards the target making the algorithm more efficient. Otherwise, the search would equally spread in all directions. It is important to note that hi does not affect the optimality of A∗ as long as hi is not higher than the actual cost necessary to reach the target. To compute the minimum cost path, the algorithm repeatedly selects the node with lowest fi from the open list, adds all its neighbors to the open list and moves the selected node to the closed list. These processing steps continue until the target node is added to the open list and the path with the lowest cost is found. 2.2 Grid for Partitioning the Search Space For navigating in three-dimensional space, the search algorithm requires a grid which should provide a regular structure uniformly covering the search space and a small step-size between neighbor nodes. The angle between edges should be small to provide enough flexibility to follow the direction of anisotropic diffusion. For this reason, we use a hexahedral grid connecting each node to 74 neighbors as shown in Figure 1(left). In comparison to a grid with 26 neighbors, further directions are provided reducing the angle between neighbor edges and offering a considerably improved choice of different directions thus fulfilling the flexibility criterion. The grid size has to be chosen sufficiently small to guarantee a small maximum step-size which should not exceed 2 mm which is the grid size of the DTI dataset. The resulting grid is of high resolution and enables a dense sampling of the search space as outlined in Figure 1 (middle, right).
Broca
Tumor Wernicke
Fig. 1. Octant of a grid with 26 and 74 neighbors (left). Expansion of the search space between Broca’s and Wernicke’s speech areas using a grid with 74 neighbors and a maximum edge length of 1.5 mm in a brain tumor patient (middle). Close-up view showing details of search grid (right).
228
D. Merhof et al.
2.3 Cost Function The cost function is an integral part for all types of connectivity algorithms since it controls the process of path evolution. To overcome the limitations encountered with fiber tracking arising from the reduction of the tensor information to the principal eigenvector, the cost function for connectivity algorithms has to incorporate the entire tensor information. For this reason, the surface of the tensor ellipsoid, with half-axes aligned to the eigenvectors of the tensor and scaled according to the length of the respective eigenvalues λi,i=1,2,3 , is commonly used as probability profile [5,6,7]. Thereby, the probability of a fiber following a certain direction corresponds to the distance between the center of the ellipsoid located at (0, 0, 0)T and the intersection point r on the surface of the ellipsoid. To obtain probabilities between 0 and 1, the tensor ellipsoid is normalized using λ1 resulting in a maximum length of 1 for any segment connecting the center of the ellipsoid with its surface. As a result, the diffusion probability pi (r) for any direction can be directly obtained from the profile of the normalized ellipsoid: pi (r) =
r . λ1
(1)
However, the resulting probability profile results in a bias towards spherical ellipsoids since they are traversed more easily due to a high probability for all directions. In [5], this is addressed by incorporating fractional anisotropy (FA) [10] into the cost function. In this work, we propose to model the anisotropic characteristic of a tensor by subtracting the isotropic part represented by λ3 before normalization: pi (r) =
r − λ3 . λ1
(2)
This is motivated by considering the probability profile of the tensor. In Figure 2, the probability profiles for a linear and a spherical tensor are plotted as a function of the azimuthal and polar angle of the corresponding ellipsoid. In case of Equation 1 (left plot in each section), the almost spherical tensor yields very high probabilities for all directions resulting in a bias of isotropic tensors. This is circumvented by the probability profile resulting from Equation 2 (right plot in each section) which on the one hand perfectly maintains the shape of the probability profile but cuts the isotropic fraction. In this way, the probability function is solely based on the tensor probability profile and no additional term in the cost function is necessary. The resulting cost function is thus defined as ci (r) = 1 − pi (r) with pi (r) derived from Equation 2. Based on this approach, a more comprehensive probability profile is provided which better captures the tensor properties making the incorporation of FA into the cost function redundant. 2.4 Minimum Cost Path Based on the cost function, minimum-cost connections representing neuronal structures are derived. For this purpose, minimum-cost paths are determined by summing up all local costs cj encountered along the path to node i resulting in a global cost gi for the whole path [5,6,7]:
Fast and Accurate Connectivity Analysis Between Functional Regions
229
Fig. 2. Tensor probability profiles (z-axis) of a linear and an almost spherical tensor for Eq. 1 (left in each section) and Eq. 2 (right in each section), plotted as a function of the azimuthal and polar angle (x- and y-axis) of the corresponding ellipsoid.
gi =
i
cj → min .
(3)
j=1
In this way, connections are derived fulfilling the global optimum condition. Thereby, the pathfinding algorithm preferably propagates paths with low global cost by comparing the cost of all nodes on the open list. As a result, the computed connections between all voxels of the start region and the target region are guaranteed to be optimal. 2.5 Efficient Search To ensure that the search is oriented towards the target, the A∗ algorithm takes advantage of an estimate hi of the remaining cost (heuristic). If no heuristic is used, the algorithm expands equally in all directions resulting in a greater search space and an increased computational cost. If hi is admissible, i.e. it never overestimates the cost to the target, then A∗ is guaranteed to find the path with lowest cost. For this purpose, a gradient hi is employed in the cost function fi = gi + hi to direct the search towards the target: di ˆ hi = ·c. (4) smax Thereby, di denotes the Euclidean distance of node i to the target which is normalized according to the maximum step length smax within the grid. The minimum number of steps to the target di /smax is multiplied with the estimated average cost per step ˆc which is determined by sampling the data for short connections in regions with high FA. The smallest cost among the samples is then assigned to ˆc. This additional term of the cost function fi directs the search towards the target. According to our observations, it can be empirically approved that the heuristic is admissible since the resulting paths did not differ from paths computed without heuristic. The computing time for the search could be reduced considerably which is outlined in more detail in Section 3.
3 Results and Discussion For evaluation purposes, two proband and three patient DTI datasets (voxel size: 1.875×1.875×1.9 mm3 , 128×128×60 voxels) were acquired with a Siemens Sonata
230
D. Merhof et al.
1.5 Tesla scanner. For all datasets, seed regions corresponding to the Broca’s and Wernicke’s speech areas derived from fMRI were available. All computations were performed on a PC equipped with an Intel Pentium 4, 3.4 GHz, and 2 GB RAM. For pathfinding, we used the 74-neighbor grid (see Section 2.2) with a maximum step length of 1.5 mm. Similarly to [5,6,7], we emploed a FA threshold TFA to restrict the search to regions of white matter only. In all our experiments, TFA amounted to 0.3 excluding nodes with an FA value below the threshold from further processing. Our evaluation investigates the quality of the obtained paths, the computing time is analyzed and a comparison with other techniques is drawn. Quality Analysis. To investigate the accuracy of the proposed cost function (Equation 2) in comparison to a cost function based on the product of the normalized probability profile (Equation 1) and the local FA value [5], we employed the validity index introduced by Jackowski [5]. The validity index computes the scalar product between path tangent and major eigenvector for each segment and returns the average value for the whole path. Accordingly, we also recorded the average probability according to the probability profile of Equation 2 and the average FA value for each path. Table 1 shows the minimum, maximum and average value for the connections derived between speech areas in two of our datasets. As a result, our cost function achieves better results compared to the normalized tensor profile commonly employed [5,6,7]. In addition to that, our approach reaches equal accuracy with respect to the validity index compared to wavefront evolution [5]. Table 1. Evaluation based on validity index, probability profile and FA represented by average, maximum and minimum value of all fibers Cost Function Number of fibers Validity Index Probability Profile Fractional Anisotropy
Equation 1, FA Patient 1 Patient 2 17 21 avg min max avg min max 0.75 0.69 0.87 0.83 0.82 0.86 0.68 0.65 0.70 0.63 0.62 0.64 0.58 0.54 0.61 0.56 0.55 0.57
Equation 2 Patient 1 Patient 2 19 23 avg min max avg min max 0.77 0.70 0.88 0.86 0.85 0.87 0.75 0.73 0.82 0.70 0.69 0.70 0.61 0.57 0.68 0.56 0.55 0.57
For illustration, the local probability according to Equation 2 is visualized in each step by color encoding assigning red to a low and green to a high value of the probability profile from Equation 2. In Figure 3, color coding is used to compare standard fiber tracking based on streamline propagation [1] (RK-4 integration, step size 0.5 mm) and pathfinding with regard to their exactness. The red segments encountered in case of fiber tracking indicate that spherical tensors were crossed. Contrarily, a more reliable path was obtained using pathfinding which takes into account the entire probability profile of the local tensor which resulted in paths including anisotropic tensors. Computing Time. Since the algorithm aims at clinical application requiring fast interaction times, computational cost is of major concern in addition to accuracy (see Table 2). Implementation features essentially contributing to high processing times are an
Fast and Accurate Connectivity Analysis Between Functional Regions
231
Patient 1
Broca
Broca Wernicke
Wernicke
Patient 2
Broca
Wernicke
Broca
Wernicke
Fig. 3. Upper row (Patient 1): Patient with a cavernoma. Lower row (Patient 2): Patient with a glioblastoma multiforme (WHO grade IV) having speech dominance on the right hemisphere. The respective lesion is shown in red in each patient. Pathfinding (left) vs. fiber tracking (right), color coding shows that pathfinding provides more precise results. Table 2. Average computing time per path and expansion of search grid (number of nodes) for search based on global cost gi (Eq. 2) and speed up encountered with heuristic hi Performance fi = gi fi = gi + hi
Computing Time Patient 1 Patient 2 42.0 sec 82.8 sec 14.9 sec 19.5 sec
Number of Grid Nodes Patient 1 Patient 2 240 705 318 163 106 414 163 586
efficient implementation of the open and closed list since they encounter frequent access and have to administrate a high number of nodes. For this reason, we used a combination of buckets and sorted vectors for the open list and a hash map for the closed list. Incorporation of the heuristic resulted in a speed up of approximately 70%, compared to pathfinding solely using the cost function. In each case, the resulting paths remained the same, since the heuristic is admissible according to empirical observations. Comparison with other Approaches. As outlined in Figure 3, standard fiber tracking is inappropriate for connectivity analysis in subcortical areas requiring alternative approaches. Since it is anticipated that neuronal connections are kept optimal [11], minimum-cost approaches have been developed to model connectivity [5,6,7]. In comparison to the normalized tensor profiles employed in recent work [5,6,7], connectivity results were significantly improved using our novel cost function characterizing both
232
D. Merhof et al.
shape and anisotropy of the local tensor. From the algorithmic point of view, pathfinding is computationally more efficient than other graph-based techniques such as [6], since it can be proven that no other search algorithm which is guaranteed to find the minimumcost path requires less computational expense than A∗ [9]. Apart from that, the presented grid structure provides high resolution and a considerably increased number of neighbor nodes to sample the search space very densely which is superior compared to the grid used in [6]. Overall, the presented approach provides comparable or even better accuracy compared to other approaches and is, at the same time, by far faster than other current approaches.
4 Conclusion and Future Work Based on previous work introducing pathfinding for the problem of neuronal connectivity within the human brain, we presented an improved cost function, a high resolution grid for sampling the search space and an efficient implementation enabling interactive application. Accurate paths according to different quality measures were obtained. The approach has several advantages over existing methods, such as highly efficient processing times which is important for clinical application. Since the quality of the obtained connections would further benefit from tensor field regularization or higher-order tensor representations derived from high angular resolution diffusion images, future work will aim at incorporation of these techniques.
Acknowledgments This work was supported by the Deutsche Forschungsgemeinschaft in the context of SFB 603, Project C9. We are especially grateful to Peter Grummich (Dept. of Neurosurgery, University of Erlangen-Nuremberg, Germany) for speech fMRI.
References 1. Mori, S., Crain, B., Chacko, V., van Zijl, P.: Three-dimensional tracking of axonal projections in the brain by magnetic resonance imaging. Ann Neurol 45 (1999) 265–269 2. Behrens, T., Johansen-Berg, H., Woolrich, M., Smith, S., Wheeler-Kingshott, C., Boulby, P., Barker, G., Sillery, E., Sheehan, K., Ciccarelli, O., Thompson, A., Brady, J., Matthews, P.: Non-invasive mapping of connections between human thalamus and cortex using diffusion imaging. Nat Neurosci 6 (2003) 750–757 3. Bj¨ornemo, M., Brun, A., Kikinis, R., Westin, C.F.: Regularized stochastic white matter tractography using diffusion tensor MRI. In: MICCAI. (2002) 435–442 4. O’Donnell, L., Haker, S., Westin, C.F.: New approaches to estimation of white matter connectivity in diffusion tensor MRI: Elliptic PDEs and geodesics in a tensor-warped space. In: MICCAI. (2002) 459–466 5. Jackowski, M., Kao, C., Qiu, M., Constable, R., Staib, L.: White matter tractography by anisotropic wavefront evolution and diffusion tensor imaging. Med Image Anal 9 (2005) 427–440 6. Fout, N., Huang, J., Ding, Z.: Visualization of neuronal fiber connections from DT-MRI with global optimization. In: ACM Symposium on Applied Computing (SAC). (2005) 1200–1206
Fast and Accurate Connectivity Analysis Between Functional Regions
233
7. Merhof, D., Enders, F., Hastreiter, P., Ganslandt, O., Fahlbusch, R., Nimsky, C., Stamminger, M.: Neuronal fiber connections based on A∗ -pathfinding. In: SPIE Medical Imaging. (2006) 8. Hart, P., Nilsson, N., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics 4 (1968) 100–107 9. Dechter, R., Pearl, J.: Generalized best-first search strategies and the optimality of A*. Journal of the ACM 32 (1985) 505–536 10. Basser, P., Pierpaoli, C.: Microstructural and physiological features of tissues elucidated by quantitative diffusion tensor MRI. J Magn Reson B 111 (1996) 209–219 11. Essen, D.: A tension-based theory of morphogenesis and compact wiring in the central nervous system. Nature 385 (1997) 313–318
Riemannian Graph Diffusion for DT-MRI Regularization Fan Zhang and Edwin R. Hancock Department of Computer Science, University of York, York, YO10 5DD, UK
Abstract. A new method for diffusion tensor MRI (DT-MRI) regularization is presented that relies on graph diffusion. We represent a DT image using a weighted graph, where the weights of edges are functions of the geodesic distances between tensors. Diffusion across this graph with time is captured by the heat-equation, and the solution, i.e. the heat kernel, is found by exponentiating the Laplacian eigen-system with time. Tensor regularization is accomplished by computing the Riemannian weighted mean using the heat kernel as its weights. The method can efficiently remove noise, while preserving the fine details of images. Experiments on synthetic and real-world datasets illustrate the effectiveness of the method.
1
Introduction
Diffusion tensor MRI is an important tool for non-invasive exploration of the anatomic structure of the white matter in vivo. It endows each voxel with a 3 × 3 symmetric positive-definite matrix, which measures the anisotropic behavior of water diffusion in white matter. Due to the motion of the ventricles and the partial volume effect, DT images are often noisy. Thus, regularization is necessary before fiber tracking can commence. Because of the high dimensionality of the tensor data and constraints imposed by the curved feature space of tensors, traditional scalar image smoothing techniques are often no longer applicable. Generally, tensor field regularization can be realized in one of the three ways: smoothing the whole tensors [1,2], separately smoothing the eigenvectors and eigenvalues of tensors [3,4,5], and directly smoothing the diffusion weighted images before estimation of the tensor [6]. The aim of this paper is to show how the heat kernel can be used to develop a graph-spectral method for tensor field regularization by working on the Riemannian feature space of tensors. Unlike prior literature on image smoothing using continuous partial differential equations (PDEs), our method is based on the heat equation for discrete structures [7]. We represent tensor-valued images as weighted undirected graphs, where the edge weights are functions of geodesic distances between tensors. Diffusion across this weighted graph with time is captured by the heat-equation, and the solution, i.e. the heat kernel, is found by exponentiating the Laplacian eigen-system with time. Diffusion tensor image regularization is accomplished by convolving the heat kernel with the image, R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 234–242, 2006. c Springer-Verlag Berlin Heidelberg 2006
Riemannian Graph Diffusion for DT-MRI Regularization
235
which is equal to the Riemannian weighted mean due to the fact that tensors are points on a Riemannian manifold [8,9,2]. The next section of this paper summarises the Riemannian space of the tensors, and defines the geodesic distance and Riemannian weighted mean on it, which are necessary for the main algorithm. Section 3 proposes the new graph diffusion-based method for tensor field regularization. Experiments on synthetic and real DT-MRI data are shown in Section 4. Section 5 concludes the paper.
2
Riemannian Space of Tensors
Let Σ(r) be the set of r × r real matrices. Recall that in Σ(r) the Euclidean inner products, which is known as the Frobenius inner product, is defined as A, BF = tr(AT B), where tr(·) denotes the trace and superscript T denotes the transpose. For a matrix A whoseeigen-decomposition is A = U ΛU T , the ∞ 1 k exponential of A is given by exp A = k=0 k! A = U exp(Λ)U T , and the inverse ∞ (I−A)k logarithm of A is given by log A = − k=1 k = U log(Λ)U T , where I is the identity matrix. Let S(r) be the space of r × r symmetric matrices and S + (r) be the space of symmetric positive-definite matrices. Thus, the space M of diffusion tensors is identified with S + (3). Through the identity mapping ψ : P ∈ S + (r) → (σ11 , ..., σij ), i ≤ j, S + (r) is isomorphic with an open subset Ω of the Rm where m = 12 r(r + 1). Thus we could consider S + (r) as a m-dimensional differential manifold with (Ω, ψ) as the coordinate system. At each point P ∈ S + (r) the tangent space TP S + (r) is equal to S(r). So a basis of TP S + (r) can be defined as ∂σ∂ij ↔ Πij ∈ S(r), i ≤ j, where Πij = 1ii if i = j, Πij = 1ij + 1ji if i = j, and 1ij means the r × r matrix with 1 at element (i, j) and 0 elsewhere. We can turn S + (r) into a Riemannian manifold by introducing a Riemannian metric g at P [10], i.e. g( ∂σ∂ij , ∂σ∂kl ) = g(Πij , Πkl ) = tr(P −1 Πij P −1 Πkl ). This is the same as the affine-invariant inner product used by [8,11,2] , i.e. A, BP = tr(P −1 AP −1 B), A, B ∈ TP S + (r). Thus, for a smooth curve C : [a, b] → S + (r), the length of C(t) can be computed via the invariant metric, i.e. (C) = b b C (t) C(t) = a tr(C(t)−1 C (t))2 . The distance between two points A, B ∈ a S + (r) is the infimum of lengths of curves connecting them, i.e. d(x, y) := argmin C
{ (C)|C(a) = A, C(b) = B }. The curve satisfying this infimum condition is a geodesic. In S + (r) the geodesic with initial point at identity matrix I and tangent vector X ∈ TI S + (r) is given by exp(tX). Using invariance under group actions [11], an arbitrary geodesic Γ (t) such that Γ (0) = P and Γ (0) = X 1 1 1 1 is given by Γ(P,X) (t) = P 2 exp(tP − 2 XP − 2 )P 2 . Thus, the geodesic distance between two points A and B in S + (r) is n dg (A, B) = log(A−1 B)F = (logλi )2 , (1) i=1
where λi are the eigenvalues of A−1 B.
236
F. Zhang and E.R. Hancock
We can relate an open subset of the tangent space TP S + (r) with a local neighborhood of P in S + (r) using the exponential map Exp : Ω ⊂ TP S + (r) → S + (r), which is defined as ExpP (X) = γ(P,X) (1). Geometrically, ExpP (X) is a point of S + (r) obtained by marking out a length equal to |X| commencing from X P , along a geodesic which passes through P with velocity equal to |X| . From equation 2, it follows ExpP (X) = P 2 exp(P − 2 XP − 2 )P 2 . 1
1
1
1
(2)
Since ExpP is a local diffeomorphism, it has an inverse map, the so-called logarithmic map LogP : S + (r) → B (0) ⊂ TP S + (r) where LogP (γ(P,X) (t)) = tX. Thus, for a point A near P it also follows LogP (A) = P 2 log(P − 2 AP − 2 )P 2 . 1
2.1
1
1
1
(3)
Riemannian Weighted Mean
Consider a set of n points (or tensors) p1 , ..., pn residing on the Riemannian + manifold M n= S (3) of tensors, with corresponding weights a1 , ..., an ∈ R such that k=1 ak = 1. An efficient way of characterising the points is to compute their weighted mean. If the points are in Σ(3), recall that the arithmeticEuclidean weighted mean q can be defined as the linear combination n q = k=1 ak pk , From a variational standpoint the Euclidean weighted mean minimizes the sum nof the weighted squared distances f (q) to the given points pk , i.e. f (q) = 12 k=1 ak de (q, pk )2 where de (q, pk ) = q − pk is the Euclidean distance between q and pk . To generalise the least squares minimization to the manifold M , one intuitive way is to replace de (q, pk ) by the shortest geodesic distance dg (q, pk ) on M between the mean q and the point pk . Thus, we can define the Riemannian weighted mean as the point q that minimizes the sum of geodesic distances n f (q) = 12 k=1 ak dg (q, pk )2 . The mean q is a critical point of f , i.e. the location where the directional first derivatives of f are all zero. To locate the minimum of f , one widely used and easy to implement numerical method is to use steepest gradient descent[8,9,2]. At a point q ∈ M , the gradient of f with respect to n q is ∇f (q) = − k=1 ak Logq (pk ). The minimum of f is located at the critical point q, ∇f (q) = 0, where the Euclidean weighted mean of tangent vectors Logq (pk ) ∈ Tq M is zero. The iteration of the classical gradient descent method can be stated as q t+1 = q t − ∇f . This iteration scheme means that the estimated mean q t+1 moves towards mean q in the direction of steepest descent. Thus, the intrinsic gradient descent algorithm on M gives the following iteration scheme for estimating the Riemannian weighted mean of the tensor set p1 , ..., pn , n q t+1 = Expqt ( ak Logqt (pk )), k=1
where Exp and Log operators are defined in Eq. 2 and Eq. 3 respectively.
(4)
Riemannian Graph Diffusion for DT-MRI Regularization
3
237
Graph Tensor Regularization
Let be the diffusion tensor image under regularization. Following recent work on graph-spectral methods for image segmentation [12] we abstract a tensorvalued image using a weighted graph G = (V, E, w), where the nodes V of the graph are the voxels of the image , and an edge is formed between every pair of nodes. The weight of each edge, w(i, j) ∈ [0, 1], is a function characterizing the relationship between the voxels i and j. Since we wish to adopt a graph-spectral approach, we introduce the weighted adjacency matrix W for the graph where the elements are W (i, j) = w(i, j) if (i, j) ∈ E, and W (i, j) = 0 otherwise. We also construct thediagonal degree matrix D, whose elements are given by D(i, i) = deg(i) = j∈V W (i, j). From the degree matrix and the weighted adjacency matrix we construct the Laplacian matrix L = D − W , i.e. the degree matrix minus the weighted adjacency matrix. The spectral decomposition of the Laplacian matrix is L = ΦΛΦT where Λ = diag(λ1 , λ2 , ..., λ|V | ) is the diagonal matrix with the eigenvalues ordered according to increasing magnitude (0 = λ1 < λ2 ≤ λ3 ...) as elements and Φ = (φ1 |φ2 |....|φ|V | ) is the matrix with the correspondingly ordered eigenvectors as columns. Since L is symmetric and positive semi-definite, the eigenvalues of the Laplacian are all positive. The eigenvector φ2 associated with the smallest non-zero eigenvalue λ2 is referred to as the Fiedler-vector. 3.1
Heat Kernel on Graphs
We here introduce the heat equation associated with the Laplacian [13], i.e. ∂ht = −Lht ∂t
(5)
where ht is the heat kernel and t is time. The heat-kernel satisfies the initial condition h0 = I|V | where I|V | is the |V | × |V | identity matrix. The heat kernel can hence be viewed as describing the flow of heat across the edges of the graph with time. The rate of flow is determined by the Laplacian of the graph. The solution to the heat equation is found by exponentiating the Laplacian eigen-spectrum, i.e. ht = e−tL = Φe−tΛ ΦT .
(6)
The heat kernel is a |V | × |V | matrix, and for the nodes i and j of the graph G |V | the resulting element is ht (i, j) = i=1 e−λi t φi (i)φi (j). When t tends to zero, then ht I − Lt, i.e. the kernel depends on the local connectivity structure or topology of the graph. If, on the other hand, t is large, then ht e−tλ2 φ2 φT2 , where λ2 is the smallest non-zero eigenvalue and φ2 is the associated eigenvector, i.e. the Fiedler vector. Hence, the large time behavior is governed by the global structure of the graph.
238
3.2
F. Zhang and E.R. Hancock
Regularization as Graph Diffusion
It is useful to consider the following picture of the heat diffusion process on graphs. Suppose that we inject a unit amount of heat at the node k of a graph, and allow the heat to diffuse through the edges of the graph. The rate of diffusion along the edge E(i, j) is determined by the edge weight w(i, j). At time t, the value of the heat kernel ht (k, j) can be interpreted as the amount of heat accumulated at node j. We would like to regularise the tensor image using the heat flow on graph G. To do this we inject at each node of G an amount of heat energy equal to the tensor of the associated voxel. The heat at each node diffuses through the graph edges as time t progresses. The edge weight plays the role of thermal conductivity. If two voxels belong to the same region, then the associated edge weight is large. As a result heat can flow easily between them. On the other hand, if two voxels belong to different regions, then the associated edge weight is very small, and hence it is difficult for heat to flow from one region to another. From the viewpoint of geometry, tensors are points of the Riemannian manifold M = S + (3). The above process means we let the tensor points move according to its local neighbors. We believe that the difference between two tensor points belonging to the same region is a result of noise only. We therefore, want them to move toward each other. Contrarily, if two tensor points belong to different regions, we want them stay where they are. The prior similarity between two voxels of is expressed in terms of the edge weight of G, which depends on two factors: 1). The geodesic distances between the tensors. 2). The Euclidean distances between the locations of the voxels. We encode all the tensors associated with the voxels of the image as a block column vector I0 , whose elements I0 (i) are 3 × 3 tensors. We would like to minimize the influence of one region on another. This behavior is captured by the weight matrix
dg (I0 (i),I0 (j))2 d (X(i)−X(j))2 W (i, j) =
−
e 0
κ2 I
∗e
−
e
κ2 X
if de (X(i) − X(j)) ≤ r otherwise
(7)
where the symbols used are as follows: X(i) is the location of the voxel i; de (X(i)−X(j)) is the Euclidean distance between voxels i and j; dg (I0 (i), I0 (j)) is the geodesic distance between tensors I0 (i) and I0 (j), i.e. dg (I0 (i), I0 (j)) = log(I0 (i)−1 I0 (j)) F , as defined in Eq. 1. The values of κI and κX are usually set to 10% ∼ 20% of the total range of the distance functions dg and de . They has the effect of controlling the velocity of the diffusion process. The heat diffusion process described above is governed by the same PDE as Eq. 5, however the initial conditions are different. Now the initial heat residing at each node is determined by the corresponding tensor. The evolution of the tensors I0 of image follows the equation ∂It = −LIt . (8) ∂t In fact, if we view each tensor is composed of six independent components, the above diffusion equation comprises a system of six coupled PDEs. The coupling
Riemannian Graph Diffusion for DT-MRI Regularization
239
results from the fact that the edge weight of G depends on all the components of the tensors. The solution of Eq. 8 is It = e−tL I0 = ht I0 . As a result the smoothed tensor of the voxel j at time t is It (j) =
|V |
I0 (i) × ht (i, j).
(9)
i=1
This is a measure of the total energy to flow from the remaining nodes to node j during the elapsed time t. When t is small, we have It (I − Lt)I0 . Each row i of the heat kernel ht satisfies the conditions 0 ≤ ht (i, j) ≤ 1 for ∀j and |V | j=1 ht (i, j) = 1. Due to the constraints of the tensors I0 (i) imposed by the Riemannian space M = S + (3), the smoothed tensors It (j) should always stay on the manifold M . According to section 2.1, It (j) in Eq. 9 is the Riemannian weighted mean of the tensor set {I0 (i)|i = 1, ..., |V |}. The Riemannian weighted mean has the heat kernel as its weight set {ht (i, j)|i = 1, ..., |V |}. Thus, It (j) for voxel j at time t can be easily found by the intrinsic gradient descent, as stated in Eq. 4, i.e. |V | It (j)k+1 = ExpIt (j)k ( ht (i, j)LogIt (j)k (I0 (i))),
(10)
i=1
where Exp and Log operators on M are defined in Eq. 2 and Eq. 3 respectively.
4
Experiments
We have applied our Riemannian graph-spectral filter to synthetic and real-world tensor-valued images. In top row of Fig. 1, we generate a tensor field with three different regions, and add IID additive noise to eigenvectors and eigenvalues of the tensors respectively. As the figure shows, our graph diffusion well restore the coherence of the the field inside each structure without destroying the interfaces between regions. In order to better evaluate our algorithm on real-world DTMRI, we generate a synthetic fiber bundle in the bottom row of Fig. 1, which is again corrupted by additive noise. The result shows our method perfectly recovers the fine details of the structure, while smoothes out the noise. Since for real-world DT-MR brain images, typically 128×128×n(n > 40) , the number of image voxels is very large, it is time and space consuming to find the exact solution of our algorithm. We thus introduce the following approximation scheme to simplify the computations involved. Here we make use of the fact that the elements of the heat kernel ht (i, j) decay exponentially with the path length (the shortest distance) between nodes i and j. As a result when we compute It (j) for node j, we could ignore the effect of node i if the path length between nodes i and j is larger than . Hence, we can restrict our attention to voxels that are close to one another. To do this, we use a smaller n1 × n2 × n3 volume window surrounding each voxel to construct a smaller sub-graph. We then use the heat kernel of this smaller sub-graph to calculate the Riemannian weighted mean It (j) of the voxel j at time t.
240
F. Zhang and E.R. Hancock
Fig. 1. Left column: noisy synthetic image. Right column: the corresponding regularized image. Both using parameters r = 5, κI = 0.6, κX = 2.
(a)
(b)
(c)
(d)
Fig. 2. (a) A slice of DT-MRI. (b) Graph regularized slice using approximate scheme by window size 10 × 10 × 5, r = 4, κI = 0.4, κX = 1.7. (c) and (d) are the corresponding fractional anisotropy map.
Riemannian Graph Diffusion for DT-MRI Regularization
241
Fig.2 shows a slice of the results of applying our method to a brain DT-MRI dataset of size 128 × 128 × 58. The top row gives ellipsoid representation of the sample slice and it’s regularized counterpart. The region with same color implies it’s homogeneous structure inside. Thus, our method can sufficiently smooth each region. The bottom row gives the fractional anisotropy map [14] of the corresponding slice in the top row. The bright regions are potential places where fibers can be tracked. As the figure shows, discontinuities are preserved and the local coherence of the anisotropy map is enhanced after regularization.
5
Conclusions
We presented a graph-spectral method for diffusion tensor image regularization. The idea of our method is to represent DT-MRI using weighted attributed graphs, where diffusion across this graph-structure is captured by the heat kernel. DT-MRI regularization is effected by computing the Riemannian weighted mean using the heat kernel as its weights. The method can effectively eliminate the noise and preserve the coherence of the structures. Results on synthetic and real-world datasets proves the validation of the proposed algorithm.
References 1. C. Chefd’Hotel, D. Tschumperle, R. Deriche and O. Faugeras : Constrained flows of matrix-valued functions: Application to diffusion tensor regularization. In: Proc. of ECCV. (2002) 251–265 2. X. Pennec, P. Fillard, N. Ayache : A riemannian framework for tensor computing. International Journal of Computer Vision 66(1) (2006) 41–66 3. C. Poupon, J. Mangin, C. Clark, V. Frouin, J. Regis, D. Le Bihan and I. Bloch : Towards inference of human brain connectivity from mr diffusion tensor data. Medical Image Analysis 5 (2001) 1–15 4. D. Tschumperle and R. Deriche : Diffusion tensor regularization with constraints preservation. In: Proc. of CVPR. (2001) 948–953 5. O. Coulon, D. Alexander, S. Arridge : Diffusion tensor magnetic resonance image regularization. Medical Image Analysis 8 (2004) 47–67 6. Z. Wang, B. Vemuri, Y. Chen, T. Mareci : A constrained variational principle for direct estimation and smoothing of the diffusion tensor field from complex dwi. IEEE Trans. on Medical Imaging, 23(8) (2004) 930–939 7. Kondor, R., Lafferty, J.: Diffusion kernels on graphs and other discrete structures. In: Proc. of ICML. (2002) 8. P. Fletcher and S. Joshi : Principal geodesic analysis on symmetric spaces: Statistics of diffusion tensors. In: Proc. of CVAMIA, ECCV Workshop. (2004) 87–98 9. P. Batchelor, M. Moakher, D. Atkinson, F. Calamante and A. Connelly: A rigorous framework for diffusion tensor calculus. Mag. Res. in Medicine 53 (2005) 221–225 10. L. Skovgaard : A riemannian geometry of the multivariate normal model. Scandinavian Jour. of Statistics 11 (1984) 211–223 11. M. Moakher : A differential geometric approach to the geometric mean of symmetric positive-definite matrices. SIAM J. Mat. Anal. and App. 26 (2005) 735–747
242
F. Zhang and E.R. Hancock
12. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8) (2000) 888–905 13. Chung, F.: Spectral Graph Theory. American Mathmatical Society (1997) 14. C. Westin, S. Maier, H. Mamata, A. Nabavi, F. Jolesz and R. Kikinis: Processing and visualization for diffusion tensor mri. Medical Image Analysis 6 (2002) 93–108
High-Dimensional White Matter Atlas Generation and Group Analysis Lauren O’Donnell1,2 and Carl-Fredrik Westin1,3 1
2
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge MA, USA
[email protected] Harvard-MIT Division of Health Sciences and Technology, Cambridge MA, USA 3 Laboratory for Mathematics in Imaging, Brigham and Women’s Hospital, Harvard Medical School, Boston MA, USA
Abstract. We present a two-step process including white matter atlas generation and automatic segmentation. Our atlas generation method is based on population fiber clustering. We produce an atlas which contains high-dimensional descriptors of fiber bundles as well as anatomical label information. We use the atlas to automatically segment tractography in the white matter of novel subjects and we present quantitative results (FA measurements) in segmented white matter regions from a small population. We demonstrate reproducibility of these measurements across scans. In addition, we introduce the idea of using clustering for automatic matching of anatomical structures across hemispheres.
1
Introduction
The use of diffusion MRI tractography [1] to select regions of interest (ROIs) in cerebral white matter has recently become a popular technique. The regions of interest, thought to correspond to particular anatomical white matter tracts, have been employed in quantitative analysis of scalar measures derived from the diffusion tensor, such as anisotropy values or mean diffusivities [2,3,4]. Ideally, in order to perform neuroscientific studies using white matter regions of interest, these regions should be automatically identified, correspond across subjects, and have anatomical labels. Our work addresses these three goals. Related work on tractography segmentation includes multiple methods for calculation of tract-specific regions. One class of methods uses trajectories generated via tractography and groups them into regions either interactively or automatically. Manual interactive grouping of trajectories using multiple selection regions of interest (also known as “virtual dissection” [5]) has been performed
We acknowledge the following grants: NIH NIBIB NAMIC U54-EB005149, NIH NCRR NAC P41-RR13218, NIH NCRR mBIRN U24-RR021382, and R01 MH 50747. Thank you to Susumu Mori at JHU for the diffusion MRI data (RO1 AG20012-01 / P41 RR15241-01A1). Thanks to Lilla Zollei for the congealing registration code and to Alexandra Golby, Tomas Izo, and Gordon Kindlmann for helpful conversations about this project.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 243–251, 2006. c Springer-Verlag Berlin Heidelberg 2006
244
L. O’Donnell and C.-F. Westin
to create a fiber tract atlas [6] and in several clinical studies [2,3,4]. In fact, Partridge et al. found tractography-based definitions of a pyramidal tract ROI to be more reproducible than manual ROI drawing [4]. Automated trajectory grouping using clustering algorithms has been proposed by Brun et al. [7], Gerig et al. [8], O’Donnell et al. [9], and Corouge et al. [10]. Automated trajectory grouping via atlas-based labeling of tractography was described by Maddah et al. [11] who manually created a tractography atlas and gave a method for transferring its labels to a novel subject. A second class of methods for generation of tract-specific ROIs (including approaches by Behrens et al. [12] and Parker et al. [13]) defines regions based on probabilistic connectivity measures. We present a two-step method which automatically finds white matter structures present in a group of tractography datasets (atlas generation), and has a natural extension to find those structures in new datasets (automatic segmentation). The information that allows labeling of a new dataset is a high-dimensional representation of white matter structure that we call an atlas. However, it differs from traditional digital (voxel-based) atlases because it represents long-range connections from tractography rather than local voxel-scale information.
2 2.1
Methods Population Data
Diffusion tensor MRI (single-shot spin echo EPI diffusion-weighted images, from a 1.5 T Philips scanner with SENSE parallel imaging, acquired along 30 noncollinear gradient directions (b = 700s/mm2 ), with five non-diffusion-weighted T2 images, 2.5 x 2.5mm x 2.5mm voxels, and 50-60 slices per subject covering the entire hemispheres and the cerebellum) was analyzed for 15 subjects. Tractography was performed in each subject using Runge-Kutta order two integration, with the following parameters: seeding threshold of cL 0.25, stopping threshold of cL 0.15, step size 0.5mm, and minimum total length 25mm. The quantity cL is defined as √ λ21 −λ22 2 , where λ1 , λ2 , and λ3 are the eigenvalues of the diffusion λ1 +λ2 +λ3
tensor sorted in descending order [14]. Group registration of subject FA (fractional anisotropy [15]) images was performed using the congealing algorithm [16] to calculate rotation, translation, and scaling (no shear terms). This registration was then applied to the trajectories generated via tractography. 2.2
Step One: Atlas Generation
To find common white matter structures in the population, we used our framework [9] for simultaneous clustering and matching of bundles across subjects. To complete the atlas, each cluster was given an expert anatomical label. Similarity of Fiber Trajectories. Tractography clustering methods assume that trajectories that begin near each other, follow similar paths, and end near each other should belong to the same anatomical structure. Various affinity measures have been proposed in the literature to quantify this assumption [17,7,8,10].
High-Dimensional White Matter Atlas Generation and Group Analysis
245
In this study we employed the mean closest point distance [10,18] which is defined as the average distance from each point on one trajectory to the nearest point on another trajectory. We symmetrized this distance by taking the minimum of the two possible distances. Each distance (dij ) was converted to an affinity (aij ) via a Gaussian kernel, aij = e− d2ij /σ 2 , with σ of 30mm. The clustering and automatic segmentation is insensitive to small registration errors or anatomical differences due to the capture range of the similarity measure. Similarity Across Hemispheres. To facilitate visual and quantitative comparison of anatomical structures which are present bilaterally, it is useful for the clusters (and their colors for visualization based on spectral embedding coordinates [17]) to correspond across hemispheres. To accomplish this goal, before computing the similarity metric we reflected one side of each brain across the midsagittal plane, such that trajectories with similar shapes and locations in either hemisphere would cluster together, automatically giving anatomical correspondences. (We defined the midsagittal plane as the midsagittal plane of the average group registered FA image.) This method gave better separation of some anatomical structures, for example the inferior parts of the cingulum from the inferior parts of the fornix. The improvement in clustering is due to the fact that reflecting across the midsagittal plane effectively doubles the number of prototype brain examples we have as input. Our atlas creation method is not dependent upon this reflection approach, however the bilateral matching is a useful additional property which we can obtain. We note that the success of the reflection approach would decrease in subjects with midline shift. Clustering Algorithm. As in [9], to find common anatomical structures in the population using the calculated affinities, we performed Normalized Cuts spectral clustering [19,20], a method whose application to tractography was first proposed by Brun et al. [7]. Spectral clustering methods employ eigenvectors of an affinity matrix to group data. We used the Nystrom method [20] to extend the eigenvector solution from a random subproblem (the normalized affinity matrix from a random sample of trajectories) to estimate a solution to the larger problem (eigenvectors for the whole population). The Nystrom method was used to estimate the leading eigenvectors of the normalized population affinity matrix 1
1
D− 2 WD− 2
(1)
where the symmetric matrix W contained affinities from all pairs of trajectories, and D was a diagonal matrix containing the row sums of W. To estimate these eigenvectors, first a random sample of trajectories was compared to all other trajectories to generate a partial population affinity matrix with parts A (the square affinity matrix from the random subproblem) and B (the affinities between the random sample and all other trajectories). See Figure 1. The normalization of the A and B matrices by D was then estimated as [20]: ar + br ˆ d= (2) bc + BT A−1 br
246
L. O’Donnell and C.-F. Westin A
B
B
S
T
T
S
Fig. 1. Diagram of the parts of the affinity (tract similarity) matrix. A and B are used in atlas construction and S contains affinities for embedding a new subject.
where ar and br are column vectors containing the row sums of A and B, and bc is the column sum of B. The eigenvectors U and diagonal eigenvalue matrix Λ of the normalized matrix A were calculated, and finally the population eigenvectors ¯ were estimated via projection of normalized affinity values in B onto the U ¯ was estimated via the following formula [20]. eigenvector basis from A. U U ¯ U= . (3) BT UΛ−1 Spectral embedding vectors were then calculated for each trajectory as Ej = ¯ j,2 , U ¯ j,3 , ..., U ¯ j,n ). This generated a coordinate system, the spectral em√ 1 (U Djj
bedding space, where each trajectory was represented as a point, and similar trajectories were embedded near each other. K-means clustering was then performed on these embedding vectors, giving k clusters. Using Expert Knowledge to Label Atlas Clusters. This step introduced high-level anatomical knowledge to label the common white matter structures discovered by the clustering step. The k clusters corresponded across subjects, in the sense that cluster number i represented approximately the same region for each. So providing higher-level anatomical information was reduced to the problem of defining k labels. We interactively labeled one subject, transferred the labels to the next, and worked through each atlas subject in this manner, ensuring that at the end all clusters had a high-level anatomical description. Due to the fact that tractography may cross from one anatomical structure to another, these anatomical labels represented the best approximate description of the regions discovered in atlas creation. Atlas Contents. The atlas consisted of a set of cluster centroids in the embedding space, plus information for mapping new trajectories into this space. The necessary information was the random sample of input trajectories used in the affinity calculations, plus (as formulas 2 and 3 show) the basis vectors in UΛ−1 , the matrix A−1 br for estimating the row sums, and the row (also column) sum ar + br . In addition the atlas contained per-cluster anatomical labels.
High-Dimensional White Matter Atlas Generation and Group Analysis
2.3
247
Step Two: Automatic Segmentation
By embedding new trajectories as points in the atlas embedding space, we were able to automatically segment tractography from novel subjects. First, affinity values were calculated by comparing the new subject’s trajectories to the random sample of trajectories which was saved as part of the atlas. As shown in Figure 1, this generated a new partial affinity matrix S, which then needed to be normalized using row and column sums as in (2). To keep the scaling of the old and new embedding vectors consistent, we estimated the row sum as ˆ row = sc + ST A−1 br d (4) where sc is the column sum of S. Then we employed the column sum from the original matrix. ˆ col = ar + br d (5) Performing the scaling in this way made sense for two reasons. First, if we were to re-embed a trajectory that we had already seen (whose information was in A or B) it would be mapped to the same location in the embedding space. Second, we would expect that each individual new trajectory would not significantly ˆ col of the (entire W) matrix, due to the fact that change the column sum d 30,000 trajectories were used in creation of the original atlas affinity matrices. Thus the scaling applied to a novel trajectory was basically the same as that which would have been applied if it were part of the original clustering problem. After normalization of the S matrix, the embedding vectors were calculated as in (3) for the B matrix. This step successfully embedded new trajectories as points in the clustered and labeled atlas embedding space. To create the automatic tractography segmentation, each new subject’s trajectories were labeled according to the nearest cluster centroid. 2.4
Parameters in Atlas Creation and Labeling
To form the atlas, we used tractography from subjects 1 through 10 (of 15) for the clustering and anatomical labeling. 3,000 trajectories were randomly selected from each subject as input to the clustering, giving 30,000 total trajectories to cluster. The size of the random sample used to create the A matrix was 2,500 and k = 200 clusters were generated. Finally, we labeled 10,000 trajectories from each of the 15 subjects using the atlas. These 10,000 trajectories were then used for measurement of FA and for creation of the images in this paper. 2.5
Region-Based Measurements
Any scalar invariant (cL , FA, mode, trace, eigenvalues, etc.) or the tensors themselves can be measured in the white matter ROIs (individual clusters or anatomically-labeled regions) defined by the atlas. To illustrate the method we calculated the FA using tensors sampled at each point on the trajectories. Another possibility would be to label voxels according to the clusters (or fraction thereof) they contain, and to use the original tensors directly for FA calculation.
248
3 3.1
L. O’Donnell and C.-F. Westin
Results Automatic Segmentation of Novel Subjects
Five novel subjects (11-15) were labeled using the atlas, and Figure 2 shows tractography from three of these subjects. Note that the labeling is consistent across subjects, and that major anatomical structures such as the corpus callosum, arcuate fasciculus, uncinate fasciculus, etc. are found in each subject.
S11
S11 (repeat scan)
S12
S13
Fig. 2. Result of automatic segmentation of novel subjects (11-13). Selected regions are shown as follows: navy blue, corpus callosum; yellow, corticospinal fibers; purple, arcuate fasciculus/SLF region; orange, uncinate fasciculus; green, inferior longitudinal fasciculus; sky blue, middle cerebellar peduncle; light pink, superior cerebellar peduncle; hot pink, inferior occipitofrontal fasciculus.
3.2
Measurement of Scalar Invariants in the Population
Figure 3 shows the mean FA measured in selected anatomical regions in all subjects. The FA was measured bilaterally (both hemispheres together). The mean FA is relatively similar across subjects (in each structure) yet differs across structures, as would be expected from anatomically consistent group measurements. We have plotted using bar graphs in order to show the pattern across subjects where some were found to have consistently high or low FA relative to other subjects, as seen clearest in the corpus callosum and corona radiatae graphs. 3.3
Reproducibility Experiment
Using the atlas, we automatically segmented tractography and then measured FA in two subjects (10 and 11) who had each been scanned three times. Automatically segmented trajectories from two scans of subject 11 were shown in Figure 2. The quantitative measurements of FA demonstrate reproducibility and are shown in Figure 4.
High-Dimensional White Matter Atlas Generation and Group Analysis Corpus Callosum FA
Corona Radiatae FA
Uncinate Fasciculi FA
0.5
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0.1
0
2
4
6
8
10
12
0
14
2
4
6
8
10
12
249
0
14
2
4
6
8
10
12
14
Fig. 3. Mean FA in three regions for 15 subjects. (The “corona radiatae” region includes trajectories in the cerebral peduncles, internal capsules, and corona radiatae.) Repeat FA (S10)
Repeat FA (S11)
0.5
0.5
0.45
0.45
0.4
0.4
0.35
0.35 CC
CR
UF
AF
OT
CB
CC
CR
UF
AF
OT
CB
Fig. 4. Mean FA measurements from three repeat scans of two subjects (10 and 11). Abbreviations are: CC, corpus callosum; CR, corona radiatae (plus cerebral peduncles and internal capsules); UF, uncinate fasciculi; AF, arcuate fasciculi; OT, inferior longitudinal (occipitotemporal) fasciculi; and CB, cingulum bundles. The arcuate measurements were made in one bilateral cluster containing the traditional C-shaped fibers.
4
Discussion and Conclusion
An important point when labeling tractography with anatomical names is that the correspondence between tractography and anatomical regions is not always perfect, for instance one trajectory may traverse part of the arcuate fasciculus and part of the external capsule. An advantage of the clustering approach is that common such trajectories will form clusters, and can be labeled as a mix of structures. However, detection of uncommon trajectories merits investigation. We have introduced a method for high-dimensional white matter atlas creation and automatic tractography segmentation which is robust to small errors in registration and can be used to find automatic region correspondences across hemispheres. In addition, we have shown that the tract-specific regions of interest obtained via automatic segmentation can be used to measure FA with consistency across subjects and reproducibility across scans of a single subject. Our technique enables quantitative neuroscience studies of the white matter in populations.
250
L. O’Donnell and C.-F. Westin
References 1. Basser, P., Pajevic, S., Pierpaoli, C., Duda, J., Aldroubi, A.: In vivo fiber tractography using DT–MRI data. Magnetic Resonance in Medicine 44 (2000) 625–632 2. Concha, L., Beaulieu, C., Gross, D.W.: Bilateral limbic diffusion abnormalities in unilateral temporal lobe epilepsy. Annals of Neurology 57(2) (2004) 188 – 196 3. Jones, D.K., Catani, M., Pierpaoli, C., Reeves, S.J., Shergill, S.S., O’Sullivan, M., Golesworthy, P., McGuire, P., Horsfield, M.A., Simmons, A., Williams, S.C., Howard, R.J.: Age effects on diffusion tensor magnetic resonance imaging tractography measures of frontal cortex connections in schizophrenia. Human Brain Mapping (2005) 4. Partridge, S.C., Mukherjee, P., Berman, J.I., Henry, R.G., Miller, S.P., Lu, Y., Glenn, O.A., Ferriero, D.M., Barkovich, A.J., Vigneron, D.B.: Tractographybased quantitation of diffusion tensor imaging parameters in white matter tracts of preterm newborns. Magnetic Resonance in Medicine 22(4) (2005) 467–474 5. Catani, M., Howard, R.J., Pajevic, S., Jones, D.K.: Virtual in vivo interactive dissection of white matter fasciculi in the human brain. NeuroImage 17 (2002) 77–9 6. Mori, S., Wakana, S., Nagae-Poetscher, L.M., van Zijl, P.C.: MRI Atlas of Human White Matter. Elsevier (2005) 7. Brun, A., Knutsson, H., Park, H.J., Shenton, M.E., Westin, C.F.: Clustering fiber traces using normalized cuts. In: MICCAI. (2004) 368–375 8. Gerig, G., Gouttard, S., Corouge, I.: Analysis of brain white matter via fiber tract modeling. In: EMBS. (2004) 426 9. O’Donnell, L., Westin, C.F.: White matter tract clustering and correspondence in populations. In: MICCAI. (2005) 10. Corouge, I., Gouttard, S., Gerig, G.: Towards a shape model of white matter fiber bundles using diffusion tensor MRI. In: ISBI. (2004) 344–347 11. Maddah, M., Mewes, A., Haker, S., Grimson, W.E.L., Warfield, S.: Automated atlas-based clustering of white matter fiber tracts from DTMRI. In: MICCAI. (2005) 188 – 195 12. Behrens, T., Johansen-Berg, H., Woolrich, M., Smith, S., Wheeler-Kingshott, C., Boulby, P., Barker, G., Sillery, E., Sheehan, K., Ciccarelli, O., Thompson, A., Brady, J., Matthews, P.: Non-invasive mapping of connections between human thalamus and cortex using diffusion imaging. Nature Neuroscience 6 (2003) 750– 757 13. Parker, G.J., Wheeler-Kingshott, C.A., Barker, G.J.: Estimating distributed anatomical connectivity using fast marching methods and diffusion tensor imaging. IEEE TMI 21(5) (2002) 505–512 14. Westin, C.F., Maier, S., Mamata, H., Nabavi, A., Jolesz, F., Kikinis, R.: Processing and visualization of diffusion tensor MRI. Medical Image Analysis 6(2) (2002) 93– 108 15. Basser, P., Pierpaoli, C.: Microstructural and physiological features of tissues elucidated by quantitative-diffusion-tensor MRI. J. Magn. Reson. Ser. B 111 (1996) 209–219 16. Zollei, L., Learned-Miller, E., Grimson, W.E.L., Wells III, W.M.: Efficient population registration of 3D data. In: ICCV 2005, Computer Vision for Biomedical Image Applications. (2005) 17. Brun, A., Park, H.J., Knutsson, H., Westin, C.F.: Coloring of DT-MRI fiber traces using Laplacian eigenmaps. In: EUROCAST. (2003) 564–572
High-Dimensional White Matter Atlas Generation and Group Analysis
251
18. Wang, X., Tieu, K., Grimson, E.: Learning semantic scene models by trajectory analysis. Technical report, MIT CSAIL (2006) 19. Shi, J., Malik, J.: Normalized cuts and image segmentation. PAMI 22(8) (2000) 888–905 20. Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the Nystrom method. PAMI 26(2) (2004) 214–225
Fiber Bundle Estimation and Parameterization Marc Niethammer1,2,3 , Sylvain Bouix1,2,3 , Carl-Fredrik Westin2 , and Martha E. Shenton1,3 1
3
Psychiatry Neuroimaging Laboratory, Brigham and Women’s Hospital, Harvard Medical School, Boston MA, USA 2 Laboratory of Mathematics in Imaging, Brigham and Women’s Hospital, Harvard Medical School, Boston MA, USA Laboratory of Neuroscience, VA Boston Healthcare System, Brockton MA, USA {marc, sylvain, westin, shenton}@bwh.harvard.edu
Abstract. Individual white matter fibers cannot be resolved by current magnetic resonance (MR) technology. Many fibers of a fiber bundle will pass through an individual volume element (voxel). Individual visualized fiber tracts are thus the result of interpolation on a relatively coarse voxel grid, and an infinite number of them may be generated in a given volume by interpolation. This paper aims at creating a level set representation of a fiber bundle to describe this apparent continuum of fibers. It further introduces a coordinate system warped to the fiber bundle geometry, allowing for the definition of geometrically meaningful fiber bundle measures.
1
Motivation and Background
Stroke, Alzheimer’s disease, epilepsy, multiple sclerosis, schizophrenia, and various other neuropsychiatric disorders have been implicated with white matter abnormalities [1]. Diffusion weighted magnetic resonance imaging (DW-MRI) measures the directional diffusivity of water molecules and has been used to detect white matter bundles. The current resolution of in-vivo DW-MRI scans is roughly in the millimeter range, whereas white matter fiber diameters are on the order of micrometers. Even though fiber tracing algorithms give the impression of extracting individual fibers, these cannot be individually resolved. Many fibers will be contained within an individual voxel. The best one can hope for is thus to follow fiber bundles or sub-bundles. Tractography algorithms need starting points to initialize fiber tracing. These can be individual (user prescribed) points or points contained in regions/volumes of interest defined by manual delineations in voxel space. The latter is used to capture fiber bundles. A fiber bundle can be represented by a finite number of streamlines, sampling its volume. To measure diffusion quantities (e.g., trace, FA, etc.) along a fiber, Ding et al. [2] propose to construct a bundle-centric medial axis. Mean diffusion quantities at a point, p, of the medial axis are then defined as the mean of those quantities at the locations of the streamlines passing through the normal plane of the medial axis at p. Corouge et al.[3] propose a R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 252–259, 2006. c Springer-Verlag Berlin Heidelberg 2006
Fiber Bundle Estimation and Parameterization
253
correspondence of streamline points based on equal arclength from a defined origin. The streamline sampling of a fiber bundle volume is not guaranteed to be uniform. Measuring fiber bundle quantities, based on this sampling, will thus favor regions of dense sampling unless one accounts for the non-uniformity of the streamline distribution. This paper proposes a methodology to represent the continuum of fibers constituting a fiber bundle implicitly, using three level set functions. In doing so, a local coordinate system is introduced, warped to the geometry of the fiber bundle, which for example allows for the computation of fiber bundle measures over continuous cross sections and may be used for visualization. Brun et al. [4] hinted at the usefuleness of such a local coordinate system for fiber parameterization related to their work of pseudo-coloring of a finite number of fiber traces using eigenvectors from a fiber affinity matrix. The approach chosen in this paper is in between streamline- and grid-based algorithms (see [2,3,5,6,7] and their references). It extends current streamline algorithms by associating origin coordinates (where the streamline crosses a predefined origin plane) and traveled arclength (with respect to the origin) with every point of a streamline. This information is subsequently interpolated onto the computational grid to generate the desired implicit representation.
2
Level Sets by Construction
Given an initial seed point, a fiber tract is generated at subvoxel resolution following a vector field, v, aligned with the assumed water diffusion direction (e.g., the direction of the principal eigenvector of an interpolated diffusion tensor). Under mild assumptions on v, generated fiber tracts continuously depend on the seed point, i.e., fiber tracts with proximal seed points will locally stay close to each other. A finite number of streamlines cannot fully represent such a continuum. In the context of deformable curves and surfaces, level set representations have been successful as implicit (continuous by interpolation) representations of geometry, removing the dependency on particle based descriptions. Classically, codimension one objects (e.g, a curve in the plane, surfaces in space, etc.) have been represented and evolved, based on well developed theory, within the level set framework. Evolving level set representations of objects of codimensions larger than one is still challenging, however, various way of representing such objects implicitly exist [8]. Sec. 2.1 describes the proposed level set representation. Unless otherwise noted what follows will present the approach for bundle tracing in three spatial dimensions. 2.1
Local Coordinate System and Implicit Representation
In this paper, curves (fibers) will be represented by the intersection of two level set functions, points on the curve by intersecting with a third level set function. In the context of fiber bundles, this representation induces a coordinate system locally adapted to the fiber bundle geometry.
254
M. Niethammer et al.
Given two level set functions Φ(x) : R3 → R and Ψ (x) : R3 → R, with ∇Φ = 0, ∇Ψ = 0, a curve C(φ, ψ) is represented by the intersections of the two level sets C(φ, ψ) = Φ−1 (φ) ∩ Ψ −1 (ψ), where (φ, ψ) denotes the local coordinates of the curve on a pre-specified twodimensional origin manifold, Θ, and (·)−1 the set valued inverse. To complete the coordinate representation by parameterizing along a curve, an additional level set function, Σ (Σ(x) = 0 for x ∈ Θ), encodes arclength, s, (Cs = 1), C(φ, ψ, s) = Φ−1 (φ) ∩ Ψ −1 (ψ) ∩ Σ −1 (s). See Fig. 1 for an illustration. Sec. 3 describes the construction of the level set functions.
Ψ=ψ Σ−1 (0) = Θ Φ−1 (φ) Φ=φ
(a) Original rod.
∂R
s
C(φ, ψ) Ψ−1 (ψ)
(b) Twisted rod. Fig. 1. Level set representation of a fiber. Example fiber point at coordinates (φ, ψ, 0), the origin manifold and its coordinate system.
2.2
Fig. 2. Local effect of twisting
Extracting Measures of Geometry and Diffusion
Define y(x) = (Φ(x), Ψ (x))T and the indicator function 1 if y ∈ R, χR (y(x)) = 0 otherwise, where R is a region of interest within the origin manifold (e.g., an area that encompasses a complete cross section of a fiber bundle, see Fig. 1). The volume and local area of a fiber bundle are then V = χR (y(x)) dΩ, A(s) = χR (y(x))δ(Σ(x) − s) dΩ, Ω
Ω
Fiber Bundle Estimation and Parameterization
255
where Ω is the domain of integration and δ(·) denotes the Dirac delta function. Mean and variance of a quantity q (e.g., fractional anisotropy) may be defined over the volume 1 1 μq = q(x)χR (y(x)) dΩ, σq2 = (q(x) − μq )2 χR (y(x)) dΩ, (1) V Ω V Ω or locally over an arclength isosurface1 1 μq (s) = q(x)χR (y(x))δ(Σ(x) − s) dΩ, A(s) 1 σq2 (s) = (q(x) − μq (s))2 χR (y(x))δ(Σ(x) − s) dΩ. A(s)
(2)
These local measurements are useful to investigate changes along the fiber bundle (in the fiber direction). Looking at geometric properties (e.g., curvature, torsion) of an individual fiber can be deceiving. Consider twisting a circular rod as depicted in Fig. 2. Its core does not change during the twisting motion, whereas an initially straight line on the outer surface of the rod will turn into a circular helix with constant, non-vanishing curvature and torsion. Visualizing the levelsets of Φ and Ψ allows to capture such twisting deformations. A mean fiber, C(s), can be defined as the curve that passes through the centers of gravities of the arclength isocontours2 , i.e., C(s) = xχR (y(x))δ(Σ(x) − s) dΩ. (3) Ω
3
Implementation
The level set functions Φ, Ψ , and Σ are computed on a grid (e.g., the voxel grid prescribed by the scanner resolution). Given the velocity field v induced by the reconstructed diffusion tensor field a fiber tract starting at point (φ, ψ) of the origin manifold traces out the curve given by the solution of the ordinary differential equation Cs (x, s) = v(x). In contrast to conventional tractography algorithms, in this paper, every point on the fiber tract is associated with an arclength, s, from the origin, as well as the point (φ, ψ), it originated from. Expressed as a partial differential equation on the grid, the level set functions need to fulfill v · ∇Σ = 1 v · ∇Φ = 0 v · ∇Ψ = 0 , , . (4) Σ(x) = 0, x ∈ Θ Φ(x) = φ(x), x ∈ Θ Ψ (x) = ψ(x), x ∈ Θ 1
2
Eq. 1 and 2 assume a uniform density distribution of fibers. Their integrands may be locally scaled to incorporate geometric stretch and compression effects. See [9] on how to compute this scaling based on the mean curvature of the arclength levelset Σ, which locally quantifies the tendency of fibers to converge or diverge. A reparameterization is necessary to obtain an arc-length parameterized mean fiber.
256
M. Niethammer et al.
The straightforward (based on standard finite-differences) implementation of the desirable Eulerian solution of Eqs. 4 suffers from undesired boundary effects as shown in Fig. 3(b); boundary data pollutes the constructed level set function. Consequently, the reconstructed fibers (solid lines) do not agree with the groundtruth (crosses) based on the synthetic vector field of Fig. 3(a). Thus, in this paper, Eqs. 4 are solved by explicitly computing fiber tracts and interpolating the associated data at every fiber integration point onto the fixed grid. This hybrid method agrees well with the theoretical fiber tracts of Fig. 3(c) and readily extends existing streamline-based fiber tracing algorithms. A reseeding strategy is employed to cope with diverging fibers. Secs. 3.1 and 3.2 discuss the methods for interpolation, extrapolation, and reseeding.
(a) Synthetic vector field.
(b) Eulerian solution.
(c) Hybrid solution.
Fig. 3. Constructed fiber level set functions
3.1
Interpolation and Extrapolation
Levelset values at grid points need to be interpolated, since streamlines will mostly not pass through them. Streamline integration points are not uniformly distributed in space, yielding a scattered data interpolation problem. Here, natural neighbor interpolation is employed [10]. Given a set of points S, and its associated Voronoi cells3 V(S), the natural neighbors of a point x are the points whose Voronoi cells, Vp , intersect with the Voronoi cell, Vx , of x based on the point set S ∪ x. Interpolation weights, for a measurement quantity q, are computed based on the intersection areas of the Voronoi cells: q(x) =
p∈S
q(p)wp =
p∈S
q(p)
Area(Vp ∩ Vx ) . Area(Vx )
If the interpolation point x does not lie within the convex hull formed by the given data points, values are linearly extrapolated to be able to capture the implicit boundary of the fiber bundle. 3
The Voronoi cell Vx of a point x ∈ Rn , with respect to the point set S ∪ x, is the n-dimensional polytope containing all the points closest to x.
Fiber Bundle Estimation and Parameterization
3.2
257
Density Measurements and Reseeding
Seeding fiber tracts from a region of interest will lead to a nonuniform coverage of the fiber bundle by fiber tracts, especially in regions where fibers diverge. Adaptively reseeding fibers within voxels of low density (lower than a given threshold) allows for a more uniform sampling of space. Particle density at x is computed by convolution with a smoothing kernel [11], S, ρ(x) = δ(x − xi )S(x − x0 ) dΩ = S(xi − x0 ), Ω xi
xi
where xi are the known locations of the fiber tract integration points. New fiber tracts are only initialized at locations within the convex hull of already known interpolation points (by means of natural neighbor interpolation) to avoid capturing fibers not consistent with the initial seeding. Fig. 4(a) shows the estimated point density before reseeding. Reseeding yields a more uniform density (and thus tract) distribution. Figs. 4(b) and 4(c) show the arclength field and its isocontours before and after reseeding. Values in regions that were previously not computable are filled in through reseeding4 .
(a) Original density.
(b) Original arclength.
(c) Reseeded arclength.
Fig. 4. Reseeding effect of the hybrid bundle tracing scheme
4
Results and Modalities
The algorithm described in Secs. 2 and 3 was applied to extract parts of the cingulum bundle and the corona radiata. For the cingulum, six diffusion weighted images and one baseline image were acquired on a 1.5 Tesla GE scanner. Scans were performed coronally, with a resolution of 1.7 x 1.7 x 4.0 (mm) and a one mm interslice spacing, using line scan diffusion imaging. For the corona radiata, twenty-five diffusion weighted images and one baseline image were acquired axially, with a resolution of 1.9 x 1.9 x 2.4 (mm), using Echo Planar Imaging on a 3.0 Tesla GE scanner. All images were subsequently upsampled to isotropic voxels. 4
If the initial seeding is “dense enough”, reseeding may not be necessary.
258
M. Niethammer et al.
Figs. 5(a) and 5(d) show the boundaries of the traced cingulum bundle and the corona radiata (i.e., the envelopes of χR (y(x))) overlayed on a sagittal brain slice respectively. Fig. 5(e) shows the arclength isosurfaces for the corona radiata. Figs. 5(b) and 5(f) show the respective arclength isosurfaces after reseeding for the cingulum bundle and the corona radiata. Reseeding improved the tracing result. Fig. 5(c) shows a zoom on one of the arclength isosurfaces of the cingulum bundle. It highlights the uneven distribution of streamlines passing through. Statistical quantities based on values computed at the locations of the intersection points of the streamlines with the arclength isosurface can now be replaced by integration over the arclength isosurface itself, removing potential bias towards more densely sampled areas.
(a) Cingulum.
(d) Corona radiata.
(b) Arclength levelsets.
(e) Arclength levelsets.
(c) Nonuniformity.
(f) After reseeding.
Fig. 5. Experimental results for the cingulum bundle and the corona radiata. (See http://pnl.bwh.harvard.edu/people/marc/miccai2006/ for color figures.)
5
Discussion and Future Work
Moving away from describing a fiber bundle by a finite number of streamlines, this paper introduced an implicit representation of a fiber bundle. The bundle, in this implicit representation, consists of a continuum of fibers. Fiber bundle properties can now be defined as integrals over cross-sectional areas or the bundle volume, instead of relying on the possible nonuniform sampling of a streamline representation. Further, a complete coordinate system has been introduced, warped to the geometry of the traced fiber bundle. We believe this coordinate system may be useful to establish in-between subject correspondences for population studies. Further research will investigate the proposed methodology in the context of multiple subjects and will look at alternative numerical schemes.
Fiber Bundle Estimation and Parameterization
259
Acknowledgements This work was supported in part by a Department of Veteran Affairs Merit Award (SB,MS,CFW), a Research Enhancement Award Program (MS), and National Institute of Health grants R01 MH50747 (SB,MS), K05 MH070047 (MS), and U54 EB005149 (MN,SB,MS). The authors thank Dr. Gordon Kindlmann for providing the Teem library used in this research (http://teem.sourceforge.net/).
References 1. Kubicki, M., Westin, C.F., Maier, S.E., Mamata, H., Frumin, M., Ernst-Hirshefeld, H., Kikinis, R., Jolesz, F.A., McCarley, R.W., Shenton, M.E.: Diffusion tensor imaging and its application to neuropsychiatric disorders. Harvard Review of Psychiatry 10 (2002) 324–336 2. Ding, Z., Gore, J., Anderson, A.: Classification and quantification of neuronal fiber pathways using diffusion tensor MRI. Magnetic Resonance in Medicine 49 (2003) 716–721 3. Corouge, I., Gouttard, S., Gerig, G.: Towards a shape model of white matter fiber bundles using diffusion tensor MRI. In: Proceedings of the International Symposium on Biomedical Imaging, IEEE (2004) 344–347 4. Brun, A., Knutsson, H., Park, H.J., Shenton, M.E., Westin, C.F.: Clustering fiber tracts using normalized cuts. In: Seventh International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI’04). Lecture Notes in Computer Science, Rennes - Saint Malo, France (2004) 368–375 5. Campbell, J.S.W., Siddiqi, K., Rymar, V.V., Sadikot, A.F., Pikeat, G.B.: Flowbased fiber tracking with diffusion tensor and q-ball data: Validation and comparison to principal diffusion direction techniques. NeuroImage (2005) in press. 6. Pichon, E., Westin, C.F., Tannenbaum, A.: A Hamilton-Jacobi-Bellman approach to high angular resolution diffusion tractography. In: Eighth International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI’05). Lecture Notes in Computer Science 3749, Palm Springs, CA, USA (2005) 180–187 7. Lenglet, C., Deriche, R., Faugeras, O.: Inferring white matter geometry from diffusion tensor MRI: Application to connectivity mapping. In: Proceedings of the 8th European Conference on Computer Vision. (2004) 127–140 8. Osher, S., Fedkiw, R.: Level Set Methods and Dynamic Implicit Surfaces. Springer Verlag (2003) 9. Pichon, E.: Novel Methods for Multidimensional Image Segmentation. PhD thesis, Georgia Institute of Technology (2005) 10. Sibson, R.: A brief description of the natural neighbor interpolant. In Barnett, D.V., ed.: Interpreting Multiariate Data. (1981) 11. Monaghan, J.J.: Smoothed particle hydrodynamics. Reports on Progress in Physics 68 (2005) 1703–1759
Improved Correspondence for DTI Population Studies Via Unbiased Atlas Building Casey Goodlett1 , Brad Davis1,2 , Remi Jean3 , John Gilmore3 , and Guido Gerig1,3 1 2
Department of Computer Science, University of North Carolina Department of Radiation Oncology, University of North Carolina 3 Department of Psychiatry, University of North Carolina
Abstract. We present a method for automatically finding correspondence in Diffusion Tensor Imaging (DTI) from deformable registration to a common atlas. The registration jointly produces an average DTI atlas, which is unbiased with respect to the choice of a template image, along with diffeomorphic correspondence between each image. The registration image match metric uses a feature detector for thin fiber structures of white matter, and interpolation and averaging of diffusion tensors use the Riemannian symmetric space framework. The anatomically significant correspondence provides a basis for comparison of tensor features and fiber tract geometry in clinical studies and for building DTI population atlases.
1
Introduction
Diffusion tensor imaging (DTI) has become increasingly important as a means of investigating the structure and properties of neural white matter. The local diffusion properties of water in the brain can be measured in vivo using diffusion tensor MRI (DT MRI). In brain tissue, water diffuses more easily along myelinated axons which make up the white matter fiber bundles. Acquiring multiple images with different gradient sensitizing directions provides an estimate for the local diffusion tensor at each voxel. The major eigenvector of each tensor corresponds to the direction of the local fiber bundle, and the field of principal eigenvectors can be integrated to produce fiber tracts. Many approaches have been proposed to analyze DTI in clinical studies. For example, derived scalar properties such as fractional anisotropy (FA), relative anisotropy (RA), or mean diffusivity (MD) of the tensors are often compared in regions of interest drawn by experts. Other methods have characterized the geometry of white matter through tractography, as well as quantitative analysis of tractography [1]. However, region of interest approaches require an expert
This work is part of the National Alliance for Medical Image Computing (NA-MIC), funded by the National Institutes of Health through the NIH Roadmap for Medical Research, Grant U54 EB005149. The authors acknowledge support from the NIMH Silvio Conte Center for Neuroscience of Mental Disorders MH064065 as well as grant support from the National Alliance for Autism Research (NAAR) and the BlowitzRidgeway Foundation.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 260–267, 2006. c Springer-Verlag Berlin Heidelberg 2006
Improved Correspondence for DTI Population Studies
" ! ! !
261
"
#$%
&
Fig. 1. Flowchart of atlas building process
to segment structures of interest, and inter-subject comparison of tractography lacks the correspondence between fiber tracts needed to make statistical comparisons. Wakana et al. [2] have built a fiber-tract atlas in the form of voxel maps of prior probabilities for major fiber bundles. We propose to build an atlas for tensor images to provide a basis for statistical analysis of tensors, tensor-derived measures, and fiber bundle geometry. We use the techniques of registration and atlas-building to provide intersubject correspondence for statistical analysis of diffusion data as shown in Figure 1. Our metric for optimizing the registration parameters is based on a structural operator of the tensor volumes. An initial alignment is performed by computing the affine transformation between the structural images and applying the transformation to the tensor volumes. An unbiased, deformable atlas-building procedure is then applied which produces mappings between each subject and a common atlas coordinate system using the method of Joshi et al [3]. We validate our method by showing an improvement over affine registration alone.
2
Registration
Several image metrics have been proposed for inter-subject registration of DTI including metrics based on the baseline images and the full diffusion tensors [4,5]. We propose an intermediate, heuristic solution between using only baseline images and using metrics based directly on the diffusion tensors. Our method is based on a structural operator of the FA image that is more sensitive to major fiber bundles than metrics based only on the baseline images. Given a tensor image I and the corresponding FA image F A, the structural operator C is defined in terms of the maximum eigenvalue of the Hessian, ⎛ ⎞ F Axx F Axy F Axz C = max [eigenvalues(H)], where H ≡ ⎝ F Ayx F Ayy F Ayz ⎠ . (1) F Azx F Azy F Azz Figure 2 shows the FA image of a tensor field and the corresponding structural image C. Let hi (x) be a mapping which gives the corresponding point in the ˆ Given two subject image Ii for all x in the domain Ω of the atlas image I.
262
C. Goodlett et al.
Fig. 2. The top row shows axial, sagittal, and coronal slices of the FA image from a DTI scan of a 1-year old subject. The bottom row shows the result of the structural operator on the FA image taken at σ = 2.0mm. Major fiber bundles such as the corpus callosum, fornix, and corona radiata are highlighted, while the background noise is muted.
images I1 and I2 the image match functional that is optimized in the registration process is 2
[C1 (x) − C2 (h(x))] dx,
M (I1 (x), I2 (h(x))) =
(2)
x∈Ω
the mean squared error between C1 and C2 . We use C over existing methods for two main reasons. First, we observe that C is a good detector of major fiber bundles which occur as tubular or sheet-like structures. Callosal fibers form a thin swept U; the corona radiate is a thin fan; the cingulum is a tubular bundle, and C serves as a strong feature detector for all types of these thin structures. Consequently, C optimizes correspondence of fiber tracts better than the baseline image, because C has the strongest response at the center of major fiber bundles, while the baseline image has the strongest signal in the cortico-spinal fluid (CSF). Secondly, we use C instead of a full tensor metric or the FA itself in order to minimize overfitting the diffeomorphic registration by using the same feature for registration that will be used for statistical comparison. The Hessian at a fixed scale is a first step towards basing the registration on a geometric model of the white matter, and future work will investigate a multi-scale approach to computing C to make the measurement dependent only on the local width of the structure. Using our definition for an image match functional, registration of the images proceeds in two stages. First, a template tensor image is aligned into a standardized coordinate system by affine alignment of the baseline image with a T2 atlas using normalized mutual information. The remaining tensor images are then aligned with the template using an affine transformation and equation 2. In this coordinate system, we can average the tensor volumes to produce an affine atlas.
Improved Correspondence for DTI Population Studies
263
Fig. 3. The top six images show the correspondance of the affine atlas and the five subject images at a point in the splenum of the corpus callosum. Notice the corresponding index in the subject images does not necessarily correspond to the same anatomical location. The bottom six images show the deformable atlas and subject images with the same point selected; Here the atlas provides better anatomical correspondence, and the deformable atlas has sharper structures.
However, the affine registration does not account for the non-linear variability of the white matter geometry, and many of the white matter structures are blurred in the atlas volume. For this reason, we subsequently apply a deformable registration procedure to obtain anatomical correspondence between the population of images Ii and a common atlas space [3]. This procedure jointly estimates an atlas image Iˆ and a set of diffeomorphic mappings hi that define the spatial correspondence between Iˆ and each Ii . Figure 3 shows the improved correspondence attained from the deformable registration. The computed transformations are applied to each tensor volume as described in the next section.
3
Tensor Processing
The application of high-dimensional transforms to the DTI volumes must account for the space of valid tensors. The orientation of a diffusion tensor provides a measurement of fiber orientation relative to the anatomical location, and spatial transformations of the tensor fields must account for the reorientation of the tensor. Furthermore, since diffusion tensors are symmetric positive-definite matrices, operations on the images must preserve this constraint.
264
3.1
C. Goodlett et al.
Spatial Transformations of Tensor Images
When spatial transformations of diffusion images are performed to align the anatomy of different scans, the tensors must also be transformed to maintain the relationship between anatomy and anisotropy orientation. We use the finite strain approach of Alexander et al. to reorient tensors in a deformation field by decomposing the local linear approximation of the transformation into a rotation and deformation component [6]. The rotation of each tensor is computed by performing singular value decomposition (SVD) on the local linear approximation of the transformation F , where F is given by the Jacobian matrix of the deformation field computed by finite differencing. A local linear deformation F is decomposed into a rotation matrix R and a deformation matrix U , where F = U R.
(3)
The local transformation of a tensor D is given as D = RDRT . 3.2
(4)
Interpolation and Averaging of Tensor Images
The space of valid diffusion tensors does not form a vector space. Euclidean operations on diffusion tensors such as averaging can produce averages with a larger determinant than the interpolating values, which is not physically sensible. Furthermore, operations on diffusion tensors are not guaranteed to preserve the positive-definite nature of diffusion. The Riemannian framework has been shown as a natural method for operating on diffusion tensors, which preserves the physical interpretation of the data, and constrains operations to remain in the valid space of symmetric positive-definite matrices [7,8]. Further simplifications have shown an efficient method for computation using the Log-Euclidean metric [9]. Interpolation and averaging are treated as weighted sums in the Log-Euclidean framework defined as N ˆ D = exp wi log(Di ) , (5) i=1
where log and exp are the matrix logarithm and exponential functions. To produce the atlas tensor volume the deformed tensor volumes with locally rotated tensors are averaged per-voxel using the Log-Euclidean scheme, I rot (x) = R(x)I(x)R(x)T , N 1 ˆ I(x) = exp log(Iirot (hi (x))) . N i=1
4
(6) (7)
Experiments and Results
Our methodology was tested on five datasets of healthy 1-year olds from a clinical study of neurodevelopment. The images were acquired on a Siemens head-only
Improved Correspondence for DTI Population Studies
(a) Affine
(b) Affine
(c) Deformable
265
(d) Deformable
Fig. 4. 4(a) and 4(c) show a slice of the FA for the affine and deformable atlases. 4(b) and 4(d) illustrate a subregion of tensors in the splenum of the corpus callosum. Notice that the FA image of the affine atlas is more blurry, and that the tensors in the splenum are more swollen in the affine atlas.
3T scanner. Multiple sets of diffusion weighed images were taken for each subject and averaged to improve signal-to-noise ratio. Each dataset consisted of one baseline image and six gradient direction images with b=1000s/mm2 using the standard orientation scheme. An image volume of 128x128x65 voxels was acquired with 2x2x2 mm resolution. Imaging parameters were TR/TE = 5200ms/73ms. For each set of diffusion weighted images the diffusion tensors were estimated using the method of Westin et al [10]. The structural image C was computed from the FA volume with σ = 2.0mm. Affine and deformable alignment were computed using the methods described in section 2. The warped DTI volumes were averaged to produce an affine atlas and a deformable atlas. Figure 4 shows a comparison between the tensor volumes of the two atlases. Tractography was performed in the atlas space, and the tract bundles were warped to each subject image using hi . Figure 5 shows the results of tractography in the atlas and the corresponding warped fibers in two subject images.
Fig. 5. Fibers traced in the corpus callosum of the atlas (a) are mapped to corresponding locations in the subject images (b) (c) despite pose and shape changes.
5
Validation
Visual inspection of tractography in the atlas volume shows an initial qualitative validation that the registration and averaging methods provide anatomically sensible results. Histogram comparisons of derived tensor measures in the affine and
266
C. Goodlett et al.
(a) Affine
(c) Gradient Magnitude Histogram
(b) Deformable
(d) Cumulative Histogram
Fig. 6. (a) and (b) show an axial slice of the gradient magnitude image. Notice the structures in the marked regions which are visible in the deformable atlas, but not the affine atlas, and the increased sharpness of the splenum of the corpus callosum. Figures (c) and (d) show the histogram and cumulative histogram of gradient magnitude intensities in the whole brain.
diffeomorphic atlas show an initial quantitative validation of the improvement of the deformable registration over affine registration alone. The gradient magnitude of the FA was measured in the whole brain of the affine and deformable atlases. Figure 6 shows the gradient magnitude images and a histogram comparison. At the 90th percentile of the cumulative histogram, the deformable atlas has a gradient magnitude of 684 while the affine atlas is 573, an increase of 20%. This shows that the deformable atlas better preserves thin structures via improved alignment.
6
Discussion and Future Work
We have developed an automatic method for producing correspondence in diffusion tensor images through deformable registration, and a novel image match which alignes structural features. We apply the transformation using the finite strain model for tensor rotation and a Riemannian framework for averaging and interpolation. Initial validation of deformable registration is performed by showing improvement in thin structure preservation over affine alignment.
Improved Correspondence for DTI Population Studies
267
Future validation work will try to quantify the quality of correspondence of fiber tracts. In future work, we intend to build DTI atlases of different populations to compare tract geometry and tensor statistics along tracts.
References 1. Corouge, I., Fletcher, P.T., Gilmore, J.H., Gerig, G.: Fiber tract-oriented statistics for quantitative diffusion tensor MRI analysis. In: MICCAI. Volume 3749 of LNCS. (2005) 131–139 2. Wakana, S., Jiang, H., Nagae-Poetscher, L.M., van Zijl, P.C.M., Mori, S.: Fiber Tract-based Atlas of Human White Matter Anatomy. Radiology 230(1) (2004) 77–87 3. Joshi, S., Davis, B., Jomier, M., Gerig, G.: Unbiased diffeomorphic atlas construction for computational anatomy. NeuroImage (Supplemental issue on Mathematics in Brain Imaging) 23 (2004) S151–S160 4. Zhang, H., Yushkevich, P., Gee, J.: Registration of diffusion tensor images. In: CVPR. (2004) 5. Guimond, A., Guttmann, C.R.G., Warfield, S.K., Westin, C.F.: Deformable registration of dt-mri data based on transformation invariant tensor characteristics. In: ISBI. (2002) 6. Alexander, D., Pierpaoli, C., Basser, P., Gee, J.: Spatial transformations of diffusion tensor magnetic resonance images. IEEE Transactions on Medical Imaging 20(11) (2001) 7. Fletcher, P., Joshi, S.: Principal geodesic analysis on symmetric spaces: Statistics of diffusion tensors. In: ECCV 2004, Workshop CVAMIA. (2004) 87–98 8. Pennec, X., Fillard, P., Ayache, N.: A Riemannian framework for tensor computing. International Journal of Computer Vision 66(1) (2006) 41–66 9. Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Fast and simple calculus on tensors in the Log-Euclidean framework. In: MICCAI. Volume 3749 of LNCS. (2005) 115– 122 10. Westin, C.F., Maier, S.E., Mamata, H., Nabavi, A., Jolesz, F.A., Kikinis, R.: Processing and visualization of diffusion tensor MRI. Medical Image Analysis 6(2) (2002) 93–108
Diffusion k-tensor Estimation from Q-ball Imaging Using Discretized Principal Axes Ørjan Bergmann1,2, Gordon Kindlmann1 , Arvid Lundervold2 , and Carl-Fredrik Westin1 1
Laboratory of Mathematics in Imaging, Harvard Medical School, Boston MA, USA 2 University of Bergen, Postbox 7800, N-5020 Bergen, Norway
Abstract. A reoccurring theme in the diffusion tensor imaging literature is the per-voxel estimation of a symmetric 3×3 tensor describing the measured diffusion. In this work we attempt to generalize this approach by calculating 2 or 3 or up to k diffusion tensors for each voxel. We show that our procedure can more accurately describe the diffusion particularly when crossing fibers or fiber-bundles are present in the datasets.
1
Introduction
Diffusion tensor Magnetic Resonance Imaging (DTI) has recently become an important tool in the analysis of the central nervous system. For quantitative analysis of possible white matter anomalies, parameters derived from the diffusion tensor can be associated with tissue microstructure and compared between health and disease. Diseases about which white matter abnormality is known or hypothesized (e.g. multiple sclerosis, Alzheimer’s disease and schizophrenia) have successfully been investigated using a single diffusion tensor model [1,2]. Upon inspection of histology, however, much of the white matter in primates has been shown to be composed of multiple interdigitating fibers. On the spatial scale of MRI, many voxels contain more than one fiber direction. The estimated diffusion tensor will then describe an average of the actual diffusion. Depending on the degree of intra-voxel heterogeneity, the single tensor fit may lead to inaccuracies in the diffusion tensor and derived quantities, such as fractional anisotropy (FA), linear, planar and spherical measures [3] and fiber trajectories. In this work we have investigated multi-tensor estimation, in which we approximate the diffusion per voxel with more than one diffusion tensor. In particular, our approach allows the estimation of a user specified number k tensors per voxel. We will show that our method for k-tensor estimation can better describe the actual diffusion in both real and synthetic datasets. 1.1
Related Work
In a related work Tuch [4] is able to make 2-tensor estimates by solving the multitensor model presented in section 2.1. In order to solve this set of equations, R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 268–275, 2006. c Springer-Verlag Berlin Heidelberg 2006
Diffusion k-tensor Estimation from Q-ball Imaging
269
however, he assumes prior knowledge of the eigenvalues of the diffusion tensors about to be estimated. Jansons et al. [5] solves the same problem in the Fourier domain using a maximum-entropy parameterization, although their numerical approximations are sensitive to local minima, and are in general not guaranteed to converge to a global optimum. Peled et al. [6] use the coordinate frame of a single tensor fit to estimate two tensors residing in the resulting plane. It is not reported, however, if this approach can be generalized to k larger than 2.
2
Background
In diffusion-weighted imaging, the image contrast is related to the local mobility of water molecules in living tissue, where the measurements can be made sensitive to water self-diffusion along n distinct spatial directions g1 , · · · , gn , specified by the image acquisition protocol, such that for each voxel we obtain a collection of n measurements S1 , · · · , Sn . These measured values Si are related to the 3 × 3 diffusion tensor X as described by the Stejskal-Tanner equations [3] Si = S0 exp(−bgiT Xgi )
(1)
where b is an acquisition-specific constant, S0 is signal intensity without diffusion sensitization, and unit-length gradient directions g1 , · · · , gn . The number of equations n is usually larger than the minimum 6 required for a single tensor estimate, so least-squares minimization is used. By taking the logarithm and rearranging, the estimation of the symmetric 3 × 3 diffusion tensor X can be found by solving the linear least-squares problem min (giT Xgi − di )2 (2) X
i
where
log(S0 ) − log(Si ) b These di values are in the literature commonly referred to as the apparent diffusion coefficients (ADC), and we will let the term ADC profile denote the surface spanned by the di gi . One common way to visualize a diffusion tensor is by the letting the eigenvectors define the orientation of an ellipsoid, with per-axis scaling determined by the corresponding eigenvalues. Figure 1 shows an estimated diffusion tensor and the corresponding ADC profile for a synthetic example. We observe that the orientation of the ADC profile seems to correspond well with the orientation of the diffusion tensor in that they both have their longest extent along the x-axis. di =
2.1
Multi-Tensor Model
Equation (1) describes the relationship between one diffusion tensor X and the measured MRI signals Si . However, sometimes the measured S-values are not
270
Ø. Bergmann et al.
z
0.4 0.2 z
0.2 0.1 0 −0.1 −0.2
0 −0.2
0.5 −0.4 0.5
0 0.4
0.2 −0.2
0
0.2
−0.5
0
x
y y
(a) The diffusion tensor
0 −0.2 −0.4
−0.5 x
(b) The corresponding ADC profile
Fig. 1. A synthetic diffusion tensor with principal diffusion along the x-axis and with eigenvalues λ = {1, 13 , 13 }, with b = 700, S0 = 1, using n = 642 uniformly distributed gradient directions
well modeled by a single diffusion tensor, due to a more heterogeneous distribution of fiber orientations, as arises with crossing fibers. To cope with this situation, Tuch [4] has modeled the relationship between the k tensors Xj and the measured Si values as Si = S0
k
fj exp(−bgiT Xj gi )
(3)
j=1
where fj are non-negative weights which sum to 1 for j = 1, · · · , k and i = 1, · · · , n. Note that this model reduces to that of (1) when k = 1. Fig. 2 shows a synthetic example with k = 2. S-values calculated according to (3) produce the ADC profile in Fig. 2(b). Note the apparent 45◦ shift of the main axes of the ADC profile relative to the orientation of the original tensors. As the angle between the two synthetic tensors varies, the relative size and angle between the two main extent directions of the ADC profile changes. This nonintuitive behavior motivates a different diffusion description, the Q-ball. 2.2
Approximating the Q-ball
The basic idea of Q-ball imaging [4,7] is to transform the measured diffusion signal to a orientation distribution function directly on the sphere. Considering each point on the sphere as a pole, the Q-ball transform assigns the value at the pole to be the integral over the associated equator. In our implementation we approximate the Q-ball by taking a given gi as a plane normal, and let the associated Q-ball value qi be a weighted sum of Sj values where the weights associated with Sj are inversely proportional to the distance from gj to the plane. Specifically, qi =
n 1 π Sj cos( giT gj )p S0 j=1 2
(4)
0.2 0 −0.2
271
5 z
0.2 z
z
Diffusion k-tensor Estimation from Q-ball Imaging
0
0 −5 20
−0.2
15 0.5
0.4
10 5
0.2
0
0 0 −0.5 y
0 −0.5 x
(a) Two diffusion tensors
−5 −10
−0.2
0.5 y
−0.4 −0.4
0
−0.2
0.2
0.4
y
(b) ADC profile
10
−15 −20
x
0 −10 x
(c) Q-ball profile
Fig. 2. k = 2 synthetic diffusion tensors with principal diffusion along the x-axis and y-axis, respectively, and both with eigenvalues λ = {1, 13 , 13 }. We let f = { 12 , 12 } and selected the other parameters as in Fig. 1
where p is a constant, in the current implementation p = 5. We define the Q-ball profile as the surface spanned by qi gi , shown in Fig. 2(c). Observe that the Qball profile extrema now correspond better with the extremes of the two original tensors.
3
Methods
We estimate k diffusion tensors by segmenting the S-values. To simplify our notation, we identify each of the values Si simply by its index i. We segment these n values into k disjoint sets Zl for l ∈ 1, 2, · · · , k. A diffusion tensor Xl is then estimated from each set Zl , using (2) with i ∈ Zl . Note that when k = 1, all samples are segmented into the same set Z1 = {1, 2, · · · , n}, and the approach reduces to the standard single tensor estimation (Section 2). 3.1
Segmentation
The Q-ball profiles is used to segment the samples. Each triangle vertex in Fig. 2(c) corresponds to one qi gi , calculated from the sample i. Thus segmentation of the Q-ball vertices corresponds to a segmentation of the original samples. The vertices of the Q-ball can naturally be segmented into k = 2 sets in Fig. 2(c); one belonging to each of the two principal (directions, extents, or simply) axes of the Q-ball. In the following we define an axis j to be a line going through the origin and a unique point pj taken from a uniformly distributed set of points on the unit sphere. However, since we do not know in advance what the principal axes are, we make the following simplification: we will only consider m discretely sampled axes in our approach, and attempt to choose the k which matches the vertices best. We will then attempt to pair each of the vertex points (identified by the index i) with exactly one of the k axes. Assigning a vertex i to axis j will contribute an error (or cost) cij ≥ 0, and we define this error to be the shortest distance from vertex i to axis j. Orthogonality conditions give cij = pj × (−qi gi )2
272
Ø. Bergmann et al.
where × denotes vector cross-product. The assignment of vertices to axes should minimize the total cost of the assignments. We can now formulate the problem of assigning vertices to axes as a binary integer program (BIP). In order to do so we first define our variables. We let 1, iff vertex i is assigned to axis j xij = 0, otherwise and also define yj =
1, iff any vertex has been assigned to axis j 0, otherwise
Having done so we write the following minimization problem min cij xij xij
s.t.
i
j
xij = 1
(5)
j
xij ≤ yj yj = k
(6) (7)
j
xij ∈ {0, 1}, yj ∈ {0, 1} where i = 1, 2, · · · , n and j = 1, 2, · · · , m. Here (5) ensures that each vertex is assigned to exactly one axis. Equation (6) ensures that if any vertex is assigned to axis j, then axis j is labeled as “in use”. Finally, (7) ensures that exactly k out of the m axes are in use. One important observation learned from our BIP formulation is that once the choice of which axes to use has been made (that is, the variables yj has been chosen) the remaining variables can be selected in a straightforward manner: Let J denote the set {j : yj = 1}. Then it is optimal to let 1, iff cij is the smallest value in the set {cip : p ∈ J} xij = 0, otherwise for all i when j ∈ J. For all other i and j, xij = 0. When k is small, this observation suggests that we can solve the BIP by exhaustion; enumerate all the possible choices of y, select x as suggested above and save the solution which has the minimum objective-function value.
4
Results
Figure 3 shows results a synthetic test case which varies the angle between the originating principal diffusion directions. The estimated diffusion tensors
Diffusion k-tensor Estimation from Q-ball Imaging
0.6
15
0.4
273
0.6
0.4
10
0.2
0.2 5
0
y
y
0
y
0
−0.2
−0.2 −5
−0.4
−0.4 −10
−0.6
−0.5
0 x
0.5
−15 −25
1
−0.6
−20
−15
−10
−5
0 x
5
10
15
20
−0.8 −1
25
0.8
20
0.8
0.6
15
0.6
0.4
10
0.4
0.2
5
−5
−0.2
−0.4
−10
−0.4
−0.8 −1
−15
−0.5
0 x
0.5
1
−20 −20
0 x
0.5
1
−0.5
0 x
0.5
1
0
−0.2
−0.6
−0.5
0.2
0
y
0
y
y
−0.8 −1
−0.6
−15
−10
−5
0 x
5
10
15
20
−0.8 −1
Fig. 3. The leftmost column shows k = 2 synthetic diffusion tensors with principal diffusion directions 30◦ and 45◦ apart. The middle column shows the Q-ball profile and the estimated principal axes. The last column shows the estimated diffusion tensors.
Fig. 4. Two tensor approximation in each voxel of a real DTI dataset. The upper right hand panel shows the position of the regions (a)-(c) visualized in the upper left, lower left and lower right panels, respectively.
274
Ø. Bergmann et al.
correspond well with the original ones. Specifically, the sum of error norms k original − Xestimated F for the 30◦ and 45◦ cases are 8.24 × 10−2 and l l=1 Xl −2 5.45 × 10 , respectively, letting other parameters be as in Fig. 2. Using the same setup, our algorithm was unsuccessful in computing two tensors when the angle between principal diffusion directions became smaller than 30◦ . One of the estimated principal axes became orthogonal to the xy-plane, resulting in a bad segmentation. However, as the angle between the two principle diffusion directions decreases, the single tensor fit in fact becomes more appropriate, so we currently have not made attempts to alleviate this problem. Figure 4 shows an axial slice from a healthy human volunteer DTI scan1 . Two tensors were approximated for each voxel and their estimates were visualized as described in [8]. The background images are direction-encoded FA maps where red indicates right-left, green indicates anterior-posterior and blue indicates a superior-inferior primary diffusion direction of the single tensor fit. We observe spatially consistent orientations of two diffusion directions that are plausible given the known predominant fiber directions in these regions.
5
Discussion
In this work we have presented a novel way to estimate several diffusion tensors per voxel. To implement this approach no extra data needs to be acquired other than what is routinely available from clinical scans. This fact enables larger clinical studies to be performed with already existing DTI datasets, although we have not attempted to do so thus far. We also point out that since we rely on Q-ball imaging, we expect the procedure to give better results when many gradient diffusion directions are used in the acquisitions. Our approach differs from those outlined in [4,6] in that we do not explicitly try to solve (3). Therefore we do not get the weighting parameters fj of that model either, however by substituting our estimated tensors into (3), these values can be estimated by solving the resulting constrained linear problem. Deciding how many tensors one should try to fit to the data is in itself a difficult problem. As [4,6] points out, there is some evidence to suggest that 2 tensors might be appropriate when an estimated single tensor is planar in form. One measure of this oblateness is when the difference between the second and the third largest eigenvalues of the tensor, i.e. λ2 − λ3 , is large. Ultimately, the choice of the parameter k depends on what’s being studied and we have not considered how to choose it as a part of this work. Other segmentation strategies based on the Q-ball profile are possible, such as k-means clustering. This is potentially faster, but k-means clustering does not in general guarantee optimality of the final segmentation, given its dependence on an initial (often random) configuration. Since our current focus has been to ensure a good segmentation rather than than a fast one, we have not considered k-means clustering any further, but hope to do so in the future. 1
DTI data was estimated from 30 DWIs at b = 700s/mm2 and 5 non-DWI T2s, from a 1.5 T Philips scanner, with resolution 0.94 × 0.94 × 2.5mm.
Diffusion k-tensor Estimation from Q-ball Imaging
275
Acknowledgments This work supported by NIH NIBIB T32-EB002177 and NIH NCRR P41-RR13218 (NAC). We also acknowledge support from the Norwegian Research Council (ISBILAT). DWI data courtesy of Dr. Susumu Mori, Johns Hopkins University, supported by NIH R01-AG20012-01 and P41-RR15241-01A1.
References 1. Ramnani, N., Behrens, T., Penny, W., Matthews, P.: New approaches for exploring anatomical and functional connectivity in human brain. Biological Psychiatry 56(9) (2004) 613–619 2. Sundgren, P., Dong, Q., Gomez-Hassan, D., Mukherji, S., Maly, P., Welsh, R.: Diffusion tensor imaging of the brain: Review of clinical applications. Neuroradiology 46(5) (2004) 339–350 3. Westin, C.F., Maier, S., Mamata, H., Nabavi, A., Jolesz, F.A., Kikinis, R.: Processing and visualization for diffusion tensor MRI. Medical Image Analysis 6 (2002) 93–108 4. Tuch, D.S.: Diffusion MRI of Complex Tissue Structure. PhD thesis, Massachusetts Institute of Technology (2002) 5. Jansons, K., Alexander, D.: Persistent angular structure: New insights from diffusion magnetic resonance imaging data. Inverse Problems 19 (2003) 1031–1046 6. Peled, S., Westin, C.F.: Geometric extraction of two crossing tracts in DWI. In: Proceedings of 13th ISMRM. (2005) 7. Tuch, D.S.: Q-ball imaging. Magnetic Resonance in Medicine 52 (2004) 1358–1371 8. Kindlmann, G.: Superquadratic tensor glyphs, School of Computing, University of Utah, USA, Joint Eurographics - IEEE TCVG Symposium on Visualization, The Eurographics Association 2004 (2004)
Improved Map-Slice-to-Volume Motion Correction with B0 Inhomogeneity Correction: Validation of Activation Detection Algorithms Using ROC Curve Analyses Desmond T.B. Yeo1,2, Roshni R. Bhagalia1,2, and Boklye Kim1 1
Department of Radiology, University of Michigan Medical School, MI 48109, USA {tbyeo, rbhagali, boklyek}@umich.edu 2 Department of Electrical Engineering and Computer Science, University of Michigan, MI 48109, USA
Abstract. Head motion is a significant source of error in fMRI activation detection and a common approach is to apply 3D volumetric rigid body motion correction techniques. However, in 2D multislice fMRI, each slice may have a distinct set of motion parameters due to inter-slice motion. Here, we apply an automated mutual information based slice-to-volume rigid body registration technique on time series data synthesized from a T2 MRI brain dataset with simulated motion, functional activation, noise and geometric distortion. The map-slice-to-volume (MSV) technique was previously applied to patient data without ground truths for motion and activation regions. In this study, the activation images and area under the receiver operating characteristic curves for various time series datasets indicate that the MSV registration improves the activation detection capability when compared to results obtained from Statistical Parametric Mapping (SPM). The effect of temporal median filtering of motion parameters on activation detection performance was also investigated.
1 Introduction Head motion is a significant source of error in fMRI activation detection, particularly in regions with strong intensity edges. In fMRI, activation detection techniques typically rely on test statistics that quantify voxel intensity differences between activated and rest states. Since the intensity change due to the BOLD effect is only about 1% to 4%, any motion induced intensity change can severely degrade the activation detection performance. This limitation may yield misleading functional mapping and preclude patients with severe motor control pathologies from participating in fMRI experiments. Several retrospective motion correction techniques have been applied to multislice fMRI data [1,2] and most of them involve 2D to 2D, 3D volumetric or slice-stack registration where 3D inter-slice motion is ignored. In the map-slice-tovolume (MSV) approach [3], each single-shot EPI slice in the time series is spatially repositioned into an anatomical reference space. In previous work [3], it was shown that the accuracy of registration parameters attained by MSV on noiseless simulated T2 International Consortium of Brain Mapping (ICBM) data without geometric distortion is in the range of -0.10±0.07º and -0.07±0.03mm (mean±standard error of mean). In addition, test results of human subjects performing the unilateral motor task R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 276 – 283, 2006. © Springer-Verlag Berlin Heidelberg 2006
Improved MSV Motion Correction with B0 Inhomogeneity Correction
277
suggest improved localization of functional activation compared to the slice-stack registration approach. However, the activation detection performance could not be quantified since there was no known ground truth in the clinical data used. In this work, we seek to evaluate the accuracy of MSV in activation detection of noiseless and noisy time series images with simulated motion and activation regions. The results are compared with Statistical Parametric Mapping (SPM) activation results from the same datasets. We also investigate the effects of temporal filtering of motion parameter estimates on activation detection performance in the presence and absence of field inhomogeneity induced geometric distortion. For comprehensive motion correction and activation detection, several synthetic T2 time series datasets with simulated motion, functional activation regions, noise and geometric distortion were simulated with known ground truths for the activated regions and the six degree-of-freedom (DOF) rigid body motion parameters for each slice. Statistical activation maps of the datasets with and without MSV motion correction and temporal filtering are computed and used to generate receiver operating characteristic (ROC) curves. The area under each ROC curve quantifies the activation detection accuracy.
2 Methods 2.1 Map Slice-to-Volume Motion Correction The MSV motion correction technique models the 3D motion of multislice EPI data by allowing each slice to have its own six DOF rigid body motion [3]. Each slice f l is registered with a 3D anatomical reference volume gref using the rigid body transform Tα l where αl denotes a vector of six MSV rigid body motion parameters tx, ty, tz, θx, θy, θz for slice l. To estimate αl, the negated mutual information (MI) cost function
Ψ1 (α l ) = −MI( g ref , f l (Tαl ( r ))) ,
(1)
which measures the dissimilarity between f l and gref, is minimized using the NelderMead downhill simplex optimization algorithm. Registration using an MI-based cost function is especially appropriate for multi-modality datasets, i.e. T2*-weighted EPI slices registered with a T1 anatomical reference volume. After registration, each set of optimized motion parameters is used to transform and interpolate (trilinear) its respective slice f l into a volume with the same spatial coordinates as the reference volume. The motion of each slice is thus computed independent of other slices. Intra-slice motion is ignored since it is typically negligible for single shot acquisitions. Given that head motion is typically correlated in time and that MSV may generate outlier estimates due to local minima in the cost function, it may be helpful to perform temporal filtering on the raw MSV motion parameter estimates prior to reconstructing the slices into the reference volume. As an example, Fig. 1 shows raw and median filtered MSV estimates of a rotation parameter θx from a T2 time series dataset using a filter window of nine samples. The ground truth is 0º for all slices.
278
D.T.B. Yeo, R.R. Bhagalia, and B. Kim
raw MSV median filtered MSV
70 60 50 40
șx (º)
30 20 10 0 1500
1550
1600
1650
1700
1750
1800
1850
slice number
Fig. 1. Recovered rotation parameter θx from raw MSV estimates (RMSE: 4.00º) and median filtered MSV estimates (RMSE: 0.04º) from a dataset with no motion in θx
2.2 Activation Detection with Random Permutation Test (RPT) After reconstructing all the slices into volumes, MSV yields time series volumes that may have empty voxels. This results in variable temporal sample sizes for different voxels for statistical analysis. The non-parametric statistical method of voxel-wise random permutation, using the averaged difference between activation and rest images as the test statistic, was used for significance testing of differences in voxel intensities in the simulated datasets [3]. This statistical technique is simple, robust and independent of sample size variability [6]. Random draws of 2000 permutations of activated and rest periods were used to form a permutation distribution for each voxel from which activated regions are identified by testing the null hypothesis of no activation at a fixed threshold of P = 0.001. To obtain ROC curves, we vary the threshold P values from 10-4 to 1.0 to obtain a set of activation maps and, together with the ground truth activation map, compute the true positive and false alarm rates. The area under each ROC curve measures how accurately the activation regions have been detected. 2.3 Iterative Field-Corrected Image Reconstruction To perform EPI geometric distortion correction, we use an iterative field-corrected reconstruction method that, unlike many non-iterative image reconstruction methods, does not assume a smooth field-map [4]. The continuous object f l and field-map Δω are parameterized into a sum of weighted rect functions φ(r-rn) where r is the vector of spatial coordinates. Ignoring spin relaxation and assuming uniform receiver coil sensitivity, the matrix-vector form of the parameterized MR signal equation for slice l in an EPI time series can be written as follows:
u l = Al f l + ε l
(2)
with elements of the system-object matrix Al expressed as
aml , n = Φ (k (tm ))e − jΔω n t m e − j 2π ( k (t m )⋅ rn ) l
(3)
Improved MSV Motion Correction with B0 Inhomogeneity Correction
279
where εl denotes white Gaussian noise, Φ(k(tm)) denotes the Fourier transform of φ(r-rn), ∆ωnl is field-inhomogeneity value at rn. To estimate the unknown slice f l from the k-space data, the iterative conjugate gradient algorithm is used to minimize
Ψ2 ( f l ) = u l − A l f l
2
+ β Cf l
2
(4)
where C is the first-order difference matrix and β is a parameter that controls the tradeoff between obtaining a data-consistent estimate and a smoothed estimate. In order to achieve accurate field-corrected reconstruction of the time series images with head motion, every slice of observed data ul should be paired with a dynamic field map slice ∆ωl that tracks the motion-induced field-inhomogeneity changes during the fMRI experiment. However, since only a static field-map, ∆ωstatic, is typically available, we use re-sampled slices of ∆ωstatic in place of ∆ωl to perform the geometric distortion correction. This does not take into account field map changes in the presence of head motion during the fMRI experiment. 2.4 Motion, Activation and Geometric Distortion Data Simulation To simulate the datasets, two perfectly registered T1- and T2-weighted image datasets (matrix size: 256×256×124, voxel size: 0.8mm×0.8mm×1.5mm) derived from ICBM data were used. The T1 volume is used as the anatomical reference for MSV registration and the T2 volume forms the “baseline” volume from which the time series datasets are simulated. To simulate functional activation, an “activated” T2 volume was created by increasing the T2 ICBM dataset intensity by 5% in selected ground truth ellipsoidal regions as shown in Fig. 4(c). In addition, a simulated brain static field map was created by adding three 3D Gaussian blobs located at the inferior frontal and inferior temporal lobes to a 3D third order polynomial. The field inhomogeneity map was scaled to have a maximum value of 5ppm at 1.5T. Six baseline-activation cycles, each formed by concatenating ten baseline volumes and ten activated T2 volumes, were assembled to form a 120-volume time series. Temporally continuous motion in three rotation parameters, θx, θy and θz, was applied to the T2 time series as well as the simulated field map volumes and sequential 5.6mm thick slices were then re-sampled to form 14-slice volumes (matrix size: 256×256×14). Each re-sampled slice has its own set of motion parameters. The simulated motion has maximum values of 5.0°, 8.6° and 8.1° for θx, θy and θz respectively. Four such time series datasets with motion were simulated with slice acquisition interleaving. Different combinations of simulated noise, field-inhomogeneity induced geometric distortion and distortion correction were then applied to these datasets. To forward distort the T2 images, simulated Cartesian blipped EPI k-space data of the distorted images was generated with the field map time series with motion using Eq. 2. The distorted images were then reconstructed from this k-space data using a system-object matrix with a field map set to zero [5]. The simulated readout time was 43.8ms and the pixel bandwidth in the phase encode direction was 22.8Hz. The final four simulated time series datasets are denoted by D1 for T2 without noise, D2 for T2 with additive 5% Rayleigh noise (σ=2.83), D3 for T2 without noise but with geometric distortion with a simulated field map (5ppm) with motion, and D4 for dataset D3 after iterative field-corrected image reconstruction with a static field map. Fig. 2
280
D.T.B. Yeo, R.R. Bhagalia, and B. Kim
summarizes how the datasets were generated. It is noted that dataset D3 was simulated with the assumption that the field map moves with a rigid body transformation function with the head and does not model local field map changes with out-of-plane motion.
Add Rayleigh noise (σ=2.83) T2 ICBM (256×256×14)
Form six 20-volume ON/OFF cycles to obtain 120-volume time series (D1)
Simulate geometric distortion (D3)
Add slice-wise simulated motion T2 with simulated activation (256×256×14) T2 with motion Field map with motion
Form six 20-volume ON/OFF cycles to obtain 120-volume time series (D2) Simulated static field map (256×256×14)
Add slice-wise simulated motion
Iterative field-corrected image reconstruction with static field map (D4)
Fig. 2. Flowchart of simulated time series datasets D1, D2, D3 and D4
3 Experiments and Results In the first set of experiments, the objective is to quantify the improvement in activation detection with MSV corrected data compared to SPM generated results. Thus, MSV motion correction was applied to datasets D1 and D2 and a random permutation test with 2000 random permutations of activated and rest states was used to generate activation maps for various P values from which ROC curves were computed. ROC curves were also computed from t-statistic maps generated by SPM on the same datasets. In the second set of experiments, the objective is to investigate if temporal median filtering improves activation detection for noiseless datasets D1, D3 and D4. Thus, each dataset is corrected with raw MSV and median filtered MSV motion parameters to yield two separate time series datasets. Activation detection is then performed on these two datasets using the random permutation test. 3.1 Activation Detection Performance with MSV-RPT and SPM Table 1 summarizes the area under the ROC curve (AUC) values for both sets of experiments. For datasets D1 and D2, it is observed that the AUC values of raw MSV corrected data (first column) are higher than those from SPM corrected data (third
Improved MSV Motion Correction with B0 Inhomogeneity Correction
281
column). The increase is 0.041 and 0.070 for noiseless and noisy T2 data respectively. This improvement is reflected in the ROC plots in Fig. 3(a). Table 1. Area under ROC curves for time series data processed with various methods
Dataset (D1) Noiseless T2 (D2) Noisy T2 (D3) Distorted T2 (D4) Distortion corrected T2
Area Under ROC Curves (AUC) Raw MSV MSV with median filtering 0.967 0.967 0.958 0.955 0.909 0.915 0.930 0.930
SPM 0.926 0.888 -
Fig. 4 shows several slices of the ground truth activation together with detected activation regions for MSV corrected data and SPM processed data. It is observed that MSV corrected data yielded a significantly larger number of true positives. 1
1
0.9
0.9 raw MSV (D1 - T2 without noise) SPM (D1 - T2 without noise)
0.8
0.8
raw MSV (D1 - T2 without noise)
raw MSV (D2 - T2 with Rayleigh noise)
median filtered MSV (D1 - T2 without noise)
SPM (D2 - T2 with Rayleigh noise)
0.7
0.7
raw MSV (D3 - distorted T2)
True positive
True Positives
median filtered MSV (D3 - distorted T2) 0.6
0.5
0.4
median filtered MSV (D4 - corrected distortion) 0.5
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
raw MSV (D4 - corrected distortion)
0.6
0 0
0.1
0.2
0.3
0.4
0.5
0.6
False Alarm
False alarm
(a)
(b)
0.7
0.8
0.9
1
Fig. 3. ROC curves for time series datasets (a) D1 and D2 to compare MSV and SPM, (b) D1, D3 and D4 to observe effects of temporal median filtering on ROC curves
3.2 Temporal Filtering of MSV Motion Parameters Table 2 shows that the RMSE values for the various datasets decrease significantly with median filtering. However, Table 1 shows that the AUC values of datasets corrected with median filtered MSV estimates are not much different from those of datasets corrected with raw MSV estimates, with the exception of the geometrically distorted dataset D3. Fig. 3(b) shows the respective three pairs of ROC curves.
282
D.T.B. Yeo, R.R. Bhagalia, and B. Kim
(a)
(b)
(c)
Fig. 4. Detected activation regions for (a) noiseless T2 with motion (D1), (b) T2 with motion and Rayleigh noise (D2). Rows show activation maps of respective T2 time series datasets processed with SPM (top), MSV (bottom) with random permutation test. (c) Ground truth simulated activation regions of the two sample slices. Table 2. RMSE of motion parameter estimates with and without median filtering
Dataset D1 D2 D3 D4
RMSE (mm and º) MSV without median filtering MSV with median filtering tx ty tz θx θy θz tx ty tz θx θy 1.2 1.0 0.7 2.2 2.4 2.0 1.2 1.0 0.7 0.2 0.2 1.2 1.0 0.7 0.3 0.2 0.1 1.2 1.0 0.7 0.2 0.2 2.0 7.6 1.9 2.9 1.6 1.0 1.4 7.8 1.0 1.0 0.5 1.2 3.3 0.6 0.3 0.4 0.2 1.1 3.2 0.5 0.2 0.2
θz 0.1 0.1 0.4 0.2
4 Discussion and Conclusions In this paper we evaluated the activation detection performance when a voxel-wise random permutation test is applied to MSV corrected datasets and compared it to the performance of SPM. We also investigated the effects of temporal filtering of MSV estimates for various types of datasets. Table 1 shows that the AUC values for MSV corrected data are higher than those from SPM datasets, especially in the presence of image noise. It is highly likely that the improvement came about largely because the built-in motion correction algorithm in SPM does not model 3D inter-slice motion. This quantified observation provides additional evidence that activation detection accuracy can be improved by applying retrospective MSV motion correction to fMRI time series images prior to statistical testing. It was also observed that temporal median filtering does not change the AUC values significantly except in the distorted dataset D3 (Table 1). This could be due to the larger number of outlier estimates as reflected in the higher variance and, in part, higher RMSE values of the motion parameter estimates in the presence of geometric distortion in dataset D3 (Table 2). In the other datasets without geometric distortion, outlier motion estimates occur much less frequently and thus, median filtering may
Improved MSV Motion Correction with B0 Inhomogeneity Correction
283
not have much effect on activation detection since the intensity errors due to reconstructed slices with outlier motion parameter estimates tend to be averaged out when computing the mean difference test statistic in the random permutation test. The results suggest that while temporal filtering may yield estimates that are closer to the physical head motion, it may improve activation detection only in slices with significant field inhomogeneity related distortions, i.e., temporal cortex language area. Future work will investigate specific conditions in which temporal filtering of MSV estimates will benefit activation detection. A comparison of SPM AUC values for D3 and D4 will also be done to evaluate MSV for geometrically distorted datasets.
Acknowledgements This work was supported in part by the National Institute of Health grants 1P01 CA87634 and 8R01 EB00309. We are grateful to Dr. Jeffrey A. Fessler for his iterative field-corrected image reconstruction software and to the anonymous reviewers for their constructive comments.
References 1. Friston, K.J., Ashburner, J., Frith, C.D., Poine, J.B., Heather, J.D., Frackowiak, R.S.J.: Spatial registration and normalization of images. Hum. Brain Map. 2 (1995) 165-189 2. Jiang, A.P., Kennedy, D.N., Baker, J.R.,Weisskoff, R., Tootell, R.B.H., Woods, R.P., Benson, R.R., Kwong, K.K., Brady, T.J., Rosen, B.R., Beliveau, J.W.: Motion detection and correction in functional MR imaging. Hum. Brain Map. 3 (1995) 224-35 3. Kim, B., Boes, J.L., Bland, P.H., Chenevert, T.L., Meyer, C.R.: Motion correction in fMRI via registration of individual slices into an anatomical volume. Magn. Reson. Med. 41 (1999) 964-972 4. Sutton, B.P., Noll, D.C., Fessler, J.A.: Fast, iterative image reconstruction for MRI in the presence of field inhomogeneities. IEEE Trans. Med. Imag. 22 (2003) 178-188 5. Yeo, D.T.B., Fessler, J.A., Kim, B.: Concurrent geometric distortion correction in mapping slice-to-volume (MSV) motion correction of fMRI time series. In: Barrillot, C., Haynor, D., Hellier, P. (eds.): Proc. Of MICCAI’04. Lecture Notes in Computer Science, Vol. 3217. Springer-Verlag, Berlin Heidelberg New York (2004) 752-760 6. Good, P.: Permutation tests. Springer-Verlag, Berlin Heidelberg New York (1994)
Hippocampus-Specific fMRI Group Activation Analysis with Continuous M-Reps Paul A. Yushkevich1 , John A. Detre2 , Kathy Z. Tang2 , Angela Hoang2 , Dawn Mechanic-Hamilton2 , Mar´ıa A. Fern´andez-Seara2, Marc Korczykowski2, Hui Zhang1 , and James C. Gee1 1
Penn Image Computing and Science Laboratory, University of Pennsylvania 2 Center for Functional Neuroimaging, University of Pennsylvania
Abstract. A new approach to group activation analysis in fMRI studies that test hypotheses focused on specific brain structures is presented and used to analyze hippocampal activation in a visual scene encoding study. The approach leverages the cm-rep method [10] to normalize hippocampal anatomy and project intra-subject hippocampal activation maps into a common reference space, eliminating normalization errors inherent in whole-brain approaches and guaranteeing that peaks detected in the random effects activation map are indeed associated with the hippocampus. When applied to real fMRI data, the method detects more significant hippocampal activation than the established whole-brain method.
1
Introduction
Functional MRI has revolutionized many fields of science, with numerous studies published each year. Although fMRI studies address a variety of hypotheses and use a variety of experimental designs, most employ a fairly rigid image processing and statistical analysis pipeline, defined by the Statistical Parametric Mapping (SPM) paradigm [4]. In SPM, group activation analysis involves normalizing each subject’s brain anatomy to a whole-brain template, performing random effects (RFX) analysis in the space of this template, and assigning anatomical labels to supra-threshold clusters and peaks in the resulting statistical map. This paper proposes an alternative analysis pipeline aimed at studies that test hypotheses focused on specific brain structures. In particular, we focus on analyzing activation in the hippocampus, a frequently studied structure responsible for episodic memory encoding. Using the cm-rep method [10], we fit a common shape-based coordinate system to the hippocampus in a manner that is consistent across subjects. This system allows us to project each subject’s hippocampal activation values into a common reference frame. By performing RFX analysis in this frame, we eliminate normalization errors stemming from whole-brain registration and ensure that clusters and peaks in the RFX map can be labeled as hippocampal activation with high degree of certainty, as opposed to SPM clusters and peaks, whose localization accuracy depends on the quality of the normalization to an anatomical atlas. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 284–291, 2006. c Springer-Verlag Berlin Heidelberg 2006
Hippocampus-Specific fMRI Group Activation Analysis
285
Table 1. Protocol implemented on the 3 Tesla Siemens Trio for the fMRI study SCAN T1 MP-RAGE EPI I EPI II
Resolution (mm3 ) 0.977 × 0.977 × 1 3.0 × 3.0 × 3.0 2.5 × 2.5 × 2.5
TR (ms) 1620 3000 3000
TE (ms) 3.87 30 30
FA (◦ ) 15 90 90
Other alternatives to SPM have also been proposed, including, but not limited to, methods that study activation on the cortical surface [3, 8], and the hippocampus unrolling approach by Zeineh et al . [11], which is related to our approach, but differs from it significantly, since it uses MR images with highly anisotropic voxels and produces 2D hippocampal activation maps.
2 2.1
Materials and Methods Subjects, Experimental Design and Data Acquisition
In a recent block design fMRI study conducted by the Center for Functional Neuroimaging, structural and functional brain scans were acquired from 30 healthy young adults at 3 Tesla. Table 1 summarizes the protocol, which includes a highresolution T1 scan and two EPI scans with isotropic voxels 2.5 mm and 3 mm wide, acquired in randomized order in order to study the tradeoff between signalto-noise ratio and partial volume effects. During each EPI scan, subjects were presented with novel complex visual scenes (photographs of people engaged in various activities, industrial landscape, etc.) and asked to remember them for a subsequent test. Scenes were presented in blocks of ten, with 3600 ms per stimulus and 400 ms inter-stimulus interval. To provide a baseline for analysis, subjects were shown scrambled scenes after each block, and asked to examine them as they would a normal picture. In the testing phase, subjects were shown another set of scenes, some previously seen and others novel, and asked to classify them as such; all subjects responded correctly more than 80% of the time. 2.2
Intra-subject Activation Analysis
SPM2 and VoxBo software were used to generate intra-subject statistical maps. fMRI sequences were motion-corrected and co-registered with T1 data (analysis was also repeated without such co-registration). fMRI data were spatially smoothed with 9 mm FWHM Gaussian kernels. Time series were smoothed temporally to remove frequencies above the task frequency and convolved with a canonical model of the hemodynamic response function. Box car task reference functions were also convolved with the hemodynamic response function and used as covariates in the general linear model, along with a global signal covariate, to produce statistical parametric maps. For each subject, maps of z-statistics corresponding to voxel-wise one-tail t-tests were computed.
286
2.3
P.A. Yushkevich et al.
Random Effects Analysis with Whole-Brain Normalization
RFX analysis is used to make inferences about a population on the basis of fMRI data from a group of subjects [6]. RFX statistical maps are computed by (1) normalizing each subject’s brain to a template, (2) using the normalization parameters to warp intra-subject activation maps into the space of the template, and (3) performing a second-level t-test at each voxel in this space. Normalization was implemented, as in many studies, by warping each T1 image to the Montreal Neurological Institute template [2] using SPM2, which first estimates a 12-parameter affine transformation, followed by a non-linear registration based on the discrete cosine basis [1]. Limitations of this approach include suboptimal normalization of the hippocampus due to its relatively low contribution to the registration objective function, which is integrated over the whole brain, as well as the uncertainty associated with assigning anatomical labels to peaks in the RFX statistical map. The cm-rep approach aims to overcome these limitations. 2.4
Hippocampus-Centric Random Effects Analysis with CM-Reps
The cm-reps approach is the continuous analog of the m-rep method by Pizer et al . [7]. Anatomical structures are represented by fitting a deformable model to binary segmentations of the structure (theoretically, fitting directly to anatomical images is also possible, but not implemented in this paper). The deformable model has a special property: it describes a geometric object in terms of the relationship between its skeleton and its boundary, and it preserves the branching topology of the skeleton during deformation. Specifically, a cm-rep model defines the skeleton as a single manifold with boundary m along which a certain scalar field ρ is defined. Both m and ρ are defined parametrically as m(u) =
N
mi fi (u) ;
i=0
ρ(u) =
M
ρi fi (u) ,
i=0
where {fi } is some basis (e.g., b-splines, Fourier harmonics) and mi ∈ R3 and ρi ∈ R are basis function coefficients that can be used to deform m and change ρ. The parameter vector u = (u1 , u2 ) is defined on some regular domain Ω ∈ R2 . From m and ρ, a radius function R is derived by solving the following partial differential equation: m R 2 = ρ
subj. to
R(1 − ∇m R2 ) = 0
on ∂Ω ,
(1)
where m and ∇m denote the Laplace-Beltrami and gradient operators on the manifold m. The boundary condition in (1) corresponds to the equality constraint that is satisfied by single-manifold skeletons of generic objects [9]. The PDE is well-behaved in practice, and solutions, as well as their gradients, can be found efficiently using the Finite Differences Method. Given the manifold m and having solved for the field R, we can apply inverse skeletonization to derive a parametric expression for the boundary of the object
Hippocampus-Specific fMRI Group Activation Analysis a.
b.
287
c.
Fig. 1. Three steps of constructing a cm-rep. a. Medial manifold m(u) with a color plot of the scalar field ρ(u). b. The radius scalar field R(u) computed by solving the Poisson PDE (1) on the medial manifold. c. Boundary surface b± (u) resulting from inverse skeletonization.
O whose skeleton is {m, R} (the skeleton is the set of centers and radii of all maximal inscribed balls in the object). In work that has not yet been published, we give a list of sufficient conditions for which we prove that the inverse skeletonization problem is well-posed; these include the boundary condition in (1) and some inequality constraints. When these constraints hold, the solution to inverse skeletonization is given by the following formula: b± = m + R U± ,
where U± = −∇m R ±
1 − ||∇m R||2 Nm ,
(2)
where Nm is the unit normal to m. Manifolds b+ and b− given by the above expression are two surface patches that lie on the opposite sides of m and together form the boundary of the object O. For every u ∈ Ω, the ball with center m(u) and radius R(u) is a maximal inscribed ball in O and is tangent to ∂O at the points b− (u) and b+ (u). Vectors U± project from the center of the ball towards these points of tangency and have unit length; hence they form the unit normal vector field to ∂O. It is easy to verify that the boundary condition in (1) ensures that U+ and U− coincide for u ∈ ∂Ω and that patches b+ and b− together form a closed surface. An example cm-rep model is shown in Fig. 1, first as the inputs m and ρ, next as the field R obtained by solving (1), and last sa the boundary surface computed by inverse skeletonization. We refer the reader to [9, 10] for the details of the deformable modeling approach used to fit cm-rep models to actual structures. It follows the Bayesian framework, with an image-driven likelihood term and prior terms that enforce inequality constraints that permit inverse skeletonization, impose geometric correspondence by minimizing distortion in the area element of m, and incorporate a shape prior. Since the deformable model is restricted to have a single-manifold skeleton, it matches real-world objects with some inherent level of error. In [9] we report high accuracy when fitting cm-rep models to hippocampi from an 82-subject schizophrenia study. In more recent unpublished results, the average overlap is 95.0% and the average boundary displacement error is 0.168 mm. A key feature of cm-reps discussed in [10] is the ability to impose a shapebased coordinate system on the interiors of structures. Every point x inside the cm-rep boundary can be assigned a triple of values u1 , u2 , ξ that satisfy x(u, ξ) = m(u) + |ξ|R(u) U sgn(ξ) (u)
(3)
288
P.A. Yushkevich et al.
Fig. 2. Selected sagittal slices through the RFX activation map computed using wholebrain normalization, superimposed over the average anatomical image. Please see also the corresponding color Quicktime movie: whole brain rfx.mov.
The assignment is unique, except in the case u ∈ ∂Ω, when x(u, ξ) = x(u, −ξ). This assignment can be used to map the interior of the object into a reference space Ω × [−1, 1]. This mapping has an important feature: the distance from x(u, ξ) to the cm-rep boundary is given by (1 − |ξ|)R(u), so line segments normal to the cm-rep boundary are mapped to vertical line segments in the reference space. Points on the cm-rep skeleton are mapped to the plane ξ = 0 in the reference space and points on the boundary are mapped to planes ξ = ±1. This coordinate system was used to perform hippocampus-specific RFX analysis of activation data. The right hippocampus was outlined manually in T1 images of 18 subjects. A cm-rep template was fitted to the resulting binary segmentations. Intra-subject activation z-maps were sampled in the cm-rep shapebased coordinate system, i.e., transformed into the u1 , u2 , ξ space. RFX analysis was performed by computing a one-tail t-test on the z-statistics at each point in the coordinate system. Thus, the analysis was limited just to the hippocampus region. For visualization purposes, the t-statistic field resulting from this RFX analysis was projected back into the image space of one of the subjects.
3
Results
Fig. 2 shows the RFX t-map obtained using whole-brain analysis. The map is superimposed over a ‘mean anatomy’ image, computed by averaging the subjects’ T1 images following SPM2 normalization. The fuzziness of this image stems
Hippocampus-Specific fMRI Group Activation Analysis
289
Fig. 3. Selected sagittal slices through the hippocampal RFX activation map computed in the cm-rep coordinate system, projected back into the anatomical space of one subject and superimposed over this subject’s anatomical image. Please also see the corresponding color Quicktime movie: cmrep rfx.mov.
from whole-brain normalization error, which is significant: the average pairwise overlap between subjects’ hippocampus segmentations warped into the space of the template is 53%. The RFX map in Fig. 2 is threshholded at t = 5, and peaks near the right hippocampus are highlighted. The locations and labels of the peaks, looked up in the Talairach atlas, are listed in Table 2. Such labeling carries a degree of uncertainty because of normalization error and low resolution of the atlas. For each peak in Table 2, we indicate the probability that it lies in the hippocampus, based on the number of hippocampus masks (warped into the template space) overlapping its location. The fact that these probabilities do not always agree with the anatomical labels underscores the uncertainty associated with anatomical localization of whole-brain RFX peaks. To address possible concerns about the use of rigid registration for EPI/T1 alignment, the above experiment was repeated without registration. In 3mm3 fMRI data, the maximal t-value observed in the RFX map increased from 16.08 to 16.62 with registration, and the size of the supra-threshold region t ≥ 6.0 increased from 102cm3 to 112cm3 . The results for 2.5mm3 data are analogous. The results of cm-rep-based RFX analysis for 3mm3 fMRI are shown in Fig. 3. The underlying anatomical image is a T1 scan of one of the subjects. The RFX t-map, computed in the cm-rep reference space, was projected back into the subject’s anatomical space for visualization. The peaks of the t-map shown here have larger t-values than the peaks in and around the hippocampus shown in Fig. 2. The locations of these peaks within the hippocampus are given in Table 2. Notably, the maximum t-values observed in the hippocampus with cm-rep analysis exceed those detected around the hippocampus with whole-brain analysis. The latter, however, may or may not indicate hippocampal activation, due to the errors associated with atlas-based anatomical labeling. The picture is similar for 2.5mm3 data: cm-rep normalization yields larger hippocampal activation values than SPM, with the former generating peaks t = 7.83, 7.43, 7.18, 6.22 in the right hippocampus and the latter reporting peak t = 6.90 in the right hippocampus and t = 7.48, 7.47, 7.37 in the right parahippocampal gyrus.
290
P.A. Yushkevich et al.
Table 2. Peak locations in RFX t-map computed for 3mm3 fMRI data using wholebrain and cm-rep based normalization. RC = right cerebrum; GM = gray matter; WM = white matter; PHG = parahippocampal gyrus. The column p(H) gives the fraction of hippocampus masks warped to the space of the template which overlap the given peak location. Peaks in the Whole-Brain RFX t-Map Tal. Coord. Talairach Labels 16, 20, 31, 33, 31, 27, 24,
-31, -2 -29, 8 -22, -9 -6, 27 -20, -13 -21, -13 -17, -17
RC, RC, RC, RC, RC, RC, RC,
limbic lobe, sub-gyral, *, * sub-lobar, thalamus, GM, pulvinar temporal lobe, sub-gyral, GM, hippocampus limbic lobe, uncus, WM, * limbic lobe, PHG, GM, hippocampus limbic lobe, PHG, GM, Brodmann area 6 limbic lobe, PHG , WM, *
t-Stat
p(H)
7.69 6.89 6.67 6.47 6.43 6.12 6.01
0 0 0.94 0 0.82 0.94 0.94
Peaks in the Hippocampus-Specific CM-Rep RFX Map Location in R. Hippocampus Head, Lateral Tail, Medial
4
t-Stat 9.99 7.55
Location in R. Hippocampus Tail, Lateral Body, Lateral
t-Stat 8.04 6.45
Discussion and Conclusions
The question of whether better normalization of anatomy can improve the statistical power of fMRI analysis is not easily answered. Normalization error, it can be argued, is only one source of error that may fade in comparison to other sources, such as susceptibility artifacts. The fact that in an experiment where hippocampal activation is expected we detected more significant hippocampal activation with explicit normalization of the hippocampus than with whole-brain normalization, suggests that normalization error indeed had a significant impact on RFX statistics, and that methods which minimize it may increase the sensitivity and specificity of functional neuroimaging studies. We do not presume that the cm-rep method is the only approach capable of reducing normalization error in fMRI group studies. Indeed, non-parametric registration techniques, especially when driven by expert-placed landmarks (e.g., [5]), tend to register subcortical structures more accurately than parametric registration driven only by image forces. However, the cm-rep method offers certain advantages for fMRI analysis that registration techniques do not. It makes it straightforward to associate voxel-wise features, such as functional activation, with shape features, such as the distance to the boundary. Thus, for instance, gray matter thickness can easily be used as a covariate in functional activation analysis, since functional voxels located in regions where grey matter is thicker are less likely to suffer from partial volume effects. The shape-based cmrep coordinate system also can be used to define anisotropic smoothing kernels that are thinner in the direction across the structure of interest, and longer in
Hippocampus-Specific fMRI Group Activation Analysis
291
the direction along the structure. Such kernels can lead to structurally cohesive smoothing, which would reduce the amount of mixing of fMRI signal originating from different structures, and may strengthen the signal at locations adjacent to structures that do not activate. Finally, the cm-rep deformable modeling approach, which allows incorporation of shape and intensity priors, may lead to an automatic segmentation algorithm, which would eliminate the need for binary masks. Our future work will focus on leveraging these potential advantages.
Acknowledgement This work was supported by the NIH/NINDS R01 NS045839. We thank Prof. Charles L. Epstein, Prof. Jean Gallier and Marcelo Siqueira (Penn) for insightful discussions. We are indebted to Profs. Guido Gerig, Stephen M. Pizer, Sarang Joshi, and Martin Styner (UNC) for providing data and inspiration for this work.
References 1. J. Ashburner and K. Friston. Nonlinear spatial normalization using basis functions. Human Brain Mapping, 7(4):254–266, 1999. 2. A. Evans, D. Collins, S. Mills, E. Brown, R. Kelly, and T. Peters. 3d statistical neuroanatomical models from 305 mri volumes. In IEEE Nuclear Science Symposium and Medical Imaging Conference, pages 1813–1817, 1993. 3. B. Fischl, M. I. Sereno, and A. M. Dale. Cortical surface-based analysis II: inflation, flattening, and a surface-based coordinate system. NeuroImage, 9(2):195–207, 1999. 4. R. Frackowiak, K. Friston, C. Frith, R. Dolan, and J. Mazziotta, editors. Human brain function. Academic Press USA, 1997. 5. J. Haller, A. Banerjee, G. Christensen, M. Gado, S. Joshi, M. Miller, Y. Sheline, M. Vannier, and J. Csernansky. Three-dimensional hippocampal MR morphometry by high-dimensional transformation of a neuroanatomic atlas. Radiology, 202:504– 510, 1997. 6. A. Holmes and K. Friston. Generalisability, random effects and population inference. NeuroImage, 7:754, 1998. 7. S. M. Pizer, P. T. Fletcher, S. Joshi, A. Thall, J. Z. Chen, Y. Fridman, D. S. Fritsch, A. G. Gash, J. M. Glotzer, M. R. Jiroutek, C. Lu, K. E. Muller, G. Tracton, P. Yushkevich, and E. L. Chaney. Deformable m-reps for 3D medical image segmentation. International Journal of Computer Vision, 55(2):85–106, Nov 2003. 8. D. Tosun and J. L. Prince. Cortical surface alignment using geometry driven multispectral optical flow. In International Conference on Information Processing in Medical Imaging, pages 480–492, 2005. 9. P. Yushkevich, H. Zhang, and J. Gee. Parametric medial shape representation in 3-D via the Poisson partial differential equation with non-linear boundary conditions. In G. Christensen and M. Sonka, editors, Information Processing in Medical Imaging, pages 162–173, 2005. 10. P. A. Yushkevich, H. Zhang, and J. C. Gee. Statistical modeling of shape and appearance using the continuous medial representation. In Medical Image Computing and Computer-Assisted Intervention, MICCAI, volume 2, pages 725–732, 2005. 11. M. M. Zeineh, S. A. Engel, P. M. Thompson, and S. Y. Bookheimer. Dynamics of the hippocampus during encoding and retrieval of face-name pairs. Science, 299(5606):577–80, Jan 2003.
Particle Filtering for Nonlinear BOLD Signal Analysis Leigh A. Johnston1,2 , Eugene Duff1,3 , and Gary F. Egan1 1
2
Howard Florey Institute & Centre for Neuroscience, Melbourne, Australia Dept. of Electrical & Electronic Engineering, University of Melbourne, Australia 3 Dept. of Mathematics & Statistics, University of Melbourne, Australia {l.johnston, e.duff, g.egan}@hfi.unimelb.edu.au
Abstract. Functional Magnetic Resonance imaging studies analyse sequences of brain volumes whose intensity changes predominantly reflect blood oxygenation level dependent (BOLD) effects. The most comprehensive signal model to date of the BOLD effect is formulated as a continuous-time system of nonlinear stochastic differential equations. In this paper we present a particle filtering method for the analysis of the BOLD system, and demonstrate it to be both accurate and robust in estimating the hidden physiological states including cerebral blood flow, cerebral blood volume, total deoxyhemoglobin content, and the flow inducing signal, from functional imaging data.
1
Introduction
Functional Magnetic Resonance (fMR) imaging methods map brain activation through the acquisition and analysis of a sequence of scanned volumes. Changes in the measured fMR signal are primarily due to the blood oxygenation level dependent (BOLD) effect, reflecting hemodynamic response to neuronal activity. The most common fMR analysis approach is based on regression of a General Linear Model, in which BOLD effect predictors are formed by convolution of a predefined hemodynamic response function with the input stimulus sequence, and the estimated predictor weights are taken as a measure of neuronal activity. The relationship between the input stimulus sequence and output BOLD signal is recognised, however, as being both subject and session dependent, and nonlinearly related to underlying physiological variables that include cerebral blood flow, cerebral blood volume, and total deoxyhemoglobin content [1], thus rendering the GLM insufficient for a detailed analysis of the system. One of the most comprehensive nonlinear models of the BOLD signal, presented by Riera et al. [2], models the black-box system between the known input stimulus sequence and measured output BOLD signal by a continuous-time, stochastic dynamical system, in which the physiological variables form the hidden states. This model incorporates previous BOLD signal models, including the balloon-windkessel approach of [3,4] and the hemodynamic model linking the stimulus sequence and flow-inducing signal [5]. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 292–299, 2006. c Springer-Verlag Berlin Heidelberg 2006
Particle Filtering for Nonlinear BOLD Signal Analysis
293
Any model is a purely descriptive tool unless methods exist with which to infer hidden system knowledge based on known inputs and outputs. The General Linear Model, for example, is useful because regression (equivalently least squares optimisation) provides optimal estimates of the activity-related weights that are subsequently processed to form statistical parameter maps. Riera et al. propose a method based on linearisation for estimating the physiological state variables in their nonlinear BOLD signal model [2]. In this paper we propose the novel application of particle filtering to the state estimation of the nonlinear, stochastic BOLD signal model. Particle filtering is a sequential Monte Carlo method in which probability densities are represented by a population of point masses, propagated through time according to the evolving system dynamics [6]. Applications of particle filters range from radar tracking [7] to image registration [8]. Unlike the linearisation approach [2], the particle filter maintains the nonlinearities present in the signal model, and is therefore an ideal candidate for application to fMR data analyses. We demonstrate this to be a robust and accurate method for estimation of the BOLD signal model.
2
The BOLD Signal Model
The balloon model of Buxton et al. [3], and subsequent windkessel formulation of Mandeville et al. [4], provide a mechanistic model of the dependence of the BOLD signal, y, on cerebral blood flow, f , cerebral blood volume, v, and deoxyhaemoglobin content, q. In this model, the venous compartment is considered to be a balloon-like structure, inflated by increased blood flow resulting from an increase in local neuronal activity. The blood flow increases more than the cerebral metabolic rate of oxygen, thus locally reducing the deoxyhaemoglobin content which in turn causes a slight increase in the local MR signal, creating the BOLD effect [1]. The balloon model equations are as follows (for a full derivation, see [1,5]): yt = h(vt , qt ) qt h(vt , qt ) = V0 k1 (1 − qt ) + k2 1 − + k3 (1 − vt ) vt 1 1/α v˙ t = ft − vt τ0 1 ft 1 − (1 − E0 )1/ft qt q˙t = − 1−1/α τ0 E0 vt
(1) (2) (3) (4)
where E0 is the resting oxygen extraction fraction, V0 is the resting blood volume fraction, and {k1 = 7E0 , k2 = 2, k3 = 2E0 − 0.3} are generally accepted coefficient values [5]. The parameter τ0 is the mean transit time of the venous compartment, and α is the stiffness component. The relationship between an input stimulus sequence, u, the flow-inducing signal, s, and the cerebral blood flow, f , is a topic of ongoing research and
294
L.A. Johnston, E. Duff, and G.F. Egan
is difficult to verify experimentally [1]. Friston et al. [5] propose the following parsimonious model linking stimulus to cerebral blood flow: s˙ t = εut −
1 1 st − (ft − 1) τs τf
f˙t = st
(5) (6)
Here τs is the signal decay time constant, τf the autoregulatory time constant, and ε the neuronal efficacy. Equations (1)–(6) describe a system of ordinary differential equations that deterministically relate input, ut , to output, yt , via a set of state variables, xt = [vt , qt , st , ft ] . Riera et al. [2] recently extended the above model to include a noise process in the state dynamics, dxt = F (xt , ut ) dt + g dωt
(7)
and similarly for the observation process (1). F (·) encompasses the right-hand sides of Equations (3)–(6), g is a vector of weighting terms, and ωt is a Wiener process. Let Θ = {ε, τs , τf , τ0 , α, E0 , V0 , g, σe2 } denote the complete set of BOLD signal parameters.
3
Method
We describe the acquisition methods for simulated and experimental fMR data (Sections 3.1–3.2), followed by details of state estimation by particle filtering (Section 3.3) and by the linearisation approach of [2] (Section 3.4). A joint parameter and state estimation algorithm is then outlined in Section 3.5. 3.1
Simulated Data
Ground truth against which to test the state estimation algorithms is not attainable through experimental fMR data, for which reason we simulate the BOLD signal model. Simulation requires discretisation of the continuous-time state dynamics (7), achieved by applying the Euler-Maruyama method with sampling at time instants tk = kΔt [9]. The resultant discrete-time dynamics, xk+1 = xk + F (xk , uk )Δt + gwk ,
wk ∼ N (0, Δt )
(8)
converge in the mean square sense to the continuous-time dynamics for Δt → 0. The input stimulus sequence, uk , is simulated by a binary sequence representing the on/off conditions of a block design, or Gaussian impulses of duration 200ms for an event design [2]. The output BOLD signal (1) is observed every TR seconds, tk = k TR, where TR is the Response Time of the scanning session: yk = h(vk , qk ) + ek ,
ek ∼ N (0, σe2 )
(9)
The noise process, ek , accounts for measurement and instrumentation noise encountered when acquiring the signal.
Particle Filtering for Nonlinear BOLD Signal Analysis
3.2
295
Experimental Data
Data was acquired from four consecutive fMRI scans of a single subject. In each scan, the subject performed two two-minute blocks of a repetitive left-handed finger tapping task. Blocks were interspaced by two minute periods of eyes-open rest. During each scan 240 sixteen-slice brain volumes were acquired at a rate of 0.5Hz, using a T2*-weighted gradient-echo, echo-planar-imaging sequence. Slice thickness was 6mm, and in-plane voxel resolution 3.2 × 3.2mm2 . Activation maps were generated with the FSL software package1, which uses a General Linear Modeling approach to detect regions with significant responses during the tapping task. The experimental data was obtained by detrending then averaging across the four scans the responses of voxels from a region of interest in the right primary motor cortex, centered at the most strongly responding voxel, as depicted in Fig. 1.
(a) Activation map
(b) ROI: black square
Fig. 1. A region of interest (ROI) is formed from a cluster of voxels in the right primary motor cortex, centred at the most strongly responding voxel
3.3
State Estimation Via Particle Filtering
Particle filters are a powerful class of techniques for implementing recursive Bayesian filters when the dynamics of the system under consideration are nonlinear and/or non-Gaussian [6]. The particle filter constructs the posterior density, p(xk |y1 , . . . , yk ), by a weighted sum of N particles distributed over the support of xk (refer to Algorithm 4, [6]): p(xk |y1 , . . . , yk , Θ) =
N
wki δ(xk − xik )
(10)
i=1
The particles and weights are composed and propagated through time according to the following recursions: 1
www.fmrib.ox.ac.uk/fsl/
296
L.A. Johnston, E. Duff, and G.F. Egan
xik ∼ p(xk |xik−1 , Θ),
wki ∝ p(yk |xik , Θ)
(11)
ˆ = [ˆ ˆ T ], are formed by ergodic averages. The noise State estimates, X x1 , . . . , x processes in the discrete-time state and observation equations (8)–(9) define the prior and likelihood densitites for use in (11): p(xk | xk−1 ) ∼ N (F (xk , uk ), gg ), 3.4
p(yk | xk ) ∼ N (h(xk ), σe2 )
(12)
State Estimation Via Local Linearisation
Linear functions, F (·, ·) and h(·), in (8) and (9) would render the Kalman filter the optimal state estimator, in the minimum variance sense [10]. No such optimal filter exists for the nonlinear dynamical system defining the hemodynamic response to a stimulus (8)–(9), hence the application of Monte Carlo methods such as the particle filter. An alternative method commonly employed to deal with system nonlinearity is to locally linearise the system nonlinearities. This is the approach taken in [2] to obtain estimates of the states xk = [vk , qk , sk , fk ] , resulting in the linearised state equation, xk+1 = xk + [Δt r0 (Jfx (xk , uk ), Δt ) − r1 (Jfx (xk , uk ), Δt )]Jft (xk , uk ) + r0 (Jfx (xk , uk ), Δt ) F (xk , uk ) + ξk
(13)
where Jfx is the Jacobian matrix, and Jft the time derivative, of F (xt , ut ), a rn (M, a) = 0 eMu un du and ξk is a Gaussian noise process (see [2] for details). The linearisation filtering algorithm proceeds according to an extended Kalman filter type recursion through propagation of mean and covariance estimates [10] of linearised dynamics. 3.5
Joint Parameter and State Estimation
We employ an iterative coordinate descent approach [11] to simultaneously estimate the states and parameters of the BOLD signal model. At iteration n, given the current parameter estimates, Θ(n−1) , the particle filter is applied to optimise the posterior, X (n) = arg max p(X|y1 , . . . , yT , Θ(n−1) ) X
(14)
Updated parameter estimates at iteration (n) are subsequently calculated by optimising the posterior, given the updated state sequence estimates, equivalently a maximum likelihood calculation: Θ(n) = arg max p(X (n) |y1 , . . . , yT , Θ) Θ
(15)
The optimisation (15) is carried out using MATLAB’s fminsearch function. The iterative algorithm terminates when a maximum number of iterations is exceeded, or estimates are deemed to have converged.
Particle Filtering for Nonlinear BOLD Signal Analysis
4
297
Results
4.1
Simulated Data
We demonstrate the accuracy and robustness of the particle filter through application to two sets of simulated fMR data, the first being an event response, the second a block design.
uk
u
u
y 0.2
y 0.2 k
y
0
0
s 0.4 k
s 0.4 k
0
0
k
k
k
1.5
0.4
k 0.2
0
s
0.3 k
0 −0.3
f 2 k 1.5
f2 k
1
1
1
vk 1.1
vk 1.2 1.1 1
vk 1.2
f
k
1
q
k
1
q
1
q
k
0.9 0.8 0
1
k
0.8 5
10 Time (secs)
15
0
5
10 Time (secs)
15
1 0.8 0.6 0
20
40
60 Time (secs)
80
100
(a) Event design, TR = 0.1 (b) Event design, TR = 1 (c) Block design, TR=1 sec. sec. sec. Fig. 2. State estimation of BOLD signal model. Axes (top to bottom): uk , yk , sk , fk , vk , qk . Solid line: True state/signal. Dashed line: Particle filter estimate. Dotted line: Linearisation filter estimate. Where not visible, the particle filter coincides with the true state/signal (solid line).
Consider the BOLD signal response to a single stimulus event, ut , at t = 2 seconds. The system parameters are chosen to be in their typical ranges [5]: ε = 0.5, τs = 1.25, τf = 2.5, τ0 = 1, α = 0.3, E0 = 0.3, with output scaling factor, V0 = 0.2. The observation noise variance is σe2 = 10−3 , and g = [0.01, 0, 0, 0]. The discrete-time state equation (8) is sampled at Δt = 0.1 seconds, while the observation sampling rate, TR, is varied. Fig. 2a) shows that for TR = Δt = 0.1 seconds, the particle filter is more accurate than the linearisation algorithm. This difference is only exemplified as the TR increases to a more realistic sampling interval of TR = 1 second, as shown in Fig. 2b). The particle filter maintains accuracy state estimates while the linearisation filter fails to track the state sequences. Fig. 2c) depicts the results of the particle filtering and linearisation algorithms for a block design simulation. The system parameters are the same as for the
298
L.A. Johnston, E. Duff, and G.F. Egan
event response. Again, the particle filter demonstrates superior performance to the linearisation method, and is able to track the changes in the hidden state variable sequences. 4.2
Experimental Data
The observed signal from the right primary motor cortex of a subject during a block-design finger-tapping task (see Sec. 3.2), and the experiment paradigm are shown in the top two axes of Fig. 3a). We apply the joint state and parameter estimation algorithm detailed in Section 3.5 to this input/output pair of sequences. Fig. 3a) displays the resultant state sequence estimates, Fig. 3b) the parameter estimate histories. All parameters converge within a few iterations and were found to be robust to reasonable change in initial conditions. To simplify computation, the parameters of the coupling between input stimulus and the cerebral blood flow (5)-(6) are assumed known; ε = 0.5, τs = 1.25 and τf = 2.5, with no detriment to the estimation procedure. 0.15 uk
α 0.1
1 yk
0.1
0 g 0.4
sk
0.05
0 0
−0.4 fk
vk
0.1
2
2
1
σe 0.05
1.1
0
1
qk
1 V0
1 0.5 0
100
200 300 Time (secs)
400
(a) State Estimates. Solid line: Observed signal. Dashed line: Particle filter estimate.
0.5 0 0
5 Iteration
10
(b) Parameter Estimates.
Fig. 3. Results of the joint parameter and state estimation algorithm (Sec. 3.5) applied to observed right primary motor cortex signal during finger-tapping task
5
Discussion and Conclusions
The pathway from input stimulus sequence to output BOLD effect signal via modulation of physiological variables including deoxyhemoglobin content, cerebral blood flow and cerebral blood volume, is both nonlinear and stochastic. Our proposed particle filtering approach is therefore ideally suited to the task, given
Particle Filtering for Nonlinear BOLD Signal Analysis
299
its inherent ability to maintain signal nonlinearity and propagate uncertainty. Superior performance of the particle filtering approach over the linearisation method has been demonstrated, while the proposed joint state and parameter estimation algorithm is successful in application to experimental fMR data. It is expected that nonlinear filtering methods, such as the particle filtering algorithm proposed in the paper, will become increasingly important in fMR analyses as limitations of linear and linearised modelling are reached.
References 1. Buxton, R.B., Uluda˘ g, K., Dubowitz, D.J., Liu, T.T.: Modeling the hemodynamic response to brain activation. NeuroImage 23 (2004) S220–S233 2. Riera, J.J., Watanabe, J., Kazuki, I., Naoki, M., Aubert, E., Ozaki, T., Kawashima, R.: A state-space model of the hemodynamic approach: nonlinear filtering of BOLD signals. NeuroImage 21 (2004) 547–567 3. Buxton, R.B., Wong, E.C., Frank, L.R.: Dynamics of blood flow and oxygenation changes during brain activation: the Balloon model. Magnetic Resonance in Medicine 39 (1998) 855–864 4. Mandeville, J.B., Marota, J.J., Ayata, C., Zararchuk, G., Moskowitz, M.A., Rosen, B., Weisskoff, R.M.: Evidence of a cerebrovascular postarteriole windkessel with delayed compliance. Journal of Cerebral Blood Flow and Metabolism 19 (1999) 679–689 5. Friston, K.J., Mechelli, A., Turner, R., Price, C.J.: Nonlinear responses in fMRI: The Balloon model, Volterra kernels, and other hemodynamics. NeuroImage 12 (2000) 466–477 6. Arulampalam, M.S., Maskell, S., Gordon, N., Clapp, T.: A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing 50(2) (2002) 174–188 7. Gilks, W.R., Berzuini, C.: Following a moving target – Monte Carlo inference for dynamic Bayesian models. Journal of the Royal Statistical Society, Series B 63(1) (2001) 127–146 8. Ma, B., Ellis, R.E.: Unified point selection and surface-based registration using a particle filter. In: Proceedings of MICCAI 2005. (2005) 75–82 9. Gard, T.C.: Introduction to Stochastic Differential Equations. Marcel Dekker (1988) 10. Anderson, B.D.O., Moore, J.B.: Optimal Filtering. Prentice-Hall (1979) 11. Luenberger, D.G.: Linear and Nonlinear Programming. 2nd edn. Addison-Wesley (1984)
Anatomically Informed Convolution Kernels for the Projection of fMRI Data on the Cortical Surface Gr´egory Operto1 , R´emy Bulot1 , Jean-Luc Anton2 , and Olivier Coulon1 1
Laboratoire LSIS, UMR 6168, CNRS, Marseille, France 2 Centre IRMf de Marseille, Marseille, France
Abstract. We present here a method that aims at producing representations of functional brain data on the cortical surface from functional MRI volumes. Such representations are required for subsequent corticalbased functional analysis. We propose a projection technique based on the definition, around each node of the grey/white matter interface mesh, of convolution kernels whose shape and distribution rely on the geometry of the local anatomy. For one anatomy, a set of convolution kernels is computed that can be used to project any functional data registered with this anatomy. The method is presented together with experiments on synthetic data and real statistical t-maps.
1
Introduction
The cerebral cortex is known to generate most of the activity measured by functional techniques, such as functional magnetic resonance imaging (fMRI). Despite its sheet-like nearly two-dimensional structure, methods for functional data analysis still widely consider the cortex in its original three-dimensional grid. The surface-based approach is though an attractive approach for intersubject matching [3,5] or statistical analysis [4,9,6,2], coping with its highly folded nature. However, a problem which has remained quite ignored by the litterature concerns the projection of functional values onto the cortical surface, although a relevant description of the volume-based signal on a two-dimensional model is essential to a subsequent cortical-based analysis. Some methods propose to interpolate intensities along a normal direction or inside a sphere centered at each node of the mesh representing cortical surface (e.g. when using free package Brainvisa [1]), or to assign each node with the value of its containing voxel [13], or to compute trilinear interpolations [2]. On the other hand, other methods try to embed some explicit anatomical information into the process. For example, [14] defines each node’s influence scope through voxels distances to and along the cortical mesh. In [7], a geodesic Vorono¨ı diagram is computed so that each node value is integrated within an associated set of voxels defined by the local anatomical geometry. From these methods stems a common interest in delineating 3D areas, sometimes overlapping, onto which the signal emanating from the nodes would possibly be dispersed, taking local anatomical features into account. This R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 300–307, 2006. c Springer-Verlag Berlin Heidelberg 2006
Anatomically Informed Convolution Kernels for the Projection of fMRI Data
301
consideration was a starting point for developing a new method, based on an expected, anatomically related, signal distribution in BOLD-EPI volumes, leading to surface-based representations of the cortical ribbon activity. This paper details in section 2 the model which the method relies on. Section 3 presents the method itself. Results and discussion are then detailed in section 4.
2 2.1
Method Model of the Expected Signal Distribution
We consider two distinct phenomena in order to assess the expected distribution of BOLD signal around a single node of the cortical mesh, with physiological and image- related motivations. First, as a consequence of the columnar architecture of neurons inside the cortex [12], we expect each cortical column to show homogeneous intrinsic activity, suggesting that this activity and the signal should be constant in the column orientation, i.e. normal to the surface. Then, since neighborhood connections between columns lead to strong interactions along the surface, induced decaying activity around a functional activation focus can be measured [8]. Beside this, the image acquisition process inevitably induces the signal to sparsely spill outside the cortex, possibly mixing-up sources and bringing signal to anomalous areas. These observations resultantly inspired our method as it aims to figure out in which proportions neighbouring voxels can be associated with the expected signal at a surface node and hence in which proportions their intensities should weigh on the interpolated node value. In respect with our assumptions, the distribution of activity on the cortical surface is thus depicted as a function of distance parameters along two main directions, normal and parallel (geodesic distance) to the surface, as shown on figure 1.
Fig. 1. Expected distribution of activity around a given point of the cortical surface presented with iso-influence curves
2.2
Computing the Convolution Kernels
After extracting the white/gray matter interface triangulation from MR T1 anatomical image using the Brainvisa package [10] and operating registration between functional volumes and anatomy, the first processing step deals with
302
G. Operto et al.
the evaluation, around each node nc , of influence factors for any surrounding voxel. The process is done on isotropic voxels so as to cope with the anisotropy of the MRI spatial point spread function. Since influences are here related to distances, we compute two distances, geodesic and normal, between the node nc and each voxel vi of a neighborhood Vc around nc . These two distances are then used as inputs of two distinct weight functions, both of them describing a linear influence decay as distance increases. This results in associating a specific convolution mask to each node. Computing 15 × 15 × 15 voxels mask at anatomical resolution (1 × 1 × 1 mm), the covered area is wide enough so any outer voxel has a null weight. The computation of normal distances relies on the propagation of a proximity information, starting from the voxel vc containing the current node nc . To perform this, we use a Fast Marching-like distance propagation algorithm adapted from the method described in [11]. The idea is to propagate a front that attributes to each surrounding voxel v its normal distance to the surface dv , and its closest node on the surface nv (with dv being approximated to the Euclidean distance between v and nv ). The voxel vcurrent of the front which is the closest to the surface is considered processed and its neighbors vi are added to the front, except those already processed. For each vi , it is assumed that the associated nvi is in a close neighborhood of nvcurrent . Once nvi is found, dvi is computed and both are associated with vi . The other neighbors of vcurrent are processed in a similar manner. Once all neighbors processed, the algorithm is iterated with the voxel having the lowest dv . The algorithm is initialized with the nearest voxel vc to the considered node nc . This process is illustrated on figure 2.
Fig. 2. (left) Computation of normal (black lines) and geodesic (red line) distances using a Fast Marching-like algorithm [11] - (right) Flowchart of the algorithm
Once every surrounding voxel was assigned its proximity index to the surface, each of them gets an estimated geodesic distance between nc and the node nv from which the normal distance was computed at first. A shortest path algorithm creates from nc a geodesic distance map covering the whole area under the influence of its activity.
Anatomically Informed Convolution Kernels for the Projection of fMRI Data
303
The voxels characterized by two distances are then given weights by the combination of two functions described on figure 3. Geodesically, we estimate that a node located beyond 6 mm far from nc is no longer under influence : aside from its physiological justification, this user-specified distance allows to handle the smoothness of the final projection, similar to the variance of the basis functions defined by [9]. On the other hand, the normal distance weight depends directly on the anatomical structure the voxel is located in. The normal weight of a voxel is maximal if located in the cortical ribbon and decreases outside to reach zero at a user-specified normal distance of 3 mm (figure 3). As a result, each voxel v gets assigned a value ω(v) corresponding to the product of its two weights. Finally, each mask is normalized so that the sum of all weights equals 1, for energy
Fig. 3. Geodesic (left) and normal (right) weight functions
Fig. 4. Convolution kernels obtained for a node on a flat surface mesh with superimposed simulated cortical mask
Fig. 5. Convolution kernels obtained for nodes on real cortical meshes with superimposed cortical masks
304
G. Operto et al.
and intensity range preservation purposes. Figure 4 displays a convolution kernel computed on a flat synthetic surface mesh, showing a plateau corresponding to the cortical ribbon. Figure 5 shows on real cortical meshes three kernels with shapes strongly influenced by local anatomy. 2.3
Projecting the Functional Data
Once the convolution kernels were computed for the whole mesh, we then integrate the voxel intensities onto the surface nodes : since any surrounding voxel has an estimated influence on the node, the value s(ns ) assigned to the node ns is the result of a linear combination expressed as following : s(ns ) = ωs (vi ) · I(vi ) (1) Vi
where Vi is the voxels area within which influences on ns were evaluated, ωs (vi ) is the computed weight for voxel vi and I(vi ) is vi ’s intensity on the functional image. The method is designed so that the costly step of computing the masks is done only once by subject, then any functional image can immediatly be projected onto the cortical surface.
3
Results and Discussion
The method is applicable either to raw functional data, opening the way to cortical surface-based statistical analysis, or to the results of a standard fMRI analysis, as for instance volumic t-maps of activation detection. We tested our algorithm on images from a somatotopy experiment in which 17 subjects observed 6 different sensorimotor tasks, involving foot, elbow, thumb, index, little finger and tongue. EPI scans were then analyzed using SPM’99 with task/rest contrasts. Figure 6 presents a projection of a SPM volume-based t-map from elbow activation onto its corresponding cortical mesh, then displayed on inflated mesh, then illustrates differences between our projection and the results of a classical method which interpolates values within spheres centered at each node. In comparison to methods which interpolate values along normal directions, or within voxels intersecting the cortical ribbon [13], our projection takes account of more voxels in respect with local anatomy and their intensities influences are differential within the interpolation area. Relating voxels to surface nodes using geodesic distances is a relevant way to deal with the folded nature of cortex, rather than euclidean distances. However, the propagation of distance information as we perform it, giving each voxel a matching node on the surface from which geodesic and normal distances are computed, implies a geometrical coherence between these nodes and therefore robustness to errors in registrating the functional volumes to anatomy. On the contrary, the use of a closest point criterium (through geodesic distances calculation as in [14], or Vorono¨ı-diagrams definition as in [7]) makes methods sensitive to these errors since very slight
Anatomically Informed Convolution Kernels for the Projection of fMRI Data
305
Fig. 6. Projection of a SPM t-map onto its matching cortical mesh, presented on (top) original and inflated surfaces. (bottom) Detailed view of a same area : (left) using sphere interpolation (middle) using convolution kernels (right) difference texture between the two methods results.
translations between functional volumes and anatomy will get some particular voxels totally different matching nodes. This is observed in certain regions e.g. near sulcal bottom lines where curvature varies much and where several nodes are quasi-equidistant to some voxels. In these cases, our method gives implicit preference to nodes on the same side of sulcus as the node at the center of the current kernel. Assigning each voxel with weights results in overlapping convolution kernels from a node to another, which is different from parcelling the cortical ribbon into distinct cells. In our model, we consider a voxel’s scope is not limited to one node even its closest one. On this point, our technique gathers the advantages from overlapping interpolation spheres and anatomically-informed Vorono¨ı cells. On figure 6, a volumic SPM t-map was projected using our algorithm, displaying relevant motor activations, similar to results with interpolation within spheres. The difference texture between the two projections shows that our method assigns smaller values in the region (colored in blue) between the two depicted activations which are actually located each on one side of a gyrus. This illustrates its ability to separate different geodesically distant activations and to avoid mixing-up signals from different sources.
306
4
G. Operto et al.
Conclusion and Further Work
In this paper, we propose a method allowing the projection of functional images onto the cortical surface. In application to raw data, the produced surface-based representations of the BOLD images are essential in the framework of corticalrestricted statistical analysis methodologies, such as Cortical Surface Mapping [2]. Applying it to activation maps can serve visualisation purposes, as well as cortical localisation of activation foci through the use of a surface-based coordinate system [5,3], in comparison to localisation in 3D normalized spaces. This work will be extended to validation stages, e.g. evaluating the method robustness to functional/anatomical registration errors or its dependence on the mesh spatial resolution and on nodes positions, after which it will take part in future cortical-based functional localisation experiments.
References 1. http://brainvisa.info. 2. A. Andrade, F. Kherif, J.-F. Mangin, K. Worsley, A.-L. Paradis, O. Simon, S. Dehaene, and J.-B. Poline. Detection of fMRI activation using cortical surface mapping. Hum. Brain Mapp., 12:79–93, 2001. 3. C. Clouchoux, O. Coulon, D. Riviere, A. Cachia, J.-F. Mangin, and J. Regis. Anatomically constrained surface parameterization for cortical localization. In J. Duncan and G. Gerig, editors, LNCS - Medical Image Computing and ComputerAssisted Intervention - MICCAI 2005: 8th International Conference, volume 3750, pages 344–351. Springer-Verlag Berlin Heidelberg, october 2005. 4. T.G.M. Van Erp, V.Y. Rao, H.L. Tran, K.M. Hayashi, T.D. Cannon, A.W. Toga, and P.M. Thompson. Surface-based analysis of functional magnetic resonance imaging data. In J. Duncan and G. Gerig, editors, LNCS - Medical Image Computing and Computer-Assisted Intervention - MICCAI 2004: 7th International Conference. Springer-Verlag Berlin Heidelberg, 2004. 5. B. Fischl, M.I. Sereno, R. Tootell, and A.M. Dale. Cortical surface-based analysis, ii: Inflation, flattening, and a surface-based coordinate system. NeuroImage, 9:195– 207, 1999. 6. R. Goebel and W. Singer. Cortical surface-based statistical analysis of functional magnetic resonance imaging data. Neuroimage, 9:S64, 1999. 7. C. Grova, S. Makni, G. Flandin, P. Ciuciu, J. Gotman, and J.-B. Poline. Anatomically informed interpolation of fMRI data on the cortical surface. Neuroimage, in press, 2005. 8. P.B. Johnson, S. Ferraina, and R. Caminiti. Cortical networks for visual reaching. In Experimental Brain Research, volume 97, pages 361–365, 1993. 9. S.J. Kiebel, R. Goebel, and K.J. Friston. Anatomically informed basis functions. NeuroImage, 11(6.1):656–667, 2000. 10. J.-F. Mangin, V. Frouin, I. Bloch, J. R´egis, and J. L´ opez-Krahe. From 3D magnetic resonance images to structural representations of the cortex topography using topology preserving deformations. Journal of Mathematical Imaging and Vision, 5:297–318, 1995. 11. S. Mauch and D. Breen. A fast algorithm for computing the closest point and distance function. Technical report unpublished, CalTech, 2000.
Anatomically Informed Convolution Kernels for the Projection of fMRI Data
307
12. V.B Mountcastle. An organizing principle for cerebral function: The unit module and the distributed system. The mindful brain: Cortical organization and the group selective theory of higher brain function, page 7, 1978. 13. Z. Saad, C. Reynolds, B. Argall, S. Japee, and R.W. Cox. Suma: an interface for surface-based intra- and inter-subject analysis with afni. In Proc. IEEE Intl. Symp. Biomedical Imaging, page 1510, 2004. 14. J. Warnking, M. Dojat, A. Gurin-Dugu, C. Delon-Martin, S. Olympieff, N. Richard, A. Chhikian, and C. Segebarth. fmri retinotopic mapping–step by step. Neuroimage, 17(4):1665–1683, December 2002.
A Landmark-Based Brain Conformal Parametrization with Automatic Landmark Tracking Technique Lok Ming Lui1 , Yalin Wang1,2 , Tony F. Chan1 , and Paul M. Thompson2 2
1 Department of Mathematics, UCLA Laboratory of Neuroimaging, Department of Neurology, UCLA {malmlui, ylwang, chan}@math.ucla.edu,
[email protected]
Abstract. In this paper, we present algorithms to automatically detect and match landmark curves on cortical surfaces to get an optimized brain conformal parametrization. First, we propose an automatic landmark curve tracing method based on the principal directions of the local Weingarten matrix. Our algorithm obtains a hypothesized landmark curves using the Chan-Vese segmentation method, which solves a Partial Differential Equation (PDE) on a manifold with global conformal parameterization. Based on the global conformal parametrization of a cortical surface, our method adjusts the landmark curves iteratively on the spherical or rectangular parameter domain of the cortical surface along its principal direction field, using umbilic points of the surface as anchors. The landmark curves can then be mapped back onto the cortical surface. Experimental results show that the landmark curves detected by our algorithm closely resemble these manually labeled curves. Next, we applied these automatically labeled landmark curves to generate an optimized conformal parametrization of the cortical surface, in the sense that homologous features across subjects are caused to lie at the same parameter locations in a conformal grid. Experimental results show that our method can effectively help in automatically matching cortical surfaces across subjects.
1
Introduction
Parametrization of the cortical surfaces is an important process in the brain mapping research for data comparison. In this paper, we propose an automatic way to parametrize the cortical surface that matches important anatomical features. Important anatomical features on the cortical surface are usually represented by landmark curves, called sulcal/gyral curves. It is extremely time-consuming to label these landmark curves manually, especially when large dataset must be analyzed. We therefore propose an algorithm to detect these feature curves automatically. Given a global conformal parametrization of the cortical surface, we fix two endpoints, called the anchor points, based on umbilic points of the curvature field. We then trace the landmark curve by iteratively adjusting its path in the spherical or rectangular parameter domain of the cortex, along one of the two R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 308–315, 2006. c Springer-Verlag Berlin Heidelberg 2006
A Landmark-Based Brain Conformal Parametrization
309
principal direction fields. Using the parameterization, the landmark curves can be mapped back onto the cortical surface in 3D. To speed up the iterative scheme, we propose a method to obtain a good initialization by extracting high curvature regions on the cortical surface using the Chan-Vese segmentation method [1]. This involves solving a PDE (Euler-Lagrange equation) on the cortical manifold using the global conformal parametrization. Finally, we use these automatically labelled landmark curves to create an optimized brain conformal mapping, which can match important anatomical features across subjects. This is based on the minimization of a combined energy functional Enew = Eharmonic + λElandmark .
2
Previous Work
Automatic detection of sulci landmark curves on the brain has been widely studied by different research groups. Prince et al. [2] proposed a method for automated segmentation of major cortical sulci on the outer brain boundary. This is based on a statistical shape model, which includes a network of deformable curves on the unit sphere, seeks geometric features such as high curvature regions, and labels such features via a deformation process that is confined within a spherical map of the outer brain boundary. Lohmann et al. [3] proposed an algorithm that can automatically detect and attribute neuroanatomical names to the cortical folds using image analysis methods applied to magnetic resonance data of human brains. The sulci basins are segmented using a region growing approach. Zeng et al. [4] proposed a method to automatic intrasulcal ribbon finding, by using the cortex segmentation with coupled surfaces via a level set method, where the outer cortical surface is embedded as the zero level set of a high-dimensional distance function. Optimization of surface diffeomorphisms by landmark matching has been studied intensively. Gu et al. [5] proposed to optimize the conformal parametrization by composing an optimal M¨ obius transformation so that it minimizes the landmark mismatch energy. The resulting parameterization remains conformal. Joan et al. [6] proposed to generate large deformation diffeomorphisms of the sphere onto itself, given the displacements of a finite set of template landmarks. The diffeomorphism obtained can match the geometric features significantly but it is, in general, not a conformal mapping. Leow et al. [7] proposed a level set based approach for matching different types of features, including points and 2D or 3D curves represented as implicit functions. Cortical surfaces were flattened to the unit square. Nine sulcal curves were chosen and represented by the intersection of two level set functions They were used to constrain the warp of one cortical surface onto another. The resulting transformation was interpolated using a large deformation momentum formulation in the cortical parameter space, generalizing an elastic approach for cortical matching developed in Thompson et al. [8].
3
Basic Mathematical Theory
Firstly, a diffeomorphism f : M → N is a conformal mapping if it preserves the first fundamental form up to a scaling factor (the conformal factor).
310
L.M. Lui et al.
Mathematically, this means that ds2M = λf ∗ (ds2N ), where ds2M and ds2N are the first fundamental form on surfaces M and N , respectively and λ is the conformal factor [9]. Next, the normal curvature κn of a Riemann surface in a given direction is the reciprocal of the radius of the circle that best approximates a normal slice of the surface in that which varies in different directions. It follows that: direction, e f κn = vT IIv = vT v for any tangent vector v. II is called the Weingarten f g matrix and is symmetric. Its eigenvalues and eigenvectors are called principal curvatures and principal directions respectively. The sum of the eigenvalues is the mean curvature. A point on the Riemann surface at which the Weingarten matrix has the same eigenvalues is called an umbilic point [10].
Fig. 1. Left: Conformal parametrization of the cortical surface onto the 2D rectangle. Right: A single face (triangle) of the triangulated mesh.
4
Computation of Conformal Parameterization
It is well known that any genus zero Riemann surfaces can be mapped conformally to a sphere. For the diffeomorphism between two genus zero surfaces, we can get a conformal map by minimizing the harmonic energy [5]. For high genus surfaces, Gu et al. [11] proposed an efficient approach to parameterize surfaces conformally using a set of connected 2D rectangles. They compute a holomorphic 1-form on the Riemann surface, using concepts from homology and cohomology group theory, and Hodge theory. (See Figure 1 left , [12])
5
Algorithm for Automatic Landmark Tracking
In this section, we discuss our algorithm for automatic landmark tracking. 5.1
Computation of Principal Direction Fields from the Global Conformal Parametrization
Denote the cortical surface by C. Let φ : D → C be the global conformal parametrization of C where D is a rectangular parameter domain. Let λ be the conformal factor of φ. Similar to Rusinkiewicz’s work [13], we can compute the principal directions, and represent them on the parameter domain D. This is based on the following three steps:
A Landmark-Based Brain Conformal Parametrization
311
Step 1 : Per− Face Curvature Computation √1 0 λ Let u = and v = √1 be the directions of an orthonormal coordinate 0
λ
system for the tangent plane (represented in the parameter domain D). We can approximate the Weingarten matrix II for each face (triangle). For a triangle with three well-defined directions (edges) together with the differences in normals in those directions (Refer to Figure 1 right), we have a set of linear constraints on the elements of the Weingarten matrix, which can be determined using the least square method. Step 2 : Coordinate system Transformation After we compute the Weingarten matrix on each face in the (uf , vf ) coordinate system, we can average it with contributions from adjacent triangles. This can be done by transforming the Weingarten matrix tensor into the vertex coordinate frame. Step3 : Weighting For each face f which is adjacent to the vertex p, we take the weighting wf,p to be the area of f divided by the squares of the lengths of the two edges that touch the vertex p.
Fig. 2. Left :Sulcal curve extraction on the cortical surface by Chan-Vese segmentation. Right : Umbilic points are located on each sulci region, which are chosen as the end points of the landmark curves.
5.2
Variational Method for Landmark Tracking → − Given the principal direction field V (t) with smaller eigenvalues on the cortical surface C, we propose a variational method to trace the sulcal landmark curve iteratively, after fixing two anchor points (a & b) on the sulci. Let φ : D → C be the conformal parametrization of C, < ·, · > to be its Riemannian metric and → λ to be its conformal factor. We propose to locate a curve − c : [0, 1] → C with endpoints a and b, that minimizes the following energy functional: − Eprincipal (→ c) =
1 0
|
→ − c − − <→ c , → c >M
→ − 2 − − V ◦→ c |M dt =
1 0
|
→ − γ − |→ γ |
→ − − 2 − G (→ γ )| dt
312
L.M. Lui et al.
− − where → γ =→ c ◦ φ−1 : [0, 1] → D is the corresponding iteratively defined curve on the − → − → − − parameter domain; → G (− γ ) = λ(→ γ ) V (→ γ ); | · |2M =< ·, · >M and | · | is the (usual) length defined on D. By minimizing the energy E, we minimize the difference between − the tangent vector field along the curve and the principal direction field → V . The resulting minimizing curve is the curve that is closest to the curve traced along the principal direction. We can locate the landmark curves iteratively using the steepest descent algorithm.
5.3
Landmark Hypothesis by Chan-Vese Segmentation
In order to speed up the iterative scheme, we decided to obtain a good initialization by extracting the high curvature regions on the cortical surface using the Chan-Vese (CV) segmentation method [1,14]. We can extend the CV segmentation on R2 to any arbitrary Riemann surface M such as the cortical surface. We propose to minimize the following energy functional:
2
2
(u0 − c1 ) H(ψ)dS +
F (c1 , c2 , ψ) = M
(u0 − c2 ) (1 − H(ψ))dS + ν M
|∇M H(ψ)|M dS, M
where ψ : M → R is the level set function and | · |M = Euler-Lagrange equation becomes:
√
< ·, · >. The
√ ∂ζ 1 ∇ζ 2 2 = λδ(ζ)[ ν ·( λ ) − (w0 − c1 ) + (w0 − c2 ) ] ∂t λ ||∇ζ||
Now, the sulcal landmarks on the cortical surface lie at locations with relatively high curvature. To formulate the CV segmentation, we can consider the intensity term as being defined by the mean curvature. Sulcal locations can then be circumscribed by first extracting out the high curvature regions. Fixing two anchor points inside the extracted region, we can get a good initialization of the landmark curve by looking for a shortest path inside the region that joins the
Fig. 3. Automatic landmark tracking using a variational approach. Left : we trace the landmark curves on the parameter domain along the edges whose directions are closest to the principal direction field, which gives a good initial guess of the landmark curve. Landmarks curve is then evolved to a deeper region using our variational approach. Right : Ten sulci landmarks are automatically traced using our algorithm.
A Landmark-Based Brain Conformal Parametrization
313
two points. Also, we can consider the umbilic points inside the region as anchor points. By definition, an umbilic point on a manifold is a location where the two principal curvatures are the same. Therefore, we can fix the anchor points inside the region by extracting regions with a small difference in principal curvatures.
6
Optimization of Brain Conformal Parametrization
In brain mapping research, cortical surface data are often mapped conformally to a parameter domain such as a sphere, providing a common coordinate system for data integration [5,15,16]. As an application of our automatic landmark tracking algorithm, we use the automatically labelled landmark curves to generate an optimized conformal mapping on the surface, in the sense that homologous features across subjects are caused to lie at the same parameter locations in a conformal grid. This matching of cortical patterns improves the alignment of data across subjects. This is done by minimizing the compound energy functional Enew = Eharmonic + λElandmark , where Eharmonic is the harmonic energy of the parameterization and Elandmark is the landmark mismatch energy. Here, automatically traced landmark (continuous) curves are used and the correspondence are obtained using the unit speed reparametrization. Suppose C1 and C2 are two cortical surfaces we want to compare. We let f1 : C1 → S 2 be the conformal parameterization of C1 mapping it onto S 2 . Let {pi : [0, 1] → S 2 } and {qi : [0, 1] → S 2 } be the automatic labelled landmark curves, represented on the parameter domain S 2 with unit speed parametrization, for C1 and C2 respectively. Let h : C2 → S 2 be any homeomorphism from C2 onto S 2 . We define the landmark mismatch energy of h as: Elandmark (h) = n 1 1/2 i=1 0 ||h(qi (t)) − f1 (pi (t))||2 dt, where the norm represents distance on the sphere. By minimizing this energy functional, the Euclidean distance between the corresponding landmarks on the sphere is minimized.
Fig. 4. Left : The value of Eprincipal at each iteration is shown. Energy reached its steady state with 30 iterations, meaning that our algorithm is efficient using the CV model as the initialization. Right : Numerical comparison between automatic labelled landmarks and manually labelled landmarks by computing the Euclidean distance Edif f erence (on the parameter domain) between the automatically and manually labelled landmark curves.
314
L.M. Lui et al.
Fig. 5. Optimization of brain conformal mapping using automatic landmark tracking. In (A) and (B), two different cortical surfaces are mapped conformally to the sphere. In (C), we map one of the cortical surface to the sphere using our algorithm. (D), (E), (F) shows the average map of the optimized conformal parametrization using the variational approach with the automatically traced landmark curves. Observed that the important sulci landmarks are clearly shown.
7
Experimental Results
In one experiment, we tested our automatic landmark tracking algorithm on a set of left hemisphere cortical surfaces extracted from brain MRI scans, acquired from normal subjects at 1.5 T (on a GE Signa scanner). In our experiments, 6 landmarks were automatically located on cortical surfaces. Figure 2(left), shows how we can effectively locate the initial landmark guess areas on the cortical surface using the Chan-Vese segmentation. Notice that the contour evolved to the deep sulcal region. In Figure 2(right), we locate the umbilic points in each sulcal region, which are chosen as the anchor points. Our variational method to locate landmark curves is illustrated in Figure 3. With the initial guess given by the Chan-Vese model (we choose the two extreme points in the located area as the anchor points), we trace the landmark curves iteratively based on the principal direction field. In Figure 3 (left), we trace the landmark curves on the parameter domain along the edges whose directions are closest to the principal direction field. The corresponding landmark curves on the cortical surface is shown. Figure 3 (left) shows how the curve evolves to a deeper sulcal region with our iterative scheme. In Figure 3 (right), ten sulci landmarks are located using our algorithm. Our algorithm is quite efficient with the good intial guess using the CV-model. (See Fig 4 left) To compare our automatic landmark tracing results with the manually labeled landmarks, we measured the Euclidean distance Edif f erence (on the parameter domain) between the automatically and manually labelled landmark curves. Figure 4(right) shows the value of Edif f erence at different iterations for different landmark curves. Note that the value becomes smaller as the iterations proceed. This means that the automatically labeled landmark curves more closely resemble those defined manually as the iterations continue. Figure 5 illustrates the application of our automatic landmark tracking algorithm. We illustrated our idea of the optimization of conformal mapping using the automatically traced landmark curves. Figure 5 (a) and (b) show two different cortical surfaces being mapped conformally to the sphere. Notice that the alignment of the sulci landmark curves are not consistent. In Figure 5 (c), the same cortical surface in (b) is mapped to the sphere using our method. Notice
A Landmark-Based Brain Conformal Parametrization
315
that the landmark curves closely resemble to those in (a), meaning that the alignment of the landmark curves are more consistent with our algorithm. To visualize how well our algorithm can improve the alignment of the important sulci landmarks is, we took average of the 15 optimized conformal maps [14]. Figure 5 shows average maps at different angles. In (d) and (e), sulci landmarks are clearly preserved inside the green circle where landmarks are manually labelled. In (f), the sulci landmarks are averaged out inside the green circle where no landmarks are automatically detected. It means that our algorithm can help improving the alignment of the anatomical features.
8
Conclusion and Future Work
In this paper, we propose a variational method to automatically trace landmark curves on cortical surfaces, based on the principal directions. To accelerate the iterative scheme, we initialize the curves by extracting high curvature regions using Chan-Vese segmentation. This involves solving a PDE on the cortical manifold. The landmark curves detected by our algorithm closely resembled those labeled manually. Finally, we use the automatically labeled landmark curves to create an optimized brain conformal mapping, which matches important anatomical features across subjects. Surface averages from multiple subjects show that our computed maps can consistently align key anatomic landmarks. In future, we will perform a more exhaustive quantitative analysis of our algorithm’s performance, mapping errors and quantifying improved registration, across multiple subjects, on a regional basis.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
Vese, L., Chan, T. International Journal of Computer Vision 50 (2002) 271–293 Tao, X., Prince, J., Davatzikos, C. IEEE TMI 21 (2002) 513–524 Lohmann, G., Kruggel, F., von Cramon, D. IPMI 1230 (1997) 368–374 Zeng, X., Staib, L., Schultz, R., Win, L., Duncan, J. 3rd MICCAI (1999) Gu, X., Wang, Y., Yau, S.T. IEEE TMI 23 (2004) 949–958 Glaun` es, J., Vaillant, M., Miller, M. J. Maths. Imaging and Vision 20 (2004) 179–200 Leow, A., Yu, C., Lee, S., Huang, S., Protas, H., Nicolson, R., Hayashi, K., Toga, A., Thompson, P. NeuroImage 24 (2005) 910–927 Thompson, P., Woods, R., Mega, M., Toga, A. Human Brain Mapping 9 (2000) 81–92 Schoen, R., Yau, S. International Press (1997) Cipolla, R., Giblin, P.J. Cambridge University Press (2000) Gu, X., Yau, S. ACM Symp on Geom. Processing 2003 (2003) Wang, Y., Gu, X., Hayashi, K.M., Chan, T.F., Thompson, P.M., Yau, S.T. In: MICCAI. Volume II., Palm Springs, CA, USA (2005) 26–29 Rusinkiewicz, S. Symp. on 3D Data Processing, Vis., and Trans. (2004) Lui, L., Wang, Y., Chan, T.F. VLSM, ICCV (2005) Drury, H., Essen, D.V., Corbetta, M., Snyder, A. In:. Academic Press (1999) Fischl, B., Sereno, M., Tootell, R., Dale, A. In: Human Brain Mapping. Volume 8. (1999) 272–284
Automated Topology Correction for Human Brain Segmentation Lin Chen and Gudrun Wagenknecht Central Institute for Electronics, Research Center Juelich, Juelich, Germany
Abstract. We describe a new method to reconstruct human brain structures from 3D magnetic resonance brain images. Our method provides a fully automatic topology correction mechanism, thus avoiding tedious manual correction. Topological correctness is important because it is an essential prerequisite for brain atlas deformation and surface flattening. Our method uses an axis-aligned sweep through the volume to locate handles. Handles are detected by successively constructing and analyzing a directed graph. A multiple local region-growing process is used which simultaneously acts on the foreground and the background to isolate handles and tunnels. The sizes of handles and tunnels are measured, then handles are removed or tunnels filled based on their sizes. This process was used for 256 T1-weighted MR volumes.
1
Introduction
Representing the shape of complex anatomical regions such as human brain structures has always been a challenge. Recent advances allow detailed anatomical information to be derived from magnetic resonance imaging (MRI), but due to imaging noise artifacts, partial volume effects and intensity inhomogeneities, automatic extraction of topologically correct brain structures remains difficult, especially for regions in the cortex. Xu et al. [1] introduced automatic surface topology correcting which used fuzzy segmentation to produce an isosurface representation of the WM/GM interface. If the surface is not homeomorphic to a sphere, a median filter is used to iteratively smoothen the fuzzy segmentation until a surface with the correct topology is found. One drawback of this approach is that using a 3×3×3 median filter does not guarantee the correct topology. Several methods for correcting the topology of the cortical surface have recently been developed. The region growing model proposed by Kriegeskorte et al. [2] is adopted as a topology correction method. It starts from an initial point with the deepest distance to the surface, and then grows the point by adding points that will not change the topology (without self-touching). One drawback of this approach is that the result strongly depends on the order in which the points are grown from the growing points set, and another one is that it considers only foreground without any size criterion. Dale et al. [3] flattened a cortex isosurface by ignoring small handles, identifying and fixing topology defects on the inflated cortical surface, but leaving some large handles to be edited manually. More R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 316–323, 2006. c Springer-Verlag Berlin Heidelberg 2006
Automated Topology Correction for Human Brain Segmentation
317
recently, Fischl et al. [4] inflated the cortex isosurface to a sphere and identified the overlapping triangles on the spherical surface which corresponded to handles, then tessellated a new mesh as the topologically correct cortical surface. Shattuck and Leahy [5] and Xiao Han et al. [6] introduced graph-based methods for topology correction. Shattuck and Leahy examined the connectivity of 2D segmentations between adjoining slices to detect topological defects and minimally correct them by changing as few voxels as possible. One drawback of this approach is that the set of voxels which must be changed to correct the topology are necessarily oriented along the Cartesian axes. This set is henceforth also termed ”cut”. Building on their work, Han et al. developed an algorithm to remove all handles from a binary object under any connectivity. Successive morphological openings correct the segmentation at the smallest scale. This method is effective for small handles, but large handles may need to be edited manually. Wood et al. [7] proposed a different approach. Handles in the tessellation are localized by simulating wavefront propagation on the tessellation and are detected where the wavefronts meet twice. The size of a handle is the shortest non-separating cut along such a cycle, which helps retain as much fine geometrical detail of the model as possible. One drawback of this method is that it requires the arbitrary vertex tessellation to be converted to a volumetric image in order to correct the defect. Our method provides a fully automatic topology correction mechanism. The method is similar to Shattuck and Leahy’s [5], which is based on connectivity and graphs from sequences of slices. The innovation is that we provide a very quick iterative approach, which shows that corrections need not be oriented along cardinal axes. It can thus be done under any connectivity in 3D (6,18,26) without cutting the whole plane and changing as few voxels as possible by using a threshold.
2
Background and Definitions
Some basic concepts of digital topology are outlined in this section. Please refer to G. Bertrand [8] for details. The initial segmentation is a binary digital image X ∈ Z 3 composed of a foreground object F and an inverse background object B = F . According to the conventional definition of adjacency, three types of connectivity might be considered: 6-, 18- and 26-connectivity. In order to avoid topological paradoxes, different connectivities, n and n ¯ , must be used for F and B. The following four pairs of compatible connectivities - (6,26), (6,18), (26,6) and (18,6) - are generally used. Considering a digital object, computation of two numbers will suffice to check if modifying one voxel will affect the topology. These topological numbers, denoted Tn and Tn¯ , were introduced by Bertrand [8] as an elegant way to classify the topology type of a given voxel which will be used by the algorithm. The object is henceforth termed X and its inverse object X. Furthermore, the following definitions will also be used: Definition 1 (Simple point). a point x ∈ X that can be added or removed without changing the topology of an object. It is characterized by Tn (x, X) = Tn¯ (x, X) = 1.
318
L. Chen and G. Wagenknecht
Definition 2 (Border point). a point x ∈ X is called a n-border point, if and only if at least one of its n-neighbor x , x ∈ X. It is called a n ¯ -border point, if and only if x ∈ X and at least one of its n ¯ -neighbor x , x ∈ X. Definition 3 (Residue and body point). The algorithm generates different connected components and assigns voxels with different labels for each component. Residue points characterize voxels belonging to a component that will change the topology of the object. If their component will not change the topology, they are termed body points.
3
Methods
A method is described whereby the topology of a volumetric binary image can be altered so that an isosurface algorithm can generate a surface with the topology of a sphere, i.e., its genus is zero. The genus of a surface can be determined by its Euler characteristic [6] which is a topological invariant of the generated isosurface. However, it provides only a global criterion for topology correction, but no information about the position and size of the handles and tunnels. Therefore, the size and the location of both handles and tunnels need to be found. To that end, the topology of the object is analyzed while also analyzing its geometry. Two directed graphs are formed, one for foreground and one for background voxels which contain information about the volume. Based on the approach described below, this makes it possible to determine whether the foreground object is topologically equivalent to a sphere by counting the number of cycles in the graphs. The cycles also provide information about the size and location of handles or tunnels, namely, if the number of cycles of the foreground graph and background graph is zero, then the generated isosurface of the object has a spherical topology. Our approach is based on this criterion and can be summarized in the following four steps: 1. Sweep through the volume to encode the topology in two directed graphs for foreground and background. 2. Isolate handles using the directed graphs. 3. For each handle found, find the minimum number of voxels (minimal cut) which must be changed to correct the topology. 4. Run iterative multi-axis correction. 3.1
Connectivity Graph
A foreground connectivity graph G = {N, E} is created which contains important information about the structure of the binary image foreground. The object is examined along a selected cardinal axis, identifying the connected components within each slice perpendicular to the axis, using 2D 4-connectivity if the foreground uses 6-connectivity in 3D, and 2D 8-connectivity if the foreground uses 18- or 26-connectivity in 3D. Each in-slice-connected component is called a node in the graph. Next, it is determined how each node is connected to the nodes
Automated Topology Correction for Human Brain Segmentation
F
319
B
F
B
Fig. 1. An 8 × 3 × 5 binary volume with a torus, the corresponding slices and its connectivity graphs for foreground and background
F
B
Fig. 2. A 5 × 5 × 4 binary volume with a torus, the corresponding slices and its connectivity graphs for foreground and background
in the next slice. Two nodes are defined as having a directed connection if they are n-connected. Each connection in the object is represented by an edge in the graph. Note that one node may be connected to two or more nodes, and that these nodes are connected to one node in the next slice (Fig. 1). In the simplest case, one node will have multiple edges (separate connections) connecting it to one node in the next slice in the foreground. Such a representation is the simplest form of a topological handle existing in the object (Fig. 2). In this case, the handle is detected as the smaller one of these connections. The background connectivity graph is created likewise using n ¯ connectivity (Fig. 1 and 2). 3.2
Isolate Handles Using the Directed Graph
Sweeping through the graph, cycles corresponding to handles are detected successively. The approach is as follows: For each node which is connected with two or more nodes in the next slice, different labels are assigned to the following nodes. A cycle is detected by running a breadth-first search. The way that a cycle can form is when two nodes with these different labels in the previously examined slice are connected to a single node in the next slice (Fig. 3). Note that one cycle has 2 sides with 2 different labels. When a cycle with two sides is detected, the corresponding handle in the object must be isolated. Choosing the smaller one as the handle comes naturally (Fig. 4 a); i.e., in the real world, the handle of a cup or a tea pot, not the body, is chosen as the handle. Note that a node with k > 2 subsequent connected nodes may have k child cycles. Since the local genus of the surface is k − 1, we only need to cut k − 1 smaller handles to correct the topology. The advantage of thus
320
L. Chen and G. Wagenknecht
H1
H1
H2 H1
(a)
H1
H2 H1
H2
(b)
H2
H1
H3
H2
H1
H3
H2
(c) H1
H1 H1
H2 H2
H1
H2
Fig. 3. A sample illustration of handle isolation, from (a) to (c)
choosing handles is that the object need not be separated into several pieces and overlapping cycles. As per Definition 3, these detected handles divide the object into two kinds of components, the handles consisting of points called residue points, the other points being called body points. 3.3
Handle Cutting by Cycle Breaking
The basic idea of topology correction is dividing cycles into body and residue parts and discarding the smallest residue piece in each cycle in 3D. When these voxels are removed from the foreground, they are removed from the object. This is called a handle cut. Conversely, when voxels are removed from the background, they are added to the object, and hence a tunnel is filled. The handles are the good candidates for these residue parts. However, the thinnest cut of a cycle may not fall within the handle. Hence, we introduce residue part expansion (RPE), which is intended to catch the thinnest cut by iteratively growing the handle. This is done as follows (see Def. 1-3): Algorithm 1 (Residue Part Expansion) (RPE): 1. Find the handle set S of residue points that are n-adjacent to the body point set X, (Fig. 4 a-b). 2. Find the top and bottom connection sets X1 and X2 , where for each point x ∈ X1 or x ∈ X2 , x ∈ X and x is n-adjacent to S, (Fig. 4 c). 3. Add X1 to S. 4. Find the set T which is n-adjacent to X1 , where for each point x ∈ T , x ∈ X \ S and x is n-adjacent to X1 . 5. Compare the cardinalities of sets X1 and T . If #(X1 ) > #(T ), add T to S and replace X1 by T , go to step 4. Otherwise go to next step, (Fig. 4 d). 6. Deal with X2 in the same way as with X1 , (steps 3 - 5), (Fig. 4 c-d). 7. Deal with B in the same way as with F . Note that after RPE growing, each residual part S is a n-connected component which is n-adjacent to the body part X and a cut of it can reduce the genus by one. This is due to the fact that its growing by border points is equivalent to growing by adding simple points, and so will retain its topology after growing.
Automated Topology Correction for Human Brain Segmentation
321
6
X1
3
2 Handle
6
(a)
X2
(b) X1
X2
(d)
(c)
X1
X1 X2
X1 X2
X2 X2
X2
(e)
X1
S
X
(f)
(g)
(h)
Fig. 4. Illustration of RPE (a) - (d) and TCB (d) - (h) with 6-connectivity for the foreground
The main idea of cutting the residue part is to transfer as many points as possible from the residue back to the body without changing its topology. Specifically, sets X1 and X2 are the two cuts of the residue part after RPE, but they might not be the smallest ones. The body set is grown by successively detecting border points of X1 or X2 which lie within the residue, but only add those points (which are simple points) which come from the residue and do not adversely affect topology (see Def. 1-3). This is done as follows (see Fig. 4 d-h): Algorithm 2 (Topology cycle breaking) (TCB): 1. Compare the cardinalities of sets X1 and X2 . If #(X1 ) > #(X2 ), find the set T which is n-adjacent to X1 , where for each point x ∈ T , x ∈ S \ X1 and x is n-adjacent to X1 , add X1 to body set X. Otherwise go to step 4. 2. For each point x ∈ T , if x ∈ X2 , label x as a cut point, set #(X2 ) = #(X2 ) − 1, #(T ) = #(T ) − 1, 3. if #(T ) > 0, replace X1 by T , then go to step 1. Otherwise go to step 5. 4. Deal with X2 in the same way as with X1 (steps 1 - 3). 5. Deal with B in the same way as with F . Note that at each iteration, the largest set of simple points is added to the body. The labeling of cutting points ensures that the final residue points are not added back to the body and are positioned at the thinnest parts of the handles. 3.4
Iterative Multi-axis Correction
The nature of the changes described in the previous section depends on the axis along which the object is analyzed. In order to ensure only the smallest changes are made to the object, corrections are applied iteratively along each axis. A threshold is used, and only corrections less than or equal to the threshold are done. This multi-axis approach dramatically reduces the number of voxels which are added to or removed from the object set. The algorithm is outlined below.
322
L. Chen and G. Wagenknecht
Algorithm 3 (Iterative multi-axis correction) (IMC): 1. Set the threshold value to one. 2. Starting with the z axis, compute a foreground directed connectivity graph G = {N, E}. 3. Isolate each handle using the directed graph. 4. Find the handle cutting set T for each handle using RPE and TCB. 5. Remove the handle cutting set from the object if the cardinality of T is less than or equal to the threshold. 6. Repeat steps 2 − 5 for the x−axis and y−axis. 7. Repeat steps 2 − 6 for the background. 8. Repeat steps 2 − 7 until no further changes can be made at this threshold. 9. Iterate steps 2 − 8 with an increasing threshold until no further corrections can be made.
4
Results
The topological constraint algorithm was applied to 256 T1-weighted MR volumes. Image size after preprocessing was 160 × 200 × 160 voxels. Processing time for each volume was between 25-86 seconds (average 45 seconds) on an Intel Pentium IV 3.0-GHz CPU without considering the simplest case (Fig. 2). Processing time for each volume was between 4-10 minutes by considering this case, but fewer number of voxels were changed for topology correction. It should be noted that aspects of the implementation of this approach such as frequent graph
Fig. 5. WM/GM surface before and after topology corrections 20
18
Number of handles
16
14
12
10
8
6
4
2
0
0
10
20
30
40
50
60
70
80
90
100
Number of changed voxels
Fig. 6. The histogram shows the average number of handles for the detected handle sizes (thresholds) for the 256 data sets examined
Automated Topology Correction for Human Brain Segmentation
323
recalculation can be optimized, providing substantial decreases in computation costs. Each white matter volume was tessellated after topological correction and the genus verified using the Euler-Poincar´e formula [6]. In each case, the method produced a final volume with a tessellation having an Euler characteristic of two, corresponding to a genus of zero and homeomorphism to a sphere. Fig. 5 shows one part of a WM/GM surface before and after topology correction. Note that the algorithm was able to correct all volumes without subsampling and with very few changes to the membership set. This algorithm changed between 0.0086% and 0.202% of the voxels for each of the 256 volumes, with an average of 0.020%. 94.14% of the volumes change less than 0.05% voxels.
Acknowledgements We thank Dr. Gabriele Lohmann from Max Planck Institute for Human Cognitive and Brain Sciences who kindly provided us with the brain MR images. Thanks are also due to Alejandro Rod´ on for language editing.
References 1. Xu, C., Pham, D.L., Rettmann, M.E., Yu, D.N., Prince, J.L.: Reconstruction of the human cerebral cortex from magnetic resonance images. IEEE Trans. Med. Imag. 18(6) (1999) 467–480 2. Kriegeskorte, N., Goebel, R.: An efficient algorithm for topologically correct segmentation of the cortical sheet in anatomical mr volumes. NeuroImage 14 (2001) 329–346 3. Dale, A.M., Fischl, B., Sereno, M.I.: Cortical surface-based analysis i: Segmentation and surface reconstruction. NeuroImage 9 (1999) 179–194 4. Fischl, B., Liu, A., Dale, M.: Automated mainfold surgery: Constructing geometrically accurate and topologically correct methods of the human cerebral cortex. IEEE Trans. Med. Imag. 20 (2001) 70–80 5. Shattuck, D., Leahy, M.: Automated graph-based analysis and correction of cortical volume topology. IEEE Trans. Med. Imag. 20(11) (2001) 1167–1177 6. Han, X., Xu, C., Neto, U., Prince, J.: Topology correction in brain cortex segmentation using a multiscale, graph-based algorithm. IEEE Trans. Med. Imag. 21(2) (2002) 109–121 7. Wood, Z., Hoppe, H., Desbrun, M., schroeder, P.: Removing excess topology from isosurfaces. ACM Transactions on graphics 23 (2004) 190–208 8. Bertrand, G.: Simple points, topological numbers and geodesic neighborhoods in cubic grids. Pattern Recognition Letters 15 (1994) 1028–1032
A Fast and Automatic Method to Correct Intensity Inhomogeneity in MR Brain Images Zujun Hou1, , Su Huang2 , Qingmao Hu2 , and Wieslaw L. Nowinski2 1 2
Dept. of Interactive Media, Institute for Infocomm Research, Singapore 119613 Biomedical Imaging Lab, Singapore Bioimaging Consortium, Singapore 138671
Abstract. This paper presents a method to improve the semi-automatic method for intensity inhomogeneity correction by Dawant et al. through introducing a fully automatic approach to reference points generation, which is based on order statistics and integrates information from the fine to coarse scale representations of the input image. The method has been validated and compared with two popular methods, N3 and BFC. Advantages of the proposed method are demonstrated.
1
Introduction
Intensity inhomogeneity (IIH) inherent in Magnetic Resonance (MR) images can substantially reduce the accuracy of segmentation and registration. With the advent of multichannel phased array coils and 3T units, the importance and prevalence of this problem is greatly increased, and a near-real time IIH correction is of primary importance. A wealth of techniques has been devised to correct this artifact. The correction has been dealt prospectively by phantom scanning [1] or theoretical modeling [2,3]. More often, the issue is addressed retrospectively by deriving the solution from image data alone. Most of the mathematical modeling methods for IIH correction fall into the following three categories: low-pass filtering [4,5], surface fitting [6,7] and statistical modeling [8,9]. For a latest review, the interested readers are referred to [10]. In [11], a comprehensive comparison among six algorithms has been made and concludes that none of the six methods can perform ideally under all experiments that were carried out. In this paper, we present an improvement of the semiautomatic technique by [6], which fits a surface model among a few reference points selected from the white matter (WM). Sled et al. [12] has compared this simple method with the N3 [9] and the expectation maximization (EM) [8] method using simulated T1, T2 and PD weighted images. It is found that this method is comparable with the N3 method and superior to the EM method. In particular, its performance is encouraging in T1 weighted images. Nevertheless, the requirement of expert supervision on reference points generation significantly reduces the efficiency of this method. As an attempt to enhance this simple
The work is mainly done during his stay with the Biomedical Imaging Lab of Singapore Bioimaging Consortium.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 324–331, 2006. c Springer-Verlag Berlin Heidelberg 2006
A Fast and Automatic Method to Correct IIH in MR Brain Images
325
method, an approach is proposed in this paper to automatically generate the reference points. The proposed method is demonstrated to be as fast as low-pass filtering methods, whereas as accurate as two popular methods, the N3 and the BFC method. The rest of the paper is organized as follows. The method is described in Section 2, where an approach based on order statistics to automatically generating the reference points is detailed. Validation of the method as well as comparison with state-of-art methods are presented in Section 3. Finally, the paper concludes in Section 4.
2
An Approach to Automatic Reference Points Generation
In this section, we shall present an approach to automatic generation of the reference points. The method is based on order statistics and integrate information from fine to coarse scale. Similar to many other IIH correction methods, the input of the system is a brain MR image with the skull/scalp removed. Several public domain software packages are available, such as the BrainSuit package (http://neuroimage.usc.edu/). Fig. 1 Row 1 shows an example, where on the left is an original scan and in the middle is the tissue after skull/scalp removal. The subsequent operations will be limited to the region with the tissue present. To offset the impact of noise and partial volume effect, the reference points will be selected from a low-resolution image and the intensities will be determined from a high-resolution image. It is assumed that a point in the low-resolution image corresponds to a block of points in the high-resolution image. Thus, the identification of reference points in the low-resolution image is equivalent to identifying reference blocks in the high-resolution image. 2.1
Within-Block Homogeneity and the Trimmed Range Statistic
The first problem is to test the following hypothesis: H0 : For a given block, there is only one type of tissue. The test is based on order statistics as described below. The image is raster-scanned and the trimmed range statistic is calculated block by block. Within a block, suppose that there are L tissue pixel/voxels. Firstly, let rank the pixel/voxels in a non-decreasing order and denote by x{l} , l = 1, · · · , L. Next, the samples are normalized as follows: x ˜{i} = x{i} /(2x{L/2} ).
(1)
Let 1 < j < k < L, then the trimmed range statistic rj,k is computed by rj,k = x ˜{k} − x ˜{j} .
(2)
326
Z. Hou et al.
Fig. 1. An illustration of basic steps of the proposed method. Row 1: original image (left), result after skull/scalp removal (middle) and result after within block homogeneity testing (right); Row 2: result after inter-block homogeneity testing (left), result after between block homogeneity testing (middle) and result after downsampling (right).
If rj,k < Trange for a given threshold Trange , then H0 will be accepted, i.e., the block is dominated by one tissue and its center is considered as a reference point candidate. To make the statistic tolerant with possible outliers due to noise, tissue abnormality, limited tissue voxels of other types, etc., the values at two ends are excluded in computation, which is similar to the trimmed mean in deriving a robust mode estimate from samples. Readers interested in robust statistics as well as its application to image processing can refer to [13] for more detail. For the image in Row 1, Fig. 1, the result after this step is shown in Row 1 (right). 2.2
Inter-blocks’ Median Homogeneity
Since the IIH field is slowly varying, it is reasonable to assume continuity between the median of a block and those of its neighboring blocks. The constraint on median’s continuity analyzes the image in the neighborhood of block, which in scale space is coarser than the scale characterized by block size. In this coarser scale, it is assumed that the variation of IIH is also small, thus, oscillation between the the median of a block with respect to those of its neighbors would be attributed to the intensity variation of tissues: small variation means a possibly common tissue type and large oscillation indicates different tissue types, the former of which will be preserved and the latter will be suppressed by a thresholding. For the sample image in Fig. 1, the result after inter-block homogeneity testing is displayed in Row 2 (left) of the same figure. Compared with the image in Row 1 (right), some dark blocks adjacent to bright blocks are removed. It is noted that there exist some blocks which look “grey” and adjoin some bright blocks. Examination of the neighboring slices reveals that most voxels in these blocks look pretty “white”, thus, the actual median oscillation within these blocks is not as significant as what appears in this slice.
A Fast and Automatic Method to Correct IIH in MR Brain Images
2.3
327
Between-Block Homogeneity
As aforementioned, a single candidate block can be dominated by a single type of tissue, but different blocks may contain tissue of different types. For the block candidates after last step, large oscillation in neighboring blocks could have been suppressed, and separated blocks with distinct tissue types could have been kept. To identify these tissue types and extract a sample of candidate blocks with the same type of tissue, one can resort to examining the distribution of these blocks’ medians. To extract the primary component, one can use a trimming procedure, which has been used in the previous section to test the tissue homogeneity within a block. Given a median histogram, let h(i), i = 0, · · · , 255 denote the frequency of gray level. The P rL percentile to the left is defined by P rL = T0 modeL h[i] and 255 similarly the P rR percentile to the right is P rR = TmodeR h[i]. Then the mode analysis is to determine the range (TmodeL , TmodelR ) for given parameters P rL and P rR . In this investigation, the parameters are fixed: P rR = 0, P rL = 0.5 for T1 weighted images, and P rR = 0.1, P rL = 0 for T2 or PD weighted images. As an instance, the median’s histogram of the example in Fig. 1 is shown in Fig. 2. After thresholding, the blocks with median falling outside the range are removed. The remained blocks are displayed in Fig. 1 Row 2 (middle). Clearly, most blocks belong to WM and the IIH effect is also visible in the extracted blocks. 0.025
0.02
h(r)
0.015
0.01
0.005
0
0
50
100
150
200
250
300
r
Fig. 2. The histogram of blocks’ median for the example in Fig. 1 after inter-block homogeneity testing
2.4
Downsampling
After the above processing, the identified blocks usually constitutes a large portion of a contiguous tissue. However, it would suffice to approximate the smooth IIH function with a few sampled points. Then the number of blocks can be further reduced by downsampling the blocks, which can also speed up the computation for data regression in the following section. This process starts with a raster-scan, and if a candidate block is encountered, then its immediate neighboring (26-neighborhood in 3D) candidate blocks will be deleted. On the bottom right
328
Z. Hou et al.
of Fig. 1 is the result after this step. It should be pointed out that the image is 3D and only one slice is shown in this picture. Many blocks in the lower left are deleted because there are neighboring blocks that are preserved in neighboring slices.
3
Experiments and Results
To evaluate the performance of the proposed method, a number of experiments have been carried out using simulated and real scans. The simulated data are from the normal brain database, BrainWeb [14], The simulated sequences used in this study are T1, T2 and PD-weighted, with noise varying from 0 to 9% and IIH in 0, 20% and 40% level. The images are of size 181 × 217 × 181, 8-bit per voxel and 1 mm slice thickness. The proposed method has been compared with respect to N3 [9] and BFC [15], The parameters for both methods are tuned according to those suggested in the software, so that the results are consistent with those published by other groups. To quantify the performance of an IIH correction algorithm, the commonly used criteria, the coefficient of variation (cv) and the coefficient of joint variation (cjv) [7], are adopted in this study. As aforementioned, it is assumed that the presence of IIH increases the tissue variance, thus it is expected that an algorithm attaining smaller value of cv or cjv would be better. Table 1 shows the cv of GM, the cv of WM and the cjv between WM and GM by the three methods testing on BrainWeb data sets, where some outliers are highlighted. For T2 and PD weighted images, only results with 3% noise are displayed. One can observe that the performance of H3 is generally comparable Table 1. Summary on correction of BrainWeb data sets, where some outliers are highlighted and the noise level for T2 or PD weighted images is 3%
0
3
9
T2
PD
IIH 0 20 40 0 20 40 0 20 40 0 20 40 0 20 40
Orig. 9.05 10.25 12.89 9.91 11.01 13.48 15.06 15.70 17.40 17.71 18.44 20.30 4.49 6.27 9.72
cv(GM)(%) N3 BFC 9.03 9.09 9.07 9.14 9.11 10.23 9.91 9.96 9.91 10.01 9.93 11.34 15.28 15.15 15.45 15.16 15.45 15.56 17.72 18.14 18.23 18.80 19.81 20.63 4.77 5.52 4.81 5.15 4.95 6.24
H3 9.02 9.04 9.11 9.9 9.91 10.07 15.05 15.03 15.05 17.90 18.45 19.47 5.32 5.58 5.69
Orig. 4.28 6.05 8.78 5.22 6.74 9.24 10.12 10.76 12.29 11.95 12.03 13.32 4.88 5.49 7.50
cv(WM)(%) N3 BFC H3 4.29 4.20 4.20 4.30 4.31 4.32 4.43 5.38 4.42 5.18 5.16 5.13 5.20 5.25 5.29 5.24 6.4 5.57 10.23 10.14 10.08 10.09 10.00 9.99 9.86 10.18 9.83 11.93 12.01 11.87 11.89 12.08 11.75 12.93 13.75 12.13 4.88 5.89 5.17 4.78 4.97 5.11 4.76 5.05 5.13
cjv(GM,WM)(%) Orig. N3 BFC H3 44.85 44.95 44.90 44.98 52.20 45.14 44.85 45.26 66.02 45.56 50.32 46.75 51.56 51.81 51.84 52.30 57.05 51.67 51.29 52.03 69.33 51.67 57.76 53.29 88.04 96.35 89.13 89.71 86.64 98.00 86.30 87.66 91.76 96.47 84.67 88.79 83.19 83.58 87.79 85.85 90.88 89.28 94.71 90.30 106.39 102.81 111.83 99.03 64.89 71.29 76.38 82.68 94.97 71.96 74.24 86.07 163.00 74.16 91.38 87.42
A Fast and Automatic Method to Correct IIH in MR Brain Images
329
with that of N3 or BFC. It is also noted that the findings here are consistent with those in [12], where the Dawant et al. method [6] is compared with N3. Thus, it can be concluded that at least for BrainWeb data set the automatically generated reference points approximate the IIH map very closely with that by the manual selection. Since N3 and H3 are run in the same platform, the system CPU time by the two methods has been compared. For the images tested above, the average CPU time by H3 is 0.9 second with deviation 0.03 second, which indicates that H3 can attain nearly real time IIH correction in spite of data quality. Correspondingly, N3 runs on average in 17 seconds with deviation about 33 seconds. Including time for file loading and output, H3 executes generally in less than 2 seconds. 0.045
Orig BFC H3 N3
0.04
0.035
0.03
h(r)
0.025
0.02
0.015
0.01
0.005
0
0
50
100
150
200
250
300
r
Fig. 3. Comparison of histograms on a real scan. The flat histogram due to IIH before correction and the effect of histogram sharpening by the three methods are clearly visible.
Besides the simulated images, the proposed method has been applied to about 21 real scans affected by IIH artifact. The data are acquired from Singapore and Japan using T1-weighted spin echo (SE) and spoiled gradient recalled echo (SPGR) sequences. The voxel size ranges from 0.898 × 0.898 × 1.238 mm3 to 1.0 × 1.0 × 3.5 mm3 . In general, image contrast can substantially be improved by the three IIH correction methods, and their effects are very similar. Due to space limit, only one example is shown here. Fig. 3 plots the histogram of a volume (star dotted line), where the overlapping between the GM and the WM components is obvious, as results from the increase of tissue variances (the variance of GM and the variance of WM) due to IIH. In the histograms after IIH correction by three methods, BFC (solid line), H3 (dashed line) and N3 (circle dash-dotted line), the reduction on tissue variances is visible and the multi-modal shape for the tissue distribution is recovered. As an attempt to quantify the correction for real data, a number of WM regions in the corona radiata and corpus callosum have been segmented manually on 52 slices for this volume. Fig. 4 plots the variation of segmented WM mean value over slices (from superior to inferior) before and after IIH correction by
330
Z. Hou et al. 105
100
95
intensity
90
Orig H3 BFC N3
85
80
75
70
65
60
55 50
60
70
80
90
100
110
slice number
Fig. 4. Variation of segmented white matter mean value vs slices (from superior to inferior) before (star dotted line) and after IIH correction by H3 (dashed line), BFC (solid line) and N3 (circle solid line). H3 attains minimal reduction on the mean value variation across slices due to intensity inhomogeneity.
three methods. After IIH correction by the three methods, one can see that there are little improvement on the variation pattern by N3 (circle solid line) or BFC (solid line), whereas the variation becomes much smoother by H3 correction (dash-dotted line).
4
Conclusion
We have presented a method to correct intensity inhomogeneity arising in brain MR images. The method automatically identifies the reference points in a coarse scale based on order statistics information, followed by surface fitting and inhomogeneity correction. It has been validated using simulated T1, T2 and PD weighted images as well as real scans. Experiments turn out that the presented method is capable of achieving intensity inhomogeneity correction in nearly real time as realized by the simple low-pass filtering methods, but with accuracy comparable with more complex methods such as N3 and BFC.
References 1. Wicks, D. A. G., Barker, G. J. and Tofts, P. S.: Correction of intensity nonuniformity in MR images of any orientation. Magnetic Resonance Imaging 11 (1993) 183-196. 2. Alecci, M., Collins, C. M., Smith, M. B. and Jezzard, P.: Radio frequency magnetic field mapping of a 3 Tesla birdcage coil: experimental and theoretical dependence on sample properties. Magnetic Resonance in Medicine 46 (2001) 379-385. 3. Sutton, B. P., Noll, D. C. and Fessler, J. A.: Fast, iterative image reconstruction for MRI in the presence of field inhomogeneities. IEEE Trans. Medical Imaging 22 (2003) 178-188.
A Fast and Automatic Method to Correct IIH in MR Brain Images
331
4. Brinkmann, B. H., Manduca, A. and Robb, R. A.: Optimized homomorphic unsharp masking for MR grayscale inhomogeneity correction. IEEE Trans. Medical Imaging 17 (1998) 161-171. 5. Gispert, J. D., Reig, S., Pascau, J., Vaquero, J. J., Garcia-Barreno, P. and Desco, M.: Method for bias field correction of brain T1-weighted magnetic resonance images minimizing segmentation error. Human Brain Mapping 22 (2004) 133-144. 6. Dawant, B. M., Zijdenbos, A. P. and Margolin, R. A.: Correction of intensity variations in MR images for computer-aided tissue classification. IEEE Trans. Medical Imaging 12 (1993) 770-781. 7. Likar, B., Viergever, M. A. and Pernus, F.: Retrospective correction of MR intensity inhomogeneity by information minimization. IEEE Trans. Medical Imaging 20 (2001) 1398-1410. 8. Wells, III, W. M., Grimson, W. E. L., Kikinis, R. and Jolesz, F. A.: Adaptive segmentation of MRI data. IEEE Trans. Medical Imaging 15 (1996) 429-442. 9. Sled, J. G., Zijdenbos, A. P. and Evans, A. C.: A nonparametric method for automatic correction of intensity nonuniformity in MRI data. IEEE Trans. Medical Imaging 17 (1998) 87-97. 10. Hou, Z.: A review on MR image intensity inhomogeneity correction, International Journal of Biomedical Imaging, special issue on mathematical modeling (in press). 11. Arnold, J. B., Liow, J., Schaper, K. A., Stern, J. J., Sled, J. G., Shattuck, D. W., Worth, A. J., Cohen, M. S., Leahy, R. M., Mazziotta, J. C. and Rottenberg, D. A.: Qualitative and quantitative evaluation of six algorithms for correcting intensity nonuniformity effects. NeuroImage 13 (2001) 931-943. 12. Sled, J. G., Zijdenbos, A. P. and Evans, A. C.: A comparison of retrospective intensity non-uniformity correction methods for MRI. Lecture Notes in Computer Science 1230 (1997) 459-464. 13. Hou, Z. and Koh, T. S.: Robust edge detection. Pattern Recognition 36 (2003) 2083-2091. 14. Collins, D. L., Zijdenbos, A. P., Sled, J. G., Kabani, N. J., Holmes, C. J. and Evans, A. C.: Design and construction of a realistic digital brain phantom. IEEE Trans. Medical Imaging 17 (1998) 463-468. 15. Shattuck, D. W., Sandor-Leahy, S. R., Schaper, K. A., Rottenberg, D. A. and Leahy, R. M.: Magnetic resonance image tissue classification using a partial volume model. NeuroImage 13 (2001) 856-876.
A Digital Pediatric Brain Structure Atlas from T1Weighted MR Images Zuyao Y. Shan1, Carlos Parra2, Qing Ji1, Robert J. Ogg1, Yong Zhang1, Fred H. Laningham3, and Wilburn E. Reddick1,2 1
Division of Translational Imaging Research, Department of Radiological Sciences, St. Jude Children’s Research Hospital, Memphis, Tennessee 38105, USA 2 Department of Biomedical Engineering, The University of Memphis, Memphis, TN 38152, USA 3 Division of Diagnostic Imaging, Department of Radiological Sciences, St. Jude Children’s Research Hospital, Memphis, Tennessee 38105
[email protected]
Abstract. Human brain atlases are indispensable tools in model-based segmentation and quantitative analysis of brain structures. However, adult brain atlases do not adequately represent the normal maturational patterns of the pediatric brain, and the use of an adult model in pediatric studies may introduce substantial bias. Therefore, we proposed to develop a digital atlas of the pediatric human brain in this study. The atlas was constructed from T1-weighted MR data set of a 9-year old, right-handed girl. Furthermore, we extracted and simplified boundary surfaces of 25 manually defined brain structures (cortical and subcortical) based on surface curvature. We constructed a 3D triangular mesh model for each structure by triangulation of the structure’s reference points. Kappa statistics (cortical, 0.97; subcortical, 0.91) indicated substantial similarities between the mesh-defined and the original volumes. Our brain atlas and structural mesh models (www.stjude.org/brainatlas) can be used to plan treatment, to conduct knowledge and model-driven segmentation, and to analyze the shapes of brain structures in pediatric patients.
1 Introduction Atlases based on MR images provide more useful information than do anatomy books in activities such as the planning of neurosurgical interventions, because the medical team can visualize the brain three-dimensionally, because data are available electronically, and because anatomic shape is based on in vivo images rather than fixed-brain slices [1]. Several atlases of human brain structures based on MR images have been developed for different applications, for example, an early digital threedimensional brain atlas that was commercially available for the teaching of neuroanatomy [2]. Because this atlas was intended primarily for teaching, it was relatively difficult to use interactively, and the data set from which the atlas was constructed was not accessible for further development. Furthermore, the images that were used had been derived from a patient with localized pathology [3]. Also available commercially was a computerized brain atlas developed specifically for R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 332 – 339, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Digital Pediatric Brain Structure Atlas from T1-Weighted MR Images
333
brain mapping [4]. For use in surgical planning, researchers at the Surgical Planning Laboratory at Brigham and Women’s Hospital (Boston) developed a web-interactive and highly detailed brain atlas, which was very good for visualizing brain structures (splweb.bwh.harvard.edu). One of the most prominent examples of brain atlases would be the one that will be constructed by the International Consortium for Brain Mapping (ICBM). In the final product of this project, a human brain atlas will be constructed from MR images, CT data, and post-mortem frozen sections sliced with giant cryomicrotoms. Although the labor-intensive labeling is not finished yet, the consortium has provided an anatomic template based on the MR images of a single subject (www.loni.ucla.edu/ICBM). Due to the limited length of paper, other brain atlases, including atlas based on singlesubject brain from Montreal Neurological Institute (MNI) [5], were not reviewed here. Many atlases have been developed for different applications; however, to the best of our knowledge, no digital atlas of pediatric brain structures has been available until now. A pediatric atlas is needed because the human brain continues to develop throughout childhood and adolescence [6] and the use of an adult template in pediatric neuroimaging studies may introduce significant bias [7]. Kates et al. [8] revised the original Talairach atlas grid to increase its applicability to pediatric brains. Revising the sectors to fit the brains of children resulted in the same or better accuracy, i.e. the sensitivity for lobes improved to 0.79~0.95 compared with manual tracing. However, for small structures such as brain stem, the mean volume difference between automatic and manual results was 59.4%. A possible reason could be due to the fact that the accuracy of Talairach system has been limited by its fixed grid numbers. Therefore, studies described above such as ICBM have been performed to construct adult brain atlas other than the Talairach system. Similarly, we proposed to develop a pediatric brain atlas in this study.
2 Materials and Method 2.1 MR Image A three-dimensional T1-weighted MR image of a Caucasian, 9-year old, right-handed girl was used in this study. The weight and height of the girl were 27.27 Kg and 132 cm that place her at the 50 and 40 percentile for height and weight in healthy children population, respectively (www.cdc.gov/growthcharts). The girl had no history of psychiatric, neurological, or learning problems. The MR imaging had been conducted under an approved protocol and with written informed consent from the patient, parent, or guardian, in accordance with guidelines of the St. Jude Children's Research Hospital Institutional Review Board, the National Cancer Institute, and the Office for Human Research Protection. Retrospectively using the MR data for our study has been approved by our Institutional Review Board. The MR data were acquired by using a 1.5-T Symphony (Siemens Medical System, Iselin, NJ) whole-body imager with the standard circular polarized volume head coil. A three-dimensional MPRAGE T1-weighted sequence was used. Sagittal images were acquired by using the following imaging parameters: TR = 1800 ms, TE = 2.74 ms, flip angle = 15˚, FOV = 210 mm × 210 mm, number of slices (contiguous) = 128, slice thickness = 1.25 mm, and matrix = 512 × 512.
334
Z.Y. Shan et al.
We spatially aligned the MR images to a pediatric template by using an affine registration algorithm. The registration algorithm was adopted from the affine registration method provided by SPM2 that is freely available on the web (www.fil.ion.ucl.ac.uk/spm). The object function used was normalized mutual information [9] and was maximized using Powell’s golden search algorithm [10]. The pediatric brain template was available from Cincinnati Children’s Hospital Medical Center (www.irc.cchmc.org) [7]. Templates are available for three age groups: young children (aged 5-9.5 years old), older children (aged 9.6-12.9 years old), and adolescents (aged 13-18.8 years). We used the pediatric template of young children. Finally, we resampled the MR images into isotropic voxels of 1 mm × 1 mm × 1 mm by using a trilinear algorithm. 2.2 Brain Structure Delineation Brain structures were manually delineated by an experienced investigator with the help of anatomic books [11], previous reports [12, 13] and web-based interactive atlases of adult brains (www9.biostr.washington.edu/da.html and splweb.bwh. harvard.edu). The custom in-house software package used to delineate the structures allowed us to draw structures on sagittal and transverse slices and to view them in three orthogonal slices simultaneously. All 25 brain structures were then reviewed by an experienced radiologist for accuracy, and modified where necessary. Left and right hemispheres were divided by the longitudinal fissure. Boundaries for subcortical structures were identical to those described in the references. However, white matter (WM) was included in the lobe segmentation, resulting in boundaries that differ slightly from those previously reported. Our rationale for including the WM in lobe segmentation is addressed in the Discussion. Frontal Lobe. The boundaries for the frontal lobe were defined as follows. The lateral and anterior boundaries were defined by the natural limits on every image section. On sections of the superior surface where the central sulcus intersects the longitudinal fissure, the posterior boundary was set at the central sulcus. The inner boundary was defined by the longitudinal fissure (Fig. 1a). On sections that are inferior to the ones listed above, but superior to the sections containing ventricles, the posterior boundary were central sulcus and a vertical imaginary plane that connected the end of central sulcus and the longitudinal fissure or cingulate cortex. The inner boundary was defined as the longitudinal fissure or cingulate cortex (Fig. 1b). On sections that are inferior to the ones listed above, but superior to the sections containing insula, boundaries were similar to those in the previous sections except where the corpus callosum or ventricle starts. In these sections, the inner boundary was defined by the longitudinal fissure, cingulate cortex, corpus callosum, and ventricle (Fig. 1c). On sections containing the insula, the inner and posterior boundaries were defined by the insula, an imaginary plane that connected the anterior tip of the insula and the posterior tip of the corpus callosum, the cingulate cortex, and the longitudinal fissure (Fig1. d-g). Parietal Lobe. The parietal lobe (PL) was defined by using the following boundaries: The anterior boundary was defined by the central sulcus and an imaginary plane that dropped vertically from the end of the central sulcus (Fig. 1b, c, and d). The superior
A Digital Pediatric Brain Structure Atlas from T1-Weighted MR Images
335
Fig. 1. Manual delineation of brain structures. The manually defined volumes of interest are illustrated on selected transverse slices. The representative slices are selected from 256 continuous slices, and the structures on the left hemisphere were selected for illustration purpose.
boundary was defined by the natural limit on every image section. The inferior boundary was defined by the lateral sulcus and an imaginary plane that extended horizontally from the point where the lateral sulcus changes from a horizontal to a vertical orientation (Fig. 1e). The posterior boundary was defined by the parietaloccipital sulcus (Fig 1. b and c) and an imaginary plane that projected from the superior tip of the parietal-occipital sulcus to the preoccipital notch on image sections where the parietal-occipital sulcus disappears (Fig. 1. d-f). Because the preoccipital notch was very difficult to identify on the MR image, we used the posterior tip of the inferior temporal gyrus as an approximation of its location. The inner boundary of the PL was defined by the longitudinal fissure (Fig. 1a), cingulate cortex (Fig. 1b), ventricle and corpus callosum (Fig. 1c), and the insula and an imaginary plane that projected from the posterior tip of the insula to the ventricle (Fig. 1d). Temporal Lobe. The temporal lobe was defined by using the following boundaries. The superior and posterior boundaries were defined by the inferior boundary of the PL described above. The inner boundary was defined by the insula, an imaginary plane that projected from the posterior tip of the insula to the ventricle (Fig. 1e), the ventricle, the corpus callosum, and the hippocampus. The inferior boundary was defined by the natural limit within each imaging section. Occipital Lobe. The occipital lobe was defined by using the following boundaries. The anterior boundary was defined by the posterior boundary of the PL described
336
Z.Y. Shan et al.
above. The superior, posterior, and inferior boundaries were defined by the natural limit within each imaging section. 2.3 Triangular Mesh Models The manually defined structures were further developed into individual triangulated mesh models. The boundaries of structures were extracted by using a 26-member neighborhood. The boundaries were then simplified on the basis of the surface curvature. The boundaries for each structure were scanned in order (X, Y, and Z coordinates). The first voxel encountered was defined as the vertex of the boundaries and as the first focal point. For each focal point (s), the other boundary points (N = N – {s}, N is the set of all boundary points) were first divided into groups of neighborhoods with different radiuses (ri) defined by the surface distance from points to the focal point. Then, beginning with the group that has the smallest radius, if the normal angle of the group with a radius of ri (Ng) was smaller than the threshold angle, all points in this group were removed (boundary points N were updated here, N = N - Ng) and the group with a radius of ri+1 was evaluated. If the normal angle of the group was equal to or larger than threshold angle, the point with the smallest X, Y, and Z coordinates in this group was selected as the next focal point. These steps were repeated until a focal point was reached that had no neighbors (N = Ø), regardless of the distance. The normal angle of a group is defined as the average angle between the directions of any pairs of norms within this group:
⎧ ⎡ 1 ⎪⎪ −1 ⎢⎢ α g = ∑ 2 ⎨cos ⎢ i , j =1 C m ⎪ ⎢⎢ ⎪⎩ ⎣ m
JJJG JJJJG JJJG JJJJJG ⎤ ⎫ ( sni × sni +1 ) • ( sn j × sn j +1 ) ⎥ ⎪⎪ JJJG JJJJG JJJG JJJJJG ⎥⎥ ⎬ sni × sni +1 sn j × sn j +1 ⎥ ⎪⎪ ⎦⎥ ⎭
(1)
where s is the focal point, ni and nj are points in this group, m is the number of the points in this group, “|| ||” is the vector length operator, and Cm2 is the number of possible angles between normal direction. The threshold angle determined the number of reference points and the model’s representation of manually defined structures. Larger threshold angles resulted in less reference points and less accuracy, and vice versa. We determined the threshold angle experimentally. The 30 degree was selected in this study because this threshold angle resulted in smaller number of reference points while maintaining acceptable accuracy (kappa index of 0.90 or higher). The simplified reference points were then triangulated by using a process similar to the Voronoi graph method [14]. To evaluate how well the triangular mesh models represented the true structures, we defined the volumes of structures based on mesh models. The similarities between mesh-defined volumes and manually delineated volumes were evaluated by calculation of a kappa index for each individual structure model:
κ ( S a , Sb ) =
2 Sa Sa
∩ Sb +
Sb
(2)
where Sa refers to the structure defined by the mesh models and Sb refers to the manually
delineated structure.
A Digital Pediatric Brain Structure Atlas from T1-Weighted MR Images
337
3 Results Figure 1 clearly demonstrates the dependencies, relationships, and positioning of boundary conditions defined for each of the 25 cortical and subcortical regions. A semi-transparent surface rendering of these manually delineated brain structures is provided in Figure 2. The similarity of intra-rater between two sessions for left frontal lobe was 0.98. These manually delineated structures formed the basis for the individual triangular mesh models. After validation and refinement of these models by the neuroradiologist, we generated the triangular mesh models as shown in Figure 3. We then visually inspected the resulting triangular mesh models to check the validity of node assignment and triangle formation as representative of the true surface, which had been manually delineated.
Fig. 2. Semitransparent surface of manually defined brain structures. The structures on either hemisphere were selected for illustration purposes. (a) Right lateral view with all structures. (b) Left lateral view with all structures. (c) Right lateral view with subcortical structures (insula was excluded to make it easier to see the putamen and thalamus).
Fig. 3. Triangular mesh models of brain structures. The structures on either hemisphere were selected for the purpose of illustration. (a) Right lateral view with all structures. (b) Left lateral view with all structures.
338
Z.Y. Shan et al.
The similarities between manually defined structures and models were assessed by calculation of the kappa statistic. The average kappa index for cortical structures (frontal lobe, parietal lobe, temporal lobe, occipital lobe, cerebellum, and brain stem) was 0.97, and it ranged from a low of 0.92 in the brain stem to 0.98 or 0.97 in all other cortical regions. The average kappa index for the subcortical structures (amygdale, caudate, hippocampus, thalamus, insula, putamen, cingulate cortex, and corpus callosum) was 0.91, and it ranged from a low of 0.87 in the hippocampus to highs of 0.95 in the thalamus and cingulate cortex.
4 Discussion We have included WM in lobes during manual tracing of the brain although anatomic textbooks define lobes as gray matter [11]. Our rationale was two-fold. First, if future studies required exclusion of WM in lobes, WM could be automatically removed by tissue segmentation methods. In contrast, manually tracing the WM/GM convolution would have been a tedious procedure and one prone to error. Second, WM is generally associated with specific lobes during the planning of radiation therapy and neuroimaging studies. WM tissue is made up of fibers that convey impulses to and from lobes (projection fibers), between different lobes (association fibers), and within the same lobe in the contralateral hemisphere (commissural fibers). The inclusion of WM in the definition of lobes does not consider the integrity of WM fibers; however, it does provide a rough estimation of WM fibers associated with individual lobes. We are developing a knowledge-guided active model (KAM) method for cortical structure segmentation on pediatric MR images based on this pediatric brain atlas. Triangular mesh models will be transformed to images of a given subject by maximizing entropy (normalized mutual information), and then actively slithered to the boundaries of structures by minimizing enthalpy. Our experience has suggested that the atlas could be applied for rough cortical structure segmentation by a registration method and more accurate individualized segmentation by an active model method [15]. In summary, we have constructed a digital pediatric brain atlas and its triangular mesh models for 25 cortical and subcortical brain structures, which are publicly available on the Internet (www.stjude.org/brainatlas). The atlas differs from previous brain atlases in that it was developed specifically for young children and in the fact that the WM has been roughly divided into WM associated with a specific lobe, WM that connects different regions, WM that interconnects two hemispheres, and WM that connects all lobes with subcortical structures. The atlas and its mesh models can be used in knowledge-based pediatric brain structure segmentation, shape analysis, and planning treatment. Acknowledgements. This work is supported by Thrasher Research Fund awarded to Dr. Zuyao Shan and the Cancer Center Support Grant (CA21765) from the National Cancer Institute and by the American Lebanese Syrian Associated Charities (ALSAC).
A Digital Pediatric Brain Structure Atlas from T1-Weighted MR Images
339
References 1. Ganser K.A., Dickhaus H., Metzner R., et al: A deformable digital brain atlas system according to Talairach and Tournoux. Medical Image Analysis 8 (2004) 3-22 2. Pommert A., Riemer M., Schiemann T., et al: Knowledge-Based and 3D Imaging-Systems in Medical-Education. Information Processing '94, Vol II IFIP Trans A-Computer Sci. and Tech. 52 (1994) 525-532 3. Kikinis R., Shenton M.E., Iosifescu D.V., et al: A digital brain atlas for surgical planning, model-driven segmentation, and teaching. IEEE Tran. Visualization and Computer Graphics 2 (1996) 232-241 4. Thurfjell L., Bohm C., Bengtsson E.: Cba - An Atlas-Based Software Tool Used to Facilitate the Interpretation of Neuroimaging Data. Computer Methods and Programs in Biomedicine 47 (1995) 51-71 5. Tzourio-Mazoyer N., Landeau B., Papathanassiou D., et al: Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage 15 (2002) 273-289 6. Paus T., Zijdenbos A., Worsley K., et al: Structural maturation of neural pathways in children and adolescents: In vivo study. Science 283 (1999) 1908-1911 7. Wilke M., Schmithorst V.J., Holland S.K.: Normative pediatric brain data for spatial normalization and segmentation differs from standard adult data. Magnetic Resonance in Medicine 50 (2003) 749-757 8. Kates W.R., Warsofsky I.S., Patwardhan A., et al: Automated Talairach atlas-based parcellation and measurement of cerebral lobes in children. Psychiatry ResearchNeuroimaging 91 (1999) 11-30 9. Maes F., Collignon A., Vandermeulen D., et al: Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imaging 16 (1997) 187-198 10. Press W.H., Vetterling W.T., Flannery B.P.: Numerical Recipes in C, 2nd Ed. New York CAMBRIDGE UNIVERSITY PRESS (1992) 11. Carpenter M.B.: Core text of neuroanatomy, 3rd Ed. Baltimore, MD Williams & Wilkins (1985) 12. Kim J.J., Crespo-Facorro B., Andreasen N.C., et al: An MRI-based parcellation method for the temporal lobe. Neuroimage 11 (2000) 271-288 13. Crespo-Facorro B., Kim J.J., Andreasen N.C., et al: Human frontal cortex: An MRI-based parcellation method. Neuroimage 10 (1999) 500-519 14. Berg D.M., Kreveld van M., Overmars M., et al: Computational geometry: algorithms and applications, 2nd rev. ed. Berlin, Germany Springer-Verlag (2000) 15. Shan Z.Y., Parra C., Ji Q., Jain J., Reddick W.E. A Knowledge-Guided Active Model Method of Cortical Structure Segmentation on Pediatric MR images. J Magn Reson Imging, In Press.
Discriminative Analysis of Early Alzheimer’s Disease Based on Two Intrinsically Anti-correlated Networks with Resting-State fMRI Kun Wang1 , Tianzi Jiang1 , Meng Liang1 , Liang Wang2 , Lixia Tian1 , Xinqing Zhang2 , Kuncheng Li2 , and Zhening Liu3 1
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100080, China
[email protected] 2 Department of Radiology, Neurology, Xuanwu Hospital of Capital University of Medical Science, Beijing 100053, China 3 Institute of Mental Health, Second Xiangya Hospital, Central South University, Changsha 410011, Hunan, China
Abstract. In this work, we proposed a discriminative model of Alzhei-mer’s disease (AD) on the basis of multivariate pattern classification and functional magnetic resonance imaging (fMRI). This model used the correlation/anti-correlation coefficients of two intrinsically anti-correlated networks in resting brains, which have been suggested by two recent studies, as the feature of classification. Pseudo-Fisher Linear Discriminative Analysis (pFLDA) was then performed on the feature space and a linear classifier was generated. Using leave-one-out (LOO) cross validation, our results showed a correct classification rate of 83%. We also compared the proposed model with another one based on the whole brain functional connectivity. Our proposed model outperformed the other one significantly, and this implied that the two intrinsically anti-correlated networks may be a more susceptible part of the whole brain network in the early stage of AD.
1
Introduction
Alzheimer’s disease (AD) is the most common type of dementia associated with aging. The increased number of people suffering from AD makes it a major public healthy concern. If the disease is diagnosed in the earlier stage, AD patients can live an average of eight to twenty years. Therefore, a high quality and objective discriminative approach distinguishing the early AD patients from the healthy aging may be of great importance for clinical early diagnosis of AD. Clinical observations of AD patients revealed that they often had great difficulty in performing everyday tasks at the very early stage of the disease. They were often described as being distractible and unable to concentrate when they performed the tasks that were previously easily done. These observations suggested that AD patients have attention deficits, and the attention deficits may R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 340–347, 2006. c Springer-Verlag Berlin Heidelberg 2006
Discriminative Analysis of Early AD
341
be one of the possible factors underlying other cognitive deficits at the early stage of the disease [6,19]. From this perspective, attention deficits may also be seen as an early feature of AD. Considering the fact that the attention process not only exists when people perform specific tasks but also maintains during the resting-state, it’s reasonable to hypothesize that the attention deficits of AD patients may also have some expressions even during the resting-state. Recently, by using the resting-state fMRI, two pioneer studies by Fox et al. and Fransson identified two diametrically opposed brain networks on the basis of both correlations within each network and anticorrelations between networks [13,16]. They suggested that one of the two networks was “task-positive” network which was associated with the focused attention/goaldirected behavior and the other one was “task-negative” network which was associated with the stimulus-independent thought. They also proposed that the anti-correlations between the two networks might be interpreted as competition between focused attention and process subserving stimulus-independent thought. From this perspective, their findings offered a network model of attention in the human brain and suggested that it was an intrinsic organization of the brain function. In this paper, we used these two anti-correlated networks as a discriminative model of the brain functional deficits in early AD patients. In this work, we hypothesized that the two anti-correlated networks may be abnormal in early AD patients and their alterations could distinguish AD patients from the elderly healthy controls. To test this hypothesis, we firstly obtained the two intrinsically anti-correlated networks using an anatomically labeled template. Then we analyzed the alterations of the two networks in early AD patients compared with the elderly healthy controls. In the end, we took the correlation/anti-correlation coefficients of all pairs of nodes within the two networks as the classification feature, and proposed a discriminative approach based on the Fisher Linear Discriminative Analysis (FLDA) to distinguish early AD patients from the elderly controls. MRI is noninvasive and has become more and more popular in the hospitals. In addition, resting-state fMRI has the advantage of easier manipulations relative to task-driven method, especially for patients. Hence a discriminative approach base on the resting-state fMRI may have potential in clinical applications.
2
Materials
To reduce the impact of head motion on the calculation of functional connectivity, we excluded the subjects with greater than 1 mm maximum displacement in either of x, y, z directions or greater than 1 degree of angular rotation. 14 AD patients and 14 elderly healthy controls were remained for further analysis. The AD patients and the elderly healthy controls were matched in gender , age and education levels. However, the Mini-Mental State Exam (MMSE) scores of AD patients were significantly lower than those of the healthy controls (23.1±2.6 for patients, 28.8±1.0 for controls). The diagnosis of AD patients fulfilled the Diagnosis and Statistic Manual Disorders, the Fourth Edition criteria for dementia
342
K. Wang et al.
(DSM-IV) [1], and the National Institute of Neurological and Communicative Disorders and Stroke/Alzheimer’s Disease and Related Disorders Association (NINCDS-ADRDA) criteria for AD [7]. According to the results of a clinical progression and neuropathological study by Morris et al. [8], all the patients used here can be considered to be in the early-stage of AD. The AD patients were free of other diseases and the healthy controls were free of any medical, neurological and psychiatric disorders. During the data acquisition, subjects were instructed to keep their eyes closed, relax their minds, move as little as possible, and staying awake at all times. The imaging processes of the AD patients and elderly healthy controls were performed on a 1.5 Tesla scanner. Echo Planer Imaging (EPI) Blood Oxygen Level Dependent (BOLD) images of the whole brain were acquired axially with the following parameters: 2000/60 ms (TR/TE), 20 slices, 90 degree (flip angle), 24 cm (FOV), 5/2 mm (thickness/gap), 64×64 (resolution). The fMRI scanning lasted for 6 minutes and 180 volumes were obtained. For each subject, the first 10 volumes were discarded for scanner stability. Preprocessing procedures for fMRI signals included time aligning across slices, motion correction, spatial normalization, voxels resampling to 3×3×3 mm3 . All the above preprocesses were undertaken using SPM2 [12]. We also used a linear regression process to further remove the effects of head motion and other possible source of artifacts [13]: (1) six motion parameters, (2) whole-brain signal averaged over the entire brain, (3) the linear drift. In the end, the fMRI waveform of each voxel was temporally band-pass filtered (0.01Hz < f < 0.08Hz) by using the AFNI (http://afni.nimh.nih.gov/). As described in the section 3, we pre-selected some regions in an anatomically labeled template to construct the two intrinsically anti-correlated networks. In order to test the validity of these regions, we also used another dataset of 17 young healthy participants (12 males and 5 females, age 25.71±5.62 years). The imaging processes were also undertaken on a 1.5 Tesla scanner. The data acquisition parameters and the preprocessing procedure were similar with those of the AD patients and elderly healthy controls.
3
The Two Intrinsically Anti-correlated Networks
In order to automatically obtain the two intrinsically anti-correlated networks, we used an anatomically labeled template previously reported by Tzourio-Mazoyer et al. [15]. This template divided the whole brain into 116 regions: 90 regions in the cerebra and 26 regions in the cerebella. According to the results of Fox et al. and Fransson [13,16], we selected 5 pairs of bilateral homologous regions in the “task-positive” network and 6 pairs of bilateral homologous regions in the “tasknegative” network. The regions and their abbreviations were listed in the Table 1. We obtained the mean time series of each of the 22 regions by averaging the fMRI time series over all voxels in the region. Correlations coefficients were computed between each pair of the regions. Then a Fisher’s r-to-z transformation was applied to improve the normality of these correlation coefficients [14,21].
Discriminative Analysis of Early AD
343
Table 1. The selected regions in the two anti-correlated networks and their abbreviations used in this paper Task-positive network Region Abbreviations Precentral gyrus PRECG Superior frontal gyrus SFG Middle frontal gyrus MFG Supplementary motor areas SMA Inferior parietal gyrus IPG
Task-negative network Region Abbreviations Anterior cingulate gyrus ACC Posterior cingulate gyrus PCC Hippocampus HIP Parahippocampal gyrus PHIP Angular gyrus ANG Precuneus PCU
Because we used a labeled template to automatically obtain the two intrinsically anti-correlated networks, these regions were unlikely to exactly match what have been suggested by Fox et al. and Fransson [13,16]. Therefore, we should test whether these selected regions could represent those of the two intrinsic networks on the basis of both correlations within each network and anti-correlations between networks. Taking into account of the possible alterations of the two networks due to aging, we firstly tested the validity of the automatically obtained networks on the data of 17 young healthy participants. The results were shown in Figure 1 and Table 2. The correlation coefficients between each pair of regions were entered into one-sample two-tailed t test to determine the significant correlations (p<0.05, corrected for multiple comparisons [22]). Within the “task-positive” network, all the 23 significant correlations were positive. Within the task-negative networks, all the 31 significant correlations were positive. Between the task-positive network and the “task-negative” network, 46 correlations were significantly negative, 5 correlations were significantly positive, and others were not significant. This result suggested that the two automatically obtained networks were mainly positively correlated within each network and negatively correlated between networks. Therefore, the two networks of those selected regions could be used as a representation of the two intrinsically anti-correlated networks proposed by Fox et al. and Fransson [13,16]. We then used these pre-selected regions to construct the two intrinsically anti-correlated networks in the AD patients and elderly healthy controls, respectively. As shown in Table 2, the number of significant correlations/anticorrelations of both AD patients and elderly healthy controls was obviously less than that of the young healthy participants. As to the AD datasets, the number of significant correlations within the “task-positive” network in AD patients was more than that of elderly healthy controls. This was possibly because those regions in the task-positive network were mainly distributed in the prefrontal lobe. Many studies have reported increased functional connectivity associated with the prefrontal regions compared with elderly controls [2,3,4]. The increase of prefrontal functional connectivity has been interpreted as a compensatory reallocation or recruitment of cognitive resources [13]. The number of significant
344
K. Wang et al.
Fig. 1. Graphic representations of the significant correlations within the “task-positive” network/“task-negative” network and anti-correlations between the two networks in the dataset of young healthy participants. (A) The significant correlations within the “taskpositive” network. (B) The significant correlations within the “task-negative” network. (C) The significant anti-correlations between the two networks. P<0.05 (corrected for multiple comparison using the method described by Benjamini and Yekutieli [22]) were used to determine the significant correlations/anti-correlations. Table 2. The number of significant positive correlations within each network and significant negative correlations between the two networks Subjects
NSPC in the task NSPC in the task NSNC between -positive network -negative network two networks Young healthy participants 23 31 46 Elderly healthy controls 16 19 22 Early AD patients 19 18 16 NSPC: the number of significant positive correlations; NSNC: the number of significant negative correlations.
correlations within the “task-negative” network in AD patients was similar with that of the elderly healthy controls but the number of significant anti-correlations between the two networks was less than that of the elderly healthy controls. Fox and colleagues suggested that the anti-correlations of the two networks might be interpreted as competition between focused attention and process subserving stimulus-independent thought [13]. Fransson proposed that the two anticorrelated networks reflected a recurring switch between an introspective versus an extrospectively oriented attention state-of-mind [16]. From this view, the decreased number of significant anti-correlations indicated that when the AD patients tried to concentrate on some things they might have difficulty in effectively inhibiting other random thought, and this may partly interpret the attention deficits of AD patients. The above results indicated that the number of correlations/anti-correlations of the two intrinsically networks were altered in early AD patients compared with elderly healthy controls. In the next section we used the correlation/anticorrelation coefficients of the two networks as the classification feature to discriminate early AD patients from elderly healthy controls.
Discriminative Analysis of Early AD
4
345
Pseudo-Fisher Linear Discriminative Analysis
The primary purpose of the Fisher Linear Discriminative Analysis (FLDA) is to discriminate samples of different groups by maximizing the ratio of betweenclass separability to within-class variability [5,17,18]. The between-class scatter matrix is defined as formula (1) and the within-class scatter matrix is defined as formula (2): Sb = (m1 − m2 )(m1 − m2 )T Sw =
N1 N2 (xi1 − m1 )(xi1 − m1 )T + (xi2 − m2 )(xi2 − m2 )T i=1
(1) (2)
i=1
Where xi1 and xi2 are n-dimensional feature vectors of each sample. In this study the feature is the vector of the correlation coefficients and anti-correlation coefficients of the two intrinsically network obtained in the section 3; m1 and m2 is mean feature vectors of each group; N1 and N2 is sample size of each group. The main objective of FLDA is to find a projective direction ω that maximizes the objective function as (3): J(ω) =
ω T Sb ω ω T Sw ω
(3)
Theoretically, the optimal can be determined by: −1 ω ∗ = Sw (m1 − m2 )
(4)
This is the standard FLDA procedure. However, in many brain image analysis, the number of total training observations is very limited compared to the dimension of the feature space (N1 +N2 << n). In this condition, computing the inverse matrix of Sw is an ill-posed problem and therefore the standard FLDA may get unreliable result. To solve this problem, we used a Pseudo-Fisher Discriminative Analysis (pFLDA) which firstly applied a Principal Component Analysis (PCA) on the sample feature x ∈ n to get a low-dimensional feature x ∈ n (n = N1 + N2 − 1). Then the standard FLDA procedure is used in the low-dimensional feature space to find ω ∗ ∈ n [11,20]. In the end, we can project each sample x ∈ n onto the direction ω ∗ ∈ n to get a discriminative score z by: z = ω ∗T x
(5)
The classification threshold z0 is determined by: z0 = (N1 mz1 + N2 mz2 )/(N1 + N2 )
(6)
Where mz1 and mz2 are the mean values of the discriminative scores of the two classes in the training set.
346
5
K. Wang et al.
Experimental Results
The generalization performance of the classifier was evaluated by using a leaveone-out (LOO) cross validation approach. The leave-one-out approach has been widely used as a reliable estimator of the true generalization performance, especially when the sample size is very limited. Classification results were listed in the top row of Table 3. The correct prediction rate performed on the AD patients and the elderly healthy controls were 93% and 72% respectively, and the total correct prediction rate reached 83%. Table 3. Classification results Discriminative model
LOO test correct rate Elderly controls AD patients Total Two intrinsically anti-correlated networks 72% 93% 83% The whole brain network 78% 43% 61%
We then compared the discriminative ability of proposed model with the whole brain network model which used the correlation/anti-correlation coefficients of all pairs of the 116 regions as the feature. pFLDA was also applied to the new feature space and the results were shown in the second row of Table 3. We can see that its classification rate (61%) was obviously lower than that of the proposed classifier (83%). This result suggested that the discriminative model based on the two intrinsically anti-correlated networks was more effective than the whole brain network model. More importantly, this implied that the two intrinsically anti-correlated networks may be a more susceptible part of the whole brain network of AD patients in their early stage of the disease.
6
Conclusions
In this paper, we used a template to automatically obtain the two intrinsically anti-correlated networks that have been suggested by Fox et al. and Fransson [13,16]. Though not exactly the same, our results confirmed the existence of the two intrinsically anti-correlated networks on the basis of correlations within each network and anti-correlations between networks in a different set of subjects. In addition, we found that there were alterations of the correlations/anticorrelations associated with the two networks in early AD patients compared with age-matched elderly healthy controls. We then used the correlation/anticorrelation coefficients as the feature to discriminate the AD patients and the healthy controls. The results indicated that the proposed classification method based on the two intrinsically anti-correlated networks may be an effective and promising tool for the discrimination of early AD patients. As resting-state fMRI is non-invasive, objective and more easily manipulated especially for patients, it may have potential ability to improve the current diagnosis and treatment evaluation of AD. Future work will involve the evaluation of the proposed method
Discriminative Analysis of Early AD
347
with larger sample size and multi-center imaging data. Moreover, we believe the combination with other features may add more valuable information and possibly get better classification performance. Acknowledgement. This work was partially supported by the Natural Science Foundation of China, Grant Nos. 30425004 and 60121302, and the National Key Basic Research and Development Program (973), Grant No. 2003CB716100.
References 1. American Psychiatric Association. In: DSM-: Diagnostic and Statistical Manual of Mental Disorders, 4th ed. Am. Psychiatric Assoc. Press, Washington, DC, 1994. 2. B. Horwitz, A. R. McIntosh, J. V. Haxby et al. Neuroreport, 6: 2287 - 2292, 1995. 3. C. L. Grady, A. R. McIntosh, S. Beig et al. J Neurosci, 23: 986-993, 2003. 4. C. L. Grady, M. L. Furey, P. Pietrini et al. Brain, 124: 739-756, 2001. 5. C. Z. Zhu, Y. F. Zang, M. Liang et al. MICCAI 2005, LNCS 3750: 468-475, 2005. 6. D. A. Balota, M. Faust. Attention in dementia of the Alzheimer’s type. In handbook of neurology, Volume 6, Aging and Dementia, Second edition, F. Boller and S. Cappa, eds. (Amsterdam: Elsevier), pp: 51-80, 2001. 7. G. McKhann, D. Drachman, M. Folstein et al. Neurology, 34: 939-944, 1984. 8. J. C. Morris, M. Storandt, J. P. Miller, et al. Arch Neurol, 58: 397-405, 2001. 9. J. L. Woodard, S. T. Grafton, J. R. Voltaw, et al. Neuropsychology, 12: 491 - 504, 1998. 10. J. T. Becker, M. A. Mintun, K. Aleva, et al. Neurology, 46: 692 - 700, 1996. 11. J. Yang and J. Y. Yang. Pattern Recognition, 36:563-566, 2003. 12. 11. K. J. Friston, A. P. Holmes and K. J. Worsley et al., Hum Brain Mapp, 2: 189-210, 1995. 13. M. D. Fox, A. Z. Snyder, J. L. Vincent et al. Proc Natl Acad Sci USA, 102: 96739678, 2005. 14. 13. M. J. Lowe, B. J. Mock, J. A. Sorenson. NeuroImage, 7: 119-132, 1998. 15. N. Tzourio-Mazoyer, B. Landeau, D. Papathanassiou et al. Neuroimage, 15: 273289, 2002. 16. P. Fransson. Hum Brain Mapp, 26: 15-29, 2005. 17. P. N. Belhumeur, J. P. Hespanha and D. J. Kriegman. IEEE Trans. PAMI, 19: 711-720, 1997. 18. R. Duda, P. Hart, and D. Stork. Pattern Classification. John Wiley Sons, New York, 2001. 19. R. J. Perry, J. R. Hodges. Brain, 122: 383 - 404, 1999. 20. S. Raudys and R. P. W. Duin. Pattern Recognition Letters, 19: 385-392, 1998. 21. W. H. Press, S. A. Teukolsky, W. T. Vetterling et al., Numerical recipes in C, 2nd ed. U.K. Cambridge Univ. Press, Cambridge, 1992. 22. Y. Benjamini, Y. Yekutieli. Ann Staist, 29: 1165 -1188, 2001.
Rawdata-Based Detection of the Optimal Reconstruction Phase in ECG-Gated Cardiac Image Reconstruction Dirk Ertel1 , Marc Kachelrieß1, Tobias Pflederer2 , Stephan Achenbach2, Robert M. Lapp3, Markus Nagel1 , and Willi A. Kalender1 1
2
Institute of Medical Physics, University of Erlangen-Nürnberg, Germany
[email protected] Department of Internal Medicine II, University of Erlangen-Nürnberg, Germany 3 VAMP GmbH, Erlangen, Germany
Abstract. In order to achieve diagnostically useful CT (computed tomography) images of the moving heart, the standard image reconstruction has to be modified to a phase-correlated reconstruction, which considers the motion phase of the heart and generates a quasi-static image in one defined motion phase. For that purpose a synchronization signal is needed, typically a concurrent ECG recording. Commonly, the reconstruction phase is adapted by the user to the patient-specific heart motion to improve the image quality and thus the diagnostic value. The purpose of our work is to automatically identify the optimal reconstruction phase for cardiac CT imaging with respect to motion artifacts. We provide a solution for a patient- and heart rate-independent detection of the optimal phase in the cardiac cycle which shows a minimum of cardiac movement. We validated our method by the correlation with the reconstruction phase selected visually on the basis of ECG-triggering and used for the medical diagnosis. The mean difference between both reconstruction phases was 12.5% with respect to a whole cardiac motion cycle indicating a high correlation. Additionally, reconstructed cardiac images are shown which confirm the results expressed by the correlation measurement and in some cases even indicating an improvement using the proposed method.
1 Introduction In general, the possibility of imaging moving objects in a quasi-static way is strongly dependent on the temporal resolution of the imaging system. In the special case of cardiac CT, the limited temporal resolution leads to motion artifacts when using standard image reconstruction algorithms (cp. Figure 1). Under the assumption of a quasiperiodic motion of the object, the reconstruction algorithm can be synchronized with the object’s movement [1,2,3,4]. Here, only projection data of the same motion phase are used for the image reconstruction, while the projection data acquired during other phases do not contribute to the image [2]. Periodic recurrent states indicated as synchronization points are detected. With respect to these synchronization points, a phase is defined which indicates the motion phase in which the heart is being imaged. In a temporal window around the defined phase, the projection data are used for the image reconstruction. In this way, the temporal resolution can be increased artificially and the object can be imaged in a quasi-static way. For cardiac imaging usually the ECG is used R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 348–355, 2006. © Springer-Verlag Berlin Heidelberg 2006
Rawdata-Based Detection of the Optimal Reconstruction Phase
a)
349
b)
Fig. 1. Example of cardiac images using standard reconstruction (a) and phase-correlated reconstruction (b)
as a synchronization function representing the cardiac cycle, i.e. the periodic contraction [2]. As an alternative synchronization signal, the kymogram function can be used (cp. Figure 2 (a)). It is generated by the temporal variation of the mass [5,6,7]. Due to the cardiac contraction, the mass of the projected slices varies with time and correlates with the heart rate. Thus, the kymogram can be computed without additional devices, directly from the CT rawdata. In contrast to the ECG, it captures the true mechanical motion of the heart and not an indirect signal like the ECG. Since the heart’s contractile motion is a non-uniform and only quasi-periodic motion, not all phase intervals in the cardiac cycle are equally suited with respect to the phasecorrelated reconstruction algorithm. The systolic phase corresponds to the contraction of the heart and the myocardium shows a high velocity. After the following relaxation the heart is at rest for a moment which corresponds to the diastolic phase. The diastolic phase represents the optimal cardiac reconstruction phase for most cases [8], since the temporal window for the data acquisition covers the complete period of rest. With increasing heart rate, the systolic phase more frequently provides optimal image quality [9]. However, a general prediction of the optimal reconstruction phase showing a minimum of motion is not possible. On the one hand, the possibility of a reconstruction in the systolic or the diastolic phase prohibits this prediction. On the other hand, a slight patient-dependent variation of the typical ECG phases makes the prediction uncertain. Hence, it is common practice to approach the optimal image quality adaptively by iterative image reconstruction at different phase points. An image-based method to detect the optimal reconstruction phase already exists [10]. There, the optimal reconstruction phase can be computed by a similarity measure of the image stacks reconstructed at different phase points leading to the best achievable image quality. Both approaches, the user-based iterative way and the automatic image-based detection, require multiple image stacks and therefore a large number of image reconstructions have to be performed. Thus, they are very time consuming and computationally expensive. Moreover, all image stacks have to be visually assessed for the user-based approach. We propose a fully automatic computer-assisted detection of the optimal reconstruction phase using the rawdata-based kymogram signal. Our approach intends to provide optimal image quality without the need for performing more than the final image reconstruction. This may allow substantial reductions in the time necessary for processing cardiac CT data sets in the medical workflow.
350
D. Ertel et al.
2 Materials and Methods The proposed method detects the optimal phase for a phase-correlated cardiac image reconstruction using the rawdata-based kymogram signal [6,7] which represents the heart’s motion function. The kymogram signal is generated by the standard algorithm from Kachelrieß et al. [6,7]. The algorithm computes the center-of-mass (COM) point of the projected slice which varies over time due to the heart’s motion. The twodimensional kymogram signal xc (t) r c (t) = (1) yc (t) represents the motion of the COM-point r c of a transversal slice in the direction of x and y with time t. For a detailed description of the quasi-periodic heart motion we use the motion phase p. It indicates the relative point in time in one period of the moving heart by a normalization to the length of the period T . For covering the whole scan time t ∈ [tStart , tEnd ] we obtain the modular phase p(t) ∈ [0, 1] p(t) =
t − tn . Tn
(2)
Since we cover more than one period we have to assume a variable heart rate 1/Tn = 1/(tn+1 − tn ) with the synchronization points tn , tn+1 and tn ≤ t < tn+1 . Figure 2 (b) shows an example of the modular phase p(t) and the corresponding ECG signal where the R-peaks are defined as the synchronization points tn . A grey-coded twodimensional plot of r c (t) is shown in Figure 3 (a). The grey-coding is defined with respect to the current heart motion phase p(t) shown in Figure 2 (b). The grey-coded scatterplot in Figure 3 (a) already shows a correlation between the location of the COM points and the heart motion phases as was indicated in [6]. The light-grey points corresponding to the systolic phase p(t) = 20% − 40% are mainly located in the left bottom corner of Figure 3 (a). The dark points corresponding to the diastolic phase p(t) = 60% − 80% are mainly located in the middle of the scatterplot. Since the heart’s motion function is a quasi-periodic function in the following we are just interested in the patient-specific phase motion function r c (p) of a heart’s motion cycle with respect to the motion phase p. For that purpose, we observe each motion cycle with length Tn separately within the continuous time signal rc (t). We obtain the phase motion function rc (p) by averaging the heart’s motion function rc (t) over the whole scan time with respect to the phase p. By using a modified histogram function we can extract all dedicated values of r c (t) for the phase p and compute the mean value tEnd 1 r c (p) = tEnd δ(p(t) − p) · rc (t)dt. (3) δ(p(t) − p)dt tStart tStart Thus, we generate a representative motion function of the COM points for a heart’s motion cycle. An example of a phase motion function rc (p) of a chosen patient is shown in Figure 3 (b). The velocity of the heart with respect to the motion phase can be identified by the distance between two neighboring phase-based COM points rc (p)
Rawdata-Based Detection of the Optimal Reconstruction Phase
351
a)
0
1
2
3
4
5
6
7
8
9
10
11
t/s 1
p(t)
0.8 0.6
R−peak
R−peak
0.4 0.2 0 0.6
0.8
1
1.2 t/s
1.4
1.6
1.8
b) Fig. 2. ECG-signal (solid) and corresponding kymogram function (dashed) (a); RR-cycle of an ECG-signal and corresponding modular phase p(t) (b)
0.5
y (normalized)
y (normalized)
0.5
0
−0.5
−0.5 −0.5
a)
0
0 x (normalized)
0.5
−0.5
0 x (normalized)
0.5
b)
Fig. 3. Example (a) of a motion function r c (t) over the whole scan time and (b) of a phase motion function r c (p) of a chosen patient
and rc (p + Δp). For a constant phase-based sampling rate Δp two neighboring COM points have a higher distance |rc (p + Δp) − r c (p)| for a higher velocity than for lower velocity. As already explained in the introduction projection data from a temporal window around the selected phase are used for the image reconstruction. Hence, we detect the optimal reconstruction phase palgo by identifying the phase point with the lowest mean velocity in the considered period (temporal window). Therefore, we minimize the distance of a phase-based COM point r c (p) with respect to the selected phase p to the neighboring points rc (p + pi ) in a relevant range 2 · pw
352
D. Ertel et al.
palgo = argminp
pw
−pw
||r c (p + pi ) − r c (p))|| dpi .
(4)
This relevant range is defined by the relative temporal resolution of the phase-correlated cardiac reconstruction algorithm [3,11], contributing to the image reconstruction. For validation, we used data of 65 consecutive patients scanned with standard protocols, a collimation of 2 · 32 × 0.6 mm, a rotation time of 0.33 s and a concurrent ECG recording (Sensation 64 Siemens Medical Solutions, Forchheim, Germany). For all patients, a phase-correlated image reconstruction had already been performed with the reconstruction phase determined iteratively by a cardiologist using visual assessment to identify the optimal cardiac phase. We assume that this reconstruction phase leads to images holding optimal quality. We defined the quality measure Q as the absolute difference of the identified optimal reconstruction phase palgo and the reconstruction phase used for the medical diagnosis pmed |palgo − pmed | , for |palgo − pmed | ≤ 50% Q= . (5) 100% − |palgo − pmed | , for |palgo − pmed | > 50%
3 Results The mean value of the quality measure over all patients was QI = 13.9% with a standard deviation of σ = 11.1%. For 4 patients the kymogram-based algorithm proposed the systolic phase as the optimal reconstruction phase while the diastolic phase had been chosen by the medical expert for the medical diagnosis. In detail, our algorithm predicted a reconstruction phase between 23% and 41% while a reconstruction phase of 65% or 70% had been determined by the cardiologist. The 4 patients degrade the quality measure strongly whereas the diagnostic relevance remains to be proven. After excluding these patients, the mean difference decreases to QII = 12.5% with a standard deviation of σ = 9.8%. Figure 4 shows two histogram functions of both quality measures QII and QI . Histogram function
Histogram function
15
15
10
10
5
5
0 0
5
10
15
20
25
30
35
40
0 0
45
QII / %
a)
5
10
15
20
25
30
35
40
45
QI / %
b)
Fig. 4. Histogram function of the quality measure Q from (a) the selected 61 patients and (b) all consecutive 65 patients including the mean values (dashed line)
Rawdata-Based Detection of the Optimal Reconstruction Phase Medical expert
353
Algorithm
Fig. 5. Example 1: Phase-correlated cardiac images (fH = 75.1 bpm) in coronal view (top) and two different transaxial views (middle and bottom) with pmed = 70% (left) and palgo = 29% (right) (C 0/W 1000)
Figure 5 serves as an example to illustrate our decision to exclude the predicted systolic phases from the objective quality measure when the diastolic phase had been used by the cardiologist. For this patient palgo = 29% was predicted and pmed = 70% had been used. One coronal view and two transaxial slices are shown for both reconstruction phases. The images reconstructed at pmed suffer from motion artifacts (right images). The left circumflex (LCX) artery shown in the coronal view (top) and the left anterior descending (LAD) artery shown in the first transaxial slice (middle) are strongly blurred. The right coronary artery (RCA) shown in the second transaxial slice (bottom) has no sharp delineation between the vessel and the surrounding structures which is a typical motion artifact. A reconstruction at palgo (left images) clearly improves the image quality. Blurring of the LCX and the LAD is reduced and the RCA shows a sharper delineation. Also the calcification in the LAD is more visible.
354
D. Ertel et al. Medical expert
Algorithm
Fig. 6. Example 2: Phase-correlated cardiac images (fH = 63.8 bpm) in coronal view (top) and transaxial view (bottom) with pmed = 60% (left) and palgo = 76% (right) (C 0/W 1000)
Moreover, Figure 6 shows an example with a higher-than-average deviation between the proposed phase palgo = 76% (left images) and the used phase pmed = 60% (right images) and also with increasing image quality. Again, a coronal view and a transaxial slice are shown for both reconstruction phases. The LCX shown in the coronal view (top) is blurred and the RCA shown in the transaxial slice (bottom) has no sharp delineation. Both effects are reduced for a reconstruction at palgo . Using the proposed casespecific optimal reconstruction phase in diastole, an improvement in image quality was achieved.
4 Discussion We presented a kymogram-based approach to automatically identify the optimal reconstruction phase and thereby obtain motion-free cardiac CT images. The proposed method was implemented on a standard PC with a computation time of under 2 minutes per patient and can be easily made usable on medical workstations. Multiple timeconsuming image reconstructions and other proposed image-based methods may become obsolete. We achieved good correlation of the predicted reconstruction phase with the reconstruction phase chosen by the medical expert. The mean difference value of both reconstruction phases was QI = 13.9% and QII = 12.5%, respectively. Moreover, in two of several examples we have shown that the image quality can be even improved using the predicted reconstruction phase compared to the phase used for the medical diagnosis which was chosen empirically. We conclude that the presented kymogrambased method can improve the medical workflow by providing optimal image quality with the initial image reconstruction. Second it provides optimal image quality by detecting the optimal reconstruction phase which is dependent on the heart rate and also
Rawdata-Based Detection of the Optimal Reconstruction Phase
355
on the patient. Since empirically chosen reconstruction phases provide suboptimal image quality in some cases, our computed-assisted method has the potential to improve image quality.
References 1. DeMan, B., Edic, P., Basu, S.: An iterative algorithm for time-solved reconstruction of a CT scan of a beating heart. (2005) 356–359 The Eighth International Meeting on Fully Three-dimensional Image Reconstruction in Radiology and Nuclear Medicine. 2. Kachelrieß, M., Kalender, W.A.: Electrocardiogram-correlated image reconstruction from subsecond spiral computed tomography scans of the heart. Medical Physics 25 (1998) 2417– 2431 3. Kachelrieß, M., Ulzheimer, S., Kalender, W.A.: ECG-correlated image reconstruction from subsecond multi-slice spiral CT scans of the heart. Medical Physics 27 (2000) 1881–1902 4. Kubo, H., Hill, B.: Respiratory gated radiotherapy treatment: A technical study. Physics in Medicine and Biology 41 (1996) 83–91 5. Bruder, H., Maguet, E., Stierstorfer, K., Flohr, T.: Cardiac spiral imaging in Computed Tomography without ECG using complementary projections for motion detection. Proc. SPIE (2003) 6. Kachelrieß, M., Sennst, D.A., Maxlmoser, W., Kalender, W.A.: Kymogram detection and kymogram-correlated image reconstruction from subsecond spiral computed tomography scans of the heart. Medical Physics 29 (2002) 7. Kalender, W.A., Kachelrieß, M.: Computertomograph mit objektbezogener Bewegungsartefaktreduktion und Extraktion der Objektbewegungsinformation (Kymogramm) (1999) European Patent Office, Patent pending, Patent Nr.: 99111708.6 - 1522. 8. Achenbach, S., Ropers, D., Holle, J., Muschiol, G., Daniel, W.G., Moshage, W.: In-Plane Coronary Arterial Motion Velocity: Measurement with Electron-Beam CT. Radiology 216 (2000) 457–463 9. Shanneik, K., Kachelrieß, M., Kalender, W.A.: Heart Rate Dependency of Image Quality Using Multi-segment Reconstruction Algorithms. In: Supplement to Radiology, Radiological Society of North America (2005) 479 RSNA 2005 conference proceedings. 10. Manzke, R., Köhler, T., Nielsen, T., Hawkes, D., Grass, M.: Automatic phase determination for retrospectively gated cardiac CT. Medical Physics 31 (2004) 11. Kachelrieß, M., Knaup, M., Kalender, W.A.: Phase-correlated imaging from multi-threaded spiral cone-beam CT scans of the heart. (2005) 159–162 Proceedings of the 8th International Meeting on Fully 3D Image Reconstruction.
Sensorless Reconstruction of Freehand 3D Ultrasound Data R. James Housden, Andrew H. Gee, Graham M. Treece, and Richard W. Prager University of Cambridge, Department of Engineering, Trumpington Street, Cambridge, CB2 1PZ, UK Abstract. Freehand 3D ultrasound can be acquired without a position sensor by finding the separations of pairs of frames using information in the images themselves. Previous work has not considered how to reconstruct entirely freehand data, which can exhibit irregularly spaced frames, non-monotonic out-of-plane probe motion and significant inplane motion. This paper presents reconstruction methods that overcome these limitations and are able to robustly reconstruct freehand data. The methods are assessed on freehand data sets and compared to reconstructions obtained using a position sensor.
1
Introduction
Freehand 3D ultrasound is a technique that produces 3D data from a series of 2D ultrasound images [1]. In order to do this, the position of each individual scan must be known relative to the others. This is normally achieved using a position sensor, but the extra sensor is an inconvenience. A more serious limitation is the small scale accuracy of the sensor — over a short range, the positioning errors can be significant. Also, a position sensor gives positions relative to a fixed reference, which relies on the scanning subject remaining stationary during the scan. An alternative is image-based positioning, which uses information in the images themselves. The offset in the plane of the images can be found using standard image registration and motion tracking techniques [2,3]. Out of the plane of the image, speckle decorrelation [4,5,6] can be used to estimate the absolute separation between two patches of data. The idea is that the correlation between the patches varies with their separation. By calculating the correlation, the value can be converted to a distance using a precalibrated decorrelation curve. There are several complications that must be overcome in order to apply this basic theory to a typical freehand data set. First, the calibrated decorrelation curve is only valid for data consisting of fully developed speckle, but there is very little of this in scans of real tissue. A solution to this problem is to use the adaptive speckle decorrelation method presented in [7]. Secondly, there are other causes of speckle decorrelation that might be confused for elevational motion. While most are a matter for future research, the effects of in-plane motion have been studied in detail. In [8], it is shown how to interpolate the in-plane motion estimates to sub-sample precision and then correct the elevational correlation values by allowing for the sub-sample in-plane motion. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 356–363, 2006. c Springer-Verlag Berlin Heidelberg 2006
Sensorless Reconstruction of Freehand 3D Ultrasound Data
357
In addition, the speckle decorrelation method only gives the absolute value of the separation between two patches. Previous work has ignored this problem and instead constrained the motion to be monotonic with no intersecting frames, in which case all the offset estimates are in the same direction. In this paper, we present for the first time sensorless reconstructions of truly unconstrained freehand data. Building on preliminary work [9] with regularly spaced frames, we present algorithms to allow for arbitrary hand-held probe motion, including cases where the probe reverses direction part-way through the scan. The process of reconstructing a freehand 3D data set is as follows: 1. Each image frame is divided into a grid of 12 × 8 patches. 2. For each frame A, we determine the set of frames Bi within range of A. By “within range”, we mean that the majority of the patches on Bi are offset by an amount within the search range of the in-plane alignment and are close enough elevationally to correlate to some extent with a corresponding patch on A. Frames that are elevationally too distant from A will be totally decorrelated and not allow any meaningful estimation of elevational or inplane separation. 3. For all frame pairs within range of each other, the offset between corresponding patches is estimated as three translational degrees of freedom (two in-plane and one out-of-plane). This is done as accurately as possible, making use of the methods in [7,8]. 4. Using this elevational separation information, several interleaved sets of coarsely spaced frames are selected from the irregularly spaced sequence of frames, using the method described in Section 2. 5. The coarse sets of frames are reconstructed independently and then combined into a final reconstruction, as described in Section 3. It is assumed that most of the frame pairs will not intersect at the coarse scale, so there is no need to correct for intersections. This is a reasonable assumption for linear scanning protocols. Other scanning protocols are also possible, for example axial rotation, where there will certainly be intersections. Reconstructing these data sets is the subject of related work [10].
2
Frame Selection
A well known problem in sensorless reconstruction is drift errors from accumulating offsets through a sequence of frames. They are caused by biases in the elevational separation estimates which are most significant when the frames are closely spaced. Our frame selection algorithm therefore has two objectives. First, a coarse set of frames must be widely spaced to minimise drift errors. Also, the multiple coarse sets must use different frames, so that they give independent reconstructions. It is not always possible to achieve both of these objectives, in which case the independence objective is considered to be more important. Consider initially the problem of choosing only one set of frames from the sequence, so that independence is not an issue. The algorithm builds up a coarse
358
R.J. Housden et al.
set of frames one frame at a time, using range information from the two previous frames in the coarse sequence. Consider Figure 1, where frames A and B are already chosen and the next frame C is to be found. Clearly, C must lie within range of B, giving the rmax (B) limit. However, in order to detect nonmonotonic motion, we need to ensure that each frame is within range of not one but two frames further along the sequence (the reason for this will become clear in Section 3). Ideally, one of these frames would be at the maximum range and the other half way between. C must therefore be positioned within the rhalf (B) range and must also be within range of A. The ideal choice for C is therefore the smaller of A + rmax (A) and B + rhalf (B). This method would generate a single suitably spaced sequence. However, with multiple coarse sequences, it is possible that a frame would be the ideal next frame for more than one sequence. It is then necessary to adjust the frames used by each set to attempt to avoid repetitions. Figure 2 shows an example of this situation. Three sets of frames are being constructed together. We have reached a point where two coarse sets (1 and 3) both have the same ideal next frame, rhalf (B) A
B
B + rmax(B) ideal new frame, C
rmax(A)
Fig. 1. Choosing the next frame in the sequence. The ideal next frame depends on two maximum limits: it must be within range of both the preceding frames, A and B, and leave room for another frame within range of B.
frame sequence positions
A1
Set 1
Set 2
Set 3
B2
A2
C1
B2 A3
C2 B2
C3
P3
P2
P1
available positions
Fig. 2. Repeated frame in coarse sequences. Sets 1 and 3 both expect the same frame as their next frame. The algorithm adjusts them if there is room to do so.
Sensorless Reconstruction of Freehand 3D Ultrasound Data
359
so we need to try to adjust the selections. Since we wish to change no more than the last frame in each set, the only slots available are those after frame B3 , labelled P1 to P3 in the figure. There are two frames to adjust, so we initially consider just P1 and P2 . However, as frame C2 is already in position P2 , we find that we actually need to adjust all three frames, using all three available slots. In more extreme cases, there might not be enough slots to play with. We must then accept a repeated frame as unavoidable, but continue to adjust any further repetitions. Having established how many frames to adjust, we distribute the frames evenly into the available slots, minimising the number of times each position is used. We order the frames according to the order of the preceding frames in the sets. In our example, this means that both frames C2 and C1 are moved back one position, while C3 is unchanged.
3
Robust Reconstruction
Consider reconstructing a sequence of frames. The out-of-plane offset between two frames can be found by solving a simple least squares problem d = Xw, where d is a vector of patch-wise distance estimates and w is a vector of three out-of-plane offset parameters (centre elevational offset and gradients across and down the frame). X is a coefficient matrix. The direction ambiguity in the elevational distance estimates means that it is not immediately possible to see when the probe reverses direction. Given three frames A, B and C, the AB and BC offsets can be found independently, but their relative directions are not known. By making use of the distances between A and C, an expanded least squares problem can be solved for the three frames together, disambiguating the two possible directions of BC relative to AB: ⎛ ⎞ ⎛ ⎞ dAB X 0 ⎝ dAC ⎠ = ⎝ X X ⎠ wAB (1) wBC dBC 0 X Figure 3(a) shows how we use this multiframe approach to reconstruct the coarsely spaced sequences of frames. We consider three consecutive frames in a sequence as an independent block and position them using equation (1). We then consider another block of three frames that overlaps the first, as shown in the figure. These independently positioned blocks are joined together by averaging the offset in the overlapping range. By repeating this process, we can build up an entire sequence of frames. This can easily give an incorrect reconstruction, as it only needs one spurious change of direction to make the entire sequence useless. The important next step is therefore to exploit the redundancy afforded by the multiple coarse sets (Figure 3(b)). We look for consistency between the sets by finding the maximum difference in their accumulated offsets. Reconstructions that disagree on changes of direction will have a much larger maximum difference than similar reconstructions. Sequences with a difference of less than 0.5 mm are taken as consistent.
360 A
R.J. Housden et al. B
C
(a) coarse sets are reconstructed in blocks of 3 frames
d1 A B
C
C
B
D
d1+d 2 2
D
(b) sets are compared to find incorrect reconstructions A(1) N(1)
d2
Set 1
A
(c) final reconstruction B C
A(n)
N
N(n) Set n
Fig. 3. The reconstruction algorithm
Having rejected the incorrect reconstructions, a final scale is determined by averaging the maximum lengths of the remaining sets. This averaging is intended to reduce any non-systematic drift error. The final reconstruction consists of the frames of one of the sets, scaled to match the average length (Figure 3(c)). Note that there will be one direction ambiguity remaining in the final reconstruction, so it could be the mirror image of the correct reconstruction. There is currently no solution to this problem other than to manually determine which is correct either from the scanning protocol or from features in the images.
4
Experiments and Results
Ultrasound data was recorded with a 5–10 MHz linear array probe connected to a Dynamic Imaging Diasus ultrasound machine (http://www.dynamicimaging. co.uk). The depth setting was 4 cm with a single focus at 2 cm. Analogue radio frequency (RF) ultrasound signals were digitised after receive focusing and timedelay compensation, but before log-compression and envelope detection, using a Gage Compuscope CS14200 14-bit digitiser (http://www.gage-applied.com). The system works in real time, with acquisition rates of about 30 frames per second. The signal was sampled at 66.67 MHz, synchronous with the ultrasound machine’s internal clock. The RF vectors were filtered with a 5–10 MHz filter, then envelope detected using the Hilbert transform. The resulting 127 × 3818 frames of backscatter amplitude data formed the basis of all further computation: speckle decorrelation and lateral interpolation [8] require RF amplitude data rather than rasterised B-scans. The frame positions were recorded using a Polaris optical tracking system (http://www.ndigital.com). Several 3D data sets were recorded by scanning beef joints by hand in both monotonic and non-monotonic sweeps. The only deliberate motion was elevational, but being freehand, there were also small in-plane and rotational motions between each frame pair. The data had very dense frame spacing, so that the
Sensorless Reconstruction of Freehand 3D Ultrasound Data
361
4 3 2 1 100 frame number
150
4 3 2 1 0 −1 0
3 2.5 2 1.5 1 0.5 0 −0.5 0
50 100 frame number
0.5 0 −0.5 −1 150
accumulated centre offset (mm)
accumulated centre offset (mm)
1
Data Set 7 accumulated centre offset (mm)
3 2 1 0 −1 0
2 1.5 1 0.5 0 50 100 frame number
2.5 2 1.5 1 0.5 0 50
100 150 frame number
Data Set 8
150
2 1.5 1 0.5 0 −0.5 0
50
100 150 frame number
200
Data Set 6
3
−0.5 0
100 frame number
Data Set 3
2.5
−0.5 0
50
2.5
Data Set 5
1.5
50 100 frame number
150
3
Data Set 4 2
−1.5 0
100 frame number
4
Data Set 2 accumulated centre offset (mm)
accumulated centre offset (mm)
Data Set 1
50
accumulated centre offset (mm)
50
5
200
accumulated centre offset (mm)
0 0
accumulated centre offset (mm)
5
accumulated centre offset (mm)
accumulated centre offset (mm)
frame selection algorithm was able to select five independent sequences from each. Ten data sets were considered, five with no deliberate change of direction (although there could be some jitter in the probe motion) and five with the probe deliberately reversed once. Figure 4 shows the reconstruction in terms of the offset at the centre of the frames, accumulated through the data set, for each set of data considered. All five of the coarse sets are shown separately, along with the equivalent values from the position sensor.
3 2 1 0 0
50
100 150 frame number
Data Set 9
2 1 0 −1 0
50 100 frame number
150
Data Set 10 Fig. 4. Reconstruction of freehand data sets. Each figure shows the coarse scale accumulated centre offsets for the five independent sets of frames. The solid lines (—) are the sets that contribute to the consensus. The dashed lines (- -) are those that are rejected. The centre offsets as given by the position sensor are also shown (-·-). The first five data sets are monotonic and the rest have one deliberate turning point.
362
R.J. Housden et al.
A meaningful comparison between the two reconstructions would require the beef to remain stationary with respect to the Polaris system’s reference frame. To this end, the probe was separated from the beef by a thick layer of coupling gel, to minimise the chance of any contact force displacing the beef. Note that this precaution is solely for the benefit of the sensor-based reconstructions. Sensorless reconstructions are not sensitive to rigid movement of the scanning subject. The results in Figure 4 demonstrate how the algorithm can robustly reconstruct a data set. In all except the last example there is a clear consensus among the coarse reconstructions, with at most one set different. In addition, the consensus sets give very similar reconstructions, despite using independent distance estimates for most, if not all, of their length. This suggests that uncertainty in the distance estimates does not seriously affect the accuracy of the overall reconstruction. In the worst case, data set 10, the final reconstruction relied on a consensus of two from five. Data set 10 featured two points with uncertain monotonicity, whereas most data sets have none.
Fig. 5. Reslices. The figure shows a part of a reslice along the length of data set 1. The left reslice is derived from a position sensor reconstruction, whereas the right one is from the image-based reconstruction.
The sensorless reconstructions differ from the position sensor reconstructions in two ways: small scale jitter and large scale drift. The position sensor has significant small scale inaccuracy, which affects the quality of the reslice image shown in Figure 5. In contrast, the sensorless reslice is much smoother. We have used coarsely spaced frames in order to minimise the large scale drift error, but it is clear from Figure 4 that we have not eliminated it entirely. The lateral interpolation between RF vectors [8] and the adaptive speckle decorrelation [7] both contribute to the bias. A further source of bias is transducer rotation between frames. This would result in systematic overestimation of elevational separation — as is evident in many of the results in Figure 4 — and helps explain the somewhat larger drift for, say, data set 2 compared with previous results obtained with purely translational probe motion [7,8].
5
Conclusions
This paper has presented results showing the viability of sensorless reconstruction of freehand 3D ultrasound data. The method has been shown to be
Sensorless Reconstruction of Freehand 3D Ultrasound Data
363
robust to the complex motions in freehand data, particularly non-monotonic motion. Image-based reconstruction has advantages over the usual position sensor method, particularly for small scale accuracy. Also, it shows movement relative to the scanning subject, which is an advantage if the subject is free to move. However, unavoidable drift limits the accuracy of any large scale measurements based on the sensorless reconstructions.
References 1. Fenster, A., Downey, D.B., Cardinal, H.N.: Three-dimensional ultrasound imaging. Physics in Medicine and Biology 46 (2001) R67–R99 2. Weng, L., Tirumalai, A.P., Lowery, C.M., Nock, L.F., Gustafson, D.E., Behren, P.L.V., Kim, J.H.: US extended-field-of-view imaging technology. Radiology 203(3) (1997) 427–441 3. Geiman, B.J., Bohs, L.N., Anderson, M.E., Breit, S.M., Trahey, G.E.: A novel interpolation strategy for estimating subsample speckle motion. Physics in Medicine and Biology 45 (2002) 1541–1552 4. Chen, J.F., Fowlkes, J.B., Carson, P.L., Rubin, J.M.: Determination of scan plane motion using speckle decorrelation: theoretical considerations and initial test. International Journal of Imaging Systems Technology 8 (1997) 38–44 5. Li, M.: System and method for 3-D medical imaging using 2-D scan data. United States patent 5,582,173, application number 529778 (1995) 6. Tuthill, T.A., Kr¨ ucker, J.F., Fowlkes, J.B., Carson, P.L.: Automated threedimensional US frame positioning computed from elevational speckle decorrelation. Radiology 209 (1998) 575–582 7. Gee, A.H., Housden, R.J., Hassenpflug, P., Treece, G.M., Prager, R.W.: Sensorless freehand 3D ultrasound in real tissue: Speckle decorrelation without fully developed speckle. Medical Image Analysis 10(2) (2006) 137–149 8. Housden, R.J., Gee, A.H., Treece, G.M., Prager, R.W.: Sub-sample interpolation strategies for sensorless freehand 3D ultrasound. Technical Report CUED/FINFENG/TR 545, Cambridge University Department of Engineering (2006) 9. Housden, R.J., Gee, A.H., Prager, R.W., Treece, G.M.: Sensorless freehand 3D ultrasound for non-monotonic and intersecting frames. In: Proceedings of the 9th Annual Conference on Medical Image Understanding and Analysis, Bristol, UK (2005) 127–130 10. Housden, R.J., Gee, A.H., Treece, G.M., Prager, R.W.: Sensorless reconstruction of unconstrained freehand 3D ultrasound data. Technical Report CUED/FINFENG/TR 553, Cambridge University Department of Engineering (2006)
Motion-Compensated MR Valve Imaging with COMB Tag Tracking and Super-Resolution Enhancement Andrew W. Dowsey1, Jennifer Keegan2, Mirna Lerotic1, Simon Thom3, David Firmin2, and Guang-Zhong Yang1 1
Royal Society / Wolfson Foundation Medical Image Computing Laboratory, Department of Computing, Imperial College London, SW7 2BZ, UK {andrew.dowsey, m.lerotic, g.z.yang}@imperial.ac.uk 2 Cardiovascular Magnetic Resonance Unit, National Heart and Lung Institute, Imperial College London, Royal Brompton and Harefield NHS Trust, SW3 6NP, UK
[email protected],
[email protected] 3 International Centre for Circulatory Health, National Heart and Lung Institute, Imperial College London, St. Mary’s NHS Trust, W2 1LA, UK
[email protected]
Abstract. MR imaging of the heart valve leaflets is a challenging problem due to their large movements throughout the cardiac cycle. This paper presents a motion-compensated imaging approach with COMB tagging for valve imaging. It involves an automatic method for tracking the full 3D motion of the valve plane so as to provide a motion-tracked acquisition scheme. Super-resolution enhancement is then applied to the slice-select direction so that the partial volume effect is minimised. In vivo results have shown that in terms of slice positioning, the method has equivalent accuracy to that of a manual approach whilst being quicker and more consistent. The use of multiple parallel COMB tags will permit adaptive imaging that follows tissue motion. This will have significant implications for quantification of myocardial perfusion and tracking anatomy, functions that are traditionally difficult in MRI.
1 Introduction Heart valve disease represents one of the major causes of heart failure. The valves govern the passage of the blood flow within and through the heart. The properties of the valve leaflets are unique; they have high tensile strength, being virtually inextensible at pressures, but are flexible and coordinate forward flow during the cardiac cycle. Understanding valve morphology and interaction with the blood flow could unveil localised factors that contribute to heart diseases. Over the years, different noninvasive 3D imaging techniques have been proposed for assessing valve structure and function. Extensive development in ultrasound imaging has made 3D transthoracic and transesophageal echocardiography an important tool clinically for valve assessment. The alternative of cardiovascular MR offers additional versatility and accuracy in 3D localisation. This, however, has received limited success mainly due to the difficulty in capturing the moving valve leaflets. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 364 – 371, 2006. © Springer-Verlag Berlin Heidelberg 2006
Motion-Compensated MR Valve Imaging
365
Fig. 1. End systolic (left) and end diastolic (right) frames from 2 orthogonal long axis planes through the aortic valve. The position of othe valve is marked on both frames (solid line). The position of the valve at end systole is superimposed on the end diastolic images (dotted line).
As an example, the aortic valve may be imaged in cross-sections near the base of the left ventricle. However, the basal ventricular plane moves rapidly through the cardiac cycle and so may be displaced by as much as 15 – 20mm between systole and diastole [1], as shown in Fig. 1. This results in the aortic valve moving in and out of the imaging plane, resulting in poor visualisation of the valve leaflets during different phases of the cardiac cycle. To avoid this problem, it is necessary to adaptively track the position of the valve plane during data acquisition so that global cardiac motion is removed to reveal the morphological changes of the valve leaflets. The importance of motion-tracked imaging has been recognised by the MR community for a number of years. Kozerke et al. [2] first introduced tag tracking to perform motion-compensated MR cine velocity mapping through the aortic valve. The basal plane is labelled and then imaged from the four-chamber view plane at regular intervals through the cardiac cycle. The user then marks the septal and left ventricular walls on an arbitrary frame before an automatic technique uses pattern matching to track motion between frames. The technique has been demonstrated useful for quantifying mitral and aortic regurgitation [3]. The disadvantage of this method is that whilst it successfully localises the aortic valve, motion is only tracked on one axis and so it cannot compensate for the complex lateral and torsional motion of the heart. The purpose of this paper is to develop and validate an automatic scheme to track the full 3D motion of a region of interest (ROI) in the basal ventricular plane to guide the slice position in subsequent motion-tracked acquisition. With this approach, global motion is removed so that the characteristics of the valve leaflets may be clearly observed throughout the cardiac cycle. Super-resolution enhancement is then applied to the slice-select direction so that the partial volume effect is minimised.
2 Method 2.1 COMB Tagging The prerequisite of motion-compensated imaging is to tag the target anatomical ROI so that its 3D position is highlighted. In this study, a cine gradient-echo echo-planar sequence with a tagging pre-pulse [2] was employed. The tagging pre-pulse consists of a selective and non-selective 90° pulse pair with the selective component involving a COMB excitation [4], which enables the option of simultaneous tagging of multiple
366
A.W. Dowsey et al.
parallel planes within the same acquisition. The tagged images are generated by the complex subtraction of two datasets, with the phase of the selective component of the tagging pulse pair being reversed between the two. Interleaved vertical long axis (VLA) and horizontal long axis (HLA) image planes were acquired during a single breath-hold, with tagging performed perpendicular to the long axis planes. The resulting images show the motion of the tagged slice through the cardiac cycle in both planes, as illustrated in Fig. 2, such that automated slice tracking can then be applied.
0.0ms 27.5ms
d)
c)
b)
a)
55.0ms 82.5ms
275.0ms 302.5ms
935.0ms 962.5ms
Fig. 2. The basal short axis plane was excited by COMB tagging and VLA (top) and HLA (bottom) acquisitions interleaved through the cardiac cycle so that only motion of the tag plane remains: The left ventricle, which reflects the movement of the aortic valve, is then tracked (solid line). (a) shows tagged blood artefacts; (b) the displacement (dotted line) between successive frames; (c) the position of the tag at maximum displacement; (d) tag fading at end-diastole.
2.2 Adaptive Motion Tracking To determine the required motion-compensation of the valve plane at different phases of the cardiac cycle, multi-resolution image registration of the acquired HLA/VLA image series was performed [5]. Initially, the left ventricle ROI was manually delineated on one chosen VLA and HLA reference frame. The registration process was then performed between neighboring frames, starting with the reference frame and propagating both forwards and backwards. At each stage the initial transformation was computed as the cumulative motion since the reference frame. Due to the removal of all high frequency anatomical information and gradual spatial attenuation of the COMB pre-pulse, the COMB acquisitions provide smooth and continuous derivative data ideal for image registration methods based on quasi-Newtonian optimisation. For this study, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [6] was used to optimise a piecewise bilinear mapping to maximise the normalised cross-correlation (NCC) between reference and source images. NCC was employed to compensate for the temporal attenuation of the COMB excitation over time. Performance and convergence was further improved by utilising the closed form partial derivatives of the NCC with respect to the parameters Θ of the transformation mapping m:
Motion-Compensated MR Valve Imaging
367
σ ( I t )⋅ ∂∂Θ σ ( I r , I t ) − σ ( I r , I t )⋅ ∂∂Θ σ ( I t ) ∂ ncc ( I r , I t ) = where 2 ∂Θ σ ( I r )⋅ σ ( I t ) It = I s D m
σ ( I t , ∂∂Θ I t ) ∂ σ ( It ) = ∂Θ σ ( It )
∂I ∂m ∂ ∂ σ ( I r , I t ) = σ ( I r , ∂∂Θ I t ) It = t ⋅ ∂Θ ∂Θ ∂m ∂Θ
(1)
Ir and Is are the reference and source images respectively, σ(I1) represents variance and σ(I1,I2) covariance. To avoid convergence to local maxima a Haar multiresolution pyramid approach was used. The optimisation was performed on Gaussian sub-sampled images from 8×8 to 128×128 pixels so that at each stage more and more localised motion was accounted for. This combination of multi-resolution, bilinear mapping and sampling, closed-form derivatives, BFGS optimisation and C++ implementation was capable of motion estimating a 32 frame COMB acquisition (30 registrations) in 16 seconds on a 2Ghz PC, making it suitable for online processing. The tracked end-points of the ROI are cubic spline interpolated between timepoints so they can be used to derive the offset and orientation of the tracked slice at any point in the cardiac cycle. It is important to note that due to noise, the points are unlikely to be exactly co-planar, so the orthogonal distance regression plane was obtained. The slice offset (position) vector was defined as the centroid of the ROI endpoints, and the slice orientation (normal) vector was derived using singular value decomposition (SVD) as the eigenvector of MTM corresponding to its smallest eigenvalue, where M is the matrix of ROI end-points each subtracted by the centroid. 2.3 Super-Resolution Enhancement In 2D multi-slice MRI, the signal to noise ratio (SNR) decreases proportionally with voxel size, since the thinner the slice to be excited by the RF pulse, the sharper the edges of the slice excitation profile and so the longer the pulse duration. In order for the motion-tracked cine acquisition to attain an acceptable SNR, the slice thickness has to be relatively large, which is detrimental to clear depiction of the value leaflets due to a partial volume effect. Super-resolution is a retrospective method to increase resolution and/or increase SNR. This technique reconstructs a high resolution volume from multiple low resolution images with sub-voxel shifts. Due to the band-limited nature of MRI, in-plane resolution cannot be increased retrospectively [7], therefore super-resolution aids only SNR. In this study, super-resolution was performed in the slice-select direction [8]. Overlapping slices were acquired for the motioncompensated acquisition and super-resolved by using a projection onto convex set (POCS) approach [9]. The stack of voxels at each in-plane location were conceptually grouped into m non-overlapping input vectors a, and modelled such that: a k = Wk b + n k , for k = 1...m
(2)
where b is the super-resolved output vector to be reconstructed, W is the transformation matrix mapping each voxel in b to its constituent voxels in a1…m, and n is noise. With POCS, a priori knowledge is incorporated via deterministic constraints. Let Ci = {u} be the set of images satisfying constraint i and {û} the set that do not. If Ci is closed and convex, the constraint can be defined by a non-expansive projection
368
A.W. Dowsey et al.
operator Pi that maps each u to itself and each û onto a u so that |Piû – û| ≤ |u – û| for all u. POCS then defines the recurrence relation: u n+1 = Pn Pn−1 ...P2 P1u n
(3)
which converges weakly from any starting estimate u0 to a u in the intersection of all constraints C1…Cn. Based on the data model in (2), the data consistency constraint set is represented for each pixel y in yk as [10]: Ck ( y ) = { x : rx ( y ) ≤ δ ( y )} where rx ( y ) = y − ∑ x∈x xWk [ y; x ]
(4)
A benefit of this approach is that δ represents the statistical confidence we have in the observation, which is set empirically to smooth out aliasing caused by noise and motion artefacts between slices. 2.3 Data Acquisition and In Vivo Verification For in vivo verification, a cine gradient-echo echo-planar sequence with COMB prepulse was implemented on a Siemens Sonata 1.5 Tesla scanner (Erlangen, Germany). The parameters for the imaging sequence were as follows: slice thickness = 12mm, field of view = 300mm × 225mm, matrix size = 128 × 96, echo-planar readout factor = 8, breath-hold duration 20 cardiac cycles, temporal resolution (per interleaved cine pair) = 55ms, starting flip angle = 25°, incrementing to 90° for the last phase in the cardiac cycle. For the motion-tracked breath-hold acquisitions of the aortic valve, a breath-hold cine segmented FLASH sequence was used. The imaging parameters used were slice thickness = 8mm, in-plane resolution = 1.2mm × 2.4mm, temporal resolution = 50ms, and breath-hold duration = 16 cardiac cycles. A total of 20 parallel acquisitions were performed with the slice position offset by 2mm on each acquisition (super-resolution slice overlap = 6mm), resulting in full coverage of the aortic root.
3 Results To demonstrate the effect of slice tracking, Fig. 3 compares a cine acquisition of the aortic valve with and without motion-compensation for one of the 8 subjects studied. In Fig. 3(a), the whole cine was acquired conventionally without slice-tracking. It is evident that cardiac motion has caused the aortic valve to move out of the imaging plane early in the cardiac cycle. In Fig. 3(b), the base of the aortic root was tagged, and its motion through the cardiac cycle determined. The resulting image shows the valve plane within the image slice in all frames of the acquisition, clearly depicting the valve opening/closing during the cardiac cycle. The corresponding POCS superresolution results are shown in Fig. 3(c), demonstrating the increased SNR and reduced partial-volume affect. To assess the accuracy of the COMB tracking technique, the derived automatic motion-compensation results were compared with manual tracking of the COMB VLA/HLA images. Shapiro-Wilk tests confirmed that residual error between the two approaches was normally distributed. Table 1 summarises the results for the 8 collated COMB tag acquisitions from 5 subjects. The mean, maximum and standard deviation
Motion-Compensated MR Valve Imaging
0ms
50ms
100ms
350ms
369
400ms
a)
b)
c)
Fig. 3. Cine frames showing the opening and closing of the aortic valve at different phases of the cardiac cycle, where (a) corresponds to image acquisition without motion-tracking, (b) with motion tracking, and (c) motion tracking with super-resolution enhancement.
for the orientation angle and offset distance error between the generated tracked slices for automatic and manual tracking were calculated for each dataset. The manual delineation was highly subjective due to blood artefacts and COMB tag fading (see Fig. 2). In contrast, in the automatic approach only the VLA/HLA pair offering the clearest delineation need be processed manually. Over the 8 datasets the motioncompensation was consistent, only failing on 4 frames due to tag fading. In each case, the mis-tracking could be easily identified and interpolated. The automatic approach with initial ROI delineation and tracking verification took between 20-30 seconds to perform whilst the manual tracking took an average of 3 minutes. Bland-Altman plots comparing manual vs automatic approaches are presented in Fig. 4(a)(b) for each separate slice parameter in dataset 1 from Table 1. Aortic valve motion varied by 7.3mm and 4.8° in the sagittal axis, 5.4mm and 7.7° in the coronal axis and 12.2mm and 7.5° in the transverse axis, whereas 95% of residuals were bounded within 1.4mm and 2.6° in the sagittal axis, 1.4mm and 3.6° in the coronal axis and 1.2mm and 2.7° in the transverse axis. In a minority of cases, such as Fig. 4(b)(z), the paired Student’s t-test found bias due to the subjective nature of manual ROI delineation. Fig. 4(c) shows the results for a manual vs manual experiment, to gauge the subjective reproducibility. In both manual vs manual and manual vs automatic comparisons, orientation variance was more pronounced than offset variance, though when taken in context of typical aortic valve diameter, an atypical 3° error would be needed to cause a maximum voxel displacement error of 1mm.
370
A.W. Dowsey et al.
Table 1. Statistics of the tracking accuracy for the 8 datasets, illustrating both position and orientation errors between the automatic and manual approaches Slice offset error (mm)
Orientation error (degrees)
Dataset
Timepoints
Mistracked
mean
sd
max
mean
sd
max
1 2 3 4 5 6 7 8
32 24 22 34 22 28 24 24
0 1 0 2 0 1 0 0
0.55 0.87 0.67 1.29 0.79 1.07 0.80 0.57
0.22 0.50 0.30 0.91 0.33 0.95 0.33 0.25
0.97 2.31 1.24 4.03 1.42 4.63 1.46 1.01
1.22 1.34 1.20 1.46 0.94 1.07 0.88 1.26
0.71 0.79 0.39 0.67 0.20 0.71 0.25 0.65
3.23 4.00 1.87 2.77 1.70 3.41 1.88 3.01
(x)
a) 1
Difference / mm
0 -1
0 -1
-6
-4
-2
0
2
4
1 0 -1 -6
-4
-2
0
2
4
6
-6 2
-6
-2
0
2
(x)
4 2
6
-2
0
2
4
6
-4
-2
0
2
4
6
-4
-2
0
2
(z)
1 0 -1
-4
-4
(y)
1 0 -1
(z)
b)
-6
6
(y)
1 0 -1
Difference / degrees
(x)
c) 1
-6
(y)
2
0
0 -6 -4 -2 0
2
4
6
(z)
0
-2
-2
4 6 Mean / mm
-2 -6 -4 -2 0
2
4
6
-6 -4 -2 0 2 4 6 Mean / degrees
Fig. 4. Bland-Altman plots for dataset 1 in Table 1, comparing the average motion in the sagittal (x), coronal (y) and transverse (z) axes against the difference between: a) automatic and manual tracking of slice offset; b) automatic and manual tracking of slice orientation; c) two manual delineations of slice offset. The mean (dotted line) and two standard deviations either side of the mean (dashed line) are shown.
Motion-Compensated MR Valve Imaging
371
4 Conclusions In this study, we have demonstrated that a motion-compensated imaging scheme with COMB tagging provides an effective means of aortic valve imaging. In terms of slice positioning, the method leads to accuracy equivalent to a manual approach but with greater consistency and speed. In this study, the slice-tracked valve images were acquired with a segmented FLASH imaging sequence which can easily be modified to allow phase velocity mapping and thereby slice-followed flow measurements throughout the cardiac cycle. The use of multiple parallel COMB tags will permit adaptive imaging that follows tissue motion. This will have significant implications for improved tracking of cardiac motion, a challenge that has hitherto been difficult with MRI. Such tracking applied to the coronary arteries will facilitate investigation of the progression of atherosclerosis. Acknowledgement. This work was supported by British Heart Foundation grant PG/04/078/17370.
References 1. Karwatowski, S.P., Mohiaddin, R., Yang, G.-Z. Firmin, D.N, Sutton, M.S., Underwood, S.R., Longmore, D.B.: Assessment of regional left ventricular long-axis motion with MR velocity mapping in healthy subjects. J. Magn. Reson. Imaging 4 (1994) 151-155 2. Kozerke, S., Scheidegger, M. B., Pedersen, E. M., Boesiger, P.: Heart motion adapted cine phase-contrast flow measurements through the aortic valve. Magn. Reson. Med. 42 (1999) 970-978 3. Kozerke, S., Schwitter, J., Pedersen, E.M., Boesiger, P.: Aortic and mitral regurgitation: Quantification using moving slice velocity mapping. J. Magn. Reson. Imaging 14 (2001) 106-112 4. Dumoulin, C.L., Doorly, D.J., Caro, C.G.: Quantitative measurement of velocity at multiple positions using COMB excitation and fourier velocity encoding. Magn. Reson. Med. 29 (1993) 44-52 5. Veeser, S., Dunn, M.J., Yang G.-Z.: Multiresolution image registration for twodimensional gel electrophoresis. Proteomics 1 (2001) 856-870 6. Nocedal, J.: Updating quasi-Newton matrices with limited storage. Math. Comp. 35 (1980) 773-782 7. Greenspan, H., Oz, G., Kiryati, N., Peled, S.: MRI inter-slice reconstruction using superresolution. Magn. Reson. Imaging 20 (2002) 437-446 8. Peeters, R.R., Kornprobst, P., Nikolova, M., Sunaert, S., Vieville, T., Malandain, G., Deriche, R., Faugeras, O., Ng, M., Van Hecke, P.: The use of super-resolution techniques to reduce slice thickness in functional MRI. Int. J. Imaging Sys. Tech. 14 (2004) 131-138 9. Stark, H., Oskoui, P.: High-resolution image recovery from image-plane arrays, using convex projections. J. Opt. Soc. Amer. A 11 (1989) 1715-1726 10. Tekalp, A. M., Ozkan, M. K., Sezan, M. I.: High-resolution image reconstruction from lower-resolution image sequences and space-varying image restoration. Proc. IEEE ICASSP 3 (1992) 167-172
Recovery of Liver Motion and Deformation Due to Respiration Using Laparoscopic Freehand 3D Ultrasound System Masahiko Nakamoto1 , Hiroaki Hirayama1, Yoshinobu Sato1 , Kozo Konishi2 , Yoshihiro Kakeji2 , Makoto Hashizume2 , and Shinichi Tamura1 1
Division of Image Analysis, Graduate School of Medicine, Osaka University, Japan 2 Graduate School of Medical Sciences, Kyushu University, Japan
Abstract. This paper describes a rapid method for intraoperative recovery of liver motion and deformation due to respiration by using a laparoscopic freehand 3D ultrasound (US) system. Using the proposed method, 3D US images of the liver can be extended to 4D US images by acquiring additional several sequences of 2D US images during a couple of respiration cycles. Time-varying 2D US images are acquired on several sagittal image planes and their 3D positions and orientations are measured using a laparoscopic ultrasound probe to which a miniature magnetic 3D position sensor is attached. During the acquisition, the probe is assumed to move together with the liver surface. In-plane 2D deformation fields and respiratory phase are estimated from the timevarying 2D US images, and then the time-varying 3D deformation fields on the sagittal image planes are obtained by combining 3D positions and orientations of the image planes. The time-varying 3D deformation field of the volume is obtained by interpolating the 3D deformation fields estimated on several planes. The proposed method was evaluated by in vivo experiments using a pig liver.
1
Introduction
In laparoscopic liver surgery, 3D ultrasound (3D US) is a useful modality for intraoperative 3D imaging of internal structures of the liver such as vessels or tumors due to its realtime and non-invasive nature. Aiming at safe and accurate surgery, the internal structures can be virtually seen through by superimposing the 3D US onto the real laparoscope view. Motion and deformation of the liver due to respiration is a major problem in maintaining accurate superimposition. Although breath holding during superimposition is often performed, it not only disturbs the smooth operation but may affect the patient condition if it is performed many times. Olbrich et al. proposed a gating method for visualizing the superimposition only at the expiration phase, which is regarded as the most stable phase in the respiratory cycle [1], However, its visualization is not continuous but intermittent. On the other hand, studies on recovery of liver motion from preoperative MR images have been reported [2][3][4]. There are the following disadvantages in intraoperative use of these methods: (1) Preoperative R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 372–379, 2006. c Springer-Verlag Berlin Heidelberg 2006
Recovery of Liver Motion and Deformation Due to Respiration
373
images may not capture actual intraoperative dynamic motion of the liver in the laparoscopic setup in which the abdominal cavity is filled with air. (2) Acquisition of time-varying 3D MR images in acceptable spatiotemporal resolution is time-consuming. In this paper, we describe a method for intraoperative recovery of respiratory motion and deformation of the liver using a laparoscopic freehand 3D US system for the purpose of augmented reality visualization of 4D US. Advantages of the proposed method are as follows: – The data acquisition protocol is simple and rapid so that it can be performed intreoperatively. Acquisition of several cross-sectional time-varying 2D US images is only required in addition to conventional 3D US image acquisition using a freehand probe. – Intraoperative dynamic motion and deformation of the liver in the laparoscopic setup are recovered. The proposed method utilizes the assumptions on the characteristics of the liver motion indicated in the previous work [2]. Its main component exists in the cranio-caudal (5-25 mm) and anterior-posterior (1-12 mm) directions, and the component in the left-right direction is sufficiently small (1-3 mm) [2]. That is, we assume that the nonrigid deformation mainly exists in the sagittal planes and is negligible in the left-right direction (although rigid motion component across the sagittal planes is incorporated in our method). In addition, we assume that the liver motion as periodical because the patient is under general anesthesia and the breathing is controlled by the respirator. The proposed method combines rigid 3D motions of the image planes corresponding to the US probe motion and time-varying in-plane 2D deformation fields estimated from time-varying 2D US images acquired at several (quasi-)sagittal planes to recover the 4D deformation field of the liver. Since the laparoscopic US probe is in contact with the liver surface during the US image acquisition, we assume that the US probe motion corresponds to the liver surface motion with moderate contact pressure. We perform animal experiments to evaluate and demonstrate feasibility of the proposed method.
2 2.1
Methods System Overview and Data Acquisition
We employ a freehand 3D ultrasound system described in [5]. The position and orientation of the probe tip are measured using a miniature magnetic tracker attached to the tip of the ultrasound probe (Fig. 1). Spatial relationship between the tracker and the US image is determined by preoperative calibration. Magnetic field distortion caused by metallic objects is corrected in a rapid manner by using the method described in [6] just before US image acquisition. Time-varying 2D US image sequences of several (quasi-)sagittal planes as well as their positions and orientations are acquired during two or more respiratory
374
M. Nakamoto et al.
cycles. We assume that the probe tip is put directly on the liver surface with moderate contact pressure so that it moves together with the liver surface. The time-varying data set Di of i-th sagittal plane is defined as Di = {(Ii (t), Ti (t), Ri (t))|t = 1, ..., Ni },
(1)
where Ii (t) is time-varying 2D US images, and Ti (t) and Ri (t) are positions and orientations of the US images, respectively. t denotes a frame number and Ni is the total number of the acquired frames.
Fig. 1. Intraoperative position measurement and laparoscopic ultrasound image acquisition of the liver
2.2
Estimation of Respiratory Cycle
Temporal registration among acquired time-varying data sets is necessary. To do so, the reference points of time in the respiratory cycle for time-varying data sets of different sagittal planes are determined. As reference points, we determine the start and end points of deformation as well as the inspiration and expiration points for each time-varying data set by analyzing the positions of US image planes and time-varying similarities between temporally adjacent US images. Normalized cross correlation (NCC) is used as a similarity measure, which is considered to be in inverse proportion to the amount of deformation. Figure 2 shows an example of time-varying similarities and positions of the image plane. The respiratory cycle is divided into static and dynamic phases based on the magnitude of similarity. In the static phase, the similarity has a nearly constant and large value. The start point of deformation is determined as a point where the similarity begins to decrease (point D in Fig. 2). The end point of deformation is also determined as a point where the similarity returns to the large value (point B in Fig. 2). Using points D and B, the dynamic and static phases are divided. The inspiration point is determined as the peak point during the dynamic phase (point A denoted by circles in Fig. 2), and is used as a reference point for temporal registration since it can be localized stably and accurately. The point where the change in both US images and the position is the smallest is determined as the expiration point (point C in Fig. 2), 2D US image at the expiration point is used as a reference image for nonrigid registration described in the next subsection since it is regarded as the most stable state.
Recovery of Liver Motion and Deformation Due to Respiration
Similarity
C
D
A B
C D
A
2
0.8
1
0.6
0
0.4 0.2 0
-1
Similarity Position
0
2
4
6
-2 8
Time (sec)
10
12
14
Position (mm)
A B
1
375
-3
Fig. 2. Analysis of respiratory cycle. A: Inspiration point. B: Start point of deformation. C: Expiration point. D: End point of deformation.
We confirmed that the above four reference points could be stably localized by 1D signal analysis of time-varying similarities and positions shown in Fig. 2 (although the details of the methods are not described here). The respiratory cycle length is determined as the interval between adjacent inspiration points. 2.3
Estimation of In-Plane Deformation
To estimate time-varying in-plane deformation fields from the US image sequences, we perform intensity-based non-rigid registration [7]. NCC is used as a similarity measure. The difference between temporally adjacent k-th and (k +1)th deformations is regarded as small since respiratory motion and deformation are smooth and continuous. Therefore, estimated k-th deformation is used as an initial deformation for the estimation of (k + 1)-th deformation. k-th 2D deformation field mi (k) is defined as mi (k) maximizing N CC(F (Ii (te,i ), mi (k)), Ii (te,i + k − 1)),
(2)
where N CC(I, J) denotes NCC of images I and J, F (I, m) denotes the deformed image of I by deformation field m, and te -th frame corresponds to the expiration point (point C in Fig. 2). To accurately estimate the deformation of the liver, the liver region is manually segmented beforehand only in the te -th frame, which is not time-consuming since only one short curve is specified at the bottom of the liver. 2.4
Recovery of 4D Motion and Deformation
The recovery processes of 4D liver motion and deformation consist of three stages. Firstly, temporal registration among time-varying data sets is performed by aligning the inspiration points (point A in Fig. 2), and then the frame offset γi of each time-varying data set is obtained. The normalized frame number t = ti − γi represents the same respiratory phase among the time-varying data sets. Secondly, by combining the position and orientation of a US image
376
M. Nakamoto et al.
(Ti (t ), Ri (t )) and deformation field mi (t ), 3D deformation vector mi (t ) at an arbitrary position p on the US image plane is estimated as mi (p, t ) = Ri (t )mi (p, t ) + (Ri (t ) − Ri (t0 ))p + (Ti (t ) − Ti (t0 )).
(3)
Finally, the deformation vector at an arbitrary 3D position in the whole volume covering the 3D US image is obtained by performing interpolation of the 3D deformation fields on the 2D US image planes. In order to interpolate the 3D vectors sampled in a scattered nature due to freehand image acquisition, we employ a multilevel B-spline fitting method [8].
3 3.1
Experimental Results Experimental Conditions
We performed in vivo experiments using a pig in the laparoscopic environment to evaluate the proposed method. The pig was under general anesthesia and its breathing was controlled by the respirator. The ventilation volume was 400 cc. SSD-5500 (ALOKA Co., Japan) was employed as an ultrasound system. The field of view of the US images was 27 x 56 mm2 , and the frame rate was 30 frames per second. microBIRD (Ascension Technology Co., Burlington, VT) was employed to magnetically track the US probe tip. Polaris (Northern Digital Inc., Waterloo, Ontario, Canada) was employed for correction of the magnetic field distortion of microBIRD using a magneto-optic hybrid tracker [6]. We acquired two data sets around the porta hepatic of the pig, which covered the volume of 27 × 56 × 60 mm3 and 27 × 56 × 90 mm3 , respectively. These data sets consisted of six and seven 2D-US sagittal planes of average intervals of 12.6 mm and 15.7 mm, respectively. 3.2
Recovered Motion and Deformation
Figure 3 shows the motion and deformation of the pig liver recovered from the data set. The estimated respiratory cycle length was 6.0 seconds. The amount of recovered motion was the largest in the cranio-caudal, next largest in the anterior-posterior, and the smallest in the left-right directions. It was maximum at the inspiration point in all the three directions. The recovered 4D deformation fields were combined with the 3D US images of the same regions as those covered by the data sets acquired for recovery of the motion and deformation in order to reconstruct the 4D US images. Figure 4 shows the 3D liver vessel model reconstructed from the 3D US images and the US image frames of sagittal planes used for recovery of the motion and deformation. The 3D US images were acquired at the expiration phase during breath holding. Figure 5 shows the 4D liver vessel model reconstructed by combining the 3D liver vessel model and the recovered 4D deformation field. Distinctive motion was observed from the lateral view, where the vessels moved up and down along the different paths during the expiration and inspiration phases, respectively.
Recovery of Liver Motion and Deformation Due to Respiration
Movement (mm)
14
377
Total C-C A-P L-R
12 10 8 6 4 2 0
0.0
1.0
2.0
3.0
Time (sec)
4.0
5.0
6.0
Fig. 3. Magnitude of recovered motion of the liver along the body axis (Data set 1). CC: cranio-caudal movement. A-P: anterior-posterior movement. L-R: lateral movement.
Fig. 4. 3D US liver vessel model and US image frames of sagittal planes (Data set 1). C-C: the cranio-caudal axis. A-P: the anterior-posterior axis. L-R: the lateral axis.
Fig. 5. Reconstructed 4D liver vessel model (Data set 1)
3.3
Accuracy Evaluation of Recovered 4D Deformation Field
To evaluate the effects of intervals of sagittal planes on the accuracy, we performed leave-N -out cross validation. Recovery of the deformation fields was performed by removing selected N planes and then the deformation fields on the
378
M. Nakamoto et al.
N planes were used for validation by comparing them with the recovered deformation field of the whole volume. The validation planes were selected based on three conditions; (a) a single plane, (b) every one plane, and (c) every two of three planes. The average interval was the smallest in condition (a) and the largest in condition (c). The error Ei,t was defined as 2 x,y,z∈Ri |V al(x, y, z, t) − Est(x, y, z, t)| Ei,t = , (4) n where n = x,y,z∈Ri 1, and Ri was the point set on the selected planes. V al(x, y, z, t) and Est(x, y, z, t) were original and recovered deformation vectors at position (x, y, z) and time t, respectively. Here, the original deformation vectors means the 3D deformation field estimated in the second step of 2.4. Table 1 summarizes the results. The error increased during the period from the expiration point to the inspiration point and then decreased to the next expiration point. The maximum error was observed around the inspiration point. The magnitude of error along time was correlated to that of respiratory motion, and the error remained 1.0-1.5 mm at the end of the cycle. The maximum error under condition (b) during one cycle was comparable to condition (a), and it was around 2.0 mm (Table 1). Table 1. Results on accuracy evaluation for recovery of deformation vectors in whole volume Condition Average interval (mm) Data set 1 Maximum error (mm) Average error (mm) Average interval (mm) Data set 2 Maximum error (mm) Average error (mm)
4
(a) 15.8 1.9 1.3 15.7 2.4 1.0
(b) 25.2 1.8 1.2 18.8 2.3 1.0
(c) 31.3 2.2 1.2 42.2 2.7 1.1
Discussion and Conclusions
We have described an intraoperative recovery method of respiratory motion and deformation of the liver using a laparoscopic freehand 3D-US system. We confirmed that plausible motion and deformation of the pig liver were recovered from several time-varying 2D US images whose 3D positions and orientations were measured. The characteristics of the recovered motion were qualitatively consistent with those reported previously. The results of the cross validation to evaluate the interpolation accuracy showed the estimation error of the deformation field was around 2 mm even at the inspiration point using additional acquisition of three 2D US images during a couple of respiratory cycles for the volume of around 27 × 56 × 60 – 90 mm3 . The additional image acquisition time was two or three minutes. Therefore, the 3D US images of the liver can be
Recovery of Liver Motion and Deformation Due to Respiration
379
extended to the 4D US images by spending a little time on acquiring additional data sets. Computation time was around 30 minutes in current implementation, but it might be significantly reduced by introducing GPU-based technologies. To recover 3D deformation field from 2D deformation fields on several sagittal planes, we assume that nonrigid deformation in the lateral direction is negligible although rigid translation in the lateral direction was incorporated as motion of the whole image frame. If large deformation in the lateral direction occurs, the estimation of in-plane deformation would not be accurate. However, the recovered motion in the lateral direction was less than 2 mm in the recovered deformation field. We validated the in-plane deformation using several localizable feature points, and its accuracy was less than 1 mm in average. Another assumption was that the liver motion as periodical. We also validated that the repeatability of the pig liver motion was much higher in the laparoscopic environment than the open surgery environment. Although there is concern that a trocar may restrict the US probe movement, the US probe tip could be moved smoothly together with the liver surface during several respiratory cycles under laparoscope. In addition, all of the time-varying 2D US images could be adjusted almost parallel to the sagittal plane. We conclude that the potential usefulness of the proposed method was demonstrated based on the above considerations. Future work will include accuracy evaluation using markers embedded in the pig liver.
References 1. Olbrich, B., Traub, J., Wiesner, S., et al.: Respiratory motion analysis: Towards gated augmentation of the liver. In: Proceedings of Computer Assisted Radiology and Surgery (CARS 2005), Berlin, Germany (2005) 248–253 2. Rohlfing, T., Maurer Jr., C.R., O’Dell, W.G., Zhong, J.: Modeling liver motion and deformation during the respiratory cycle using intensity-based nonrigid registration of gated MR images. Medical Physics 31 (2004) 427–432 3. Penney, G., Blackall, J., Hawkes, D.J.: Registration of freehand 3D ultrasound and magnetic resonance liver images. Medical Image Analysis 8 (2004) 81–91 4. von Siebenthal, M., Cattin, P., Gamper, U., et al.: 4D MR imaging using internal respiratory gating. In: Lecture Notes in Computer Science, 3750 (Proc. MICCAI 2005, Part II), Palm Springs, CA, USA (2005) 336–343 5. Anonymous: 3D ultrasound system using a magneto-optic hybrid tracker for augmented reality visualization in laparoscopic liver surgery. In: Lecture Notes in Computer Science, 2489 (Proc. MICCAI 2002, Part II), Tokyo, Japan (2002) 148–155 6. Anonymous: A rapid method for magnetic tracker calibration using a magneto-optic hybrid tracker. In: Lecture Notes in Computer Science, 2879, (Proc. MICCAI 2003, Part II), Montreal, Canada (2003) 285–293 7. Rueckert, D., Sonoda, L.I., Hayes, C., et al.: Nonrigid registration using free-form deformations: Application to breast MR images. IEEE Trans. Med. Imag. 18 (1999) 712–721 8. Lee, S., Wolberg, G., Shin, S.Y.: Scattered data interpolation with multilevel Bsplines. IEEE Trans. Visualization and Computer Graphics 3 (1997) 228–244
Numerical Simulation of Radio Frequency Ablation with State Dependent Material Parameters in Three Space Dimensions Tim Kröger1, Inga Altrogge1 , Tobias Preusser1 , Philippe L. Pereira2, Diethard Schmidt2 , Andreas Weihusen3 , and Heinz-Otto Peitgen1,3 1
2
CeVis – Center for Complex Systems and Visualization, University of Bremen, Germany {kroeger, altrogge, preusser, peitgen}@cevis.uni-bremen.de Dept. of Diagnostic Radiology, Eberhard Karls University, Tübingen, Germany {philippe.pereira, diethard.schmidt}@med.uni-tuebingen.de 3 MeVis – Center for Medical Diagnostic Systems and Visualization, Bremen, Germany {weihusen, peitgen}@mevis.de Abstract. We present a model for the numerical simulation of radio frequency (RF) ablation of tumors with mono- or bipolar probes. This model includes the electrostatic equation and a variant of the well-known bio-heat transfer equation for the distribution of the electric potential and the induced heat. The equations are nonlinearly coupled by material parameters that change with temperature, dehydration and damage of the tissue. A fixed point iteration scheme for the nonlinear model and the spatial discretization with finite elements are presented. Moreover, we incorporate the effect of evaporation of water from the cells at high temperatures using a predictor-corrector like approach. The comparison of the approach to a real ablation concludes the paper.
1
Introduction
Minimally invasive treatment of tumors has become a promising alternative to other therapy forms. In particular for cases where surgical resection is too complex or even impossible, thermo-therapies such as radio frequency (RF) ablation are applied: One or more probes are placed in the tumor, producing an electric current that warms the neighboring tissue. When a critical temperature (≈ 54 ◦ C) is exceeded, proteins denaturate and the tissue is destroyed. To increase the coagulation volume, much higher temperatures are generated near the probes. This leads to evaporation of water in the cells (as can be seen through ultrasound monitoring of the ablation process) and to dehydration of the tissue. For the planning of RF ablation, the simulation of this treatment is an indispensable tool. The results of such simulations can provide information about a suitable probe position, power setup, and ablation time. The simulation of RF ablation (and related thermo-therapies) has been considered by several authors [1,2,3,4,5]. Many approaches assume constant material parameters, exclude evaporation of water [2,3], and use commercial grid generators and finite element software for the discretization of the resulting linear equations. Stein [1] R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 380–388, 2006. c Springer-Verlag Berlin Heidelberg 2006
Numerical Simulation of Radio Frequency Ablation
381
presented an approach which takes into account non-constant material parameters and evaporation but assumes a cylindrical symmetry around the probe. His discretization uses finite differences and explicit time-stepping. Several authors investigate the effect and the modeling of blood flow for RF ablation [3,6,7]. The results presented in this paper are part of a collaboration of CeVis and MeVis and a variety of clinical partners working together in publicly funded research projects1 . The main goal of these projects are the development and the evaluation of clinically useful computer assisted systems for planning, monitoring, and assessing of tumor ablation. A complete destruction of tumor tissue is essential for treatment success, similar to R0 resection in liver tumor surgery. Here, the numerical simulation of RF ablation for procedure planning can considerably contribute to a successful ablation. At CeVis, a group headed by the third author is developing advanced numerical simulation procedures. Our current model of RF ablation combines features of several existing models: It is a three-dimensional, nonlinear finite element model incorporating material parameters changing with temperature, dehydration, and damage. For the evaporation, we consider a simple two-phase model. This enables us to model the nonlinear behavior of the electric generator as a consequence of the dehydration (an effect that is known to change the ablation process essentially). Patientindividual segmented vascular systems are considered as well. Moreover, the discretization allows to simulate several simultaneously applied mono- or bipolar RF probes in any geometric constellation; a novelty is the usage of Robin boundary conditions for the electric potential at the domain boundary.
2
A Mathematical Model of Radio Frequency Ablation
Let us begin with a description of the geometrical setting: We consider a cuboidshaped domain Ω ⊂ R3 with (outer) boundary Γout = ∂Ω. We assume (cf. Fig. 1) that one or more probes are applied within Ω, the union of which will be denoted by Ωpr . The probes consist of insulating material and electrodes which are either positively or negatively charged. We let Ω+ , Ω− ⊂ Ωpr denote the union of all positive and negative electrodes, respectively, and Ω± = Ω+ ∪ Ω− . The mathematical model for the simulation of RF ablation consists of three interacting parts: The first part models the energy applied to the tissue by the probe and the electric generator. The second part considers the heat distribution in the tissue including sources and sinks and the phase change. The third part tackles the effect of the heat on the (malignant) cells and their destruction. These parts of the ablation model depend on a variety of material parameters, which themselves depend on the various states of the tissue. Thus, denoting the time-space computational domain by W = I × Ω where I = [0, tmax ], tmax > 0, and by C(D, E) the space of continuous functions D → E, we are looking for the following five states: 1
LITT / RFITT funded by the German Research Foundation (Pe 199/11-1 and Pe 199/15-2) and the national research networks VICORA and FUSION funded by the Federal Ministry of Education and Research (01EZ0401 and 01IBE03C).
382
• • • • •
T. Kröger et al.
electric potential ϕ ∈ C(W, R) of the probe induced by the generator, temperature distribution T ∈ C(W, R), relative content of fluid water FW ∈ C(W, [0, 1]), (1) relative content of vapor FV ∈ C(W, [0, 1]), coagulation state FC ∈ C(W, [0, 1]),
and the following four material parameters: • electric conductivity σ ∈ C(W, R), • density ρ ∈ C(W, R), (2) • specific heat capacity c ∈ C(W, R), • thermal conductivity λ ∈ C(W, R).
Γout Ω− Ω± Ω+ Ω
Ωpr
Fig. 1. Schematic geometrical setting. Note that Ω+ ∪ Ω− = Ω± ⊂ Ωpr ⊂ Ω.
Let us give some remarks on the state variables FW and FV : The evaporation of water is an important effect which leads to a drying-up of the tissue and consequently to a reduced conductivity for the electric current. The quantities FW and FV specify how much of the cell-volume available for water is actually filled by fluid water or vapor. When water evaporates, we assume that the surplus vapor will pass off due to the pressure and condensate in remote regions. On the other hand, if condensation takes place (e. g., if electrodes are switched off), the tissue will contract temporarily and water will only slowly rediffuse to fill the free space. Hence, instead of the intuitive relation FW + FV = 1, we only assume FW (t, x) + FV (t, x) ≤ 1
for all (t, x) ∈ W .
(3)
Material Parameters. Experimental investigations [1] have shown that all of the material parameters (2) depend nonlinearly on T , FW , and FC . But this dependence is not known in every detail and moreover it may vary for each patient individually. However, Stein [1] has shown that model equations can be used that are linear in each of T , FW , and FC , i. e. σ(t, x) = σ1 · 1 + σ2 (T (t, x) − Tbody ) 1 + σ3 (1 − FW (t, x)) 1 + σ4 FC (t, x) and analogously for ρ, c, and λ. Note that we abuse notation and write σ(t, x) instead of σ(T (t, x), FW (t, x), FC (t, x)). Following Stein, we use the constants σ1 ρ1 c1 λ1
= 0.21 A/Vm , = 1080 kg/m3 , = 3455 J/gK , = 0.437 W/Km ,
σ2 ρ2 c2 λ2
= 0.013 K−1 , = −0.00056 K−1 , =0 , = 0.0025 K−1 ,
σ3 ρ3 c3 λ3
= −1 , = −0.657 , = −0.376 , =0 ,
σ4 ρ4 c4 λ4
= 1.143 , =0 , =0 , =0
Numerical Simulation of Radio Frequency Ablation
383
(σ3 = −1 models the effect of drying-up on the conductivity: σ → 0 as FW → 0). Electric Energy. The energy applied by the probe is given by the time integral of the power p(t, x) = σ(t, x)|∇ϕ(t, x)|2 . For the frequencies used in RF-ablation, the electric potential ϕ is quasistatic [1] and modeled by −∇ · σ∇ϕ = 0 in I × (Ω \ Ω± ), (4a) ϕ = ±1 n · ∇ϕ =
on I × Ω± (fixed potential on electrodes), (4b)
n · (s − x) ϕ on I × Γout (Robin boundary conditions). (4c) |s − x|2
Condition (4c) (where n denotes the outer normal to Γout ) enables us to model bipolar as well as monopolar probes without prescribing a neutral potential on the boundary Γout as e. g. in [3]. The condition corresponds to the assumption that on Γout (i. e. far away from the probe), the potential behaves as it was induced by a point load at the barycenter s of the union of all probes. The values ±1 on Ω± are chosen arbitrarily. The correct potential induced by the generator is obtained by a scaling, which also takes into account the nonlinear behavior of the generator due to the tissue resistance Rtis (t). Denoting the setup power of the generator with psetup (typically in the range 20–200 W), we have 4psetupRtis (t)RI U2 ptotal (t) = p(t, x) dx , peff (t) = , R (t) = , tis (Rtis (t) + RI )2 ptotal (t) Ω where RI is the inner resistance of the generator and U is the difference of the potential ϕ on the two electrodes (U = 2 V for bipolar and U = 1 V for monopolar probes). From these quantities, we compute Qrf (t, x) = p(t, x) · peff (t)/ptotal (t). Temperature. The model equation for the temperature distribution is ∂t (ρcT ) − ∇ · (λ∇T ) = Q T = Tbody n · ∇T = 0
in I × (Ω \ Ωpr ),
(5a)
on I × Ωpr (cooled probe),
(5b)
on I × Γout (natural boundary conditions), (5c)
which is known as heat transfer equation, with the initial value T (t = 0) ≡ Tbody = 37 ◦ C. The right hand side is Q(t, x) = Qrf (t, x) + Qperf (t, x) + Qpc (t, x), where Qrf has been introduced above. Qperf models the cooling effect due to perfusion. Various authors have studied the modeling of perfusion in the context of RF ablation [6,8]. In our approach, we consider the capillary blood flow having diffusive effects on the temperature only. Thus, we use an extended version of Pennes’ approach [9] and set Qperf (t, x) = ν(x)ρblood cblood (Tbody − T (t, x)) if FC (t, x) < FC∗ := 1 − exp(−1) and Qperf (t, x) = 0 else. However, we use a spatially varying perfusion coefficient ν(x) which is 0.05 s−1 inside the hepatic and the portal vein, 0.1 s−1 inside the artery, and 0.01765 s−1 else [1]. Qpc is the heat source / sink due to the phase change of water. Modeling the thermodynamics of a phase change would involve further differential equations
384
T. Kröger et al.
for the pressure and thus increase the computational complexity. Instead, we use a predictor-corrector like approach which is described in the next section. Coagulation. The denaturation of tissue due to coagulation is modeled following the Arrhenius formalism (cf. [10,1]). This leads to FC = 1 − exp(−W ) , ∂t W = AA exp −EA (RT )−1 , W (t0 ) = 0 , (6) where R is the universal gas constant and AA = 9.4 · 10104 s−1 and EA = 6.7 · 105 J/mol are the corresponding Arrhenius parameters.
3
Discretization
After having introduced our model, we now explain how the differential equations are discretized. This section consists of three parts: We first present an overview of our overall discretization scheme in space and time and supply the details in the second part. The last part deals with the discrete phase change model. For the spatial discretization we work with the weak forms of (4) and (5): n · (s − x) 1 0= σ∇ϕ · ∇v dx − σϕv ds(x) ∀ v ∈ H± := H 1 , (7) 2 Ω± Ω\Ω± Γout |s − x| 1 0= ∂t (ρcT )v + λ∇T · ∇v − Qv dx ∀ v ∈ Hpr := H 1 , (8) Ωpr
Ω\Ωpr
Dividing Ω uniformly into a finite number of cuboids we introduce the finite element space V h of globally continuous, cell-wise trilinear functions. Next, we introduce our overall time discretization. Let X = (V h )5 denote the state space (i. e. any x ∈ X consists of all five states (1) in their respective finite-dimensional function space at a particular time level). For the solution of our nonlinear system, we have to deal with an X -valued initial value problem: Find x : I → X such that ∂t x = F (x) and x(t0 ) = x0 . We approximate the solution by a sequence of discrete time levels xn ≈ x(tn ), tn = tn−1 + τn computed by the implicit Euler scheme xn = xn−1 + τn F (xn ). This equation is solved using a fixed point iteration xn,k+1 = xn,k + τn G(xn−1 , xn,k , τn ) ,
xn,0 = xn−1 ,
(9)
where G(xn−1 , xn,k , τn ) is some semi-implicit approximation to F (xn ) to be explained below. When xn,k − xn,k−1 is below a given threshold for some k, then we set xn = xn,k for this k. If it fails to do so up to some given kmax , we multiply τn with 12 and restart (9). This is repeated until convergence is achieved. Fixed Point Iteration. We will now define our fixed point iteration function G by explaining how (ϕ, T, FW , FV , FC )n,k+1 is computed. We use a semi-implicit approach, i. e. ϕn,k+1 and T n,k+1 are chosen such that
Numerical Simulation of Radio Frequency Ablation
385
n · (s − x) n,k n,k+1 σ ϕ v ds(x) , (10) 2 Ω\Ω± Γout |s − x| n,k n,k ρ c 0= (T n,k+1 − T n−1 )v + λn,k ∇T n,k+1 · ∇v − Qn,k v dx (11) τn Ω\Ωpr 0=
σ
n,k
∇ϕ
n,k+1
· ∇v dx −
h h for all v ∈ V± and v ∈ Vpr , respectively. Equations (10) and (11) each lead to a system of linear equations, which are solved using a conjugate gradient solver. To complete the formula for T n,k+1 , we need to state our discretization of Qn,k . For the discretization of Qn,k rf , we note that σ and ϕ are approximated by n,k σ and ϕn,k+1 , and the ∇ operator is discretized by the monotonized centered n,k slope limiter [11]. For Qn,k perf , we note that FC and T are approximated by FC and T n,k+1 , which changes the matrix of the linear system (11). Finally, the discretization of FCn,k+1 is done in a straightforward manner.
Phase Change. Note that FW → 0 implies Qrf → 0, i. e. there will not be any heat source any more, and hence, the temperature is always bounded by Tboil . It is important to let the temperature increase up to Tboil without evaporation (i. e. Qpc = 0). If this point has been reached, the temperature has to be fixed as long as there is still a heat source or FV (t, x) > 0. We perform the whole phase change procedure as a post-processing step: We first solve a modified heat equation in which Qpc is omitted, denoting the result as T˜ n,k+1 , and then compute Qn,k pc such that T (t, x) is fixed at Tboil whenever this is necessary: FVn,k (t, x)Evap qvap ρn,k (t, x)cn,k (t, x) n,k+1 ˜ = min Tboil − T (t, x), . τn cn,k (t, x) n,k Then, we set T n,k+1 (t, x) = T˜ n,k+1 (t, x) + τn Qn,k (t, x)cn,k (t, x) and pc (t, x)/ ρ Qn,k pc (t, x) n,k+1 n−1 ˜ FW (t, x) = max FW (t, x) + τn ,0 , (12) Evap ρn,k (t, x) Qn,k pc (t, x) ˜ n,k+1 (t, x) , (13) FVn,k+1 (t, x) = min FVn−1 (t, x) − τn , 1 − F W Evap qvap ρn,k (t, x) n,k+1 τA F˜W (t, x) + τn (1 − FVn,k+1 (t, x)) n,k+1 FW (t, x) = . (14) τA + τn Qn,k pc (t, x)
Equations (12) and (13) model the change of the water and vapor content which is due to the phase change; here, Evap = 2.256 · 106 J/kg is the specific enthalpy of evaporation and qvap = 1/332 is the density ratio of vapor and water. The second term in (13) asserts (3). Finally, (14) describes an exponential flow-back (with speed determined by τA = 4 s) of water after the condensation process [1].
4
Numerical Results
For an initial evaluation of our algorithm, we compared the results to a data set from the University Hospital of Tübingen. Vascular system and tumor were
386
T. Kröger et al.
Fig. 2. Top: Overlay of the computed coagulation after 2, 5, and 8 minutes (dark) and the segmented coagulation (light). Bottom: Three-dimensional view including segmented vessel system and probes. (Images can be found at www.mevis.de/~tim/.)
100 80 60 40 20
Impedance (Ω)
Time (s) 50
100
150
200
250
300
350
400
450480
Fig. 3. Computed tissue impedance during the ablation
segmented from a pre-interventional CT scan [12]; ablation parameters were set up as recorded; and the coagulation was compared with a post-interventional CT scan, which, however, had been performed two months after the intervention. Ablation parameters were a “Cool-TipTM M RF CTCM-1525” cluster (three monopolar RF applicators in a fixed geometrical arrangement), a Radionics generator (set to its maximum power psetup = 200 W at a frequency of 480 kHz and an inner resistance of RI = 80 Ω), and 8 minutes ablation time. Our computational domain of approx. 60 × 40 × 40 mm was covered by interleaved grids of grid sizes of about 0.2 mm, 0.4 mm, and 0.8 mm. We chose a time step size of 1 s, that was reduced to 0.5 s (by the implicit time stepping scheme with kmax = 5) in 231 cases and to 0.25 s in 2 cases. The computation took 6 hours. Figure 2 shows simulation results. Since the post-interventional CT scan had been made two months later, the liver had already begun to regenerate, which might explain why the segmented lesion is slightly too small. Note also that the
Numerical Simulation of Radio Frequency Ablation
387
‘handle’ in the segmented lesion was produced by leaving the generator switched on while pulling out the probes; a process not included in the simulation. Figure 3 shows the computed tissue impedance progression. After about 230 s, the impedance increases drastically, because a layer of dried-up tissue has been formed around the probes. For models that do not incorporate evaporation of water, the impedance would stay constant (dashed line), a non-physical behavior.
5
Discussion
We presented a mathematical model of RF ablation and its discretization, combining the advantages of several different approaches. Also, to our knowledge, the proposed handling of monopolar probes and our evaporation model are new. In a preliminary application to patient data, the scheme yielded reasonable results. Future work will cover a more systematic evaluation as well as extensions such as error estimation, unstructured grids, and blood advection. The authors thank the VICORA team and in particular T. Stein and A. Roggan from Celon AG for valuable hints and fruitful discussions on the topic. Also, we would like to thank the team from MeVis, especially S. Zentis and C. Hilck for preprocessing the CT scans.
References 1. Stein, T.: Untersuchungen zur Dosimetrie der hochfrequenzstrominduzierten interstitiellen Thermotherapie in bipolarer Technik. Volume 22 of Fortschritte in der Lasermedizin. Müller and Berlien (2000) 2. Tungjitkusolum, S., Staelin, S.T., Haemmerich, D., et al.: Three-dimensional finiteelement analyses for radio-frequency hepatic tumor ablation. IEEE Trans. Biomed. Eng. 49 (2002) 3–9 3. Welp, C., Werner, J.: Large vessel cooling in hepatic radiofrequency ablation: investigation on the influence of blood flow rate. (IEEE Trans. Biomed. Eng.) submitted. 4. Deuflhard, P., Weiser, M., Seebaß, M.: A new nonlinear elliptic multilevel fem applied to regional hyperthermia. Comput. Vis. Sci. 3 (2000) 115–120 5. Roggan, A.: Dosimetrie thermischer Laseranwendungen in der Medizin. Volume 16 of Fortschritte in der Lasermedizin. Müller and Berlien (1997) 6. Jain, M.K., Wolf, P.D.: A three-dimensional finite element model of radiofrequency ablation with blood flow and its experimental validation. Ann. Biomed. Eng. 28 (2000) 1075–84 7. Preusser, T., Weihusen, A., Peitgen, H.O.: On the modelling of perfusion in the simulation of RF-ablation. In: Proceedings of Simulation and Visualization (SimVis), Magdeburg (2005) 8. Jain, R. K., Grantham, F. H., Gullion, P. M.: Blood flow and heat transfer in walker 256 mammary carcinoma. J. Natl. Cancer Inst. 69 (1979) 927–931 9. Pennes, H.H.: Analysis of tissue and arterial blood temperatures in a resting forearm. J. Appl. Physiol. 1 (1948) 93–122
388
T. Kröger et al.
10. Arrhenius, S.: Über die Reaktionsgeschwindigkeit bei der Inversion von Rohrzucker durch Säuren. Z. Phys. Chem 4 (1889) 226–248 11. LeVeque, R.J.: Wave propagation algorithms for multidimensional hyperbolic systems. J. Comp. Phys. 131 (1997) 327–353 12. Schenk, A., Prause, G., Peitgen, H.O.: Efficient semiautomatic segmentation of 3D objects in medical images. In: Proc. MICCAI. Volume 1935 of Springer LNCS. (2000) 186–195
Towards a Multi-modal Atlas for Neurosurgical Planning M. Mallar Chakravarty1, Abbas F. Sadikot1,2 , Sanjay Mongia2 , Gilles Bertrand1,2 , and D. Louis Collins1 1
McConnell Brain Imaging Centre 2 Department of Neurosurgery Montreal Neurological Institute 3801 University Street, Montreal, Quebec Canada, H3A 2B4
Abstract. Digital brain atlases can be used in conjuction with magnetic resonance imaging (MRI) and computed tomography (CT) for planning and guidance during neurosurgery. Digital atlases are advantageous since they can be warped nonlinearly to fit each patient’s unique anatomy. Functional neurosurgery with implantation of deep brain stimulating (DBS) electrodes requires accurate targeting, and has become a popular surgical technique in Parkinsonian patients. In this paper, we present a method for integrating postoperative data from subthalamic (STN) DBS implantation into an antomical atlas of the basal ganglia and thalamus. The method estimates electrode position from post-operative magnetic resonance imaging (MRI) data. These electrodes are then warped back into the atlas space and are modelled in three dimensions. The average of these models is then taken to show the region where the majority of STN DBS electrodes were implanted. The group with more favorable post-operative results was separated from the group which responded to the STN DBS implantation procedure less favourably to create a probablisitic distribution of DBS in the STN electrodes.
1
Introduction
Deep brain stimulation (DBS) has become a popular method to treat movement disorders such as Parkinson’s Disease [2,8,15,20]. Despite current improvements in imaging techniques [2,20], magnetic resonance imaging alone cannot suggest the final location for surgical targets. For intra-operative guidance, neurosurgeons will typically use a combination of pre-operative imaging, atlas-integration, stimulation of the internal capsule and subcortical nuclei, as well as electrophysiological recording in order to properly localize these surgical targets. Originally, print atlases were used to guide these surgical procedures [18,21]. Advances in medical image processing techniques have enabled the use of digital atlases representing the subcortical anatomy to be warped to match the individual patient anatomy and have been used to aid the visualization of surgical targets [1,4,5,9,19]. The use of multi-modal atlases has grown in popularity to aid in R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 389–396, 2006. c Springer-Verlag Berlin Heidelberg 2006
390
M.M. Chakravarty et al.
target localization [9,10,12,13,16]. The integration of functional recording (such as electrode stimulation recordings and electrophysiological recordings of the internal capsule and subcortical nuclei [9,11,12,13]) and the post-operative position of active DBS electrodes in the subthalamic nucleus using linear and nonlinear mappings [9,10,16] have been used to create a new class of atlases called the probabilistic functional atlas. In this paper we propose a new technique to create a probabilistic functional atlas based on an estimation of electrode position in post-operative data from Parkinson’s patients who underwent STN DBS implantation. In the creation of our probabilistic functional atlas each electrode was modelled as a three dimensional structure based on the geometry of each electrode in the DBS. All electrodes are then warped towards a common template using nonlinear atlasto-patient warping techniques based on [5]. The functional information has been added to an anatomical atlas of the basal ganglia and thalamus derived from a high-resolution set of serial histology [4]. This atlas contains 105 structures and has been nonlinearly warped to fit a high resolution MRI template known as the Colin27 MRI average [14]. The anatomical atlas can be seen in Fig. 1. The accuracy of this atlas to template warp was validated in [5].
Fig. 1. First three panels: Voxel-label atlas warped to fit the Colin27 MRI average. Bottom Right: 3D geometric atlas.
Towards a Multi-modal Atlas for Neurosurgical Planning
391
In this paper we will explain the methods used to create the probabilistic functional atlas (PFA) using postoperative data in Section 2. The results of the creation of the PFA are given in Section 3.
2
Probabilistic Functional Atlas Creation Techniques
The electrode model used in this paper is based on the actual geometry of the electrodes in the DBS. Details of the 3D model are given in Section 2.1. The estimation of the electrode position in postoperative data from patients who underwent STN DBS electrode implantation is described in Section 2.2. The method used to warp the electrode position in a final common atlas space is described in Section 2.3. Once all DBS electrodes are warped to the atlas, the probabilistic atlas is created (see Section 2.4). The electrode position was estimated on post-operative MRI data from 22 patients who underwent subthalamic STN DBS implantation. 2.1
3D Electrode Model
Each electrode in the dataset is modelled as a cylinder in a 3 x 3 x 3 cm binary volume on a grid with a 0.1mm isotropic voxel-to-voxel spacing. All voxels considered to belong to the electrode are set to an intensity value of 1 where all voxels outside of the electrode are set to 0. The diameter of cylinder is was 1.3 mm and the height was 1.5mm (based on the actual geometry of the electrodes). A chamfer distance map [3] is then estimated so that region around the electrode can be modelled as a decaying electric field using a simple 1/r2 model. This is an approximation of the influence of the electrode and will be discussed further in the conclusions presented in Section 4. This model is then blurred with a 2mm isotropic Gaussian blur to account for the uncertainty inherent in the model. A slice thrugh the volume of the initial cylinder, the 1/r2 and the blurred 1/r2 maps can be seen in Fig. 2.
Fig. 2. Electrode model. From left to Right: A top view of the initial binary electrode model, a view of the modeled 1/r 2 electric field around the electrode, and the electric field blurred with a 2mm isotropic Gaussian kernel.
392
2.2
M.M. Chakravarty et al.
Electrode Position Estimation
We assume that that any MRI artefact caused by the DBS lead is homogeneous in all three dimensions. For each subject, two points were manually identified. The first point, T , identified the top of the tract of the STN DBS and the second, B identified the bottom. The midpoint, M of the line segment defined by BT was → calculated. A unit row vector − u was defined using BT in the direction T . Using → − the M and u the position of the four electrodes in each DBS lead are estimated → using simple vector algebra: E = P − u + M , where E = [E3 , E2 , E1 , E0 ]T is a column vector vector of the final electrode position, and P = [P3 , P2 , P1 , P0 ] = [2.25, 0.75, −0.75, −2.25] is a column vector of magnitudes of the distance (in mm) of each electrode away from the M , based on the actual geometry of the DBS. The result of the electrode estimation can be seen in Fig. 3.
Fig. 3. Electrode Estimation. From left to Right: Sagittal and coronal views of the DBS electrode position estimation procedure. Top and bottom landmarks of the tract are clearly visible in sagittal view. Note that DBS electrodes are not necessarily visible in the coronal or sagittal planes and that the electrode positions are projected to these slices.
2.3
Nonlinear Patient-to-Atlas Warping
Once the electrode positions have been estimated on the post-operative data they need to be warped into a common atlas space. The post-operative MRI data is first linearly warped using a rigid body transformation (3 translations and 3 rotations) [7] to the pre-operative MRI for each subject. The nonlinear patient-toatlas transformation is then estimated from the pre-operative data to the atlas. This is done to eliminate the effects of artefacts caused by DBS electrodes in the post-operative data on the nonlinear transformation procedure described next. The nonlinear warping technique uses a modified version of the ANIMAL algorithm [6]. ANIMAL is an iterative algorithm which estimates a 3D deformation field which matches a source volume to the target volume. The transformation estimation consists of two steps: calculating a deformation at each node which will maximize the local similarity measure, and the second is a smoothing step to ensure that a continuous deformation field has been estimated. For further details on ANIMAL’s parameters, the reader is referred to [4,6,17]. In this implementation, similar to that in [5], we do not use blurred data, as we are estimating a high resolution deformation field. A mask in the region of subcortical nuclei is used in both the source and target data. Each nonlinear transformation estimated maximizes the similarity between the template and patient volume. The
Towards a Multi-modal Atlas for Neurosurgical Planning
393
final nonlinear transformation is estimated in three steps, larger deformations are estimated first and become the input transformation when estimating more refined deformations. The details of the ANIMAL parameters used at each step are shown in Table 1. For each step in the transformation estimation, the same weight, stiffness, and similarity values are used (1, 1, and 0.3 respectively, as determined in the optimization done by Robbins et al. [17]), and use the crosscorrelation objective function. Table 1. ANIMAL parameters used for warping patient MRI data to the Colin27 template Step Step Size (mm) Sub Lattice Diameter (mm) Sub Lattice Iterations 1 4 8 8 15 2 2 6 8 15 3 1 6 6 15
2.4
Final Functional Atlas Creation
Using the final concatenated patient-to-atlas transformation, each electrode is warped into the atlas space. In the atlas space the roll, pitch, and yaw angle of each electrode in each DBS lead is estimated by averaging two vectors defined the current electrode and the electrode above and the electrode below. The resulting rotation matrix is applied to the active electrode models. Once all the electrode models of all the active electrodes are in the atlas space they are averaged to create a PFA. Active electrodes are defined by the neurosurgical/neurologist team caring for the patient. The total group was subdivided into two groups using preoperative versus postoperative UPDRS motor score improvement. The two groups were defined as those subjects that showed greater than 30% and 50% improvement.
3
Results of Atlas Creation
The results of the final multi-modal (anatomical and PFA) atlas as are shown in Fig. 4. The figure is organized as follows. On the left, 4 panels are shown. From left to right on the top panel, the anatomical atlas in the region of the STN is shown and the average of all active electrodes on all subjects is shown. Looking at the bottom panel, from left to right, we see the groups subdivided into those patients who had greater than 30% improvement (17 subjects) and those who had greater than 50% improvement (11 subjects). On the right we see the 3D geometric atlas defined in [4] in the region of the subthalamic nucleus. The blue wire-frame structure is the STN, and red surface is an isosurface defined from the PFA of patients showing greater that 50% improvement. Using this type of volumetric data allows increased visualization of the STN area. In the bottom panels we see an expected narrowing of the probabilistic cluster which has been defined. In addition we see that this cluster approaches
394
M.M. Chakravarty et al.
Fig. 4. First four panels left, left to right, top to bottom: The atlas definition of the Zona Incerta (ZI, in green), STN (in orange), and the substantia nigra (in white) warped to the Colin27 MRI template. PFA created from all the active electrodes, PFA created from all patients who had a 30% postoperative improvement in UPDRS, and PFA created from all patients who had 50% improvement in postoperative UPDRS. Right: Sagittal view of 3D geometric atlas of the STN in blue, and region of highest probability taken from average of patients who had a 50% improvement in UPDRS (defined as volume in red). Part of the ZI can be seen in green.
the structural definition of the STN on the anatomical atlas. The 3D geometric view of this data is also very useful for anatomical planning, as it gives an a priori volumetric region for surgical targeting in the STN.
4
Conclusions and Future Work
In this paper we have demonstrated a method for the creation of a multi-modal atlas using a combination of anatomical and functional data. The PFA creation is based on an estimation of the final electrode position of STN DBS electrodes in postoperative MRI data. These final electrode locations on the atlas are estimated by matching the postoperative patient data to preoperative patient data with a rigid body transform. A nonlinear transformation matching the preoperative data to the atlas template is then estimated. The concatenated postoperative to template transformation is then used to transform the electrode positions to the atlas template. In the atlas template all electrodes are represented by a volumetric model. All active electrodes are averaged together and then grouped with respect the patient’s response to the surgical procedure postoperative. The results demonstrate a strong relationship between the location of the group averages with the location of the STN on the anatomical atlas. In addition this information can be used intraoperatively in three dimensions in order to aid in targeting during STN DBS implantation surgeries. Future work will include the addition of active electrode data, as well as the results of intraoperative stimulations and electrophysiological recordings in the STN region. The combination of this data will provide a powerful anatomical and functional visualization tool for preoperative and intraoperative use. The
Towards a Multi-modal Atlas for Neurosurgical Planning
395
electrode model must be further explored to better account for tissue conductivity, as well as frequency and voltage setting of each STN DBS electrode. However it has been shown in this paper that the proposed model provides a reasonable approximation of the STN DBS and the surrounding field.
References 1. E. Bardinet, D. Dormont, G. Malandain, M. Bhattacharjee, B. Pidoux, C. Saleh, P. Cornu, N. Ayache, Y. Agid, and J. Yelnik. Retrospective cross-evaluation of an histological and deformable 3d atlas of the basal ganglia on series on parkinsonian patients treated by deep brain stimulation. In Seventh International Conference on Medical Image Computing and Computer Assisted Intervention MICCAI 2005, volume 2 of Lecture Notes in Computer Science, pages 385–393, Palm Springs, USA, October 2005. Springer. 2. A.L. Benabid, A. Koudsie, A. Benazzouz, J.F. Le Bas, and P. Pollak. Imaging of subthalamic nucleus and ventralis intermedius of the thalamus. Movement Disorders, 17:S123–S129, 2002. 3. G. Borgefors. Distance Transformations in Arbitrary Dimensions. Computer Vision, Graphics, and Image Processing, 27:321–345, 1984. 4. M.M. Chakravarty, G. Bertrand, C.P. Hodge, A.F. Sadikot, and Collins D.L. The creation of a brain atlas for image guided neurosurgery using serial histological data. NeuroImage, 30(2):359–376, 2006. 5. M.M. Chakravarty, A.F. Sadikot, J. Germann, G. Bertrand, and D.L. Collins. Anatomical and electrophysiological validation of an atlas for neurosurgical planning. In Seventh International Conference on Medical Image Computing and Computer Assisted Intervention MICCAI 2005, volume 2 of Lecture Notes in Computer Science, pages 394–401, Palm Springs, USA, October 2005. Springer. 6. D.L. Collins and A.C. Evans. ANIMAL: validation and application of non-linear registration-based segmentation. International Journal of Pattern Recognition and Artificial Intelligence, pages 1271–1294, December 1997. 7. D.L. Collins, P. Neelin, T.M. Peters, and A.C. Evans. Automatic 3D Intersubject Registration of MR Volumetric Data in Standardized Talairach Space. J. of Computer Assisted Tomography, 18(2):192–205, March 1994. 8. E. Cuny, D. Guehl, P. Burbard, C. Gross, V. Dousset, and A. Rogier. Lack of agreement between direct magnetic resonance imaging and statistical determination of a subthalamic target: the role of electrophysiological guidance. Journal of Neurosurgery, pages 591–597, 2002. 9. P.F. D’Haese, E. Cetinkaya, P.E. Konrad, C. Kao, and B.M. Dawant. Computeraided placement of deep brain stimulators: From planning to intraoperative guidance. IEEE Transactions on Medical Imaging, 24(11). 10. P.F. D’Haese, S. Pallavaram, K. Niermann, J. Spooner, C. Kao, P.E. Konrad, and B.M. Dawant. Automatic selection of DBS target points using multiple electrophysiological atlases. In Seventh International Conference on Medical Image Computing and Computer Assisted Intervention MICCAI 2005, volume 2 of Lecture Notes in Computer Science, pages 427–434, Palm Springs, USA, October 2005. Springer. 11. E.G. Duerden, K.W. Finnis, T.M. Peters, and A.F. Sadikot. A method for analysis of electrophysiological responses obtained from the motor fibers of the human internal capsule. In Fifth International Conference on Medical Image Computing and Computer Assisted Intervention MICCAI 2003, Lecture Notes in Computer Science, pages 50–57, Montreal, Canada, November 2003. Springer.
396
M.M. Chakravarty et al.
12. K.W. Finnis, Y.P. Starrveld, A.G. Parrent, A.F. Sadikot, and T.M. Peters. Threedimensional database of subcortical electrophysiology for image-guided stereotactic functional neurosurgery. IEEE Transactions on Medical Imaging, 22(1):93–104, January 2003. 13. T. Guo, K.W. Finnis, A.G. Parrent, and T.M. Peters. Development and application of functional databases for planning deep-brain neurosurgical procedures. In Seventh International Conference on Medical Image Computing and Computer Assisted Intervention MICCAI 2005, volume 1 of Lecture Notes in Computer Science, pages 835–842, Palm Springs, USA, October 2005. Springer. 14. C.J. Holmes, R. Hoge, L. Collins, R. Woods, A.W. Toga, and A.C. Evans. Enhancement of MR images using registration for signal averaging. Journal of Computer Assisted Tomography, 22(2):324–333, 1998. 15. M. Krause, W. Fogel, W. Hacke, M. Bonsanto, C. Trenkwalder, and V. Tronnier. Deep brain stimulation for the treatment of parkinson’s disease: subthalamic nucleus versus globus pallidus internis. Journal of Neurology, Neurosurgery, and Psychiatry, 70:464–470, 2001. 16. W.L. Nowinski, D. Belov, and A.L. Benabid. An algorithm for rapid calcualation of a probablisitic functional atlas subcortical structures from electrophyisiological data collected during functional neurosurgical procedures. NeuroImage, 18:143– 155, 2003. 17. S. Robbins, A.C. Evans, D.L. Collins, and S. Whitesides. Tuning and Comparing Spatial Normalization Methods. Medical Image Analysis, 8(3):311-323, 2004. 18. G. Schaltenbrand and W. Wahren. Atlas for Stereotaxy of the Human Brain. Georg Thieme Verlag, Stuttgart, Germany, 1977. 19. P. St-Jean, A.F. Sadikot, L. Collins, D. Clonda, R. Kasrai, A.C. Evans, and T.M. Peters. Automated Atlas Integration and Interactive Three-Dimensional Visualization Tools for Planning and Guidance in Functional Neurosurgery. IEEE Transactions on Medical Imaging (TMI), 17(5):854–866, May 1998. 20. P.A. Starr, J.L. Vitek, M. DeLong, and R.A. Bakay. Magnetic resonance imagingbased steretactic localization of the globus pallidus and subthalamic nucleus. Neurosurgery, 44(2):303–314, 1999. 21. J. Talairach and P. Tournoux. Co-Planar Stereotaxic Atlas of the Human Brain. Georg Thieme Verlag, Stuttgart, Germany, 1988.
Using Registration Uncertainty Visualization in a User Study of a Simple Surgical Task Amber L. Simpson1 , Burton Ma1,2 , Elvis C.S. Chen1 , Randy E. Ellis1,2,3 , and A. James Stewart1,2 1
2
School of Computing, Queen’s University, Kingston, Canada Human Mobility Research Centre, Kingston General Hospital, Kingston, Canada 3 Surgical Planning Laboratory, Brigham and Women’s Hospital, Boston, USA {simpson, jstewart}@cs.queensu.ca Abstract. We present a novel method to visualize registration uncertainty and a simple study to motivate the use of uncertainty visualization in computer–assisted surgery. Our visualization method resulted in a statistically significant reduction in the number of attempts required to localize a target, and a statistically significant reduction in the number of targets that our subjects failed to localize. Most notably, our work addresses the existence of uncertainty in guidance and offers a first step towards helping surgeons make informed decisions in the presence of imperfect data.
1 Introduction Computer–assisted surgery all but ignores the visualization of uncertainty in data. Uncertainty might be ignored because of the inherent difficulty in expressing and computing uncertainty during surgery. A second reason could be the lack of meaningful methods of visualizing uncertainty and data. Errors and uncertainty are introduced when data is acquired (through an imaging modality or tracking system), transformed (by registration or segmentation), and rendered (see Figure 1). We contribute a method for visualizing registration uncertainty and a user study that evaluates the visualization method for osteoid osteoma excision. 1.1 Uncertainty Visualization Only a few techniques have been proposed to visualize uncertainty and attempts to quantify the effectiveness of these techniques are virtually nonexistent (see [1] and the references therein). Perhaps this is because it is difficult to define a task that is representative of a typical use that can be statistically analyzed. Methods of visualizing uncertainty in medical applications include brain tumor typing [2], analyzing EEG data of neonatal seizures [3], determining brain fiber orientation in MRI [4], and determining radiation therapy dosing [5]. However, to our best knowledge, no research has investigated the use of uncertainty visualization in computer–assisted surgery. 1.2 Osteoid Osteoma Excision Our study of uncertainty visualization is based on osteoid osteoma excision. An osteoid osteoma is a small, painful, benign bone lesion. In the U.S., osteoid osteomas account R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 397–404, 2006. c Springer-Verlag Berlin Heidelberg 2006
398
A.L. Simpson et al.
Fig. 1. Illustration of sources of uncertainty in computer–assisted surgery systems. Errors are introduced preoperatively in image acquisition, segmentation, 3D model generation and planning. Intraoperatively, errors occur in registration, tool calibration, tool deformation (deflection) and tracking. Uncertainty from all sources propagates through the entire surgical procedure.
for 12.1% of benign tumors and 2.9% of all tumors [6]. In some cases, it may be necessary to remove the tumor. Successful treatment requires complete removal of the nidus surgically. Our affiliated hospital has had success excising these tumors [7] by registering the patient to a pre-operative CT scan and using a tracked drill to expose the nidus.
2 Methodology This section describes how we compute the effect of registration uncertainty on a linear path, visualize the resulting uncertainty distribution, and utilize the visualization in a user study that mimics tumor excision. 2.1 Registration Uncertainty Previous work in registration uncertainty [8] has suggested that the distribution of the registration parameters is approximately symmetric, anisotropic, and leptokurtic (strongly peaked) with heavy tails. In this study, we investigated the distributions of the registration parameters and the resulting effects on a planned linear path. We built an isosurface model of the proximal end of a cadaver femur, and identified a linear path between the lateral cortex of the femur and the medial inferior cortex of the femoral neck such as might be followed in an osteoid osteoma excision (Figure 2). A rapid prototyping machine was used to produce a plastic model of the isosurface model. A calibrated stylus and a Polaris tracking system (Northern Digital Inc., Waterloo, Canada) was used to digitize five distinct landmarks and an additional 50 points from the entire surface of the plastic model; these 55 points were used to compute a good estimate of the ground truth registration transformation T0 . To generate registration point sets for learning the
Using Registration Uncertainty Visualization in a User Study
399
Fig. 2. (Left) Model of the proximal femur showing the registration region in gray and the planned path between the entry point on the lateral cortex and the target point on the medial inferior neck. (Middle and right) The TRE error distribution of the target point under the 10,000 training registrations; the z component of TRE was similar to the x component.
registration parameter distributions, we collected 10 registration points from each of 16 separate regions near the lateral entry point. We generated sets of 16 points each by randomly drawing one point from each of the 16 groups of points; 10,000 sets were generated and registered to the surface model using ICP [9] to produce a set of registration transformations {Ti |i = 1 . . . 10, 000}. A computation of the 10,000 target registration errors (TREs) using the virtual tumor location as the target produced a similar distribution to that observed by Ma and Ellis [8] (Figure 2). 2.2 Visualization Method We applied each of the difference transformations Δi = Ti T−1 0 to the planned path to produce the empirical distribution of paths under registration uncertainty. The simplest visualization technique is to render the paths as individual lines. However, this does not produce an adequate representation of the spatial distribution of the paths. To convey information about the spatial distribution of the paths, we chose to use volume rendering. With volume rendering, we can view the path distribution as a 3D volume or as 2D cross sections, and we can choose what features to accentuate or attenuate by modifying the color and opacity transfer functions. Our path distribution volume had its long axis aligned with the mean path direction. We computed each slice of the volume at 1mm intervals (with slices oriented perpendicular to the long axis) by computing the intersection of every path with the slice. Given that the TRE distribution was approximately symmetric with a significant central spike and broad tails, we fitted a two component mixture of Gaussians to the intersection points on each slice; one Gaussian distribution represented the central spike and the other represented the tails. We used a greedy EM (Expectation-Maximization) algorithm to fit the mixture of Gaussians [10]. Figure 3 illustrates our volume construction method. All of our volume visualizations were performed with the Visualization Toolkit (Kitware Inc., New York, USA) using texture–based volume rendering [11]. Figure 4 shows the volume rendered path distribution from two different views. Figure 5 shows an apparent twisting of the path distribution made apparent by tuning the opacity transfer function. Figure 6 shows the volume and its relationship to the femoral model.
400
A.L. Simpson et al.
Fig. 3. Illustration of the path distribution volume. (Left) The planned path under each difference transformation Δ i is intersected with each slice of the volume, and a 2-component mixture of the Gaussians is fit to the intersection points. (Middle and right) Path intersection locations at the target slice (middle) and entry slice (right). The ellipses are centered at the means of the Gaussians; their size and orientation reflect the structure of the covariance matrices. Note the anisotropic distribution at the target location.
Fig. 4. The volume rendered path distribution from (top) the frontal view and (bottom) the topdown view
2.3 User Study Our study mimics the excision of a deep bone tumor. In this task, the surgeon must expose a blind target (tumor) by drilling through the bone and is immediately aware of success as the tumor is readily visible through the drilled hole. We quantify the success of this task by measuring the number of attempts required to hit the target (tumor). Hypothesis. Success of instrument placement is better if uncertainty information is visualized and there is sufficient registration error that can cause the subject to miss the target. Subjects. Fifteen non-expert subjects performed the experiment. The experiment took approximately one hour to complete for each subject. The task was explained to each subject, who was trained using a preliminary trial that consisted of five attempts at each of the two tasks. Apparatus. Our apparatus consisted of a Polaris tracking system, a calibrated digitizing stylus, a passive dynamic reference body, and an irregular array of 10 metal targets mounted on an optical bench (Figure 7). The flat circular targets were 5mm in diameter
Using Registration Uncertainty Visualization in a User Study
401
Fig. 5. (Left) Rendering of the path distribution volume from different viewpoints rotating about the long axis of the volume. (Right) Transverse slices through the volume starting at the entry point of the path (top) taken at 10mm intervals along the path until the target (bottom) is reached. Note the twist in the path distribution that would not be evident without visualization of the volume.
and wired to a buzzer that sounded when the stylus touched the target. We placed 2.5cm thick modeling clay in front of the target array to hide the targets and to provide support for the stylus as the subjects tried to locate the targets. A computer monitor located in the line of sight of the subject was used to display information to the subject during the experiment. Stimuli. We performed another 1,000 registrations in a similar way as described in Section 2.1 using a different set of candidate registration points. We randomly selected 10 registrations that produced a TRE of greater than 4mm at the target location; these registrations were guaranteed to cause the subject to miss a target if the plan was followed accurately. We emphasize that these registrations were not drawn from the learning distribution used in Section 2.1, nor were they chosen to ensure that they were faithful to the learning distribution; doing either would unfairly bias the results in favor of our hypothesis. These registration transformations were transferred to the array of targets by aligning the planned path on the femoral model to each target. Subjects were presented with a 3D visualization of a trajectory and point representing the path of the stylus and the location of the target, respectively. Their task was to use the visualization to guide the stylus through the modeling clay to touch the target. The subject repeatedly inserted the instrument into the clay until the target was touched.
402
A.L. Simpson et al.
Fig. 6. Model of the proximal femur showing the uncertainty visualization of the planned path between the lateral cortex and the target point (location of tumor) on the medial inferior neck
Fig. 7. Illustration of the apparatus. (Left) Subject guiding instrument to target. (Right) Screen capture of stimulus shown to subject. The top-left of the screen is the view along the x-axis (leftright movement of instrument), the bottom-left is the view in the y direction (up-down movement of instrument) and the right of the screen is the view along the z axis (position of the tip of the instrument).
If the subject failed to localize the target, the instrument was completely withdrawn. Side-to-side movement of the probe was forbidden. The subject was supervised in all tasks. When the subject touched the target successfully, the buzzer sounded. We recorded the number of times it took the subject to touch the target and the position of the tip of the stylus for each missed attempt. The subject was given 10 attempts to locate the target. Experiment. Each subject was trained using five preliminary trials in order to reduce the effects of learning. Each of the fifteen subjects was presented with 10 tasks without visualizations of uncertainty information and the same 10 tasks with uncertainty information visualized. Within each group of 10, the order in which tasks were presented to subjects was randomized.
Using Registration Uncertainty Visualization in a User Study
403
3 Results We measured the number of attempts it took for the subject to touch the target successfully, then analyzed the data using the Wilcoxon signed-ranks test, which is a paired non-parametric test that statistically measures within-subject variability [12]. The Wilcoxon test showed that the number of attempts before success was significantly lower when uncertainty information is visualized (p = 0.0065). We also investigated whether uncertainty information enabled subjects to locate targets that could not be found using ten or fewer attempts. We analyzed the number of incomplete tasks with and without uncertainty information with the Mann-Whitney nonparametric U test, which compares two independent samples [12]. The Mann-Whitney test showed that the visualization of uncertainty significantly reduced the number of incomplete tasks by 1.91 attempts per task (p = 0.0281). The total number of incomplete tasks was 43 without uncertainty and 26 with uncertainty.
4 Discussion We demonstrated a simple method to determine the variation caused by registration uncertainty in a planned linear path. We visualized the uncertainty with a path uncertainty volume. Our visualization method resulted in a statistically significant reduction in the number of attempts required to localize a target, and a statistically significant reduction in the number of targets that the pool of subjects failed to localize. We believe the major source of error in the study is that all subjects were shown the two tasks in the same order, which may have favored the second task. One way to rectify this would be to administer the test to 15 new subjects with the uncertainty visualized first. Note that we elected to limit the number of attempts, which may have underestimated the statistical significance of our results since it is likely that subjects would require more attempts to successfully locate the target. We are currently examining how subjects interpreted our visualizations using the recorded position of the tip of the stylus for each missed attempt. Visualization of registration uncertainty lets us see unusual and perhaps unexpected features in the distribution of the paths. For example, Figure 4 shows that the distribution has an hourglass shape and that the distribution elongates near the target (see also Figure 3). As another example, Figure 5 demonstrates a twist in the path distribution. There are many interacting sources of uncertainty in a computer-assisted surgical system and we have only considered registration uncertainty in this article. There are many opportunities for investigating how the various sources of uncertainty affect navigational guidance. The task defined in our user study may not be representative of how uncertainty would be used in the operating room. Indeed, when we asked a surgical colleague what he would do if he missed an osteoid osteoma in a computer–assisted procedure, his reply was that he would use a bigger drill bit. In practice, we believe that a surgeon would be given a visualization of the uncertainty in the procedure (including all other sources of uncertainty) and decide whether or not to proceed with the procedure as is. We believe strongly that uncertainty visualization in computer–assisted surgery will help surgeons make more informed decisions in the presence of imperfect data.
404
A.L. Simpson et al.
Acknowledgments. This research was supported in part by the Canadian Institutes of Health Research and the Natural Sciences and Engineering Research Council of Canada.
References 1. David H. Laidlaw et al.: Comparing 2D vector field visualization methods: A user study. IEEE Transactions on Visualization and Computer Graphics 11 (2005) 59–70 2. Vellido, A., Lisboa, P.J.: Handling outliers in brain tumor MRS data analysis through robust topographic mapping. Computers in Biology and Medicine To Appear (2006) 3. N. B. Karayiannia et al.: Quantifying and visualizing uncertainty in EEG data of neonatal seizures. In: Proc of the 26th Annual International Conference of the IEEE EMBS. (2004) 4. Jones, D.K.: Determining and visualizing uncertainty in estimates of fiber orientation from diffusion tensor MRI. Magnetic Resonance in Medicine 49 (2003) 7–12 5. T. McCormick et al.: Target volume uncertainty and a method to visualize its effect on the target dose prescription. International Journal of Radiation Oncology Biology Physics 60 (2004) 1580–1588 6. Dahlin, D.C., Unni, K.: Bone Tumors: General Aspects and Data on 8,542 Cases. 4 edn. Thomas, Springfield, Ill. (1986) 7. Ellis, R.E., Kerr, D., Rudan, J.F., Davidson, L.: Minimally invasive excision of deep bone tumors. In Niessen, W.J., Viergever, M.A., eds.: MICCAI, LNCS #2208, Springer (2001) 8. Ma, B., Ellis, R.E.: Surface-based registration with a particle filter. In Barillot, C., Haynor, D., Hellier, P., eds.: MICCAI, LNCS #3216, Springer (2004) 566–573 9. Besl, P., McKay, N.: A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 14 (1992) 239–256 10. Vlassis, N., Likas, A.: A greedy EM algorithm for Gaussian mixture learning. Neural Processing Letters 15 (2002) 77–87 11. Fernando, R., ed.: Volume Rendering Techniques. In: GPU Gems: Programming Techniques, Tips, and Tricks for Real-Time Graphics. Addison-Wesley, New York (2004) 12. Sheskin, D.J.: Handbook of Parametric and Nonparametric Tests. 3 edn. Chapman & Hall, Boca Raton (2004)
Ultrasound Monitoring of Tissue Ablation Via Deformation Model and Shape Priors Emad Boctor1, Michelle deOliveira2, Michael Choti2, Roger Ghanem3, Russell Taylor1, Gregory Hager1, and Gabor Fichtinger1 1
Engineering Research Center, Johns Hopkins University, Baltimore, MD, USA 2 Department of Surgery, Johns Hopkins Hospital, Baltimore, MD, USA 3 Department of Aerospace and Mechanical Eng., Univ. of Southern California, CA, USA
[email protected]
Abstract. A rapid approach to monitor ablative therapy through optimizing shape and elasticity parameters is introduced. Our motivating clinical application is targeting and intraoperative monitoring of hepatic tumor thermal ablation, but the method translates to the generic problem of encapsulated stiff masses (solid organs, tumors, ablated lesions, etc.) in ultrasound imaging. The approach involves the integration of the following components: a biomechanical computational model of the tissue, a correlation approach to estimate/track tissue deformation, and an optimization method to solve the inverse problem and recover the shape parameters in the volume of interest. Successful convergence and reliability studies were conducted on simulated data. Then ex-vivo studies were performed on 18 ex-vivo bovine liver samples previously ablated under ultrasound monitoring in controlled laboratory environment. While Bmode ultrasound does not clearly identify the development of necrotic lesions, the proposed technique can potentially segment the ablation zone. The same framework can also yield both partial and full elasticity reconstruction.
1 Introduction Primary and metastatic liver cancer represents a significant source of morbidity and mortality in the United States and worldwide [1]. An increasing interest has been focused on treatment using thermal ablative approaches, particularly radiofrequency ablation (RFA). These approaches utilize image guided placement of a probe within the target area in the liver parenchyma. Heat created around an electrode is conducted into the surrounding tissue, causing coagulative necrosis at a temperature between 50°C and 100°C [2]. Key problems with this approach include: 1) localization/targeting of the tumor and 2) monitoring of the ablation zone. The first problem has been previously addressed by developing a robotic 3DUS system for guidance of liver ablation. The second problem, the subject of this paper, is monitoring the zone of necrosis during ablative therapy. Monitoring the ablation process in order to document adequacy of margins during treatment is a significant problem. Current approaches often result in either local failure or excessively large zones of liver ablation. Some ablative devices employ temperature monitoring using thermisters built within the ablation probes. However, these temperatures only provide a crude estimate of the zone of ablation. Magnetic R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 405 – 412, 2006. © Springer-Verlag Berlin Heidelberg 2006
406
E. Boctor et al.
resonance imaging (MRI) can monitor temperature changes (MR thermometry), but is expensive, not available in many sites, and difficult to use intraoperatively [3]. Ultrasonography (US) is the most common modality for both target imaging and is also used for ablation monitoring. However, conventional ultrasonographic appearance of ablated tumors only reveal hyperechoic areas due to microbubbles and outgasing, but cannot adequately visualize the margin of tissue coagulation. Currently, ablation adequacy is only estimated at the time of the procedure and primarily based on the probe position and not the true ablation zone. Accordingly, ultrasound elasticity imaging (USEI), known as elastography, has emerged as a potentially useful augmentation to conventional ultrasound imaging, first introduced by Ophir [11]. USEI in monitoring ablation [4, 13-16] was made possible by the following observations: (1) different tissues may have significant differences in their mechanical properties and (2) the information encoded in the coherent scattering (a.k.a. speckle) may be sufficient to calculate these differences following a mechanical stimulus. However, producing a real-time elasticity map using 3D ultrasound data is an exigent task that requires extensive computation and has its own limitations. Despite the fact that strain images have better signal-to-noise (SNR) and contrast-to-noise (CNR) compared to US images [10, 12] (Fig. 1), these images still suffer from artifacts related to their formation theory or US artifacts. These artifacts are primarily attributable to decorrelation due to out-of-plane motion, large deformation, shadowing effect, or other causes. Moreover, speckle decorrelation occurs due to the shadowing/attenuating effects underneath the hot ablation zone as seen in the Bmode image (Fig. 1, left). Accurate segmentation of strain images is an essential in planning and monitoring ablations [4, 13-16]. More generally, our method is directly relevant to a large family of interventions that require targeting, tracking, and monitoring some encapsulated stiff mass suspended in a softer background. In this paper, we report a generic and rapid approach to segment stiff lesions based on tissue deformation (i.e. displacements), without needing to estimate strain images.
Fig. 1. B-mode image shows ex-vivo liver boundaries embedded in gel based medium. It is not possible to differentiate the ablated area from B-mode. Strain is generated from differentiating a displacement map in the axial direction. Strain provides clear evidence of the presence of hard lesion, which is in agreement with the gross pathology picture.
2 Strain Imaging Segmentation Strain imaging holds the promise of playing a significant role in planning and monitoring ablative therapy [4, 13-16]. However, reliable and rapid semi/automatic
Ultrasound Monitoring of Tissue Ablation Via Deformation Model and Shape Priors
407
segmentation of noisy strain images is needed during ablation intervention. By comparing ablated regions in both gross pathology and strain images in the 18 ex-vivo bovine liver samples, strain imaging -immediately after ablation- tends to overestimate ablation coverage in 33.3 % of the time, due to a decorrelation region beneath the ablated lesion [15, 16]. To appreciate this fact, refer to the strain image in Fig. 1 that has a strong agreement with gross pathology laterally and an obvious overestimation axially. It is only 13.3 % of the time when ablation coverage in strain images matched corresponding gross pathology images. Accurate segmentation should help the surgeon by monitoring the extent of the applied ablative therapy in 3D, planning optimal overlap/multiple ablations to cover the original target tumor, and terminating decision of the procedure. To extract ablation boundaries, one approach is to apply conventional image segmentation techniques directly to strain images. Initially to get these strain images, we start from RF data as input; create the displacement and correlation images; then differentiate the axial displacement field and estimate the axial strain tensor (ε11). Obviously, to have reliable standalone segmentation module, the inputs (strain images) to this module should meet some expected standards, or consistent quality. However, obtaining local strain values with high accuracy depends on precise measurement of local tissue displacement. A significant problem is the loss of similarity (correlation) in the pre- and post-deformed image. During the last few years, several groups have investigated this problem [15]and have came up with various strategies for increasing the reliability of the cross-correlation function including: 1) Choice of the processing parameters, kernel length and amount of kernel overlap [5]; 2) RF-data tracking instead of envelope-detected data when small displacements are involved [6]; 3) Temporal stretching that include adaptive local and global companding [7]; and 4) Axial and lateral RF-data interpolation. In addition to these enhancements for displacement estimation, there has been active research in improving the procedure for deriving strain from displacement images including: 1) A least squares strain estimator (LSQSE) was suggested [8]; 2) Multi-step compression [9] to increase SNR; and 3) Average and median filtering [10]. Besides the perennial problem of decorrelation, the following issues need to be addressed. Displacement estimation: The majority of the rapid techniques assume a constant distribution of scatterers before and after a small (<3% axial strain) introduced axial compression. However, this assumption is not valid during ablative therapy proximal to the ablation zone, where vaporization and bubbles affect speckles appearance and matching. Strain estimation: The procedure for estimating the axial strain tensor is based on differentiating the displacement field. However, differentiation also amplifies errors and noise in the displacement measurements. Unfortunately, all current solutions to suppress noise in strain imaging require least squares estimation or average multiple compression steps. Both add a considerable computation cost and/or time delay. Time performance: The entire chain of displacement estimation, strain image reconstruction, and strain image segmentation takes far too long for 3D datasets. Prior knowledge and redundancy: Clearly, not all displacement/strain measurements are required to locate a lesion. At the same time, in our motivating ablative therapy application, there is well-defined a priori knowledge about the expected shape and size of the ablation. Undoubtedly, utilizing this a priori information effectively should reduce redundant computational cost or time required. Elasticity reconstruction: Segmentation helps in defining the location and boundary of a lesion but
408
E. Boctor et al.
fails in reconstructing Young’s modulus at the region of interest. It will be favorable, if one can augment the segmentation pipeline to solve the inverse problem with minimal effort. To address all these issues, we propose a novel framework to segment and track ablated lesions based on tissue deformations and shape priors. Furthermore, elasticity reconstruction, i.e. to retrieve the Young’s modulus, is possible with minimal alteration of the framework.
Fig. 2. Iterative tracking and segmentation pipeline
3 Elasticity Model Based on Tissue Deformations The key insight to our approach is the integration of prior geometrical knowledge, a physical elasticity model, and direct estimation of tissue deformation. The proposed approach is explained in Fig. 2.The process starts, in the lower branch of the workflow, from a prior geometrical model. In our example, the ablated lesion is modeled as simple ellipse. In liver ablation, we have a very reliable initial guess of the location of the lesion, because we know the position of the ablator. Next we solve the forward problem of estimating the theoretical displacement from a geometric mesh representation of the initial model, boundary conditions and assumed elasticity model. The computational method of choice to solve this forward problem is Finite Element Method (FEM). In the upper branch, we start with calculating the correlation map and then rapidly estimate the displacement field. The lower and upper branches meet in the shape optimization loop where we iteratively solve the inverse problem to optimize our parametric geometric model and locate the region of interest. 3.1
Tissue Displacement Estimation
We acquire RF ultrasound data from a tissue in both rest and stressed states, and then estimate the induced deformation distribution by tracking speckle motion. Our
Ultrasound Monitoring of Tissue Ablation Via Deformation Model and Shape Priors
409
implementation is based on maximization of normalized cross-correlation between pre- and post-compressed RF signals. In the example shown in Figure 2, decorrelation was caused by shadowing and attenuation effects underneath the hot ablation zone, as seen in the Bmode image. Fortunately, decorrelation artifacts can be detected from the normalized correlation map associated with the displacement image. Figure 3 shows a typical color-coded correlation image, where dark-red indicates perfect correlation (unity) and dark-blue stands for Fig. 3. Correlation in the displacement map, perfect decorrelation (zero). In the same thresholded at different correlation levels figure, there are four different binary maps based on thresholds 0.95, 0.90, 0.85, and 0.80, all showing the spatial coverage of these correlation regions. The white region associated with correlations greater or equal to 0.95 is small compared to the region associated with correlations 0.80. knowing the locations of better correlated regions helps us formulate a better conditioned shape optimization framework as shown in section 3.3. 3.2 Theoretical Displacement Estimation To calculate theoretical displacements, a tissue elasticity model needs to be considered. We assume linear elastic, homogeneous and isotropic material. Moreover, liver tissues as mainly filled with water are incompressible, and hence Poisson’s ratio is nearly 0.5. We also apply uniform pressure by the transducer face in a quasi-static form, in which the deformation is considered plane strain. It is a type of deformation where there are no displacements in the z-direction, and the displacements in the xand y- directions are functions of x and y only. For plain strain problem, the assumption is εz = εzx = εzy = 0. Given a boundary conditions and a finite element mesh, the theoretical displacements u = (u,v) can be calculated by solving the following Navier’s equations:
ρ
∂ 2u − ∇ • c∇u = K ∂t 2
(1)
where ρ is the material density, K are the body forces, and c is a tensor where each entry is a function of G (shear modulus), E (Young’s modulus), and ν (Poisson’s ratio). 3.3 Shape Optimization Here we compare observed tissue displacements with simulated displacements using correlation map as weighting criterion. This weighting method puts high premium on the “most trusted” displacement areas and drives the evaluation of the domain points in the mesh at the corresponding locations. This approach reduces the resources needed for updating the mesh and effectively utilizes the available displacement information
410
E. Boctor et al.
with minimal redundancy. Next, in an iterative optimization cycle where lesion location and shape parameters S, we adjust this hypothetical displacement field until it fits the actual displacement field. When the two fields are sufficiently similar, then the deformed model will yield the contours of the lesion. Thus, this inverse framework is trying to solve a non-linear optimization problem with the objective function: N M S = arg min{ℑ(S ) = ∑∑ W (i, j ) u (i, j ) − u (i, j; S )
L1
}
(2)
i =1 j =1
where S is the estimated shape parameters, u is estimated tissue motion, ℑ(S ) is the objective function, N,M are the number of rows and columns respectively in the displacement maps, and W is the correlation map that acts as a weighing function to minimize the effects of distrusted tissue displacements. The experimental implementation is currently utilizing a modified version of the simplex search algorithm and LM method.
4 Experiments and Results A thorough simulation study was conducted independently of any US imaging. The study involves running the finite element model to generate a displacement map where we know the ground truth shape parameters coupled to this map. Then, we use this displacement as the simulated input from tissue deformation. We have 5 parameters to optimize, ellipse location (x,y), size (a,b), and orientation. We tested the robustness of the method under different conditions and using different optimization schemes. The tests include partitioned optimization by solving x, y together or separately, then solving for size parameters and orientation. This is performed while applying different objective functions and different search constraints.
Fig. 4. Shows both strain and displacement images that reflects tissue deformations. Model displacement image represents FEM theoretical deformation. Also it illustrates a small circle to the top, left corner as an initial guess and another small circle after 8 iterations approaching to the final ellipse. Gross pathology picture is included to the right
The second study was conducted using a robotic ultrasound acquisition system. We used a Siemens Antares US scanner (Siemens Medical Solutions USA, Inc. Ultrasound Division, Issaquah, WA) with an ultrasound research interface (URI) to access raw RF data. A Siemens VF 10-5 linear array was used to acquire data. The tracking
Ultrasound Monitoring of Tissue Ablation Via Deformation Model and Shape Priors
411
beams were standard B-mode pulses (6.67 MHz center frequency, F/1.5 focal configuration, apodized, pulse repetition frequency (PRF) of 10.6 KHz, with a pulse length of 0.3μs). We collected data immediately after ablation of 18 ex-vivo fresh bovine liver samples; they are divided into three groups, 4min, 6min, and 8 min ablations using a Radionics device. All samples were soaked in degassed water to remove air pockets, and then embedded into gel material to support the liver during ablation and also maintain the assumed incompressibility of living tissues. Figure 4 reflects the application of our approach on one ex-vivo liver sample ablated for 8min. Both strain and axial displacement from tissue deformation are included in the figure. The strain image shows good indication for the ablation zone. The FEM theoretical axial displacement at the last iteration has a good agreement with the corresponding tissue displacement. Also, we included the initial ellipse at the top-left corner, and in 8 iterations it becomes closer to the final shape, which was estimated in 15 more iterations. To evaluate the resulting segmentation, we use two independent ground truths. The first one deals with ellipse location, by relying on the ablator’s tip location in the B-mode image before ablation. We have excellent agreement within 1 mm laterally. It is hard, however, to get a corresponding location axially, due to the unmeasured axial motion of the US probe before collecting strain data. The second ground truth deals with ellipse size, by using gross pathology picture to get an accurate measure for the ellipse size. In axial direction, our approach suggests 76.1 pixels (10.49mm) where pathology picture gives about 11 mm. In lateral direction, however, our approach numbers is 140 pixels (18 mm) where pathology is 15 mm. This suggests that the axial displacement is very accurate in locating axial features but we probably need to regularize the objective function with lateral displacement estimation. Figures 5 and 6 present a convergence study for the scale parameters and center location, respectively. In both figures and for all four parameters, 10 iterations were enough to lock on a value with 10-4 difference variations.
Fig. 5. Scale convergence of the ellipse
Fig. 6. Location convergence of the ellipse. It also shows the effects of K on optimizing shape parameters. K is the ratio of Young’s modulus of ablated lesion to normal liver.
412
E. Boctor et al.
We chose a suggested literature value of 1.5 KPa for the Young’s modulus of bovine liver, and tested the convergence of the FEM model to the center location using values of the Young’s modulus for the ablated tissue, at 7.5, 15, 30, and 60 KPa. The results are shown in Figure 6, where the K parameter refers to the ratio of the ablated tissue, at 7.5, 15, 30, and 60 KPa. The results are shown in Figure 6, where the K parameter refers to the ratio of insensitive to variations for K values in the 10-40 range, in other words, regardless how hard we have cooked the tissue or how poorly estimate its Young’s modulus, the model robustly converges. Also when we picked an unreasonable value, such as K=5, the shape optimization algorithm was biased with only 35 pixels (0.7mm).
5 Conclusions We have developed and tested a novel shape optimization approach based on tissue deformation and shape priors. The results for the 18 samples will be reported in a detailed journal publication. Extending this framework to 3D is the next step for this project, which promises to increase the amount of measurements, while reducing the number of unknowns to just 9 parameters for an ellipsoid. Finally, the authors acknowledge the financial support from the NSF #EEC 9731478 and Siemens Corporate Research.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
Nakakura EK, Choti MA: Oncology (Huntingt). 2000 Jul;14(7):1085-98; Buscarini L, Rossi S: Semin Laparosc Surg 1997;4:96–101. Graham SJ, Stanisz GJ, Kecojevic A et al.: Magn Reson Med 1999;42(6):1061-71. Kallel F, Stafford RJ, Price RE et al.: US. Med & Biol., Vol. 25 (7), pp. 1099-1113, 1999 Bilgen M, Insana MF: J. Acoust. Soc. of Am., 101:1147-1154, 1997. Ramamurthy BS, Trahey GE: Ultrasonic Imaging, 13:252-268 1991. Chaturvedi P, Insana MF, Hall TJ: IEEE Trans. Ultrason. Ferroelec. Freq. Control, 45:179-191, 1998. Kallel F, Ophir J: Ultrasonic Imag., 19:195–208, 1997. O'Donnell M, Skovoroda AR, Shapo BM, Emelianov SY: IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 41:314-325, 1994. Doyley MM, “An investigation into methods for improving the clinical usefulness of elasticity imaging.” 1999. Institute of Cancer research, University of London. Thesis. Ophir J, Cespedes I, Ponnekanti H et al.: Ultras. Imaging. 1991 Apr;13(2):111-34. Varghese T, Ophir J: IEEE Trans. Ultrason. Ferroel. Freq. Cont., pp.164-172, 1997. Varghese T, Techavipoo U, Liu W et al.: AJR Am J Roentgenol. 2003 Sep;181(3):701-7. Curiel L, Souchon R, Rouvière O et al., “Elastography for the follow-up high-intensity focused ultrasound prostate cancer treatment” Ultrasound Med Biol. 2005 Nov ;31:1461-8 Kolen A, “Elasticity imaging for monitoring thermal ablation in liver” 2004, Institute of Cancer research, University of London. Thesis. Varghese T, Techavipoo U, Zagzebski JA et al.: J Ultrasound Med. 2004 Apr; 23(4):53544; quiz 545-6.
Assessment of Airway Remodeling in Asthma: Volumetric Versus Surface Quantification Approaches Amaury Saragaglia, Catalin Fetita, and Fran¸coise Prˆeteux ARTEMIS Project Unit, INT, Groupe des Ecoles des T´el´ecommunications 9 rue Charles Fourier, 91011 Evry Cedex, France
Abstract. This paper develops a volumetric quantification approach of the airway wall in multi-detector computed tomography (MDCT), exploiting a 3D segmentation methodology based on patient-specific deformable mesh model. A comparative study carried out with respect to a reference 2D/3D surface quantification technique underlines the clinical interest of the proposed approach in assessing airway remodeling in asthmatics and in evaluating the efficiency of therapeutic protocols.
1
Introduction
Automated quantification of airway responses to pharmacologic agents is the key issue in establishing and evaluating therapeutic protocols in severe and mild asthma or chronic obstructive pulmonary disease (COPD). The inflammation of the airways is known to induce airway remodeling, such indicator being commonly used in clinical practice to confirm the diagnosis of asthma. The current challenge for CT in asthma is to assess airway reactivity and wall remodeling in response to an excitation agent (e.g., bronchoprovocation, bronchodilatation, response to changes in lung volumes), that is, to accurately quantify bronchial lumen and wall variations. Efficient imaging methods are required to provide accurate and reproducible measurements of the inner and outer edges of the airway walls. This can be challenging for the small caliber, thin wall airways, and even for normal size bronchi because of inherent partial volume effect in the image acquisition. Semi-automated or automated methods have been developed for airway wall segmentation. They apply to section images of bronchi and can be roughly classified into three categories : full-width at half-maximum (FWHM) [1], patternbased [2,3,4] and shape-independent approaches. FWHM-based methods estimate the inner/outer contour of the wall by investigating the grey-level profile of rays cast from the bronchus center. However, it has been shown in [5] that FWHM can yield inaccurate measurements for small airways or for those having very thin walls (10% to 15% of lumen diameter). Pattern-based approaches were used to automatically detect circular or ellipsoidal shape patterns. These methods are fast and seem to be efficient on phantom studies and excised animal lungs in case of circular and elliptical shapes detection, but do not apply accurately to general clinical data. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 413–420, 2006. c Springer-Verlag Berlin Heidelberg 2006
414
A. Saragaglia et al.
More recent methods exploits morphological operators to achieve shape-independent, accurate quantification for irregular wall bronchi. In [6], such a technique is applied in a 2D/3D framework, to 2D image sections orthogonal to the bronchus axis. Here, the medial axis of bronchi is computed from a 3D segmentation of the airway lumen. In this way, the quantification is independent of CT acquisition plane and can be theoretically obtained at any desired spatial location, which is essential for investigating diffuse diseases such asthma. Note however that, even performed in a cross-sectional plane, bronchial lumen/wall surface quantification methods may suffer from several drawbacks related either to the uncertainty of medial axis estimation accuracy or to CT acquisition parameters discussed further on in this paper. 3D quantification makes it possible to overcome such limitations, by investigating the bronchus wall volume variation caused by airway remodeling. This paper develops a volumetric quantification approach of the airway wall in MDCT, relying on the 3D segmentation of the inner/outer bronchus wall surface using patient-specific deformable mesh models. Such a method is then evaluated with respect to phantom image and real CT data, and compared with the 2D/3D technique developed in [6]. The paper is organized as follows. Section 2 summarizes the clinical investigation framework and the reference 2D/3D method, while Section 3 is dedicated to a detailed description of the developed volumetric approach. A comparative discussion of volumetric versus surface quantification accuracy is developed and illustrated in Section 4 for phantom and clinical data. Concluding remarks are stated in Section 5.
2 2.1
Clinical Framework Data Acquisition and Preprocessing
Low-dose MDCT acquisitions were performed in severe and mild asthmatics in the framework of a therapeutic follow-up study conducted in a large hospital group in Paris. Images obtained at 65% of the total lung capacity with 0.6 mm collimation and 0.3 mm reconstruction interval focused on the right lung in a 18-20 cm field of view, resulting in quasi-isotropic volume data, free from cardiac motion artifacts [9]. The objective was to evaluate bronchial wall remodeling on (sub-)subsegmental bronchi at several locations spread throughout the lung. 2.2
Reference 2D/3D Quantification Method
In order to evaluate the performance and clinical interest of a volumetric quantification approach, a surface quantification method was selected as reference from the recent literature [6], for comparison purposes. Achieving an accurate estimation of bronchial parameters (lumen and wall areas), requires to perform the surface quantification analysis in a cross-section plane orthogonal to the bronchus axis. A preprocessing step first performs the 3D airway lumen segmentation from the MDCT data using multi-resolution morphological filtering and energy-based modeling [7]. Then, a full description of this arborescent structure is achieved by
Assessment of Airway Remodeling in Asthma
415
computing the medial axis [8] (Fig. 1(a)). Medial axis provides the navigation and interaction tools required for selecting bronchial segments and generating cross-section images at appropriate spatial resolution and sampling intervals along bronchus axis (Fig. 1(b)). Only axial images outside of the subdivision zones are selected for analysis. The quantification of the bronchial parameters is performed independently on each cross-section image. It exploits a segmentation method combining mathematical morphology operators and energy-based contour matching [6]. The quantification results are then averaged for each bronchus. Fig. 1(c) shows bronchial inner/outer wall segmentation result. As reported in [9,10], when cross-tested on synthetic and natural database, the selected 2D segmentation approach overperformed the FWHM and the patternbased methods in terms of quantification accuracy and robustness with respect to the presence of homologous artery. Therefore such a method is chosen as the reference one for evaluating the performance of the new developed volumetric segmentation and quantification approach.
(a)
(b)
(c)
Fig. 1. Airway tree and 2D/3D quantification of bronchial parameters: (a) 3D Segmentation and corresponding medial axis, (b) Cross section image reconstruction orthogonally to the medial axis at the selected sampling locations, (c) 2D wall segmentation on sample data
3
The Developed Volumetric Approach
The volumetric quantification approach developed in this paper relies on the 3D segmentation of the bronchial wall using a patient-specific deformable mesh model. Starting from the 3D segmentation of the airway lumen obtained in § 2.2, a mesh model of the inner airway wall is built-up. In order to preserve the topology and geometry of the small bronchi involved in the quantification process and to obtain an regular mesh, a restricted Delaunay triangulation approach is applied [11,12] with a specific adaptative distance criterion [13]. The outer surface of the bronchial wall is then segmented by deforming the inner wall mesh model (Fig. 2(a)) conditional to image data and shape regularity
416
A. Saragaglia et al.
constraints. A motion equation is assigned to each vertex of the mesh allowing it to move in a force field governed by internal and external forces. In order to maintain regularity during deformation, each vertex is displaced by a small step k , xi = (xi−1 + kn), conditionally to the force field at xi , {Fext (xi ) ≥ Fint (xi )}, where n denotes the mean normal vector in the neighborhood of xi (Fig. 2(b)). External Forces. To perform the shape recovery, we introduce two external forces at each vertex, Fext = FI + F∇ . They represent the influence of the image on the embedded surface. The force FI guides the surface toward the image high intensity values while the force F∇ drives the surface to regions of strong gradient. Both forces are normalized by the invariant maximum grey-scale value ˆ External forces are computed from the raw data by transforming the discrete I. volumetric image I into a continuous scalar field using tri-linear interpolation. The attraction toward an intensity value in this field is expressed by FI (xi ) =
IHE (xi ) , Iˆ
(1)
where IHE denotes the histogram-equalized image I. The force FI aims to inflate the model locally at high intensity regions. As grey-level values decrease when approaching the outer surface of the bronchial wall, Fext is strengthened by a force which drives the surface along the local gradient of the image: F∇ (xi ) =
|∇in (xi , I)| , Iˆ
(2)
where I denotes the original image intensity and ∇in the lateral gradient computed with respect to the inner local neighborhood of xi , N (xi−1 ), Fig. 2(b). The model is attracted to strong contours of the wall guided by the force F∇ , thus allowing to match wall irregularities. Internal Forces. In order to equilibrate the external force, Fint will be defined as a composition of two forces, Fint = Fδ + Fr , an elastic force Fδ , which linearly penalizes local wall thickness variations, and a regularization force Fr , which locally smoothes the shape. The elastic force is defined as the distance from the vertex xi to the inner wall surface mesh, δ (xi , M0 ), weighted by the mean distance computed over the vertices situated in a cross-section slab including xi , S⊥ (xi ), and orthogonal to the bronchus axis (Fig. 2(c)): Fδ (xi ) =
δ (xi , M0 ) , μaxial = (μaxial + σaxial )
xj ∈S⊥ (xi )
1 δ (xj , M0 ) , |S⊥ (xi )|
(3)
with μaxial denotes the mean distance value to M0 and σaxial the corresponding standard deviation. The shape of the external surface of the bronchus at the level of vessel-bronchus contact zone is mainly constrained by the elastic component to follow the inner wall contour shape.
Assessment of Airway Remodeling in Asthma
417
The regularization term, Fr , exploits the local normal curvature, computed according to the differential geometry definition [14] : Fr = (−∇(A)/2A) n, where A denotes the local area around xi and the inner product. ∇(A) can be discretized in case of triangulated surfaces by the formula described in [14], leading to ⎛ ⎞ 1 ⎝ (cot αij + cot βij ) (xi − xj ) n⎠ (4) Fr (xi ) = − 4A j∈star(i)
where star (i) denotes the subset of neighbor vertices of xi and αij , βij the two angles opposite to the edge xi − xj as illustrated in Fig. 2(d).
Fig. 2. Illustration of internal constraints associated with the deformable model: (a) Inner wall mesh, M0 , (b) Vertex motion, where {xj }j = N (xi−1 ), (c) Elastic force definition, (d) Regularization term definition
Mesh Resolution and Topology Adaptation. As a result of vertex displacement, the inter-vertices distance will change, leading either to skip important image features in case of too large distance, or to increase computational cost when too dense vertex distributions occur. In order to maintain adequate surface mesh resolution and high computational efficiency, the edge length is constrained during the deformation process, according to ξ ≤ dE (u, v) ≤ λξ, where dE (u, v) denotes the Euclidean distance between two u, v vertices, ξ determines the global resolution of the surface, and λ the length ratio between the longest and the shortest edges allowed. Edges not holding these constraints are removed or subdivided by methods used in progressive meshing [15]. The vertex displacement step between two iterations, k, as well as mesh density ξ, are set according to the highest CT data spatial resolution. They are iteratively decreased during the deformation process as approaching the solution, in order to reach the optimal mesh resolution at the convergence. Mesh deformation may also lead to surface self-intersections, especially in the case of bronchus subdivision. To prevent this problem, auto-collision is checked for each vertex [16].
418
4
A. Saragaglia et al.
Results and Discussion
The bronchial quantification was cross-validated compared to the 2D/3D reference method, with respect to a 3D image model simulating a cylindrical bronchus subdivision geometry for different calibers and lumen/wall textures [6] (Fig. 3, top). Only regions providing reliable quantification (outside the subdivision zone) were taken into account when applying the 2D/3D method (Fig. 3(a)), whereas the quantification was applied to the whole model using the 3D method (Fig. 3(b)). The same cross-section planes have been used for both 3D and 2D/3D methods, for a quantification evaluation purpose. The percent error between the quantification provided by the developed approach and the model ground truth (1.12%±1.18% for lumen area, 1.18%±1.72% for wall area) was not meaningfully different from that obtained with the 2D/3D method (0.27% ± 0.57% for lumen area, 0.24%±0.58% for wall area) and was always under 3%. Comparisons carried out on real data for different distal bronchi at sampling points providing good discrimination of bronchus wall in the cross-section plane, showed similar results for the volumetric and the 2D/3D method (relative percent error: 1.08% ± 1.57% for lumen area, 1.11%± 1.64% for wall area – Fig. 3, bottom). These results tend to prove that the volumetric method has similar performance as the reference 2D/3D method for ideal situations and simple bronchus configurations. However, the volumetric approach is expected to provide higher accuracy and robustness
Fig. 3. Surface (a)(c)(d) vs. volumetric (b)(e)(f) quantification of the bronchial wall on bronchus image phantom (top) and real data (bottom).
Assessment of Airway Remodeling in Asthma
(a)
419
(b)
Fig. 4. Limitations of surface quantification approaches: (a) Restriction of the measurement zones, (b) Local deformation of the bronchus axis inducing estimation errors
in clinical situations by overcoming several limitations which are inherent to surface quantification methods, namely: (1) a strong dependency on the correct estimation of the medial axis: deviations from the real tangent may cause wall area quantification errors. (2) choice of sampling location in the bronchus: wall thickness is generally irregular along the bronchial segment. The solution retained in the 2D/3D method is to provide an average value over each bronchus. (3) measurement area dependency: only regions of the bronchus outside a subdivision area can be considered with the 2D/3D method while the volumetric approach can capture wall thickness variations in this zones (Fig. 4(a)). (4) local bronchus irregularity may cause large local variations in the sampling plane orientation resulting in asymmetric and tilted sampling of wall regions (Fig. 4(b)). (5) dependency of lung inflation during CT acquisition: due to their elastic properties, bronchi elongate at deep inspiration, which results in bronchus wall thinning. In such case, surface quantification will underestimate wall area while volumetric assessment will not be affected.
5
Concluding Remarks
Volumetric quantification of bronchial wall is recommended versus surface quantification for assessing small changes in airway wall remodeling in severe and mild asthmatics, before/after a therapeutic protocol. Due to the diffuse nature of asthma, such follow-up studies have to investigate a large number of distal bronchi spread throughout the lungs. Another advantage of the volumetric approach is that it can be applied to several bronchus generations, thus leading to more accurate measurements than any surface approach (cf. § 4). The developed volumetric method provides the investigation tools for designing new therapeutic protocols and assessing their efficiency with respect to various airway diseases. This technique is currently subject to a large-scale evaluation in the framework of a therapy follow-up study, jointly performed using the reference 2D/3D technique.
420
A. Saragaglia et al.
References 1. S. Matsuoka, Y. Kurihara, Y. Nakajima, H. Niimi, H. Ashida, K. Kaneoya, “Serial Change in Airway Lumen and Wall Thickness at Thin-Section CT in Asymptomatic Subjects”, in Radiology, 234 (2), pp. 595-603, 2005. 2. F. Chabat, X.P. Hu, D.M. Hansell, G.Z. Yang, “ERS transform for the automated detection of bronchial abnormalities on CT of the lungs”, in IEEE transactions on medical imaging, 20, 2001. 3. G.G. King, N.L. Muller, K.P. Whittall, Q.S. Xiang, P.D. Pare, “An Analysis Algorithm for Measuring Airway Lumen and Wall Areas from High-Resolution Computed Tomographic Data”, in Am. J. R. Cr. Care Med., 161, pp. 574-580, 2000. 4. R. Wiemker, T. Blaffert, T. Blow, S. Renisch, C. Lorenz, “Automated assessment of bronchial lumen, wall thickness and bronchial tree using high-resolution CT”, in nternational Congress Series, 1268, pp. 967-972, 2004. 5. E.A. Hoffman, J.M. Reinhardt, M.Sonka, et al., “Characterization of the intersistal lung diseases via density based and texture-based analysis of computed tomography images of lung structure and function”, in Acad. Radiology, 10, pp. 1104-1118, 2003. 6. A.Saragaglia, C.Fetita, F.Prˆeteux, PY Brillet, P. Grenier, “Accurate 3D quantification of bronchial parameters in MDCT”, Proceedings SPIE Conference on Mathematical Methods in Pattern and Image Analysis, 5916, pp. 323-334, 2005. 7. C. Fetita, F. Prˆeteux, C. Beigelman-Aubry, P. Grenier, “Pulmonary airways: 3D reconstruction from multi-slice CT and clinical investigation”, IEEE Transactions on Medical Imaging, 23(11), pp. 1353-1364, 2004. 8. D. Perchet, C. Fetita, F. Prˆeteux, “Physiopathology of Pulmonary Airways: Automated Facilities for Accurate Assessment”, MICCAI, 2, pp. 234-242, 2004. 9. PY. Brillet , C. Fetita, C. Beigelman-Aubry, D. Perchet, F. Prˆeteux,P. Grenier, “Automatic segmentation of airway wall area for quantitative assessment at MDCT: preliminary results in asthmatics”, in European Congress of Radiology, 2005. 10. P. Brillet, A. Saragaglia, C. Fetita, C. Beigelman-Aubry, S. Dreuil, F. Prˆeteux, P. Grenier, “Automatic quantification of bronchi using MDCT: Comparison of FWHM and EDCE methods on computerized simulations and in vivo bronchi”, in European Congress of Radiology, 2006. 11. J.-D. Boissonnat and S. Oudot “Provably Good Surface Sampling and Approximation”, in Eurographics Symposium on Geometry Processing, 2003. 12. L.P. Chew, “Guaranteed-Quality Mesh generation for Curved Surfaces”, in Proc. 9th Annu. ACM Sympos. Comput. Geom. , 10, pp. 274–280, 1993. 13. D. Perchet, “Modelisation in-silico des voies aeriennes : reconstruction morphologique et simulation fonctionnelle”, PHD Thesis, Univ. Paris 5, November 2005. 14. M. Desbrun, M. Meyer, and P. Alliez, “Intrinsic Parameterizations of Surface Meshes”, in Computer Graphics Forum, 21(3), pp. 209-218, 2002. 15. H. Hoppe, T. DeRose, T. Duchamp, J. McDonald and W. Stuetzle, “Mesh optimization”, in Computer Graphics (SIGGRAPH 93 Proceedings), pp. 1926, 1993. 16. J.-O. Lachaud and B. Taton, “Deformable model with adaptive mesh and automated topology changes”, in Proc. 4th Int. Conference on 3D Digital Imaging and Modeling, pp. 1219, IEEE, 2003.
Asymmetry of SPECT Perfusion Image Patterns as a Diagnostic Feature for Alzheimer’s Disease Vassili A. Kovalev1 , Lennart Thurfjell2 , Roger Lundqvist2 , and Marco Pagani3 1
Centre for Vision, Speech and Signal Processing University of Surrey, Guildford, Surrey GU2 7XH, United Kingdom 2 Centre for Image Analysis, Uppsala University, SE-752 37 Uppsala, Sweden 3 Institute of Cognitive Sciences and Technologies, CNR, 00185 Rome, Italy
Abstract. In this paper we propose a new diagnostic feature for Alzheimer’s Disease (AD) which is based on assessment of the degree of inter-hemispheric asymmetry using Single Photon Emission Computed Tomography (SPECT). The asymmetry measure used represents differences in 3D perfusion image patterns in the cerebral hemispheres. We start from the simplest descriptors of brain perfusion such as the mean intensity within pairs of brain lobes, gradually increasing the resolution up to five-dimensional co-occurrence matrices. Evaluation of the method was performed using SPECT scans of 79 subjects including 42 patients with clinical diagnosis of AD and 37 controls. It was found that combination of intensity and gradient features in co-occurrence matrices captures significant differences in asymmetry values between AD and normal controls (p < 0.00003 for all cerebral lobes). Our results suggest that the asymmetry feature is useful for discriminating AD patients from normal controls as detected by SPECT.
1
Introduction
PET and SPECT have been widely used to investigate alterations of functional patterns in patients with Alzheimer’s Disease (AD). A decrease in regional cerebral blood flow (rCBF) in the parieto-temporal cortex is considered to be a diagnostic sign of AD. Nevertheless, in the early stage of the disease the typical bilateral parieto-temporal rCBF deficit is seldom described and there is a certain degree of hypo-perfusion distribution overlap between AD patients and normal elderly. In the past, the diagnosis of AD was often based on visual examination of the images or on other subjective approaches such as analysis of manually placed regions of interests (ROIs) [1]. Since the introduction of semi-quantitative and 3D standardisation methods in SPECT image analysis visual reading has been less and less utilised by all research groups. Objective methods reported in literature include methods based on pixel-by-pixel comparison of a patient’s scan to data from normal volunteers [2], [3], the analysis of ROIs or VOIs (volumes of interest) defined by a stereotactic atlas registered to the data (eg, [4]), and the most advanced SPM method [5]. These methods require a preceding spatial registration of the data (eg, [3], [6]). Moreover, the data must be normalized for R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 421–428, 2006. c Springer-Verlag Berlin Heidelberg 2006
422
V.A. Kovalev et al.
global activity. Different approaches have been used and for most methods the main assumption is to normalize the data against the activity in areas that are unaffected by the disease (see, for example, [7]). In order to overcome these difficulties we propose to use a measure that capitalise on the fact that, although a bilateral flow reduction in the parieto-temporal cortex is an important diagnostic feature for AD, there is normally an asymmetry in the degree of involvement [3]. AD often starts in one hemisphere and gradually becomes more bilateral during the progression of the disease. Because of this we hypothesize that if we consider a healthy person that becomes affected by AD, one would expect a low asymmetry to start with. Then the asymmetry index would gradually increase during the early phase when the disease mainly involves one hemisphere [8], [9], [10], and finally that the asymmetry index would decrease when there is a bilateral involvement. Hence, an asymmetry index that captures specific differences in spatial structure of bilateral three-dimensional (3D) perfusion patterns could be considered as a diagnostic feature of AD. Previous work on perfusion asymmetry is mainly restricted to the computation of ratios of VOIs where an asymmetry index is computed as |right − lef t|/(right + lef t), where right and left indicate values within VOIs placed in both hemispheres [4]. More recently, a new kind of multi-sort co-occurrence matrices has been suggested to properly describe the three-dimensional spatial structure (texture) of medical images [11]. These descriptors were first used for the analysis of MRI datasets [12], [13] and later on for direct inter-subject comparisons of SPECT scans [14]. In this paper, we systematically investigate and compare the usefulness of new asymmetry features derived from SPECT perfusion scans. We start from the simplest and most obvious characteristics of brain perfusion such as the mean intensity within different anatomical regions, gradually increasing the resolution through vector (histogram) features, and going up to five-dimensional intensity/gradient co-occurrence matrices. All co-occurrence matrices used are independent of translation, rotation, reflection, and size of the 3D regions they describe [11]. The analysis was performed implementing the Greitz Computerized Brain Atlas [15] to standardize the scans, define the regions of interest (cerebral lobes), and normalize the image data. To our best knowledge, this is the first work on examining the diagnostic abilities of asymmetry indices obtained by means of comparison of 3D perfusion image patterns using co-occurrence matrices.
2 2.1
Method Subjects and SPECT Scans
Alzheimer’s patients were selected from all consecutive ones referred to the Department of Nuclear Medicine of Karolinska Hospital, Stockholm for suspected AD. The AD group consisted of 42 patients (22 males and 20 females, mean age 74 ± 7 years) with cognitive decline suggestive of AD. The clinical diagnosis of AD was clinically determined, fulfilling the NINCDS-ADRDA criteria
Asymmetry of SPECT Perfusion Image Patterns as a Diagnostic Feature
423
Fig. 1. Left panel: examples of original SPECT scans (axial, sagittal, and coronal sections are shown from left to right). Right panel: pairs of brain lobes.
for possible AD, with a MMSE score less than 20 and being also supported by Computer Tomography (CT) and Electroencephalographic (EEG) changes. The rCBF values of these patients were compared to 37 normal controls (18 males, 19 females, mean age 63 ± 12 years) chosen from the library of normal scans available at the Department in order to match as much as possible age and gender of AD patients. Patients and normal volunteers were scanned under the same physiological conditions at rest and with the eyes closed. The slight difference in mean age due to the difficulty in recruiting too old control subjects did not significantly affect the rCBF in the VOIs under study. About 1000 MBq 99m Tc-HMPAO were prepared and was administered intravenously to the subjects within 5 minutes from the preparation. SPECT brain imaging was performed using a three-headed Gamma Camera (TRIAD XLT 20, Trionix Research Laboratory Inc., Twinsburg, OH, USA) equipped with a low-energy high-resolution (LEHR) collimator. The projection data were acquired for 30 seconds per projection at 90 equal angles of a complete revolution. The SPECT images were reconstructed by filtered back projection algorithm using a ramp filter with a cut-off frequency of 0.6 cycles/cm. The data were smoothed using a Hamming filter with a cut-off frequency of 2.25. The attenuation correction was based on a four-point ellipse. Data were projected into a 128x128 pixel matrix that resulted in an isotropic voxel size of 2.2 mm. Examples of original scans are shown in the left panel of Fig 1. 2.2
Brain Regions and Asymmetry Evaluation Procedure
The asymmetry features were obtained by comparing certain measures within pairs of anatomical regions. For definition of these anatomical regions (VOIs), the Greitz Computerized Brain Atlas [15] was used. For registration of the atlas to each of the individual SPECT scans, a fully automated method was used to register a SPECT template volume previously reformatted to atlas space with the individual SPECT scan. Details of the registration procedure can be found
424
V.A. Kovalev et al.
in [14]. The brain regions considered in this work were the frontal, parietal, temporal, occipital and cerebellar lobes (Fig 1). Similar to previous asymmetry study in normal population using MRI [12], the asymmetry was regarded here as dissimilarity of 3D perfusion patterns, as detected by SPECT, within pairs of lobes in the left and right brain hemispheres. It should be stressed that there are no direct relationships between the asymmetry of 3D volumetric image patterns and the shape of 3D surfaces they are bounded with. For all kinds of features under consideration, we are measuring the degree of dissimilarity as the Euclidean distance between these features. Comparison of image features computed for pairs of brain lobes resulted in a single dissimilarity value treated as an asymmetry index. The primary goal was to evaluate the usefulness of different image features (descriptors) and relative importance of different anatomical regions for discrimination of AD and controls. We start from the simplest and most obvious characteristics of brain perfusion such as the mean intensity within different anatomical regions, gradually increasing the resolution through vector (histogram) features, and going up to five-dimensional intensity/gradient co-occurrence matrices. The list of features examined in this study includes: (a) the mean intensity and 3D gradient [16] values, (b) the intensity and gradient magnitude histograms (16 bins each), (c) the three-dimensional intensity co-occurrence matrices W 1, (d) the three-dimensional gradient cooccurrence matrices W 2, (e) the combined, five-dimensional intensity and gradient co-occurence matrices W 3 that provide detailed description of spatial structure of 3D perfusion patters in brain lobes. The combined W 3 matrix of an image region is a five-dimensional array [14]. The matrix axes are associated with elementary features of voxel pairs. All possible neighbors around each voxel with no repetition were considered for given distance range so that descriptors are reflection- and rotation-invariant. More formally, let us consider an arbitrary voxel pair (i, k) defined in 3D by voxel indices i = (xi , yi , zi ) and k = (xk , yk , zk ) and with inter-voxel distance d(i, k). Let us denote their intensities by I(i) and I(k), local gradient magnitudes by G(i), G(k). Then the W 3 co-occurrence matrix can be defined as: W 3 = ||w(I(i), I(k), G(i), G(k), d(i, k))||, where w denotes the number of voxel pairs with characteristics given in brackets. The intensity gradients at each voxel positions were computed using a 3D filter with the small 3×3×3 window suggested in [16]. Further implementation details can be found in [11]. The three-dimensional matrices W 1 and W 2 are defined in a similar way and may be viewed as particular cases of the more general W 3 matrix. The intensity and gradient values were quantized down to 16 bins and inter-voxel distance varied within 1–5 bins, 2.2 mm each. The feature evaluation procedure includes the following three steps: (a) fractional ranking of the tested asymmetry measure values across all SPECT scans to the range of 0-100 to allow direct inter-region comparison, (b) evaluation of the significance of the asymmetry features using t−test and K-means clustering,
Asymmetry of SPECT Perfusion Image Patterns as a Diagnostic Feature
425
(c) visualization and analysis of subjects’ scattering in asymmetry feature space using reduced, two-dimensional scatter plots obtained with the help of multidimensional scaling method [17]. These steps were performed for all types of features given above. The t− and p−values were used as quantitative estimates of the ability of a tested feature to capture the asymmetry property in the image. In all the analyses the cooccurrence matrices treated as corresponding feature vectors.
3
Results
Our results suggest that the mean intensity within the brain lobes did not provide any useful information for discriminating AD and normal controls based on their asymmetry. As a result of two-tailed t−test, no pair of lobes showed any statistical significance of the hypothesis. T -values were unacceptably small (in order of 0.3) with very high p−levels ranged from 0.41 for occipital lobes to as high as 0.90 for frontal lobes. Asymmetry values computed using mean gradient magnitudes gave slightly better results but p−levels were still unacceptably high (from 0.06 for occipital lobes to 0.73 for temporal lobes). The t−test performed on intensity and gradient histogram features implied that the best region was the occipital lobe (p−values of 0.11 for intensity and 0.04 for gradient). However, no region demonstrated any significant difference in asymmetry between AD and controls with these features. Results summarized in Table 1 evidencing that the hypothesis of the existence of different asymmetry for AD patients and controls cannot be rejected. This is the case for asymmetry computed using both kind of matrices W 1 and W 2. A comparison of t−values and p−levels suggests that the most important anatomical regions for measuring asymmetry were the occipital and the temporal lobes while the cerebellum did not provide any useful information at all. Furthermore, results obtained using combined 5D intensity/gradient matrices (see Table 1) confirmed the existence of a significant difference in asymmetry values for AD as compared to normal controls for all lobes except for the cerebellum. The data reported in two last columns of the table suggest that the most useful regions for discriminating AD from normal controls are the temporal lobes. This is Table 1. Significance of asymmetry differences between AD patients and controls brain intensity co-occurrence gradient co-occurrence combined co-occurrence lobe matrices W 1 matrices W 2 matrices W 3 pair t−value p−value t−value p−value t−value p−value Frontal 2.15 0.034 4.29 < 0.001 5.15 < 0.00003 Parietal 3.50 0.001 5.72 < 0.001 6.20 < 0.00001 Occipital 6.36 <0.001 7.05 < 0.001 8.51 < 0.00001 Temporal 5.21 <0.001 7.00 < 0.001 9.36 < 0.00001 Cerebellum 1.10 0.274 0.86 0.090 0.92 0.33
426
V.A. Kovalev et al.
Table 2. Mean and STD values of lobular asymmetry (combined matrices W 3) brain lobe pair Frontal Parietal Occipital Temporal
asymmetry value normal controls AD patients mean STD mean STD 35.0 25.0 64.4 25.4 32.9 22.7 66.3 24.9 29.2 20.4 69.5 21.5 28.2 18.8 70.4 20.0
Fig. 2. Scatter plot of Alzheimer’s cases and normal controls in asymmetry feature space. The original four-dimensional space constituted by asymmetry indices computed for four pairs of brain lobes is reduced to two conditional dimensions using multidimensional scaling method. The ”neutral zone” where AD patients and controls are mixed is bounded by the rectangle.
further illustrated by mean and STD values summarized in Table 2 (the cerebellum is omitted as insignificant). The asymmetry values computed for frontal, parietal, temporal, and occipital lobe pairs of each subject form a four-dimensional feature space. A scatter plot of the asymmetry of all AD cases and controls obtained by reducing the four-dimensional feature space down to two dimensions using multidimensional scaling method is shown in Fig. 2. Interestingly, when an unsupervised K-means clustering method is applied to the data to cluster them down for three groups, the following classes were obtained: class 1 (27 cases, AD patients only), class 2 (21 cases, normal controls only), class 3 (31 cases, 15 AD and 16 control cases mixed). These results are well in agreement with the data presented in Fig. 2. Thus, with this unbiased classification strategy, 48 cases (60.8%) were categorized to the right classes and the remaining 31 cases were categorized to some sort of ”uncertain” class but not missclassified.
Asymmetry of SPECT Perfusion Image Patterns as a Diagnostic Feature
4
427
Discussion
Our result suggest that the mean and histogram features of intensity and gradient magnitude computed for brain lobes do not provide enough resolution to describe brain activity asymmetry usable for discrimination of AD and control scans. This is not surprising because all these features do not utilize spatial information constituted by the specific intensity distribution within brain lobes. Nevertheless, the simplest of these features, namely the mean intensity within predefined regions, is the most commonly used feature in previous work on AD asymmetry in this context [1], [4]. The choice of 3D large VOIs (entire lobes in our case) instead of 2D ROIs renders the analysis more sensitive to changes in perfusion (and hence to asymmetry) and less dependent on errors in outlining and positioning. The results obtained using specific co-occurrence matrices for intensity and gradient values within the different brain lobes gave much better group separation (p−values ranged from 0.034 up to 0.001). The main reason for improvements is perhaps that matrices take also the spatial information into account. When comparing intensity and gradient information, gradient information tends to be more important for asymmetry estimation. This trend was observed across all the tests performed on individual features. From our results, it seems that the rate of change in intensity is more useful than the intensity level itself for discrimination between AD and controls. The best results were obtained when intensity and gradient features were combined into 5D co-occurrence matrices. The differences in asymmetry values between patients and controls found to be highly significant (p <0.00003 for all four cerebral lobes). Thus, our results support the research hypothesis about the importance of 3D asymmetry index as a diagnostic feature of AD. Similar trend of asymmetry increase in AD was also found using high resolution anatomical MRI datasets [11]. The decrease of asymmetry at the late stages with bilateral involvement needs to be studied separately using an additional SPECT material. In conclusion, we believe that the high-resolution asymmetry measures we propose, provide very useful information for discrimination of AD and normal controls. It may not be the ultimate feature but is likely to be more objective and robust than previous approaches. Furthermore, the asymmetry measure is independent of the intensity scale of the SPECT datasets and all inter-subject comparisons are made on the level of asymmetry features only. Therefore there should be no need for normalization of global activity. However, further blind studies are needed to clinically validate the new methodological approach proposed in this paper. Finally, we would like to emphasize that any asymmetry measure is not alone sufficient as a diagnostic feature for discrimination of AD from controls. Instead it gives additional information and should be used in conjunction with the more traditional methods for discriminating AD from controls e.g. through the comparison with a normals database.
428
V.A. Kovalev et al.
References 1. Ishii, K., Sasaki, M., Yamaji, S., Sakamoto, S., Kitagaki, H., Mori, E.: Paradoxal hippocampus perfusion in mild-to-moderate alzheimer’s disease. J Nucl Med 39 (1998) 293–298 2. Bartenstein, P., Minoshima, S., Hirsch, C., Buch, K., Willoch, F., Mosch, D., Schad, D., Schwaiger, M., Kurz, A.: Quantitative assessment of cerebral blood flow in patients with Alzheimer’s disease by SPECT. J Nucl Med 38 (1997) 1095–1101 3. Minoshima, S., Frey, K., Koeppe, R., Foster, N., Kuhl, D.: A diagnostic approach in Alzheimer’s disease using three-dimensional stereotactic surface projections of fluorine-18-FDG PET. J Nucl Med 36 (1995) 1238–1248 4. Borght, T., Minoshima, S., Giordani, B., Foster, N., Frey, K., Berent, S., Albin, R., Koeppe, R., Kuhl, D.: Cerebral metabolic differences in Parkinson’s and Alzheimer’s disease matched for dementia severity. J Nucl Med 38 (1997) 797–801 5. Ashburner, J., Friston, K.J.: Nonlinear spatial normalization using basis functions. Hum Brain Mapp 7(4) (1999) 254–266 6. Andersson, J., Thurfjell, L.: Implementation and validation of a fully automatic system for intra- and inter-individual registration of PET brain scans. J Comput Assist Tomogr 21(1) (1997) 136–144 7. Andersson, J.L.: How to estimate global activity independent of changes in local activity. NeuroImage 6(4) (1997) 237–244 8. Celsis, P., Agniel, A., Cardebat, D., Demonet, J.F., Ousset, P.J., Puel, A.: Age related cognitive decline: a clinical entity? A longitudinal study of cerebral blood flow and memory performance. J Neurol Neurosurg Psych 62 (1997) 601–608 9. Talbot, P.R., Snowden, J.S., Lloyd, J.J., Neary, D., Testa, H.J.: The contribution of single photon emission tomography to the clinical differentiation of degenerative cortical brain disorders. J Neurol 242(9) (1995) 579–586 10. Small, G.W., Mazziotta, J.C., Collins, M.T., Baxter, L.R., Phelps, M.E., Mandelkern, M.A., Kaplan, A., Rue, A.L., Adamson, C.F., Chang, L.: Apolipoprotein E type 4 allele and cerebral glucose metabolism in relatives at risk for familial Alzheimer disease. JAMA 273(12) (1995) 942–947 11. Kovalev, V.A., Kruggel, F., Gertz, H.J., von Cramon, D.Y.: Three-dimensional texture analysis of MRI brain datasets. IEEE TMI 20(5) (2001) 424–433 12. Kovalev, V.A., Kruggel, F., von Cramon, D.Y.: Gender and age effects in structural brain asymmetry as measured by MRI texture analysis. NeuroImage 19 (2003) 895–905 13. Kovalev, V.A., Petrou, M., Suckling, J.: Detection of structural differences between the brains of schizophrenic patients and controls. Psych Res 124 (2003) 177–189 14. Pagani, M., Kovalev, V.A., Lundqvist, R., Jacobsson, H., Larsson, S.A., Thurfjell, L.: A new approach for improving diagnostic accuracy of Alzheimer’s disease and Frontal Lobe Dementia utilising the intrinsic properties of the SPET datasets. Eur J Nuc Med Molec Imag 30(11) (2003) 1481–1488 15. Greitz, T., Bohm, C., Holte, S., Eriksson, L.: A computerized brain atlas: construction, anatomical content and some applications. J Comp Assist Tomogr 15 (1991) 26–38 16. Zucker, S.W., Hummel, R.A.: A 3D edge operator. IEEE PAMI 3 (1981) 324–331 17. Cox, T.F., Cox, M.A.A.: Multidimensional scaling. 2nd edn. Volume 88 of Monographs on Statistics and Applied Probability. Chapman and Hall, Boca Raton, Florida (2000) 328 p.
Predicting the Effects of Deep Brain Stimulation with Diffusion Tensor Based Electric Field Models Christopher R. Butson1 , Scott E. Cooper2 , Jaimie M. Henderson3 , and Cameron C. McIntyre1,2 1
2
Department of Biomedical Engineering, Cleveland Clinic Foundation, Cleveland, OH
[email protected] Center for Neurological Restoration, Cleveland Clinic Foundation, Cleveland, OH 3 Department of Neurosurgery, Stanford School of Medicine, Stanford, CA
Abstract. Deep brain stimulation (DBS) is an established therapy for the treatment of movement disorders, and has shown promising results for the treatment of a wide range of other neurological disorders. However, little is known about the mechanism of action of DBS or the volume of brain tissue affected by stimulation. We have developed methods that use anatomical and diffusion tensor MRI (DTI) data to predict the volume of tissue activated (VTA) during DBS. We co-register the imaging data with detailed finite element models of the brain and stimulating electrode to enable anatomically and electrically accurate predictions of the spread of stimulation. One critical component of the model is the DTI tensor field that is used to represent the 3-dimensionally anisotropic and inhomogeneous tissue conductivity. With this system we are able to fuse structural and functional information to study a relevant clinical problem: DBS of the subthalamic nucleus for the treatment of Parkinsons disease (PD). Our results show that inclusion of the tensor field in our model caused significant differences in the size and shape of the VTA when compared to a homogeneous, isotropic tissue volume. The magnitude of these differences was proportional to the stimulation voltage. Our model predictions are validated by comparing spread of predicted activation to observed effects of oculomotor nerve stimulation in a PD patient. In turn, the 3D tissue electrical properties of the brain play an important role in regulating the spread of neural activation generated by DBS.
1
Introduction
Deep brain stimulation (DBS) is an effective, proven therapy for the treatment of Parkinson’s Disease (PD), essential tremor and dystonia. The realization that
The authors would like to thank Ashu Chaturvedi, Barbara Wolgamuth, and Christopher Maks for their technical assistance on this project. This work was supported by grants from the American Parkinson Disease Association, the Ohio Biomedical Research and Technology Transfer Partnership, and the National Institutes of Health (NS-50449 & NS-52042).
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 429–437, 2006. c Springer-Verlag Berlin Heidelberg 2006
430
C.R. Butson et al.
high frequency DBS generates clinical benefits analogous to those achieved by surgical lesioning has transformed the use of functional neurosurgery for the treatment of movement disorders [1,2,3]. Thalamic DBS for intractable tremor has virtually replaced ablative lesions of the thalamus [4]. And, DBS of the subthalamic nucleus (STN) or globus pallidus internus has largely replaced pallidotomy in the treatment of the cardinal motor features of PD (resting tremor, rigidity, bradykinesia) [5]. In addition, multiple pilot studies have begun to examine the utility of DBS for dystonia, epilepsy, obsessive-compulsive disorder, and depression. However, the clinical successes of DBS are tempered by our limited understanding of the effects of DBS on the nervous system, and the therapeutic mechanisms of DBS remain unknown [6]. The fundamental purpose of DBS is to modulate neural activity with extracellular electric fields, but the technology necessary to accurately predict and visualize the neural response to DBS has not been previously available. In turn, the current state-of-the-art in pre-operative targeting strategies and post-operative parameter selection processes do not take into account the influence of the electric field generated by the DBS electrode on the surrounding neural structures. DBS systems provide thousands of possible stimulation configurations, resulting in great flexibility in customization of therapy for individual patients, but at the expense of time-consuming clinical evaluation to determine therapeutic stimulation settings [7]. As a result, DBS patient management is primarily based on trial-and-error as opposed to quantitative understanding. To address these limitations, we have developed techniques to predict the volume of tissue activated (VTA) during DBS [8]. We have previously shown that the electrode-tissue interface and the encapsulation layer surrounding the electrode can cause substantial differences in the VTA volume. The electrode-tissue interface interacts with the time course of the stimulation waveform such that longer pulse widths are more strongly attenuated [9]. Similarly, high electrode impedances are correlated with greater electrode encapsulation, which also reduces the VTA volume [10]. However, these previous studies were conducted using a homogeneous, isotropic tissue medium which is useful for parameterizing the effects of DBS but does not take into account the complex properties of neural tissue. In this paper we explore the effects of 3-dimensional tissue electrical properties on the VTA generated by DBS. Specifically, tissue conductivity is extracted from diffusion tensor MRI (DTI) data and coregistered with pre-operative and post-operative anatomical MRIs. With this method we are able to create a realistic volume conductor for use in a finite element field model of the brain [11,12]. Until now there have been few methods available to predict or visualize the effects of DBS on individual patient brains. The innovation of this paper is to predict the volume of tissue activated during stimulation using quantitative forward field models. This type of model is potentially useful to the EEG/MEG community because it provides a forward field solution to brain dynamics that can be used to validate the inverse models often used for localizing neural activity. The results presented in this paper are the first clinically validated, patient-specific models of DBS for predicting neural activation.
Predicting the Effects of DBS with Diffusion Tensor
2
431
Methods
The DBS system consists of a quadripolar electrode lead, an implantable pulse generator (IPG) and an extension wire connecting the two (Figure 1). The electrode lead is implanted during stereotactic surgery; the work presented in this paper focuses on STN DBS. Each electrode contact is 1.27 mm diameter and 1.5 mm height, and electrodes are separated by either a 1.5 mm (3387 model) or 0.5 mm (3389 model) insulation gap. The DBS system can be operated in multi-polar or monopolar configurations with any combination of anodes and cathodes; during monopolar stimulation the return electrode is the IPG case. Typical settings for monopolar STN DBS are -3 V, 90 μsec pulse width and 130 Hz. For therapeutic stimulation settings, symptomatic relief of PD symptoms is normally achieved within a few seconds. We have developed techniques to predict the spread of activation during DBS by combining several imaging methods into a single coherent framework (Figure 1). First, a pre-operative T1 MRI (collected on a Siemans 1.5T scanner) was segmented using SNT software (Surgical Navigation Technologies, a division of Medtronic, Minneapolis, MN, USA) based on the non-linear warping techniques of [13] to produce closed surfaces for several basal ganglia nuclei: STN, thalamus, globus pallidus, putamen, caudate and accumbens (only STN and thalamus are shown in the figures). Second, a post-operative T1 MRI was used to determine the position of the electrode. Electrode localization was performed by isosurfacing progressively smaller values of the electrode artifact until the surface converged onto a volume containing the 4 electrode contacts. The pre-operative and post-operative MR volumes were then co-registered with fractional anisotropy values from a DTI atlas brain using Analyze 6.0 (AnalyzeDirect, Lenexa, KS, USA). The DTI atlas brain is explained in detail in [14]. Briefly, images were acquired on a 1.5-T Philips Gyroscan MR using single-shot echo-planar imaging sequences. The imaging matrix was 112 x 112 with a field of view of 246 x 246 mm. Transverse sections of 2.2 mm thickness were acquired for a total of 55 slices. Diffusion weighting was encoded along 30 independent orientations, and the b value was 700mm2 /sec. Imaging was repeated six times to enhance the signal to noise ratio. The six independent elements of the 3 x 3 diffusion tensor were calculated for each pixel by using a multivariate linear fitting method. Accurate representation of the stimulating electrodes and the surrounding electric field is made possible with two techniques: multiresolution meshes and dynamic interpolation. Finite element methods are used to model geometric features and boundary conditions for which analytical solutions do not exist. A finite element model of the DBS electrode and surrounding tissue medium was constructed using FEMLAB 3.1 (Comsol Inc., Burlington, MA, USA). Multi-resolution meshes allow node density to be highest where the electric field is changing the fastest. The second technique, dynamic interpolation, allows the electrode to be easily moved to arbitrary positions and orientations within the brain volume without repeating the computationally costly and time-consuming step of creating a new mesh. With these methods, DTI tensors are dynamically interpolated onto the finite element mesh, an operation that takes 10-15 seconds for approximately
432
C.R. Butson et al.
Fig. 1. Methods. The DBS system is implanted as shown. Pre-operative, post-operative and DTI MRIs are coregistered, the stimulation waveform is applied to the electrode, and the system is solved using the Fourier FEM solver. The resulting time- and spacedependent voltages are used to calculated activating function thresholds for fibers of passage, and the VTA is determined using these thresholds.
215,000 nodes and 1.9e6 DTI tensors on an 8-processor SGI Prism with 32 GB RAM. Lastly, boundary conditions for the stimulation parameters at each electrode contact are dynamically interpolated onto the brain mesh. The conductivity tensor is directly solved for at each voxel using a simple linear transform [15]: σ = (σe /de )D, where σe is the effective extracellular conductivity, de is the effective extracellular diffusivity and D is the diffusion tensor. The ratio of (σe /de ) has been determined from experimental and empirical data to be ~0.7 (S - s)/mm2 [12,15]. This value is then scaled to match the electrode impedance recorded from the IPG of the given patient.
Predicting the Effects of DBS with Diffusion Tensor
433
VTAs are calculated with stimulation prediction techniques that integrate finite element based electric field solutions with multi-compartment cable models of myelinated axons [16,17,11]. The Poisson equation is solved with a Fourier FEM solver [9] to determine voltage as a function of time and space within the tissue medium: ∇ · σ∇Φ = −I (1) where Φ is voltage, I represents injected current and σ is a complex stiffness matrix that includes the DTI conductivities and the reactive components of the electrode-tissue interface. The voltage solution is subsequently interpolated onto the model neurons to determine stimulation thresholds for action potential generation. Activating function values are then defined from the second difference of the voltage solution (δ 2 Ve /δx2 ) and used to provide a spatial map for VTA prediction. Simulations and visualization were conducted using BioPSE [18].
3
Effects of DTI-Based Tissue Conductivity on Bioelectric Field Modeling
We found that the use of DTI-based tissue conductivities caused substantial differences in the activation of anatomical targets when compared to a homogeneous, isotropic tissue medium (Figure 2a). First, the VTAs calculated using
A. Tissue Medium: Isotropic
DTI Thalamus
STN
ZI/H2
Internal Capsule
B. Oculomotor Nerve: Localization & Activation Sagittal View
Oblique View
VTA & Fiber Activation
Fig. 2. Results. A) VTAs are compared between isotropic and DTI-based tissue mediums, illustrating the difference in the size and shape of the VTAs (leftmost images) as well as the differential activation of thalamus, STN, ZI/H2 and internal capsule (images at right, activation of subvolumes shown in light green). B) These results are validated by comparing predicted thresholds to those observed during activation of the oculomotor nerve in a PD patient (VTA at -6 V shown, fibers in red are activated).
434
C.R. Butson et al.
the two different methods have very different shapes. As expected, the VTA generated with the isotropic medium is nearly spherical. Alternately, the VTA generated using the DTI conductivities is non-spherical and asymmetric. The differences in shape are due to the 3D tissue properties and cannot be explained by a change in the mean conductivity of the isotropic volume. Second, we observed that the VTAs generated under the two conditions caused differential activation of the thalamus, STN, zona incerta and fields of Forel (ZI/H2), and internal capsule. These structures have direct relevance for STN DBS: activation of the thalamus and internal capsule is believed to be correlated with side effects, while activation of STN and ZI/H2 is correlated with therapeutic effects. The magnitude of the difference was dependent on both the shape of the VTA and its total volume, which is directly proportional to the stimulation parameters (voltage and pulse width).
4
Activation of Oculomotor Nerve in Parkinson’s Disease Patient
We compared model predictions to patient outcomes by looking for activation of discrete target structures with clear clinical signs. One such structure that is relevant to STN DBS is the oculomotor nerve (cranial nerve III), which is attractive for several reasons: it is visible on MR (Figure 2b); it has well defined borders; stimulation of this nerve manifests itself with clear clinical signs (ocular deviations ipsilateral to the side of the brain being stimulated). We conducted these experiments on a 51 year old male PD patient with bilateral DBS electrodes implanted in both STNs (Medtronic 3387 electrodes and Soletra IPGs). The patient received informed consent to participate in the study and the protocol was approved by the Institutional Review Board at the Cleveland Clinic. The oculomotor nerve on the right side of the patient brain was localized on a parasagittal slice along a line from the point it exits the midbrain to midway through the superior colliculus. The position of the fibers within this oblique plane was determined with a 3D atlas [19]. While PD patients are normally stimulated at therapeutic frequencies over 100 Hz, in these experiments the patient was stimulated at 2 Hz, which has the advantage of producing neither paresthesias nor fused muscle contractions. Hence, muscle twitches are directly observable by the clinician and in some cases can be sensed by the patient. The patient was stimulated with the electrode on the right side of the brain at 2 Hz, 120 μsec pulse width while voltage magnitude was increased from 0 V to -7 V in -1 V increments. Thresholds for activation of the oculomotor nerve during monopolar stimulation of contact 0 (the most distal contact) were detected by an observer at -4 V, and were sensed by the patient at -6 V. Corresponding simulations indicate good agreement with our experimental results (Figure 2b). With the DTI volume, the VTA was just touching the border of the oculomotor fibers at -4 V. At higher voltages, progressively more of the fibers were activated as indicated by intersection of the VTA with the fiber bundles. In contrast, the threshold for activation with the isotropic medium was -1 V.
Predicting the Effects of DBS with Diffusion Tensor
5
435
Discussion
Existing DBS systems were adapted from cardiac pacing technology ~20 years ago. The original design of the Medtronic 3387/3389 DBS electrode was hindered by limited knowledge of the neural stimulation objectives of the device. Recent advances in neural engineering design tools, such as the computer modeling techniques used in this study, were not available when DBS was invented. In addition, the therapeutic mechanisms of action remain an issue of controversy [6]. Until direct neurophysiological relationships can be determined between the stimulation and therapeutic benefit, true engineering optimization of DBS devices will remain difficult. However, the neurostimulation prediction tools used in this study provide a new technology platform that can be used to advance three important areas in DBS research. First, patient-specific models of DBS can be used to advance our scientific understanding of the effects of DBS [11,16]. Development of correlations between VTA spread into various anatomical nuclei under therapeutic and non-therapeutic stimulation conditions provides the basis for definition of an anatomical and electrical target for the stimulation. Second, patient-specific models of DBS can be used to improve clinical adjustment of the stimulation parameters [20]. Development of clinician-friendly software incorporating 3D visualization graphics of the MRI data, anatomical nuclei, DBS electrode, and VTA as a function of the stimulation parameters provides the basis for decreasing the level of intuitive skill necessary to perform clinical DBS parameter selection. Third, anatomically and electrically accurate models of DBS can be used for the development of novel electrode designs that optimize the shape of the VTA to maximize therapeutic outcome [8]. Development of theoretically optimal electrode designs for a given stimulation target area in the brain and neurological disorder has the potential to minimize side effects generated by the stimulation and maximize therapeutic benefit from DBS. This paper demonstrates results from a clinically validated, patient-specific model of DBS. The gold standard for this type of experimental model would use microelectrode recordings in addition to the physiological results presented in the paper. However, microelectrode recordings can only be made very briefly and in areas close to the electrode target due to patient safety considerations. Nonetheless, the results of this study show that when 3D tissue conductivities derived from DTI data are incorporated into our model of neurostimulation they substantially impact the VTA. Only with the DTI-based conductivities were we able to achieve good agreement between our model predictions and clinical results. In addition, we have previously shown that electrode capacitance and electrode encapsulation also play important roles in defining the VTA. In turn, attempts to make quantitative correlations between stimulation parameters and behavioral measurements on a patient-by-patient basis should explicitly account for the effects of electrode capacitance, electrode impedance, and the 3D tissue electrical properties of the tissue medium surrounding the DBS electrode.
436
C.R. Butson et al.
References 1. Benabid, A.L., Pollak, P., Louveau, A., Henry, S., de Rougemont, J.: Combined (thalamotomy and stimulation) stereotactic surgery of the vim thalamic nucleus for bilateral parkinson disease. Appl Neurophysiol 50 (1987) 344–6 2. Benabid, A., Pollak, P., Gervason, C., Hoffmann, D., Gao, D., Hommel, M., Perret, J., de Rougemont, J.: Long-term suppression of tremor by chronic stimulation of the ventral intermediate thalamic nucleus. Lancet 337 (1991) 403–406 Reprint Status: In File. 3. Gross, R.E., Lozano, A.M.: Advances in neurostimulation for movement disorders. Neurol Res 22 (2000) 247–58 4. Benabid, A.L., Pollak, P., Gao, D., Hoffmann, D., Limousin, P., Gay, E., Payen, I., Benazzouz, A.: Chronic electrical stimulation of the ventralis intermedius nucleus of the thalamus as a treatment of movement disorders. J Neurosurg 84 (1996) 203–14 5. Kumar, R., Lang, A.E., Rodriguez-Oroz, M.C., Lozano, A.M., Limousin, P., Pollak, P., Benabid, A.L., Guridi, J., Ramos, E., van der Linden, C., Vandewalle, A., Caemaert, J., Lannoo, E., van den Abbeele, D., Vingerhoets, G., Wolters, M., Obeso, J.A.: Deep brain stimulation of the globus pallidus pars interna in advanced parkinson’s disease. Neurology. 55 (2000) S34–9 6. McIntyre, C.C., Savasta, M., Kerkerian-Le Goff, L., Vitek, J.L.: Uncovering the mechanism(s) of action of deep brain stimulation: activation, inhibition, or both. Clin Neurophysiol 115 (2004) 1239–48 7. Obeso, J., Olanow, C., Rodriguez-Oroz, M., Krack, P., Kumar, R., Lang, A.: Deepbrain stimulation of the subthalamic nucleus or the pars interna of the globus pallidus in parkinson’s disease. N Engl J Med 345 (2001) 956–63 8. Butson, C., McIntyre, C.C.: Role of electrode design on the volume of tissue activated during deep brain stimulation. J Neural Eng 3 (2006) 1–8 9. Butson, C.R., McIntyre, C.C.: Tissue and electrode capacitance reduce neural activation volumes during deep brain stimulation. Clin Neurophysiol 116 (2005) 2490–500 10. Butson, C.R., Maks, C.B., McIntyre, C.C.: Sources and effects of electrode impedance during deep brain stimulation. Clin Neurophysiol 117 (2006) 447–54 11. McIntyre, C.C., Mori, S., Sherman, D.L., Thakor, N.V., Vitek, J.L.: Electric field and stimulating influence generated by deep brain stimulation of the subthalamic nucleus. Clin Neurophysiol 115 (2004) 589–95 12. Tuch, D.S., Wedeen, V.J., Dale, A.M., George, J.S., Belliveau, J.W.: Conductivity tensor mapping of the human brain using diffusion tensor mri. Proc Natl Acad Sci U S A 98 (2001) 11697–701 13. Christensen, G.E., Joshi, S.C., Miller, M.I.: Volumetric transformation of brain anatomy. IEEE Trans Med Imaging 16 (1997) 864–77 0278-0062 (Print) Journal Article. 14. Wakana, S., Jiang, H., Nagae-Poetscher, L.M., van Zijl, P.C., Mori, S.: Fiber tract-based atlas of human white matter anatomy. Radiology 230 (2004) 77–87 0033-8419 Journal Article. 15. Haueisen, J., Tuch, D.S., Ramon, C., Schimpf, P.H., Wedeen, V.J., George, J.S., Belliveau, J.W.: The influence of brain tissue anisotropy on human eeg and meg. Neuroimage 15 (2002) 159–66
Predicting the Effects of DBS with Diffusion Tensor
437
16. Butson, C., Hall, J., Henderson, J., McIntyre, C.: Patient-specific models of deep brain stimulation: 3d visualization of anatomy, electrode and volume of activation as a function of the stimulation parameters. In: Society for Neuroscience. Volume 30:1011.11. (2004) 17. McIntyre, C.C., Richardson, A.G., Grill, W.M.: Modeling the excitability of mammalian nerve fibers: influence of afterpotentials on the recovery cycle. J Neurophysiol 87 (2002) 995–1006 18. BioPSE: Scientific Computing and Imaging Institute (SCI),http://software.sci. utah.edu/biopse.html, 2002. 19. Haines, D.E.: Neuroanatomy : an atlas of structures, sections, and systems. 5th edn. Lippincott Williams & Wilkins, Philadelphia (2000) 20. Butson, C., Maks, C., Cooper, S.E., Henderson, J.M., McIntyre, C.C.: Deep brain stimulation interactive visualization system. In: Society for Neuroscience. Volume 898.7., Washington, D.C. (2005)
CFD Analysis Incorporating the Influence of Wall Motion: Application to Intracranial Aneurysms Laura Dempere-Marco1, Estanislao Oubel1, Marcelo Castro2, Christopher Putman3, Alejandro Frangi1, and Juan Cebral2 1
Department of Technology, Pompeu Fabra University, 08003 Barcelona, Spain {laura.dempere, estanislao.oubel, alejandro.frangi}@upf.edu 2 School of Computational Sciences, George Mason University, 22030 Fairfax, USA {mcastro3, jcebral}@gmu.edu 3 Interventional Neuroradiology, Inova Fairfax Hospital, 22042 Falls Church, USA
[email protected]
Abstract. Haemodynamics, and in particular wall shear stress, is thought to play a critical role in the progression and rupture of intracranial aneurysms. A novel method is presented that combines image-based wall motion estimation obtained through non-rigid registration with computational fluid dynamics (CFD) simulations in order to provide realistic intra-aneurysmal flow patterns and understand the effects of deforming walls on the haemodynamic patterns. In contrast to previous approaches, which assume rigid walls or ad hoc elastic parameters to perform the CFD simulations, wall compliance has been included in this study through the imposition of measured wall motions. This circumvents the difficulties in estimating personalized elasticity properties. Although variations in the aneurysmal haemodynamics were observed when incorporating the wall motion, the overall characteristics of the wall shear stress distribution do not seem to change considerably. Further experiments with more cases will be required to establish the clinical significance of the observed variations.
1 Introduction Intracranial aneurysms are pathological dilatations of cerebral arteries, which tend to occur at or near arterial bifurcations, mostly in the circle of Willis. The optimal management of unruptured aneurysms is controversial and current decision-making is mainly based on considering their size and location, as derived from the International Study of Unruptured Intracranial Aneurysms (ISUIA) [1]. However, it is thought that the interaction between haemodynamics and wall mechanics plays a critical role in the formation, growth and rupture of aneurysms. Although there is little doubt that arterial and aneurysmal walls do move under the physiologic pulsatile flow conditions [2], there is no accurate information on the magnitude and other motion characteristics required for understanding the interaction between the haemodynamics and the wall biomechanics. Visualization of aneurysmal pulsation seems to have become possible with the advent of 4DCTA imaging techniques [3,4]. Confounding these observations, a number of imaging artifacts were present related to motion of bony R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 438 – 445, 2006. © Springer-Verlag Berlin Heidelberg 2006
CFD Analysis Incorporating the Influence of Wall Motion
439
structures [3]. Since there are no reliable techniques for measuring flow patterns in vivo, various modeling approaches were considered in the past [5,6]. Hitherto, most computational fluid dynamics (CFD) methods assume rigid walls due to a lack of information regarding both wall elasticity and thickness. Moreover, in order to perform simulations that account for the fluid-structure interaction, it is also necessary to prescribe the intra-arterial pressure waveform, which is not normally acquired during routine clinical exams. In this paper, wall motion is quantified by applying image registration techniques to dynamic X-ray images. To study the effects of wall compliance on the aneurysmal haemodynamics, the obtained wall motion is directly imposed to the 3D model derived from the medical images.
2 Wall Motion Estimation Through Image Registration Three case studies have been considered in this paper. Each patient underwent conventional transfemoral catheterization and cerebral angiography using a Philips Integris Biplane angiography unit. As part of this examination, a rotational acquisition was performed using a six seconds constant injection of contrast agent and a 180 degrees rotation at 15 frames per second over 8 seconds. These images were transferred to the Philips Integris Workstation and reconstructed into 3D voxel data using the standard proprietary software, which was used for generating a 3D anatomical model. Biplanar dynamic angiogram at 2 Hz was subsequently performed using a six second contrast injection for a period of at least 6 seconds. In addition, an expert neuroradiologist (CP) measured the diameters D1 and D2 (maximum height and width, respectively) on these projection views. This information was then used to establish the pixel size and to quantify the wall motion. In order to estimate the wall motion, image registration, which establishes correspondences between points in two different images, was applied to the series of 2D images. To this end, a 2D version of the non-rigid registration algorithm proposed by Rueckert et al. [7] was applied. This method is based on free-form deformations, which are modeled through the 2D tensor product of 1D cubic B-splines. By moving a set of control points originally distributed into a regular lattice, a smooth and continuous transformation is obtained that is subsequently used for deforming one image into the other. The control points are moved in order to maximize the similarity between the two images. The similarity metric used in this study to characterize the alignment between the two images is the Normalized Mutual Information (NMI) [8]. For each series of sequential X-ray projection images, a set of landmarks were manually delineated in the first frame, and subsequently propagated by using the transformations derived from the image registration procedure. The complete series was registered to the initial reference frame. Thus, wall motion was estimated from the propagated landmarks by calculating the distance between corresponding points. A distribution of displacement vectors was obtained for the complete set of landmarks. The wall motion was estimated by using two assumptions: differential motion (i.e. different amplitude for different regions), and uniform motion (i.e. same amplitude for all regions). Thus, the statistical characterization of the displacement field was performed on the whole aneurysm and also on several different anatomical regions. In order to characterize the behavior of each region, both the median and the
440
L. Dempere-Marco et al.
90th percentile of the displacement vector modulus were calculated. By considering the 90th percentile, an upper boundary to the wall motion amplitude in the simulations is obtained while excluding outliers.
3 Haemodynamics Modeling Including Wall Compliance In order to compute the intra-aneurysmal flow patterns, personalized models of blood vessels were constructed from 3D rotational angiography (3DRA) images. The segmentation procedure was based upon the use of deformable models by first smoothing the image through a combination of blurring and sharpening operations, followed by a region growing segmentation and isosurface extraction. This surface was then used to initialize a deformable model under the action of internal smoothing forces and external forces from the gradients of the original unprocessed images [5]. The anatomical model was subsequently used as a support surface to generate a finite element grid with an advancing front method that first re-triangulates the surface and then marches into the domain generating tetrahedral elements [9]. The blood flow was mathematically modeled by the unsteady Navier-Stokes equations for an incompressible fluid: ⎞ ⎛∂v ∇ ⋅ v = 0, ρ ⎜⎜ + v ⋅ ∇v ⎟⎟ = −∇p + ∇ ⋅ τ ⎠ ⎝ ∂t
(1)
where ρ is the density, v is the velocity field, p is the pressure, and τ is the deviatoric stress tensor. The fluid was assumed Newtonian with a viscosity of μ=4 cPoise and a density of ρ=1.0 g/cm3. The blood flow boundary conditions were derived from PCMR images of the main branches of the circle of Willis obtained from a normal volunteer. Traction free boundary conditions were imposed at the model outflows. At the vessel walls, no-slip boundary conditions were applied. Since the velocity of the wall was estimated through imaging techniques, wall compliance is implicitly included in the simulation process. As it was not possible to determine the shape of the distension waveform at low sampling rates, it was assumed as a first order approximation, that it followed the flow waveform. Such waveform was scaled locally to achieve the measured displacement amplitude in each of the regions considered. The walls were assumed to move in the normal direction, and such motion was directly imposed to the 3D mesh derived from the volumetric medical images. The grid was updated at each time step by a non-linear smoothing of the wall velocity into the interior of the computational domain [10].
4 Experiments and Results In Fig. 1(a), the results from the segmentation of the 3DRA medical images for the three cases considered in this study are displayed. To establish whether the detected motion could be discriminated from the intra-observer variability in delineating contours, a manual segmentation was performed by an expert observer in the first frame by selecting closely located points on the boundary of the aneurysm and subsequently fitting a spline to the series of points. By considering also the independently selected
CFD Analysis Incorporating the Influence of Wall Motion
441
landmarks for this same frame, the distributions of distances derived from: a) original landmarks to the spline, and b) propagated landmarks and original landmarks, were compared. To this end, Student paired t-test and ranked sum Wilcoxon tests were performed. As established before, the landmarks were tracked in the complete series in order to quantify the wall motion. Fig.1(b) illustrates the landmarks propagation from which wall deformation estimates follow. The amplitude of the motion is characterized in Fig. 1(c) by both the median and the 90th percentile. For patient #1, statistically significant differences were found between the two distributions for 8/11 frames (PTTEST<0.04, PWIL<0.02). In all of these cases, the average distance between propagated landmarks and original landmarks was larger than that due to intra-observer variability. Thus, although small deformation fields are obtained for all the images within the temporal series (see Fig. 1(c)), these differ in a statistically meaningful way from the error made in the manual delineation of the contours. Pairwise one-way ANOVA analyses were also performed on the series defined by the median of the displacement of the landmarks for all the frames and statistically meaningful differences (P<0.004) were found between the distribution corresponding to the lobule and the rest of distributions. For patient #2, the deformations recovered by the registration algorithm required subpixel accuracy to be detected. In fact, statistically significant differences were found between the two distributions for 3/6 frames (PTTEST<0.02, PWIL<0.04), however, the intra-observer variability was found to be larger than the average wall motion detected. Thus, it was concluded that the deformation field was so small that could not be reliably quantified given the available image resolution (i.e. maximum mean wall deformation of 0.07 mm in the temporal series). For patient #3, the deformation fields obtained yielded statistically significant differences for 6/9 (PTTEST<0.04, PWIL<0.03) frames, with the average wall deformation larger than the error due to intra-observer variation. The amplitude of the maximum displacement observed was 0.25 mm, i.e. about 3% of the aneurysm diameter, with no significant differences between regions. A total of five simulations were carried out for patient #1, using different vessel wall velocity conditions: 1) maximum displacement (i.e. 90th percentile), differential motion, 2) maximum displacement, uniform motion, 3) median displacement, differential motion, 4) median displacement, uniform motion, and 5) rigid walls. For patients #2 and #3, a compliant and a rigid simulation were carried out. Since for patient #2 it was not possible to detect a significant wall deformation, a uniform wall motion with amplitude just below the resolution of the technique was used for the compliant model. For patient #3, only the maximum uniform wall motion was imposed in the elastic model to assess the maximum effect that wall motion would have. In order to compare the results obtained with the different vessel wall movements, the following characterization of the wall shear stress (WSS) was used: the necks of the aneurysms were manually selected on the anatomical models, and a region of approximately 1 cm of the parent vessel (denoted proximal parent vessel) from the neck of the aneurysm was identified (see Fig. 2). At each time instant, the average WSS magnitude was computed in the proximal parent vessel region and used to normalize the instantaneous WSS in the aneurysm and to identify regions of elevated WSS (NWSS>1). Table 1 shows the area of these regions, the percent of the total aneurysm shear force
442
L. Dempere-Marco et al.
(a)
(b)
(c)
Fig. 1. (a) Segmented models corresponding to the 3DRA medical images considered in this study (from left to right: patient #1, patient #2, and patient #3), (b) illustration of the propagation of the landmarks between different frames for [left] patient #1-frame #1, [middle] zoom corresponding to region containing the lobule, [right] patient #1-frame #2, and (c) estimation of wall deformation discriminated from intra-observer variability: [top] median of the modulus of the displacement vectors, and [bottom] 90th percentile of the modulus of the displacement vectors for patient #1. The sign of the wall deformation indicates whether dilation (+) or contraction (-) is obtained, and a null deformation has been imposed to those frames for which wall deformation cannot be discriminated from intra-observer variability.
applied over these regions, and a “shear force concentration factor” (percentage of the shear force applied in these regions divided by the percent area of the regions), for each of the wall motion regimes considered and for each of the three patients. Visualizations of the instantaneous WSS distribution at five selected instants during the cardiac cycle and for all the wall motion conditions of patient #1 are presented in Fig. 3. These visualizations and the data presented in Table 1 reveal a region of elevated WSS in the dome of the aneurysm. Although this region covers a small area of the aneurysm (~1.4%), it contributes to a significant fraction of the total shear force (~8%), i.e. it is subject to a “concentrated shear force”. A graph of the WSS obtained with the rigid wall model and the compliant model with maximum differential wall motion is shown in Fig. 4. This figure shows that rigid models tend to overestimate the WSS compared to the compliant models. However, the overall WSS distributions obtained with the different wall models have similar appearances and characteristics in spite of small local deviations. Visualizations of the instantaneous WSS distributions at the five selected instants during the cardiac cycle for patients #2 and #3 are presented in Fig. 5. The results obtained with both compliant (maximum uniform wall deformation) and rigid models are shown. The visualizations reveal a stable region of elevated WSS in the dome of the aneurysm for patient #2. For patient #3, the region of elevated WSS expands from the neck to the dome of the aneurysm.
CFD Analysis Incorporating the Influence of Wall Motion
443
Flow rate (ml/s)
Table 1. Percent of the area of the aneurysm under elevated wall shear stress, percent of the shear force applied in this region, and concentration factor (CF) for each patient under different wall motion conditions. The values shown are time averages over the cardiac cycle.
(a)
(b)
t (s)
case max diff max unif med diff med unif rigid elastic rigid elastic rigid
%area 1.31 1.40 1.44 1.41 1.12 2.04 1.37 2.16 1.82
%force 7.46 8.06 8.34 8.11 6.09 6.37 4.71 19.48 18.86
CF. 5.8 6.0 6.0 5.9 5.7 3.3 3.5 9.4 10.4
Compliant Model
Fig. 2. (a) Aneurysm subdivision for differential wall motion estimation (vessel, sac and lobulation), and (b) flow waveform used to prescribe pulsatile velocity conditions at the inlet boundary of the models. The dots indicate the instants during the cardiac cycle at which flow visualizations are presented.
# 1 1 1 1 1 2 2 3 3
Rigid Model
Fig. 4. WSS obtained with different wall motions (rigid vs compliant walls) when 90th percentile uniform motion is prescribed for patient#1
Color scale common to Fig. 3 and Fig. 5 -2
150.0
100.0
50.0
0.0
WSS (dyne·cm )
Fig. 3. Instantaneous WSS distributions obtained with different wall motions: (Rows from top to bottom) 90th percentile differential, 90th percentile uniform, median differential, median uniform, and rigid walls. Each column corresponds to a time instant highlighted in Fig. 2(b).
Fig. 5. Instantaneous WSS distributions obtained for patient #2 (top panel) and #3 (bottom panel), using compliant walls (top row) and rigid walls (bottom row). Columns correspond to the time instants indicated in Fig. 2.
444
L. Dempere-Marco et al.
The regions of elevated WSS (i.e. WSS larger than the average WSS over the proximal parent vessel) cover approximately 2% of the area of the aneurysms for both patients #2 and #3. It is interesting to note that compliant models tend to yield slightly larger areas of elevated WSS. This may be due to lower WSS values in the proximal parent artery for distending vessels.
5 Discussion and Conclusions The purpose of this study was to introduce a novel way to combine image-based motion estimation with CFD analysis to better understand the effects of aneurysm wall compliance on intra-aneurysmal blood flow patterns. It has been proven that it is possible to determine wall motion from X-ray dynamic imaging through image registration. Furthermore, the reported global or regional estimates provide a basis for potentially more accurate compliant wall CFD simulations. This study adds further evidence to the results reported by [3,4] where wall motion was observed and found to be different in the bleb. The methodology for wall motion estimation can be improved in a number of ways such as image acquisition at higher frame rates, use of larger catheters and higher contrast injection rates. Also, ideally the images could be gated to heart rate or temporally registered with arterial pressure waveforms. In this study, only low frame rate acquisitions were available. However, it is expected that with higher frame rates, the complete distension waveform will be reproduced, thus avoiding the need to assume any particular temporal function. Moreover, the structural differences between frames due to the passage of the bolus will be minimized. In fact, a potential limitation of this technique is that homogenous mixing of the contrast cannot be assured at the current injection rates and catheter sizes. It is also expected that with the advent of new technological advances, an improvement in spatial resolution will follow and more subtle variations will therefore be detected. When assessing the influence of wall motion on the haemodynamics, relatively small changes in the distribution of WSS were observed when imposing different wall motions. Point-wise comparisons of the WSS magnitudes obtained with the rigid and compliant models were carried out and it was observed that rigid models tended to over estimate the WSS magnitude. This effect was less pronounced when the wall motion was small. This result is in agreement with those reported by Shojima et al. [11]. In contrast, a number of haemodynamic characteristics did not exhibit substantial variations. For all patients, the compliant and rigid models provided consistent estimations of the location and size of the flow impaction zone. In particular, the area of the aneurysm that is under elevated wall shear stress with respect to the average WSS in the proximal parent vessel, the contribution to the total shear force of this region, and the shear force concentration factor were relatively unaffected by the wall motion. In addition, changing the amplitude of the wall motion or imposing differential rather than uniform deformations did not have a considerable effect on these variables either. Although WSS is thought to play an important role on the mechanobiology of the arterial wall, further investigation is required to determine which haemodynamic variables are clinically relevant and the effect of wall motion on them. Finally, it is widely believed that aneurysmal wall is always compliant to some extent, however,
CFD Analysis Incorporating the Influence of Wall Motion
445
the range of variability of wall motion among patients is still unknown. Studies with larger numbers of aneurysms are required to determine how common measurable wall motions are in the population at large and if differences in compliance can be related to clinical outcomes such as aneurysmal growth and rupture.
Acknowledgments This research was partially supported by the grant @neurIST (IST-2004-027703). The authors LDM, EO and AF are supported by the Spanish Ministry of Science and Technology through a Juan de la Cierva Research Fellowship (LDM), FPU grant (EO), and a Ramon y Cajal Research Fellowship (AF).
References 1. D.O. Wiebers, J.P. Whisnant, J. Huston 3rd, I. Meissner, R.D. Brown Jr, D.G. Piepgras, G.S. Forbes, K. Thielen, D. Nichols, W.M. O'Fallon, J. Peacock, L. Jaeger, N.F. Kassell, G.L. Kongable-Beckman, J.C. Torner, “International Study of Unruptured Intracranial Aneurysms Investigators. Unruptured Intracranial Aneurysms: Natural History, Clinical Outcome, and Risks of Surgical and Endovascular Treatment.” Lancet, 362(9378): 103-110, 2003. 2. F.B. Meyer, J. Huston 3rd, S.S. Riederer, “Pulsatile increases in aneurysm size determined by cine phase-contrast MR angiography”, J. Neurosurg, 78(6): 879-883, 1993. 3. F. Ishida, H. Ogawa, T. Simizu, T. Kojima, and W. Taki, “Visualizing the Dynamics of Cerebral Aneurysms with Four-Dimensional Computed Tomographic Angiography,” Neurosurgery, 57: 460-471, 2005. 4. M. Hayakawa, K. Katada, H. Anno, S. Imizu, J. Hayashi, K. Irie, M. Negoro, Y. Kato, T. Kanno, and H. Sano, “CT Angiography with Electrocardiographically Gated Reconstruction for Visualizing Pulsation of Intracranial Aneurysms: Identification of Aneurysmal Protuberance Presumably Associated with Wall Thinning” Am. J. Neuroradiol.,26:13661369, 2005. 5. J.R. Cebral, M.A. Castro, S. Appanaboyina, C. Putman, D. Millan, A. F. Frangi, “Efficient Pipeline for Image-Based Patient-Specific Analysis of Cerebral Aneurysm Haemodynamics: Technique and Sensitivity” IEEE T. Med.Imaging, 24(4), pp. 457-467, 2005. 6. T. M. Liou and S. N. Liou, “A review of in vitro studies of hemodynamic characteristics in terminal and lateral aneurysm models,” Proc.Nat. Sci.Council. (B) 23(4): 133–148, 1999. 7. D. Rueckert, L. I. Sonoda, C. Hayes, D. L. G. Hill, M. O. Leach, and D. J. Hawkes, “Nonrigid registration using free-form deformations: Application to breast MR images. IEEE T. Med. Imaging, 18(8):712-721, 1999. 8. C. Studholme, D. L. G. Hill, and D. J. Hawkes, “An overlap invariant entropy measure of 3D medical image alignment,” Pattern Recogn, 32(1): 71–86, 1999. 9. R. Löhner, “Automatic unstructured grid generators”, Finite Elem. Anal. Des., 25: 111134, 1997. 10. R. Löhner, C. Yang, “Improved ALE Mesh Velocities for Moving Bodies”, Comm. Numer. Meth. En. 12: 599-608, 1996. 11. M. Shojima, M. Oshima, K. Takagi, R. Torii, M. Hayakawa, K. Katada, A. Morita and T. Kirino, “Magnitude and role of wall shear stress on cerebral aneurysm. Computational fluid dynamic study of 20 middle cerebral artery aneurysms”, Stroke, 35: 2500-2505, 2004.
A New CAD System for the Evaluation of Kidney Diseases Using DCE-MRI Ayman El-Baz1 , Rachid Fahmi1 , Seniha Yuksel1 , Aly A. Farag1 , William Miller1 , Mohamed A. El-Ghar2 , and Tarek Eldiasty2 1
Computer Vision and Image Processing Laboratory, University of Louisville, Louisville, KY 40292 {elbaz, rachidf, esen, farag}@cvip.uofl.edu http://www.cvip.louisville.edu. 2 Urology and Nephrology Department, University of Mansoura, Mansoura, Egypt
Abstract. Acute rejection is the most common reason of graft failure after kidney transplantation, and early detection is crucial to survive the transplanted kidney function. In this paper, we introduce a new approach for the automatic classification of normal and acute rejection transplants from Dynamic Contrast Enhanced Magnetic Resonance Imaging (DCEMRI). The proposed algorithm consists of three main steps; the first step isolates the kidney from the surrounding anatomical structures by evolving a deformable model based on two density functions; the first function describes the distribution of the gray level inside and outside the kidney region and the second function describes the prior shape of the kidney. In the second step, a new nonrigid registration approach is employed to account for the motion of the kidney due to patient breathing. To validate our registration approach, we use a simulation of deformations based on biomechanical modelling of the kidney tissue using the finite element method (F.E.M.). Finally, the perfusion curves that show the transportation of the contrast agent into the tissue are obtained from the cortex and used in the classification of normal and acute rejection transplants. Applications of the proposed approach yield promising results that would, in the near future, replace the use of current technologies such as nuclear imaging and ultrasonography, which are not specific enough to determine the type of kidney dysfunction.
1
Introduction
In the United States, approximately 12000 renal transplants are performed annually [1], and considering the limited supply of donor organs, every effort is made to salvage the transplanted kidney [2]. Currently, the diagnosis of rejection is done via biopsy which has the downside effect of subjecting the patients to risks such as bleeding and infections. Moreover, the relatively small needle biopsies may lead to over or underestimation of the extent of inflammation in the entire graft [3]. Therefore, a noninvasive and repeatable technique is not only helpful but also needed in the diagnosis of acute renal rejection. In DCE-MRI, a R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 446–453, 2006. c Springer-Verlag Berlin Heidelberg 2006
A New CAD System for the Evaluation of Kidney Diseases Using DCE-MRI
447
contrast agent called Gd-DTPA is injected into the bloodstream, and as it perfuses into the organ, the kidneys are imaged rapidly and repeatedly. During the perfusion, Gd-DTPA causes a change in the relaxation times of the tissue and creates a contrast change in the images. As a result, the patterns of the contrast change gives functional information, while MRI provides good anatomical information which helps in distinguishing the diseases that affect different parts of the kidneys. However, even with an imaging technique like DCE-MRI, there are several problems such as, (i) the spatial resolution of the dynamic MR images is low due to fast scanning, (ii) the images suffer from the motion induced by the patient breathing which necessitates advanced registration techniques, (iii) the intensity of the kidney changes non-uniformly as the contrast agent perfuses into the cortex which complicates the segmentation procedures. To the best of our knowledge, there has been limited work on the dynamic MRI to overcome the problems of registration and segmentation. For the registration problem, Gerig et al. [4] proposed, using Hough transform, to register the edges in an image to the edges of a mask and Giele et al. [5] introduced a phase difference movement detection method to correct for kidney displacements. Both of these studies required building a mask manually by drawing the kidney contour on a 2D DCE-MRI image, followed by the registration of the time frames to this mask. Most of these efforts used healthy transplants in the image analysis, and edge detection algorithms were sufficient. However; in the case of acute rejection patients, the uptake of the contrast agent is decreased, so edge detection fails in giving connected contours. For this reason, we consider the combined usage of gray level and prior shape information to give better results.
2
Methods
In this paper we introduce a novel and automated technique (i) to segment the kidney and (ii) to correct for the motion artifacts caused by breathing and patient motion. The details of the techniques are given below. 2.1
Segmentation
The first step of the proposed approach is to extract the kidney tissues from DCE-MRI’s as shown in Fig. 1. Segmentation algorithms are based on using the deformable model guided by a stochastic force which represents the intensity and shape prior of the kidney. Details of the algorithm are presented in [6]. 2.2
Model for the Local Deformation
In DCE-MRI sequences, the registration problem arises because of the patient and breathing movements. To solve this problem, we propose a new approach to handle the kidney motion. The proposed approach is based on deforming the segmented kidney over evolving closed equispaced contours (i.e. iso–contours) to closely match the prototype. The evolution of the iso-contours is guided by
448
A. El-Baz et al.
error = 0.3820%
error = 0.3997%
radiologist segmentation
Fig. 1. Segmentation results using the approach proposed in [6] with the errors w.r.t the radiologist segmentation
an exponential speed function in the directions minimizing distances between corresponding pixel pairs on the iso-contours of the objects to be registered. The normalized cross-correlation is used as an image similarity measure which is insensitive to intensity changes (e.g. due to tissue motion in medical imagery and the contrast agent). Unlike free–form deformation approaches based on B-spline, our technique is less expensive computationally. The first step of the proposed registration approach is to use the fast marching level set method [7] to generate the distance map inside the kidney regions as shown in Fig. 2(a)–(b). The second step is to use this distance map to generate equal space separated contours (iso-contours) as shown in Fig. 2(c)–(d). Note that the number of iso-contours depends on the accuracy and the speed required by the user. The third step of the proposed approach is to use normalized cross correlation to find the correspondence between the iso-contours. Since we start with aligned images, we limit our searching space to a small window (e.g. 10×10) to improve the speed of the proposed approach. The final step is the evolution of the iso-contours; here, our goal is to deform the iso-contours in the first image (target image) to match the iso-contours in the second image (source image). Before discussing the details of the evolution algorithm, let’s define the following: – φA niso (., ν) are the iso-contours in the target image (A), where niso =1, . . . , Niso is the index of the iso-contours, and ν the iteration step, – φB miso (.) are the iso-contours in the source image (B), where miso=1, . . . , Miso is the index of iso-contours, – S(h, γh ) denotes the Euclidean distance between a iso-contour point h on image A and its corresponding iso-contour point γh on image B. Note that γh is searched for within a local window centered at h s position in image B. Note also that γh may be the same for different h s, A – SnAiso ,niso −1 (h) is the Euclidian distance between φA niso (h, ν) and φniso −1 (h, ν) at each iteration ν, – V(.) is the propagation speed function. The most important step in the model propagation is the selection of the propagation speed function V(.). This selection must satisfy the following conditions:
A New CAD System for the Evaluation of Kidney Diseases Using DCE-MRI
449
1. V(h) = 0 if S(h, γh ) = 0, 2. V(h) ≤ min(S(h, γh ), SnAiso ,niso −1 (h), SnAiso ,niso +1 (h)) if S(h, γh ) > 0; is the smoothness constraint, which prevents the current point from cross passing the closest neighbor contour as shown in Fig. 2(e). The following speed function satisfies the above conditions: V(h) = eβ(h).S(h,γh ) − 1, where β(h) is the propagation constant with the upper bound
ln
(1)
min(S(h, γh ), SnAiso ,niso −1 (h), SnAiso ,niso +1 (h)) + 1
β(h)
.
S(h, γh )
Based on this speed function, we can deform the iso-contours using the following equation as shown in Fig. 2(f): φA (h, ν + 1) =
V(h) B S(h,γh ) φmiso (γh )
+
S(h,γh )−V(h) A φniso (h, ν) S(h,γh )
(2)
where h = 1, . . . , H denotes a point on the iso-contours on image A, and γh its corresponding point on image B. To show the quality of the proposed approach, we fused the two kidney images by a checkerboard visualization in Fig. 3. It is clear from Fig. 3(b) that the connectivity between the two images at the edges and inside the kidney region are improved after applying the proposed deformation model.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 2. The distance map of two kidneys (a, b) and the iso–contours (c, d), a model constrains (e), and a model evolution (f)
450
A. El-Baz et al.
(a)
(b)
Fig. 3. Checkerboard image to show the quality of the approach, before non-rigid registration (a), and after non-rigid registration (b)
2.3
Validation of Our Registration Approach Using the F.E.M.
In this section we propose to validate our deformable registration approach using the finite element (F.E.) method. Given a 2D image of the kidney, we simulate a deformation using a biomechanical modelling of the kidney tissue. The pair of images (deformed and non–deformed ones) is used to test our algorithm. The Abaqus/CAE (Ver. 6.5) 1 environment was used to generate a cubic spline fit
(a)
(e)
(b)
(f)
(c)
(d)
(g)
Fig. 4. Validation of non-rigid registration. F.E mesh (a) before and (b) after deformation. (c) Meshes (a) & (b) overlayed. (d)&(e) Sample of the selected correspondence pairs determined from F. E meshes before & after deformation. (f) Results of the proposed non-rigid registration approach. (g) The displacement field. 1
www.abaqus.com
A New CAD System for the Evaluation of Kidney Diseases Using DCE-MRI
451
to the points representing the outer contour of the kidney object and then a 2D F.E. model was built from it. Figure 4(a), (b) and (c) show the 2D mesh before and after deformation, and the overlay of these two meshes, respectively. For the sake of generating a deformed shape only, we assumed the kidney tissue to be isotropic and homogeneous elastic material with a Young Modulus E = 2500Pa and a Poisson Ratio ν = 0.4. Note that this model does not reflect the results of any rheological experiments conducted on the kidney tissue. A uniformly distributed pressure P = 100N was applied normal to the boundary of the kidney. The points on this boundary are allowed to move freely in the x and y directions, but are constrained to rotate around the z–axis. The mesh consists of 1253 3-node linear plane stress angular elements. The average displacement of the induced deformation is 4.75mm, the minimum is 1.19mm, and the maximum is 6.9mm. The accuracy of the registration method is assessed by registering the simulated deformed image to the original one and comparing the recovered point displacements with the bio-mechanically simulated ones. The average registration error is about 1.54mm, with a maximum of 2.7458mm, a minimum of 0.0221mm, and a standard deviation of 0.5517. This proves the accuracy of our non-rigid registration technique used in this work. 2.4
Cortex Segmentation
Strake et.al. [8] had shown that the most important signal characteristics come from the cortex of the kidney in the acute rejection problem. Therefore, the final step of our approach is to segment the cortex from the segmented kidney. To achieve this task, we use the same segmentation approach shown in [6] but based only on the intensity. At this step, since all the kidneys are aligned together, we select seed points from the medulla regions and evolve the deformable model based only on intensity. After we extract the medullary regions, the rest is cortex, which is used as a mask and propagated over the whole sequence to plot the average cortex intensity. In Fig. 5, we show the cortex segmentation results. In Fig. 5(a), we manually initialize several deformable models inside medulla regions, and we let the deformable model to evolve in these regions with gray
(a)
(b)
(c)
(d)
(e)
Fig. 5. The segmentation of the cortex from the kidney images. Several medullary seeds are initialized (a), and the deformable model grows from these seed point (b). After the medulla is extracted from the kidney, the cortex is propagated over the whole sequence of images as shown in (c)–(e).
452
A. El-Baz et al.
Fig. 6. Normalized cortex signals from 4 subjects w.r.t. scan number. Subjects 1 and 2 are acute rejection, subject 3 is normal and subject 4 is chronic glomerulopathy proved by biopsy.
level information as shown in Fig. 5(b). The cortex mask is applied to the rest of the sequence as shown in Fig 5(c)-(e).
3
Results and Conclusion
The ultimate goal of the proposed algorithms is to successfully construct a renogram (mean intensity signal curves) from the DCE-MRI sequences, showing the behavior of the kidney as the contrast agent perfuses into the transplant. In acute rejection patients, the DCE-MRI images show a delayed perfusion pattern and a reduced cortical enhancement. We tested the above algorithms on thirty patients, four of which are shown in Figure 6. The normal patient shows the expected abrupt increase to the higher signal intensities and the valley with a small slope. The acute rejection patients show a delay in reaching their peak signal intensities. From these observations, we have been able to conclude that the relative peak signal intensity, time to peak signal intensity, the slope between the peak and the first minimum, and the slope between the peak and the signal measured from the last image in the sequence are the major four features in the renograms of the segmented kidney for classification. To distinguish between normal and acute rejection, we use Bayesian supervised classifier learning statistical characteristics from a training set for the normal and acute rejection. The density estimation required in the Bayes classifier is performed for each feature by using a linear combination of Gaussians (LCG) with positive and negative components. The parameters of the LCG components are estimated using a modified EM algorithm [9]. In our approach, we used 50% of the data for the training and the other 50% for testing. For testing data, the Bayes classifier succeeds to classify 13 out of 15 correctly (86.67%). For the training data, the Bayes classifier classifies all of them correctly, so the overall accuracy of the proposed approach is 93.3%.
A New CAD System for the Evaluation of Kidney Diseases Using DCE-MRI
453
In this paper, we presented a framework for the detection of acute renal rejection from DCE-MRI which includes segmentation of the kidneys from the abdomen images, non-rigid registration and Bayes classification. Our future work will include testing on more patients; the results of the proposed framework are promising and might replace the current nuclear imaging tests or the invasive biopsy techniques.
References 1. U.S. Department of Health and Human Services. Annual report of the U.S. scientific registry of transplant recipients and the organ procurement and transplantation network: transplant data: 1990-1999. Bureau of Health Resources Department, Richmond, VA; 2000. 2. M. Neimatallah, Q. Dong, S. Schoenberg, K. Cho, and M. Prince, “Magnetic resonance imaging in renal transplantation,” Journal of Magnetic Resonance Imaging, vol. 10(3), pp. 357-368, Sep 1999. 3. D. Yang, Q. Ye, M. Williams, Y. Sun, T. C. C. Hu, D. S. Williams, J. M. F. Moura, and C. Ho., “USPIO-Enhanced Dynamic MRI: Evaluation of Normal and Transplanted Rat Kidneys,” Magnetic Resonance in Medicine, vol. 46, 1152-1163, 2001. 4. G. Gerig, R. Kikinis, W. Kuoni, G.K. van Schulthess, and O. Kubler, “Semiautomated ROI analysis in dynamic MRI studies: Part I: image analysis tools for automatic correction of organ displacements,” IEEE Transactions Image Processing, vol. 11:(2),pp. 221-232, 1992. 5. E. Giele, “Computer methods for semi-automatic MR renogram determination,” Ph.D. dissertation, Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, 2002. 6. A. El-Baz, S. E. Yuksel, A. A. Farag, H. Shi, T. El-Diasty, and M. Ghoneim, “2D and 3D Shape Based Segmentation Using Deformable Model,” Proc. of International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI-2005, Palm Springs, California, USA, October 26-29, 2005. 7. J. A. Sethian, “Fast marching level set method for monotonically advancing fronts,” Proceedings of the National Academy of Sciences, USA, vol. 93, pp. 1591-1595, Feb. 1996. 8. L. Strake, L. Kool, L. Paul. A. Tegzess, J. Weening, J. Hermans, J. Doornbos, R. Bluemm, and J. Bloem, “Magnetic resonance imaging of renal transplants: its value in the differentiation of acute rejection and cyclosporin A nephrotoxicity,” Journal of Clinical Radiology, vol. 39(3), pp. 220–228, May, 1988. 9. A. A. Farag, A. El-Baz, and G. Gimel’farb, “Precise Segmentation of Multi-modal Images,” IEEE Transactions on Image Processing, vol. 15, no. 4, pp. 952-968, April 2006.
Generation and Application of a Probabilistic Breast Cancer Atlas Daniel B. Russakoff and Akira Hasegawa Fujifilm Software (California), Inc., San Jose, California
[email protected]
Abstract. Computer-aided detection (CAD) has become increasingly common in recent years as a tool in catching breast cancer in its early, more treatable stages. More and more breast centers are using CAD as studies continue to demonstrate its effectiveness. As the technology behind CAD improves, so do its results and its impact on society. In trying to improve the sensitivity and specificity of CAD algorithms, a good deal of work has been done on feature extraction, the generation of mathematical representations of mammographic features which can help distinguish true cancerous lesions from false ones. One feature that is not currently seen in the literature that physicians rely on in making their decisions is location within the breast. This is a difficult feature to calculate as it requires a good deal of prior knowledge as well as some way of accounting for the tremendous variability present in breast shapes. In this paper, we present a method for the generation and implementation of a probabilistic breast cancer atlas. We then validate this method on data from the Digital Database for Screening Mammography (DDSM).
1
Introduction
Computer-aided detection (CAD) of breast cancer in mammography has been shown to be a very effective tool in the fight against breast cancer. One recent study of a regional mammography screening program with and without CAD showed an increase in the cancer detection rate, a decrease in the mean age at screening detection, and a much earlier detection of invasive cancers using CAD [1]. These results corroborate the findings of a number of other, earlier studies that indicate the usefulness of CAD [2,3]. The basic algorithms for CAD systems in mammography are well documented [4] and well understood. The pipeline consists of three steps: 1. Detection of potential lesion candidates. 2. Feature extraction from this pool of candidates. 3. Classification of these candidates based on the extracted features. While steps 1 and 3 are important to be sure, step 2 is the most vital as it, more than the others, is where the algorithm mimics the experience of the radiologist. In general, the features extracted in step 2 are all characteristics of cancerous lesions that radiologists themselves have described as being useful for detection R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 454–461, 2006. c Springer-Verlag Berlin Heidelberg 2006
Generation and Application of a Probabilistic Breast Cancer Atlas
455
of cancer [5]. Among the features typically calculated in feature extraction are those local to a neighborhood around the candidate lesion including: degree of spiculation, shape of the border, strength of the margin, pixel density, contrast with background, and texture [4]. Also calculated are global features such as: left-right symmetry, temporal stability, and multiple view correlation [4]. None of these features, however, takes into account one of the most important cues a radiologist uses, namely location information and its influence on cancer probabilities. There is a great deal of well-documented data indicating that certain areas of the breast are much more likely to have cancer [6,7] than others. One oft-cited study is due to Hanley [8] and suggests that over 50% of cancer can be found in the upper half of the breast. CAD algorithms hoping to mimic experienced radiologists must take this information into account. To do so, we propose to generate a set of probabilistic breast cancer atlases. There exist, however, many difficulties that must be overcome before we can effectively generate this information and add it into an algorithm. The rest of this paper details a method for handling these issues on the way towards a functioning probabilistic atlas. Section 2 will detail our use of shape modelling to generate a common reference space in which to create our atlases. Section 3 will further describe our shape parameterization together with our atlas generation procedure. And, finally, Section 4 will detail how we use the atlas in a CAD algorithm and present our results on the Digital Database for Screening Mammography (DDSM) [9].
2
Shape Analysis
One of biggest challenges in defining a breast cancer atlas is the large amount of variability associated with the shapes and sizes of breasts. To overcome this, we must find a way to equate, normalize, and account for this variation. Here we make our first simplifying assumption; namely that we are only interested in accounting for the variations in the silhouettes of the breasts as opposed to their appearances. Here we also assume that we have a reliable algorithm to segment the breast area from the background in a standard mammogram. Though a nontrivial problem in and of itself, there are a number of effective algorithms in the literature that can be used for this purpose (e.g. [10]). Since a breast silhouette is a closed contour, we can parameterize its shape as a chain code together with a center of mass as shown in Figure 3b. Now, in order to generate an atlas in some common reference frame, what we need is some method that can quickly bring two disparate breast shapes into alignment. Here, we take advantage of the popular shape modelling framework of Cootes, et al. [11] to characterize the space of all breast silhouettes. In the typical shape modelling approach, corresponding landmarks are selected from a large set of training images. Unfortunately, in our case, there are no reliable landmarks on a breast silhouette. Recent techniques in shape modelling, however, allow us to proceed despite this setback. Several groups have begun using non-rigid registration to generate shape models using a pre-labelled
456
D.B. Russakoff and A. Hasegawa
Fig. 1. Basic flow of the use of non-rigid registration for automatic generation of landmark points for shape modelling
Fig. 2. Mean breast silhouette shape for the MLO view plus the first three principal modes of variation
template [12,13]. The basic idea is depicted in Figure 1 which shows how one canonical breast silhouette of the mediolateral oblique (MLO) view can be seeded with pseudo-landmarks which can then be transferred to any other training image via non-rigid registration. The process starts with a typical breast shape upon which some landmarks are placed (column 3). A training set of breast silhouettes (column 1) are then aligned with the canonical shape using nonrigid registration yielding a series of non-rigid transformations (column 2). We can then project the landmarks from the canonical shape back onto the original shape by reversing the corresponding non-rigid transformation (column 4). Given these pseudo-landmark correspondences, we can use the standard principal component analysis paradigm to generate the mean shape and the principal modes of variation as shown in Figure 2. This figure was generated using a shape parameterization: S = {(s11 , s12 ), (s11 , s12 ), . . . , (sn1 , sn2 )} consisting of n = 50
Generation and Application of a Probabilistic Breast Cancer Atlas
457
2D points around the breast contour. A training set of 4971 breast silhouettes was used to automatically generate the shape model. Here we see that the first mode corresponds roughly to changes in breast size, the second to length, and the third to general shape. Together, these three modes account for almost 80% of the variance in the training set. To account for 99% of the variance in the training set, we only need 17 total modes. We also generated a similar model of the cranio-caudal (CC) view of the breast, trained with 5592 images. Owing to the small range of variability differing CC views, we only need 6 modes to capture 99.9% of the training set’s variance for the CC view.
3
Atlas Generation
In order to generate a consistent atlas, we must start with a database of mammograms which have been previously marked with instances of cancer. For every mammogram with known cancer, we need to be able to quickly warp it into the atlas shape and record the cancer location in atlas space. To do this, we need to define some transformation function T that can bring different breast shapes into correspondence with the atlas shape. Here, we must not only align the breast silhouettes, we must also be able to match any pixel within the new breast silhouette with a reasonable corresponding pixel in the atlas. To do this we use a two step approach: 1. We use our shape model to bring the new silhouette into alignment with the atlas. 2. We use a very simple triangular mesh together with bilinear interpolation to perform the pixel correspondence. We will now discuss these two steps in more detail. For simplicity, for a given view (say, MLO), we choose the atlas shape to be μMLO , our mean breast shape. Any new breast silhouette can be matched to the mean shape using our shape model which restricts the search space of possible shapes to only those that are probable. Put another way, given k modes of variation Φ = (φ1 , φ2 , . . . , φk ), we can generate a new MLO breast shape, SMLO using the equation: k SMLO = p + μMLO + αi φi i=1
where p is simply an offset to the μMLO to account for a rigid translation of the entire shape. The free parameters in this expression, p = (px , py ) and α = α1 , α2 , . . . , αk , define the deviations of this shape from the mean along the axes associated with the principal modes. It stands to reason then, that we can perform an optimization over these k + 2 free parameters to find S∗MLO , the optimal matching breast silhouette for any given input. All we need now is an optimization technique and a cost function. For optimization, we used the simple and popular Nelder-Mead downhill simplex method [14]. Our cost function is slightly more complex. Illustrated in Figure 3a, it
458
D.B. Russakoff and A. Hasegawa di C COM
pi
(a)
(b)
Fig. 3. (a) Illustration of the cost function for our shape fitting optimization method. (b) Illustration of our point correspondence procedure.
can be described in words as follows. Given a new breast silhouette to fit, we can easily define a chain code (C) around its perimeter. Now, assume we are given a test shape during the optimization, StMLO={(st11 , st12 ), (st21 , st22 ), . . . , (stn1 , stn2 )}, for each point StMLO (i) = (sti1 , sti2 ), we define di as the distance between StMLO (i) and the intersection between C and the line defined by StMLO (i) and nCOM , the center of mass of the points. The full cost function, therefore, is: i=1 di . During the optimization, we add one more constraint. As part of the machinery of the PCA, we are given not only the principal modes of variation, Φ, but also their standard deviations, Σ. We constrain the search for the α s such that |αi | < (3)σi . This allows us to restrict the search space more stringently to look only at probable breast shapes. We now need an apparatus to bring individual pixels inside one breast shape into correspondence with another. To do this, we turn each breast into a very simple triangular mesh involving the center of mass and each of the shape points, as shown in Figure 3b. Once we have matched two different breast contours, all pixels within one contour can be mapped into the other, as shown in the following example. The point pi in Figure 3a can be mapped to its corresponding spot in Figure 3b by first identifying the mesh triangle containing pi and then finding its equivalent in the next breast shape. Next, point pi is mapped into its proper place in the new mesh triangle using trilinear interpolation of the vertices of the corresponding triangles. Now, given a set of mammograms which have had the instances of cancer previously marked by radiologists, we can generate a cancer atlas in the following way. First, we define the atlas shape for a specific view (e.g. MLO or CC) to be the corresponding shape model’s mean value, μMLO . Next, for each mammogram Mi , we perform these steps: 1. Generate Mi ’s corresponding silhouette image and associated closed contour. 2. Run the shape fitting optimization to find the p∗ and α∗ which correspond to S∗MLO , the optimal fitting shape. 3. Map all pixels corresponding to cancer from Mi into the new atlas space. 4. For each cancer pixel mapped into atlas space, increment a counter corresponding to that location. 5. The final, normalized output of the counters corresponds to our probabilistic atlas.
Generation and Application of a Probabilistic Breast Cancer Atlas
(a)
459
(b)
Fig. 4. (a) Atlas MLO. (b) Atlas CC.
In generating an atlas for both the CC and MLO views, we used 784 digitized and annotated film mammograms from the Digital Database for Screening Mammography (DDSM) [9] as well as 420 digital mammograms from our own database. From the DDSM, we used images from volumes cancer 01, cancer 02, and cancer 05-cancer 15. All cancers used in the atlas generation were selected by radiologists and biopsy proven. In building this atlas, we focused only on cancerous masses, ignoring any mammograms deemed malignant solely because of microcalcifications. Figure 4 shows the results of atlas generation for both the MLO and CC views. In each of these images, the mean breast shape is superimposed in green and red corresponds to a very high likelihood of cancer, followed by orange, yellow, green, and blue, which indicates a region very unlikely to contain cancer. The first thing to notice about these images is that there do, indeed, seem to be areas of the breast more susceptible to cancer than others. This information will be quite useful when added to a CAD algorithm as we will see in the next section.
4
Results
Applying the knowledge encoded in our atlas to a CAD algorithm is relatively straightforward as we can simply add it as a feature to be used in the final classification. More specifically, given a breast lesion candidate detected by Step 1 in the CAD framework discussed in Section 1, we can define the a priori probability that it is cancer using its corresponding view’s atlas. Once the breast silhouette containing this new candidate has been aligned with the atlas, the probability can simply be read off from the atlas by projecting the candidate’s location into atlas space. This a priori cancer probability is a powerful feature when combined with the other intensity and morphology-based features. It can be used either before, after, or during the classification process. To test this algorithm, we implemented a CAD scheme very similar to the one in [5], a well-known paper in the CAD field. In this work we are looking only for cancerous masses and not microcalcifications so the candidate detection scheme is based on a combination of pixel intensities and degree of spiculation. Once we
460
D.B. Russakoff and A. Hasegawa
have the candidates, we segment out the mass regions from the background using a contour-based optimization technique similar to the one in [5]. For the feature generation step of the CAD pipeline, the authors from [5] calculate seven intensitybased features and two density-based features. They then calculate how effective each feature is at classifying candidate masses into benign vs. malignant by combining each feature in turn with a measure of malignancy and a location-based feature, namely the distance of the candidate to the skin line. In this framework, distance to the skin line is the only location-based knowledge used in the classification. Our hypothesis is that while distance to skin line does add some useful knowledge, when physicians look at a mammogram, they use much more powerful location cues built up over years of experience. We are trying to encode this knowledge as a set of prior probabilities in our cancer atlas. To isolate the effects of location on the classification, we replace the distance to the skin line feature with a feature generated from our atlas. Our feature is the mean a priori probability inside of the mass contour based on our probabilistic atlases. We evaluated our feature using the same basic framework from [5]. We trained a support vector machine (SVM) classifier [15] using each of the nine intensity and density features together with the location-based distance to the skin line feature. We performed the same experiment substituting our probabilistic atlas feature for the location information. Given this two class problem where class 1 is normal and class 2 is cancer, we No. normal training samples set the SVM’s C1 parameter to be 1 and C2 to be . No. cancer training samples We used radial basis functions as the kernel with the γ parameter set to the software defaults [15]. We trained each SVM with 2430 digitized mammograms consisting of 1204 which contained a biopsy proven cancer (the same database we used to train the cancer atlas itself). We then tested each SVM in turn using a different database, drawn entirely from the DDSM. This time, we used images both from the DDSM cancer volumes as well as the normal volumes. Our testing database consisted of 1732 digitized mammograms, 372 of which contained a biopsy proven cancer. Our testing measure was Az , or area under the ROC curve measuring classification performance. The results are shown below: con1 con2 con3 con4 con5 con6 con7 den1 den2 feature alone 78.3 53.9 78.6 79.0 75.9 55.3 82.4 78.5 76.4 feature + skin line dist. 79.4 57.6 79.2 78.0 76.4 59.6 82.6 78.8 76.9 feature + atlas prob. 82.4 66.2 80.4 80.4 78.1 67.4 84.5 80.1 77.8 Here, the seven contrast features from [5] are labelled con1-con7, followed by the two density features. These results suggest that our hypothesis is, indeed, correct. In every case above, the classification score (Az ) improved when using the atlas-based feature, often by a large amount. In the future, we would like to take this idea further into other CAD fields such as lung cancer and colon cancer. We would also like to explore the improvements we can make by augmenting our shape models with some internal
Generation and Application of a Probabilistic Breast Cancer Atlas
461
anatomical landmarks such as the pectoral muscle. Another area to work on is our very simple internal pixel correspondence model. We would like to look into the improvements we can make by using a more accurate and more complex polygonalization of the inside of the breast. It is also worth noting that this shape correspondence apparatus can be used in a number of other applications such as left-right subtraction of two corresponding breast views and segmentation of the breast region from the background of the mammogram.
References 1. Cupples, T.E., Cunningham, J.E., Reynolds, J.C.: Impact of computer-aided detection in a regional screening mammography program. Am. J. Roentgenol. 185 (2005) 944–950 2. Karssemeijer, N., Otten, J.D.M., Verbeek, A.L.M., Groenewoud, J.H., de Koning, H.J., Hendriks, J.H.C.L., Holland, R.: Computer-aided detection versus independent double reading of masses on mammograms. Radiology 227 (2003) 192–200 3. Ikeda, D.M., Birdwell, R.L., OShaughnessy, K.F., Sickles, E.A., Brenner, R.J.: Computer-aided detection output on 172 subtle findings on normal mammograms previously obtained in women with breast cancer detected at follow-up screening mammography. Radiology 230 (2004) 811–819 4. Giger, M.L., Huo, Z., Kupinski, M.A., Vyborny, C.J.: Computer-aided diagnosis in mammography. In Fitzpatrick, J.M., Sonka, M., eds.: SPIE Handbook on Medical Imaging - Volume II: Medical Image Processing and Analysis. SPIE (2000) 915–1004 5. te Brake, G.M., Karssemeijer, N., Hendriks, J.H.C.L.: An automatic method to discriminate malignant masses from normal tissue in digital mammograms. Physics in Medicine and Biology 45 (2000) 2843–2857 6. Tab´ ar, L.: Teaching course in diagnostic breast imaging. Mammography Education, Inc., Cave Creek, AZ (2004) 7. Tab´ ar, L., Tot, T., Dean, P.B.: Breast Cancer- The Art and Science of Early Detection with Mammography. Thieme Medical Publishers (2005) 8. Hanley, R.S.: Carcinoma of the breast. Ann R. Coll. Surg. Engl. 57 (1975) 59–66 9. Heath, M., Bowyer, K.W., Kopans, D.: Current status of the digital database for screening mammography. In: Digital Mammography. Kluwer Academic Publishers (1998) 457–460 10. Wirth, A., Stapinski, M.: Automated segmentation of digitized mammograms. Academic Radiology 2 (1995) 1–9 11. Cootes, T.F., Cooper, D., Taylor, C.J., Graham, J.: Active shape models- - their training and application. Computer Vision and Image Understanding 61 (1995) 38–59 12. Rueckert, D., Frangi, A.F., Schnabel, J.A.: Automatic construction of 3d statistical deformation models of the brain using non-rigid registration. IEEE Transactions on Medical Imaging 22 (2003) 1014–1025 13. Heitz, G., Rohlfing, T., Jr., C.R.M.: Automatic generation of shape models using nonrigid registration with a single segmented template mesh. In: Proc. of the Vision, Modelling and Visualization Conf. (2004) 73–80 14. Press, W.H. In: Numerical Recipes in C. The Art of Scientific Computing. Cambridge University Press (1992) 657–706 http://www.cs.brown.edu/research/ vis/results/bibtex/Press-1992-MOD.bib (bibtex: Press-1992-MOD). 15. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. (2001) Software available at http://www.csie.ntu.edu.tw/~ cjlin/libsvm.
Hierarchical Part-Based Detection of 3D Flexible Tubes: Application to CT Colonoscopy Adrian Barbu1 , Luca Bogoni2, and Dorin Comaniciu1 1
Siemens Corporate Research, Princeton NJ 2 Siemens CAD, Malvern, PA
Abstract. In this paper, we present a learning-based method for the detection and segmentation of 3D free-form tubular structures, such as the rectal tubes in CT colonoscopy. This method can be used to reduce the false alarms introduced by rectal tubes in current polyp detection algorithms. The method is hierarchical, detecting parts of the tube in increasing order of complexity, from tube cross sections and tube segments to the whole flexible tube. To increase the speed of the algorithm, candidate parts are generated using a voting strategy. The detected tube segments are combined into a flexible tube using a dynamic programming algorithm. Testing the algorithm on 210 unseen datasets resulted in a tube detection rate of 94.7% and 0.12 false alarms per volume. The method can be easily retrained to detect and segment other tubular 3D structures.
1
Introduction
Prevention of colon cancer can be achieved by detecting and surgically removing the polyps from the colon wall. However, the colonoscopy procedure used for detecting the polyps is a time consuming procedure that produces great discomfort for the patient. Virtual colonoscopy is a increasingly popular alternative, in which the patient’s colon is inflated with air through a rectal tube and then one or two CT scans of the abdomen are performed. A polyp detection algorithm is run on the CT scans and the detection results are reported to the doctor for inspection. The current polyp detection algorithms exhibit a relatively large number of false alarms due to the rectal tube used to inflate the colon. These false alarms can be reduced by detecting and segmenting the rectal tube and discarding any potential positives that are very close to the rectal tube. A rectal tube detection algorithm should be fast and have very low false alarm rate, since false alarms can decrease the detection rate of the overall polyp detection system. In this paper we present a supervised learning approach to rectal tube detection, where the detector is trained from manually annotated data to obtain greater robustness. As one can see in Figure 1, the rectal tubes have a great degree of variability in length, shape and appearance. A previous method for RT detection [2] handled the appearance by template matching, which is a relatively rigid method for detection, and the shape variability by tracking 2D slices. The tracking assumes that the tube is relatively R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 462–470, 2006. c Springer-Verlag Berlin Heidelberg 2006
Hierarchical Part-Based Detection of 3D Flexible Tubes
463
perpendicular to one of the axes, which is often not true, as shown in Figure 1. The method [2] only handled two types of rectal tubes and was validated on a relatively small number of cases (80 datasets). It involved a large amount of potentially time consuming morphological operations (region growing).
Fig. 1. The rectal tubes are flexible and have variable shape and appearance
The other method [4] for reduction of false positives due to rectal tubes involved using a Massive Trained Artificial Neural Network (MTANN) to distinguish between polyps and rectal tubes, which raises questions about the degree of control of the generalization power of the system. In this paper we take a hierarchical, part based learning approach to rectal tube detection and segmentation, which provides a large degree of control against overfitting the data. Tube parts are detected, starting from simpler structures (tube cross-sections) with a smaller number of parameters, to more complex structures (short tube segments), using learned discriminative models. A top-down algorithm based on dynamic programming constructs the final tube segmentation from the detected sub-parts. A voting strategy is used to greatly reduce the search for location, direction and size of the cross-sections. The training algorithm is based on a Probabilistic Boosting Tree [5] that yields a robust detector with good generalization power. The features used are not based directly on intensity, but they are based on gradient and 3D curvature. All features are invariant to axial rotation, to avoid overfitting the data.
2
Description of Our Method
The input of our system is a 512x512xN, N ∼ 500, isometric CT volume and a set of query locations that are the output of a first stage of polyp detection.
464
A. Barbu, L. Bogoni, and D. Comaniciu
The rectal tube detection algorithm assigns a label ”RT”/”non-RT” to each query location, stating whether the query location is at distance of at most 5mm from a Rectal Tube or not. The query locations labeled as ”non-RT” are passed through a second stage of polyp detection using a more involved algorithm, and the remaining locations are reported to the physician as detected polyps. 2.1
Preprocessing
For each CT volume there are on average 50 query locations as input to our Rectal Tube detector. Of those, about 40 locations are automatically labeled as negative because they are outside of a predefined box of size 200x350x200 in the caudal region of the CT volume, where all rectal tubes of the training data have been found to reside. The remaining locations are clustered together by computing connected components in a graph. There is an edge between two locations if they are less than 35 voxels apart. For each cluster (connected component of the graph), the bounding box is computed and enlarged by 30 voxels on each side. The corresponding subvolume is cropped and the tube segmentation algorithm described below is used in the subvolume. This way, candidate locations that are clustered together will be processed at the same time. Any location that is closer than 5mm from the segmented tube is labeled as ”RT”, and the rest as ”non-RT”. The rest of the paper is concentrated on the detection and segmentation of the rectal tube in a cropped subvolume.
Fig. 2. In our hierarchical learning based method, more and more complex parts are gradually constructed, starting with 3D Circles (tube cross-sections), which are combined into short tubes, that are finally combined into a long flexible tube
2.2
Overview of Our Framework
Using a trained classifier to detect rectal tubes provides a convenient way to manage the generalization power and the false alarm rate of the system. The price to pay is the computational expense to search for all possible parameters that define the classifier. It is practically impossible to detect the entire rectal tube using a single classifier, because there are too many parameters to search,
Hierarchical Part-Based Detection of 3D Flexible Tubes
465
the tube being flexible, as shown in Figure 1. Instead, we use a part-based approach, starting with simpler and more rigid shapes (tube cross-sections) and connecting them into short tubes and then into a long free-form tube. To detect the tube cross-sections, or ”3D circles”, the algorithm should apply the trained detector at all possible locations X = (x, y, z), directions D = (dx , dy , dz ) and radii R, which is computationally prohibitive. Instead, we choose to restrict the application of the detector to promising locations, given by a voting strategy. Our rectal tube segmentation method proceeds as follows: 1. Candidate tube cross-sections with parameters C = (X, D, R) (location, direction and radius respectively) are found using a voting scheme, as described in subsection 2.3. 2. A trained 3D Circle detector, described in 2.4, is used to keep only the most promising cross-sections of the 3D tube. 3. Short tubes are detected from 3D circle pairs, as described in 2.5. First, for each detected 3D circle, the 10 best aligned 3D circles on each side are found. Then a trained short tube detector is used to prune away pairs that are not supported by the data. 4. A dynamic programming algorithm uses the short tubes and their adjacency graph to find the best tube segmentation. The following subsections describe in more detail each of the above steps. 2.3
Proposing 3D Circles by Voting
Inside the cropped subvolume (typically of size 60x60x60 ), the gradient at all locations is computed. At the places where the gradient is larger that 100, the 3D curvatures and the principal directions of the curvature are computed.
Fig. 3. The voting strategy gives possible locations for the axes of the detected tubes
The voting proceeds as follows. Each voxel x casts one vote at the location v(x) in the direction of the gradient gx at distance equal to the inverse of the 1 largest curvature k(x). That is, the vote is casted at location v(x) = x+ |ggxx| k(x) . For a tubular structure, all locations on a tube cross-section will vote the center of the cross-section. The votes for two input tubes are shown in Figure 3, white representing 5 votes. At locations y with at least 5 votes, the tube direction is
466
A. Barbu, L. Bogoni, and D. Comaniciu
computed as the median of the second principal directions at locations x that voted y, i.e. v(x) = y. In that direction, the most promising 3D Circles Cy (R) are obtained by computing the voting number Ny (R) = |{x ∈ Cy (R), 0.5 ≤ R ∗ k(x) ≤ 2}|/|Cy (R)|
(1)
for some discretization of Cy (R). For a perfect tube or radius R and y on its axis, all x ∈ Cy (R) would have curvature k(x) = 1/R and the voting number Ny (R) would be π. In reality, we keep the candidate circles Cy (R) having Ny (R) ≥ 1.3. 2.4
Training the 3D Circle Detector
The 3D circle detector is specialized in detecting cross-sections of the rectal tube. The parameters of a 3D Circle are (see Fig. 4): – the center location X = (x, y, z). – the direction D = (dx , dy , dz ), |D| = 1, normal to the plane of the 3D circle. – the radius R of the 3D circle.
Fig. 4. Left: The model and parameters of a 3D circle. Right: the features are computed on 12 circles (3 locations and 4 radii) relative to the 3D circle.
To avoid overfitting the data, all the features are invariant to rotation about the 3D Circle axis. The features are obtained as 8 types of axial invariant statistics: mean, variance, central symmetry mean, central symmetry variance, 25, 50 and 75 percentile and voting number. Each invariant statistic is computed on one of 4x3=12 circles, having one of 4 radii (R/3, R, 5/3R, 7/3R) and one of 3 locations along the circle direction (X and X ± 2D). Each of the 12 circles is discretized and subsampled and one of the 8 types of statistics is computed for one of 70 different combinations of gradient, curvature and principal directions (sum, difference, product, etc.). In total there are 8x4x3x70=6720 features. For training, the rectal tubes of 154 CT volumes are annotated using a generalized cylinder model. There are 3 different types of tubes in the training data. We have a semiautomatic algorithm based on dynamic programming to compute a tube annotation given two manually marked endpoints. The algorithm produces circle sections of the tube spaced 10 voxels apart, starting from one endpoint of the tube and ending in the other endpoint. The circle locations and radii are manually corrected to obtain the best alignment possible. The annotations of three volumes are shown in Figure 5.
Hierarchical Part-Based Detection of 3D Flexible Tubes
467
Fig. 5. Manual annotation of different types of rectal tubes from our dataset
For training of the 3D circle detector, 15000 positive examples are generated from the manual annotations by interpolation, excluding the region close to the tip of the tube where are lateral holes. From the candidate locations obtained by voting, 207000 samples that are at distance of at least 35 voxels from the tube annotations are chosen as negative examples. The training algorithm used in this paper is the Probabilistic Boosting Tree (PBT) [5] that learns a binary tree of strong classifiers, where each node is trained by Adaboost [3], starting from the root. At each node, after training, the positives and negatives are run through the detector of that node and the detected positives and false alarms are passed as positives and negatives for the right subtree, while the rejected positives and negatives are passed as training data for the left subtree. After training, the PBT can assign a probability to any new sample, representing the learned probability that the new sample is a positive example. A PBT with 6 levels is trained using 15 weak classifiers per node, with the first two levels enforced as a cascade. The detection rate on the training samples was 95.6% and the false alarm rate 1.7%. The 3D Circle detector usually misses the part of the tube that is not circular, where there are lateral holes in the tube. This will be corrected by the next level. 2.5
Detecting Short Tubes from the 3D Circles
The short tubes are the parts from which the dynamic programming algorithm described in the next subsection constructs the final segmentation. For good performance, there should be approximately the same number of short tubes starting at each of the detected circles. For that, the short tubes are detected in two steps. In the first step, 10 candidate tubes are found on each side of any detected 3D Circle. For each 3D circle C1 = (X1 , D1 , R1 ), the 10 neighbor circles C2 = (X2 , D2 , R2 ) with the smallest alignment cost A(C1 , C2 ) are found. The alignment cost depends on the relative position of the circles, and their radii (see also Fig. 6): A(C1 , C2 ) = α2 + β 2 + 0.1(d − 10)2 + 0.2(R1 − R2 )2 − 0.5(R1 + R2 )
(2)
468
A. Barbu, L. Bogoni, and D. Comaniciu
Fig. 6. Left: The parameters of a short tube. Right: for each pair of aligned 3D circles C1 = (X1 , D1 , R1 ), C2 = (X2 , D2 , R2 ) a short tube T = (X1 , X2 , R1 , R2 ) is constructed.
where α, β < π/2 are the angles between the axis X1 X2 and D1 respectively D2 . This way, are preferred circles C2 which are best aligned in direction, with similar and large radii and at distance close to 10 voxels. The parameters of a short tube are T = (X1 , R1 , X2 , R2 ). For each pair of aligned 3D circles C1 = (X1 , D1 , R1 ), C2 = (X2 , D2 , R2 ) found as above, a candidate short tube is constructed, with parameters T = (X1 , R1 , X2 , R2 ), using only the radii and positions of the 3D circles, as illustrated in Figure 6. The second step validates the constructed short tubes using the input data. For that, we train a short tube detector and keep only the short tubes whose probability is greater than a threshold (0.35 in our case). The short tube detector has the same features as the 3D circle detector, with the difference that the 12 circles on which the features statistics are computed have positions X1 , (X1 + X2 )/2 and X2 and radii R/3, R, 5/3R, 7/3R with R = R1 , (R1 + R2 )/2, R2 respectively. For training, 13700 positive samples 10 voxels long were created from the manual annotations. Also, 40000 negatives were obtained, at locations, directions, and radii obtained by voting, all of length 10 and of identical radii R1 = R2 . Another 9000 negatives were obtained from aligned pairs of 3D circles that are at least 35 voxels away from the manual annotations. A PBT is trained with 6 levels, 20 weak classifiers per node, and first two levels enforced as cascade. The detection rate on the training samples was 95.1% and the false alarm rate was 3.6%. We trained another ”pure” short tube detector, with the same positives but with 198000 negatives of length 10 and identical radii R1 = R2 . The detection rate on the training samples was 97.5% and the false alarm rate was 0.1%. 2.6
Tube Segmentation by Dynamic Programming
From the previous steps, we obtained a set of short tubes T = {T1 , ..., Tn } and a graph G = (T, E) whose nodes are the short tubes T . Two short tubes Ti and Tj are connected through a graph edge Eij ∈ E if they share one endpoint X and they have the same radius R at that endpoint, e.g. Ti = (A, R1 , X, R) and is not close Tj = (X, R, B, R2 ). All edges Eij for which the 3D angle αij = AXB to π, ( i.e. αij < 5π/6 or αij > 7π/6) are removed. The weight of the edge is a measure of good continuation of the tubes: Eij = |αij − π| tan |αij − π|
(3)
Hierarchical Part-Based Detection of 3D Flexible Tubes
469
There is also a unary cost for each short tube T = (X1 , R1 , X2 , R2 ): c(T ) = − ln(P (T )) + 0.2(R2 − R1 )2
(4)
where P (T ) is the probability given by the trained short tube classifier. k In our dynamic programming framework, we denote by Cij the cost of the best a chain of k short tubes starting with Ti and ending in Tj . Then we have the following recurrence formula: k+1 k Cij = min[Cis + Esj + c(Tj )] s
(5)
k For each k we find the chain of short tubes Sk corresponding to the smallest Cij . The chain Sk with the largest length (in voxels) is the segmentation result.
3
Results
To illustrate the segmentation quality of our method, we show in Figure 7 a few segmentation results obtained on unseen data (not used for training). Testing has been performed on 210 datasets not used in training, containing about 5 types of rectal tubes. To validate our system, we present testing results for the system components, and of the whole system, all using the voting step: – Using only the 3D circle detector, the recognition rate was 99.1%, a false alarm rate of 2.1 per volume and a running time of 3.3 seconds/volume. – Using only the short tube detector, with the same radius at both ends, the detection rate was 92.2%, the false alarm rate was 0.28/volume and the running time was 4.1 seconds/volume. – Using only the ”pure” short tube detector (described in section 2.5) the detection rate was 95.0%, the false alarm rate was 0.20/volume and the running time was 4.0 seconds/volume. – Using only the ”pure” short tube detector, but searching for radius differences of up to 3, the detection rate was 97.5%, the false alarm rate was 0.99/volume and the running time was 5.5 seconds/volume. – Finally, by using the whole system, the detection rate was 94.7%, the false alarm rate was 0.12/volume and the running time was 5.3 seconds/volume. Moreover, none of the 26 false alarms from the 210 datasets was a polyp.
Fig. 7. Some segmentation results on unseen data
470
4
A. Barbu, L. Bogoni, and D. Comaniciu
Discussion
In this paper, we presented a hierarchical, part-based method for segmenting 3D tubes, for example rectal tubes from CT data. The method is learning based, provides great generalization power, and is part-based for better computational efficiency.
References 1. Bernstein, E. and Amit, Y., Part-based statistical models for object classification and detection. CVPR 2005 2. Iordanescu G, Summers RM. Reduction of false positives on the rectal tube in computer-aided detection for CT colonography. Med Phys. 2004; 31(10):2855-62. 3. Schapire R.E. The boosting approach to machine learning: An overview. Nonlinear Estimation and Classification D. D. Denison, M. H. Hansen, C. Holmes, B. Mallick, B. Yu, editors, Springer, 2003 4. Suzuki K., Yoshida H., Nappi J. J., Armato S. G. and Dachman A. H. False-positive Reduction in Computer-aided Detection of Polyps in CT Colonography Based on Multiple Massive Training Artificial Neural Networks. RSNA 2005 5. Tu Z., Probabilistic Boosting-Tree: Learning Discriminative Models for Classification, Recognition, and Clustering, ICCV, 2005.
Detection of Protrusions in Curved Folded Surfaces Applied to Automated Polyp Detection in CT Colonography Cees van Wijk1 , Vincent F. van Ravesteijn1,2 , Frank M. Vos1,3 , Roel Truyen2 , Ayso H. de Vries3 , Jaap Stoker3 , and Lucas J. van Vliet1 1
3
Quantitative Imaging Group, Delft University of Technology, The Netherlands 2 Philips Medical Systems, Best, The Netherlands Department of Radiology, Academic Medical Center, Amsterdam, The Netherlands
Abstract. Over the past years many computer aided diagnosis (CAD) schemes have been presented for the detection of colonic polyps in CT Colonography. The vast majority of these methods (implicitly) model polyps as approximately spherical protrusions. Polyp shape and size varies greatly, however and is often far from spherical. We propose a shape and size invariant method to detect suspicious regions. The method works by locally deforming the colon surface until the second principal curvature is smaller than or equal to zero. The amount of deformation is a quantitative measure of the ’protrudeness’. The deformation field allows for the computation of various additional features to be used in supervised pattern recognition. It is shown how only a few features are needed to achieve 95% sensitivity at 10 false positives (FP) per dataset for polyps larger than 6 mm.
1
Introduction
CT colonography is a modern, noninvasive method to inspect the large bowel. It enables to screen for colorectal polyps by way of images rendered from an endoluminal perspective. Polyps are well-known precursors to colon cancer. Unfortunately, current colonographic visualization techniques are still rather time consuming. More important, large polyps are sometimes missed. Therefore, methods have been proposed to support the inspection by way of computer aided diagnosis (CAD). A large number of such schemes are proposed in the literature [1,2,10,8,4]. Like most CAD systems, automated polyp detection usually consists of three basic steps: (1) segmentation of the colon wall; (2) candidate generation and (3) supervised pattern recognition. To find candidate locations on the colon, Summers et al. [8,7] propose to use methods from differential geometry. In [7] a triangle mesh is extracted from 3D CT data after which principal curvatures were computed by fitting a 4th order b-spline to local neighborhoods with a 5 mm radius. Candidates were generated by selecting regions with a positive1 mean curvature. Yoshida et al. [11,4] use 1
Throughout this paper we assume that the surface normal is pointing into the colon.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 471–478, 2006. c Springer-Verlag Berlin Heidelberg 2006
472
C. van Wijk et al.
the shape index and curvedness to find candidate objects on the colon wall. Alternatively, Kiss et al. [3] generate candidates by searching for convex regions on the colon wall. Their method fits a sphere to the surface normal field. Simply selecting regions on the colon that protrude inwards yield too many candidates. Therefore, thresholds on mean curvature, principal curvatures, sphericity ratio and/or shape index are used as restrictive criteria. Unfortunately, their values are sensitive to e.g. the CT image noise level and the size of the local neighborhood used to compute them. Generally, the thresholds are set conservative in order not to harm the sensitivity. All of the above CAD schemes are based on the modelling of an approximately spherical polypoid shape, although, many polyps are often far from symmetric, let alone spherical. Therefore, the candidate generation step of these schemes is characterized by low specificity and much effort is needed to improve specificity while preserving a high sensitivity. The problems associated with modelling polyps as spherical protrusions are presented in figure 1. It shows the (κ1 , κ2 )space. The horizontal axis shows the first principal curvature, κ1 . The vertical axis shows the second principal curvature, κ2 . All convex points on the colon lie inside the region given by κ2 > 0 and κ1 ≥ κ2 . Points on perfectly spherical protrusions lie on the line κ1 = κ2 (SI = 1). On the other hand, perfect cylindrical folds lie on the line κ2 = 0 (SI = 0.75). Objects with a larger radius yield smaller principal curvature values and therefore show up closer to the origin. Additionally larger polyps tend to be more asymmetric and therefore more fold-like when described by the shape index. Both large and small polyps are found close to the borders defined by the thresholds on SI and CV as indicated in figure 1. As a consequence the criteria used to limit the number of candidate detections are at least as stringent for the larger polyps as for the smaller ones. Notice that this behaviour is in conflict with clinical decision making, which dictates that large polyps are more important than smaller ones, since the former have a larger probability to develop into colon cancer. 0.5 0.45 0.4 0.35
k2
0.3 0.25 0.2 0.15 0.1 0.05 0 0
0.1
0.2
0.3
0.4
0.5
k1
Fig. 1. Left: the four thick lines enclose a region in the (κ1 , κ2 )-space. The curved lines represent thresholds on the curvedness, while the straight lines enclose the region given by 0.9 < SI ≤ 1. The center image shows the κ1 and κ2 values as measured on candidate objects. (See results section) The light grey objects are false positives. The circles are TP that have a largest shape index of at least 0.9. The black squares represent TP as well but have a largest shape index smaller than 0.9. The right image shows a typical polyp shape.
Detection of Protrusions in Curved Folded Surfaces
473
In this paper, we introduce a method to estimate the physiological, background shape of the colon wall. Polyp candidates are detected as a local deviation from the background. Our method locally flattens/deforms the colon wall in order to ’remove’ the protrusions. The amount of displacement needed for this deformation is used as a measure of ’protrudness’. Regions where this measure has a high value are considered as candidate polyps.
2
Methods
A typical polypoid shape is shown in Figure 1(right). Suppose that the points on convex parts of the polyp (the polyp head) are iteratively moved inwards. In effect this will ’flatten’ the object. At a certain amount of deformation the surface flattening is such that the complete protrusion is removed. That is, the surface looks like as if the object was never there. This is the key concept on which the method is based. A more formal presentation follows from the description of the surface shape using the principal curvatures. Protrusions are defined as those regions on the surface where the second principal curvature is larger than zero (This implies of course that the first and largest principal curvature is larger than zero as well). The method then deforms the surface until the second principal curvature is smaller or equal to zero. Clearly, this will only affect structures that are curved in two directions like polyps and will not deform curved structures like folds. Folds typically bend in one direction only and have a first principal curvature larger than zero and a second principal curvature around zero. One might argue that this will lead to many more candidates than obtained by thresholding e.g. the shape index. This is inarguably true. We will show, however, that the proposed method does not require any thresholds other than κ2 > 0. Moreover, the deformation method described below leads to a quantitative measure of the polyp protrudness and therefore permits ordering of the generated candidate objects in a way intuitive to the radiologist. In effect this will lead to much less candidates. 2.1
Surface Evolution
The method employs surface evolution on triangle meshes [6]. The triangle mesh is generated by the marching cubes algorithm applied to the 3D CT data using a threshold of -750 Hu. A typical mesh size consists of around 106 vertices. In [6] a method was presented to rapidly remove rough features (noise) from irregularly triangulated data. It was based on the diffusion equation: ∂Xi 1 = λL(Xi ), with L(Xi ) = ( ∂t N1
Xj ) − Xi
(1)
j∈1ring
where L(Xi ) is a discrete (1-ring) estimate of the Laplacian at vertex i. X are the positions of the mesh points, N1 is the number of vertices in the 1-ring neighborhood of vertex Xi and λ is the diffusion coefficient. The solution at
474
C. van Wijk et al.
time t was found using a backward Euler method which translated the problem into a matrix-vector equation ¯ t+1 = X ¯t (I − λdtL) X
(2)
The matrix M = I−λdtL is sparse and its structure is given by the mesh one-ring ¯ is a vector containing all mesh points and I is the identity matrix. relations, X This system can be solved efficiently using the bi-conjugate gradient method. In [6] the diffusion was applied to all mesh points. A well known effect of prolonged diffusion on the complete mesh is global mesh shrinking and in [6] a solution was proposed by compensating for the reduction of the mesh volume. We, however, apply the diffusion only to a limited number of mesh points, namely the points where κ2 > 0. The majority of points have negative or zero second principal curvature and remain at their original position. They provide the boundary conditions for the other points. Therefore, in contrast to the method suggested in [6] global shrinking is not an issue and we can search for the steady state solution of the diffusion equation: ∂Xi = L(Xi ) = 0 ∂t
(3)
The discrete Laplacian estimates the new position of vertex Xi by a linear combination of its 1-ring neighbors, Xj . Rewriting equation 3 then yields a matrixvector equation: 1 ¯ =0 ( X j ) − Xi = M X (4) N1 j∈1ring i Fortunately, M is sparse and its structure is given by the 1-ring mesh relations. The number of non-zero elements on each row equals the number of 1-ring member vertices. Like the backward Euler formulation this equation can also be solved efficiently using the bi-conjugate gradient method. It is well known that the solution to the Laplace equation minimizes the membrane energy subject to the imposed boundary conditions. However, our objective is not to minimize the mean curvature, but to minimize the second principal curvature. Therefore, we extend the above equation by introducing a ’force’ term. The resulting equation is a Poisson equation: ¯ = F¯ (κ2 ) L(X)
(5)
This equation reads as follows: the new positions of the mesh points are found by initially moving each mesh vertex to a position as prescribed by the Laplacian operator. Subsequently, the term on the right hand side ’pushes back’ the point such that the resulting second principal curvature is zero. The force term F¯ is designed to depend on κ2 . The ’force field’ F¯ initially balances the displacement prescribed by the Laplacian and is updated after solving equation 5. In other words we solve (5) iteratively: A1ring ¯ F¯ t+1 = F¯ t − κt2 n ¯ with F¯ t=0 = L(X) 2π
(6)
Detection of Protrusions in Curved Folded Surfaces
475
where A1ring is the surface area of the 1-ring neighborhood and n ¯ is the vertex normal. The last term can be interpreted as a correction term. Note that if κ2 is positive F¯ should be relaxed. On the other hand, the magnitude of the reduction term additionally depends on the sampling density of the mesh. If the sampling is dense and A1−ring small the magnitude of the correction term should be small. Since κ2 equals the reciprocal of the radius of the surface tangent circle (R = κ12 ) in κ2 -direction, the term 2π is half of the area of the κ22 fitting sphere. Therefore, the displacement R needed to remove the curvature in second principal direction is normalized by the ratio of these two areas. The estimated displacement is given by: dest = R
A1ring A1ring = κ2 2π/κ22 2π
(7)
The resulting displacement of the mesh points yields a deformed mesh which is an estimate of how the colon wall looks like in the absence of protrusions. The amount of displacement of each mesh point (e.g. in millimeters) is a quantitative measure of the ’protrudness’. Candidate objects are generated by applying a threshold on the displacement field.
3
Results
Automatic polyp detection was executed in three steps: (1) segmentation of the colon wall via the marching cubes algorithm; (2) candidate generation using protrudness; (3) supervised pattern recognition involving a linear classifier and only a few features. 3.1
Experimental Data
Clinical data of 249 consecutive patients at increased risk for colorectal cancer were included in a previous study [9]. These patients underwent CT colonogaphy before colonoscopy, which served as the gold standard. All patients were scanned in both prone and supine position. The size of a polyp was identified during CT colonography as well as colonoscopy (see figure 3(left)). 13 patients were selected that contain 1/3 of the polyps larger than 5 mm from the complete study. This yielded 64 polyps larger than 5 mm. 34 of 64 polyps could be identified in both the prone and the supine CT scan and 30 were identified on either scan but not on the other. Consequently, there were 98 example objects in total. 42 polyps are smaller than 5 mm and are considered as clinically unimportant. 28 polyps are in the range [5,6]mm, 63 are in the range (6,10]mm and 32 are larger than 10mm. 3.2
Candidate Generation
Typical results are given in figure 2. The top row shows three renderings of the colon wall surface. In the left picture an isosurface volume rendering of a 7 mm large polyp is shown. The polyp is situated on a folded colon structure. The
476
C. van Wijk et al.
Fig. 2. Top row: example of candidate segmentation. Bottom row: Left: example of coherent segmentation of irregular objects.
middle picture shows the deformed mesh (visualized by a mesh rendering). The protrusion is ’removed’, demonstrating how the colon may have looked like in the absence of the polyp. The right image shows the original mesh with the segmentation obtained by thresholding the protrusion measure at a value of 0.1mm The bottom row shows an example of more coherent segmentations as obtained with the proposed method. The left picture shows an isosurface volume rendering. In the center of the picture a large (14mm) non-spherical polyp is situated between two folds. The middle picture shows the segmentation as obtained by hysteresis thresholding the shape index using values 0.8 and 0.9 and curvedness values in the range: 0.05
CAD Performance
The candidate generation step was used as an input to supervised pattern recognition. In figure 3 we show ROC curves based on a linear classifier applied to
Detection of Protrusions in Curved Folded Surfaces
10 0 0
10 20 polyp size [mm]
30
# of patients
Sensitivity (%)
# polyps
20
80
60
40 0
polyps ≥10mm polyps ≥ 6mm polyps ≥ 5mm
10 20 30 # FP per dataset
10
10
8
8 # of patients
100 30
6 4 2
40
0
477
6 4 2
0
2 4 6 8 10 # of FP before a polyp is found
0
0
2 4 6 8 10 # of FP before a polyp is found
Fig. 3. Left two: Polyp size distribution (left) and ROC curve for the supervised pattern recognition based on protrusion plus three additional features (right). Right two: Histogram of the number of false positives per patient presented to an expert before a true positive polyp is found (ordering by the posterior probabilities).
four features: the maximum protrusion found on a candidate object; the size of the object measured as proposed in [5]; volume, obtained from the enclosed volume between the original mesh and the deformed mesh and the percentage of SI values for the vertices on the segmented surface patch, that is within the range of 0.65 < SI < 0.85. The latter value is expected to attain high values on folded structures. Three lines in figure 3 (b) show the performance of the system for different size classes. The data was generated in a leave-one-patient out manner. The large polyps (>10mm) are found with 100% sensitivity at a FP rate of 13 per dataset and 90% sensitivity at 2 FP per dataset. The ROC for polyps larger than 6 mm (including those larger than 10mm) show that 80% sensitivity is obtained at the cost of 4 FP per dataset. A 95% sensitivity is obtained at the cost of 10 FP per dataset. The results for polyps larger than 5mm are similar to the results for polyps larger than 6mm. However, 100% sensitivity is not reached. This is due to the fact that in the candidate generation step 3 polyps between 5mm and 6mm have been missed. The right two plots in figure 3 demonstrate that the specificity of the CAD system is even higher if the sole purpose of the system is for diagnostic purposes only, namely to decide whether a patient has or has not any polyps. The 3rd plot shows how many false positives have been assigned a higher posterior probability than the first large polyp (>=10mm). It follows that for 7 out of 13 patients the first object is a polyp. For one patient three false positives have a higher posterior probability than the first polyp. Note that the total number of patients is 11 (and not 13). This is due to the fact that 2 patients did not have polyps larger than 10mm. The right plot shows the number of false positives with a higher posterior probability than any of the polyps larger than 6mm (including those larger than 10mm). Note that including the smaller polyps improves the results. Apparently some of the small polyps are assigned a higher posterior probability than the large ones.
4
Conclusions
We have presented a method to detect protruding objects on curved surfaces. It was used to generate candidate objects for automated polyp detection. The method works by locally flattening the colon wall in order to ‘remove’
478
C. van Wijk et al.
protrusions. Actually, the colon surface is deformed until the second principal curvature is smaller than or equal to zero. Therefore, only those structures are affected that are curved in two directions, like polyps. Folds remain unaltered. The amount of displacement needed for flattening/deformation is used as a measure of ’protrudness’ of the object. A threshold on the deformation field is the only parameter needed for candidate generation. This is a clear advantage over methods that involve many restriction criteria. Another advantage is that the deformation field immediately allows for the computation of additional features such as the object’s volume. We have shown that a simple linear classifier involving only four features already yields 95% sensitivity at the cost of about 10 FP per dataset. Clearly, the algorithm must be extensively tested. We do realize an investigation on more data involving complexer classifiers is needed to be conclusive on the overall improvement of polyp detection. However, the current results give an indication that the protrusion measure may enhance polyp detection schemes.
References 1. B. Acer et al. Using optical flow fields for polyp detection in virtual colonoscopy. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention Conference (MICCAI), oktober 2001. 2. B. Gokturk et al. A statistical 3-d pattern processing method for computer-aided detection of polyps in ct colonography. IEEE Transactions on Medical Imaging, 20(12):1251–1260, 2001. 3. G. Kiss et al. Computer aided detection for low-dose ct colonography. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention Conference (MICCAI), pages I–859, 2005. 4. H. Yoshida et al. Computerized detection of colonic polyps at ct colonography on the basis of volumetric features: Pilot study. Radiology, 222:327–336, 2002. 5. J.J. Dijkers et al. Segmentation and size measurement of polyps in ct colonography. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention Conference (MICCAI), pages I–712, 2005. 6. M. Desbrun et al. Implicit fairing of irregular meshes using diffusion and curvature flow. In SIGGRAPH 99, 1999. 7. R. Summers et al. Polypoid lesions of airways: Early experience with computerassisted detection by using virtual bronchoscopy and surface curvature. Radiology, 208:331–337, 1998. 8. R. Summers et al. Automated polyp detection at ct colonography: Feasibility study. Radiology, 216:284–290, 2000. 9. R.E. Van Gelder et al. Computed tomographic colonography compared with colonoscopy in patients at increased risk for colorectal cancer. Gastroenterology, 127(1):41–8, 2004. 10. J. Nappi and H. Yoshida. Automated detection of polyps with ct colonography: Evaluation of volumetric features for reduction of false-positive findings. Acad. Radiol., 9:386–397, 2002. 11. H. Yoshida and J. Nappi. Three-dimensional computer-aided diagnosis scheme for detection of colonic polyps. IEEE Transactions on Medical Imaging, 20(12):1261– 1274, 2001.
Part-Based Local Shape Models for Colon Polyp Detection Rahul Bhotika1 , Paulo R.S. Mendonça1, Saad A. Sirohey2, Wesley D. Turner3, Ying-lin Lee1 , Julie M. McCoy1 , Rebecca E.B. Brown1 , and James V. Miller1 1
GE Global Research, One Research Circle, Niskayuna, NY 12309, USA {bhotika, mendonca, leeying, mccoy, book, millerjv}@research.ge.com 2 GE Healthcare, 3200 N Grandview Boulevard, Waukesha, WI, 53188, USA
[email protected] 3 Kitware Inc., 28 Corporate Drive, Suite 204, Clifton Park, NY 12065, USA
[email protected]
Abstract. This paper presents a model-based technique for lesion detection in colon CT scans that uses analytical shape models to map the local shape curvature at individual voxels to anatomical labels. Local intensity profiles and curvature information have been previously used for discriminating between simple geometric shapes such as spherical and cylindrical structures. This paper introduces novel analytical shape models for colon-specific anatomy, viz. folds and polyps, built by combining parts with simpler geometric shapes. The models better approximate the actual shapes of relevant anatomical structures while allowing the application of model-based analysis on the simpler model parts. All parameters are derived from the analytical models, resulting in a simple voxel labeling scheme for classifying individual voxels in a CT volume. The algorithm’s performance is evaluated against expert-determined ground truth on a database of 42 scans and performance is quantified by free-response receiver-operator curves.
1 Introduction According to the GLOBOCAN 2002 survey [1], colorectal cancer was the second leading cause of cancer deaths in 2002 with more than half a million deaths worldwide. The American Cancer Society estimates that 56000 people will die of the disease in the United States alone in 2005 [2]. Screening for the early detection and removal of polyps before they become cancerous is widely believed to help reduce mortality rates. However, less than half of the recommended population in the US has undergone screening, in part because of the invasive nature and risks of optical colonoscopy procedures. Virtual colonoscopy or CT colonography is emerging as an alternative, non-invasive option and may help reduce patient discomfort and thus increase compliance. Thin-slice CT scanners with their high spatial resolutions, while enabling the radiologist to see smaller structures in greater detail, pose an enormous data burden for radiologists. This data explosion motivates the need for automated systems that assist radiologists in reviewing CT colon scans with greater efficiency and improved detection sensitivity. Several computer-aided diagnosis (CAD) systems have been developed in the last decade to find polyps [3,4,5,6] in CT volumes. In general, these methods extract features such as surface curvature and intensity contrast on the colon surface to identify R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 479–486, 2006. c Springer-Verlag Berlin Heidelberg 2006
480
R. Bhotika et al.
polyp-like candidates, and use a trained classifier to classify candidates. In [6], principal curvatures were used to compute a shape index at each voxel, followed by fuzzy clustering to generate candidates and classification using discriminant analysis. A support vector machine classifier trained on 3 orthogonal cross-sectional planes sampled for each candidate was used in [3]. In [7], polyp shapes were modeled using spherical harmonics and thresholds for the shape descriptors were learned from training data. In all of these methods, training and test datasets are very similar and although high performance is reported, testing is on a small number of cases, making the results inconclusive. A recent large study on 1186 scans from 3 institutions [8] evaluated the algorithm described in [5]. The method consists of extracting the surface colon wall, removing fluid if present, and segmenting candidates using surface features followed by classification. It achieved sensitivities of 89.3% at 2 false positives (FP)/case for polyps larger than 10 mm and 61% at 8 FP/case for polyps above 6 mm. Approximately onethird of the cases were used for training and the rest for testing. While the study size is impressive, results on the clinically significant medium-sized polyps (6-9 mm) [9] show the limitations of the learning-based CAD system. This paper introduces a technique for highlighting polyps in CT colon scans. The algorithm is based on 3D analytical models of the local shape of colon-specific anatomical structures such as haustral folds and sessile and pedunculated polyps. Each of these structures is modeled by combining simpler shapes such as ellipsoids and tori. Principal curvatures of the implicit isosurface at each voxel are computed directly from the image function without explicitly extracting the isosurfaces. Analytical models are then used to derive constraints and decision boundaries in curvature space to label individual voxels as belonging to polyps, folds, or neither. No model fitting or training steps are required since thresholds are obtained directly from the models. Since all operations are performed locally at each voxel, the proposed method can be interpreted as a filter for highlighting relevant anatomical structures and can be easily incorporated into different viewing formats used by radiologists to read CT colon scans, such as traditional 2D slices, 3D virtual fly-throughs, or virtual dissection views.
2 Computation of Principal Curvatures The curvature of the intensity isosurface passing through a voxel x can be computed directly from the implicit image function [10,11]. The principal curvatures κ1 and κ2 are independent of any transformation I (x) = aI(x) + b applied to the image volume I(x). In fact, κ1 and κ2 at a voxel x depend only on the shape of the associated isosurface and not the specific isovalue. Therefore only the local shapes of the relevant anatomical structures has to be considered, not the exact intensity profiles.
3 Shape Models for Colon Structures Given the principal curvatures at a voxel, analytical models for the shapes of relevant anatomical structures can be used to set thresholds and classify the local shape at a voxel as ellipsoidal or toroidal [11]. Two major contributions of this work are in building such analytical models for colon-specific anatomy and addressing more complicated structures by combining simpler model parts.
Part-Based Local Shape Models for Colon Polyp Detection
481
Fig. 1. Colon structures of interest From left to right: 2D and 3D rendered views of a typical pedunculated polyp, and 2D and 3D views of a typical sessile polyp. The 3D views also show haustral folds running across the colon cross-section.
After bowel cleansing and insufflation of the colon, the colon wall will typically display a circular cross-section and for all purposes can be considered locally flat compared to the attached structures. The normal colon also has haustral folds produced by the shortening of colon segments which form ridge-like structures running circumferentially, as seen in Fig. 1. Colonic polyps, on the other hand, are protrusions from the wall of the colon into the lumen and generally fall into three categories based on shape: pedunculated, sessile, and flat. This work introduces analytical models haustral folds and for pedunculated and sessile polyp lesions. 3.1 Pedunculated Polyps Pedunculated polyps are mushroom-like capped structures attached by a thin stalk to the colon wall, as shown in Fig. 1. The shape of the polyp cap is locally ellipsoidal and therefore well-approximated by the ellipsoidal patches used for the nodule model in [11], without having to explicitly model the stalk. 3.2 Sessile Polyp Model Sessile polyps are relatively flat polyps with a broad base and no stalk. This configuration can be geometrically modeled by combining a half ellipsoid with a base radius c and height a, and a semi-toroidal rim with radius r, as depicted in Fig. 2(a). The analysis of the local shape of sessile polyps can be carried out independently on the individual ellipsoidal and toroidal parts and then combined as shown below. The ellipsoidal part of the sessile polyp, MSP,E , can be parameterized as MSP,E : Π × Θ × Φ →R3 ρ c cos θ cos φ ρ c sin θ cos φ (ρ , θ , φ ) →x=
(1)
ρ (r+a sin φ )
where Π = [0, 1], Θ = [0, 2π ), Φ = [0, π /2], with each choice of ρ ∈ Π defining a different isosurface within the volume of the sessile polyp model. The model parameters mSP = (a, c, r) determine the shape of the isosurfaces. A cross-sectional side-view of the outermost isosurface of the sessile polyp model is shown in Fig. 2(b). The model parameters mSP lie in the domain MSP = {(a, c, r) ∈ R3 |0 < a ≤ c, 0 < r}.
482
R. Bhotika et al.
Domain of support for (κ ,κ ) in the sessile polyp model 1
2
SP Ka,c,r 4
2
2
2
3 1
κ = ( c / a )κ 2
c a2
κ = ( c / a )κ 2
1
κ =κ 2
1
κ2 = −( r/ c)κ1
(a)
κ = −1/ r κ2
1
a c
r
a c2
0
−1 r
0 κ
1
a 1 c2 c
(c)
(b)
Fig. 2. Sessile Polyp Model (a) Three-dimensional representation of the sessile polyp. (b) Parameters of the sessile polyp model: the ellipsoidal cap of the sessile polyp has semi-axis a ≤ b = c, and the toroidal contact has an inner radius of r. The full surface can be obtained by rotating the dark curve around the a direction. (c) Shaded region shows support of the sessile polyp model in κ space. 2
2
2
Let E(x) = xc2 + yc2 + az 2 . The equation E(x) = ρ defines an isosurface. Solving for κ = (κ1 , κ2 ) and transforming into polar coordinates yields
κ1 =
a ac and κ2 = . ρ c(a2 − (a2 − c2 ) sin2 φ )1/2 ρ (a2 − (a2 − c2 ) sin2 φ )3/2
(2)
Enforcing simple constraints such as 0 ≤ κ1 ≤ κ2 , 0 ≤ sin φ ≤ 1 with φ ∈ [0, π /2], SP,E and 0 ≤ ρ ≤ 1, one can determine the region of support Ka,c for MSP,E in κ space where (2) holds, as shown in Fig. 2(c). The pedunculated polyp model has a very similar region of support (not shown). The toroidal rim (or “neck”) of the sessile polyp, MSP,T , can similarly be parameterized as MSP,T : Π × Θ × Φ →R3 ρ (c+r−r cos φ ) cos θ (3) (ρ , θ , φ ) → x = ρ (c+r−r cos φ ) sin θ ρ (r+r sin φ )
where Π = [0, 1],Θ = [0, 2π ), Φ = [−π /2, 0). It can be shown that the principal curvatures at an isosurface level ρ ∈ Π for MSP,T are given by
κ1 = −
1 cos φ and κ2 = . ρr ρ (c + r(1 − cos φ ))
(4)
SP,T The bounds for the region of support Ka,c,r in κ space where (4) holds can also be derived as before. The combined support region for the sessile polyp model is given by SP,E SP,T SP the domain Ka,c,r = Ka,c ∪ Ka,c,r and is shown in Fig. 2(c).
Part-Based Local Shape Models for Colon Polyp Detection
483
Domain of support for (κ ,κ ) in the Haustral fold model 1
3/r1
2
F KR,r 1 ,r2
κ = r κ /(r +r −r Rκ ) 1
1 2
κ = 1/r 2
1
2
1
2
1
κ =0 1
κ = −1/(R+r ) 2
2
κ = −1/r
(a)
2
κ
2
1
R
1/r1
r1
r2
0 −1/(R + r2 ) −2 r2
(b)
−1 R−r1 −r2
−1 r2
(c)
−1 R
0
Fig. 3. Haustral Fold model (a) Three-dimensional representation of the haustral fold with toroidal ridge and neck parts (b) Parameters of the fold model: the torus representing the fold ridge has radius r1 , and the toroidal contacts forming the neck each of radius r2 . All three tori have large radius R. The full surface is obtained by rotating the black curve along a direction parallel to the colon centerline. (c) Shaded region shows support of the fold model in κ space.
3.3 Haustral Fold Model Following a procedure similar to that for sessile polyps, haustral folds can be modeled by combining three different tori, one with radius r1 for the ridge and one each on either side of the ridge with radius r2 , to model the neck connecting the fold ridge and the colon wall, as depicted in Fig. 3(a). The fold curves around the colon wall, rotating with a radius R about the colon centerline. The model parameters mF = (R, r1 , r2 ) determine the shape of the isosurfaces of the fold and lie in the domain MF = {(R, r1 , r2 ) ∈ R3 |0 < r1 ≤ r2 , 0 < R}. A cross-sectional side-view of the outermost isosurface of the haustral fold model is shown in Fig. 3(b). The torus representing the neck of the fold can be parameterized as MF,R : Π × Θ × Φ →R3 (R−ρ r2 −ρ r1 sin φ ) cos θ (ρ , θ , φ ) → x = (R−ρ r2 −ρ r1 sin φ ) sin θ
(5)
ρ r1 cos φ
where Π = [0, 1], Θ = [0, θ0 ], Φ = (0, π /2], with ρ representing the different levels of the model isosurfaces and θ and φ representing the location on a particular isosurface. Solving at a particular isosurface level ρ for κ = (κ1 , κ2 ) and transforming into polar coordinates using the parameterization in (5) yields
κ1 =
− sin φ 1 and κ2 = . R − ρ r2 − ρ r1 sin φ ρ r1
(6)
Similarly the neck tori each follow the parameterization give by MF,N : Π × Θ × Φ →R3 (R−ρ r2 (1+sin φ )) cos θ (ρ , θ , φ ) → x = (R−ρ r2 (1+sin φ )) sin θ ρ (r1 +r2 (1−cos φ ))
(7)
484
R. Bhotika et al.
with Π = [0, 1], Θ = [0, θ0 ], Φ = [−π /2, 0]. The torus on the other side of the fold is exactly the same except Φ = [π , 3π /2]. The principal curvatures for the fold neck are given by
κ1 =
−1 sin φ and κ2 = . ρ r2 R − ρ r2(1 + sin φ )
(8)
F,R F,N The bounds for the regions of support KR,r and KR,r in κ space where (6) and 1 ,r2 1 ,r2 (8) are respectively valid for the fold ridge and neck can also be extracted using model constraints as was done for the sessile polyp. The combined support region for the fold F,R F,N F model is given by the domain KR,r = KR,r ∪ KR,r , and is shown in Fig. 3(c). 1 ,r2 1 ,r2 1 ,r2
4 Implementation and Experimental Results Implementation The algorithm was implemented using the open source Insight Toolkit (ITK) [12]. The volume image was smoothed by convolution with a Gaussian kernel to reduce the effect of noise. The colon volume was automatically extracted as follows. All voxels in the CT volume were labeled as non-body or body using an adaptive threshold between air and body tissue. Running connected components on the non-body voxels yielded a set of air regions. Frequently, colon CT scans also contain the lower part of the lungs close to the diaphragm. Regions outside the body or in the lung were discarded automatically, leaving only air-filled regions of the colon. The air-filled regions were dilated and then subtracted from the dilated regions, giving a thin layer of voxels consisting of just the colon wall and attached structures. All subsequent processing was restricted to this region-of-interest (ROI). The colon segmentation assumes that the colon was prepared through cleansing and insufflation. The principal curvatures at every voxel in the ROI were computed as described in [10,11]. The pedunculated and sessile polyp models and the haustral fold model provide regions of support in κ space. Given principal curvatures computed at a voxel x, it was labeled as belonging to a polyp or a fold or neither, depending on where the κ1 (x) and κ2 (x) values lied in κ space. The decision boundaries depend on the model parameters SP and K F Ka,c,r R,r1 ,r2 and were chosen either based on representative values for clinically relevant target lesions or knowledge of anatomical structures. For example, the radius R in the fold model was approximated by the radius of the colon cross-section. The fact that all model parameters can be specified in physical units and the ITK-based implementation can maintain and process data in physical space allowed all model constraints to be easily propagated throughout the algorithm. Since the curvature computations are susceptible to noise and the analytical models are idealized, morphological operations were applied on the polyp and fold responses to remove spurious small polyp responses occurring in isolation or in small pockets within large fold response regions. These final polyp and fold responses can be presented to radiologists as overlays on top of the the original CT volume, either on the 2D axial slices or on a 3D rendering as shown in Fig. 4(left). The algorithm takes about 30 seconds to run on a 3 GHz Xeon processor with 2GB RAM. Ground Truth and Validation. To quantify the algorithm’s performance, neighboring voxel responses were grouped and counted as a single detection. The validation dataset
Part-Based Local Shape Models for Colon Polyp Detection
485
Fig. 4. Algorithm results The two images on the left show polyp responses generated by the algorithm overlaid on a 2D CT slice and a 3D rendering, respectively. On the right are FROC curves showing the algorithm’s performance with ground truths containing polyps > 6 mm (light, solid curve) and polyps > 4 mm (dark, dashed curve). Table 1. Comparison of proposed method with CAD evaluation from [8]: The voxel-labeling scheme proposed in this paper does not require any training and compares favorably in running time and performance, though it has been validated on a much smaller dataset Algorithm
Training Testing Sensitivity @ FP Sensitivity @ FP cases cases polyps > 6 mm polyps > 4 mm Proposed method 0 42 31/38 = 81.6% @ 10.4 43/57 = 75.4% @ 10.4 Summers et al [8] 394 792 73/119 = 61.3% @ 8.0 -
Running time/case 30 sec 20 min
consisted of 42 CT scans. A low-dose screening protocol with 120 mAs, 120 kVp, and 0.6 mm slice spacing was used. Ground truth annotations including polyp location and diameter were provided by an expert radiologist. Current screening protocols consider all polyps above 6mm in diameter as clinically relevant. The algorithm achieved a sensitivity of 81.6% at 10.4 FP or 76.3% at 6.2 FP/case for the 38 polyps in the ground truth with diameter above 6 mm. The complete free-response receiver operator curve (FROC) is shown in Fig. 4 (right). Radiologists currently do not report findings of “diminutive” polyps (between 4-6 mm in diameter) [9]. However, a follow-up repeat screening in 5 years is generally recommended in such cases. Therefore, the algorithm was also evaluated against all polyps above 4 mm in the ground truth (57 polyps) and achieved a sensitivity of 77.2% at 12.9 FP/case or 75.4% at 10.4 FP/case, as shown in Fig. 4 (right). It is difficult to perform a fair comparison of CAD methods because of differences in test databases and validation techniques. Also, very few CAD evaluations on large databases are available. Recently, a study evaluating a classifier-based CAD algorithm on an 1186 cases from 3 institutions (1 mm slice spacing 100 mAs, 120 kVp) has been published [8]. Table 1 attempts to compares the method proposed here to the testing infrastructure and the results in this study. The relatively lower performance of the learning-based CAD algorithm used in [8] shows that the model-based approach presented in this paper has promise although it has been tested on a much smaller database.
486
R. Bhotika et al.
5 Conclusion This paper introduces a novel voxel labeling scheme for highlighting lesions in CT colon scans. Geometric models of colon-specific anatomical structures, namely pedunculated or sessile polyps and haustral folds were developed by combining ellipsoids and tori. Constraints derived from the models are used to classify each voxel based on the principal curvatures computed at that voxel. The method requires neither segmentation of the relevant anatomical structures nor sophisticated classifiers trained over a set of image features. It can be viewed as a simple, fast filter that highlights voxels belonging to specific anatomical shapes and can aid the radiologist in reviewing CT colon exams with greater efficiency and sensitivity. Validation results show the promise of the model-based approach taken in this paper. Future work includes modeling flat polyps and extending the models to screening protocols with different patient preparations.
References 1. Ferlay, J., Bray, F., Pisani, P., Parkin., D.M.: GLOBOCAN 2002: Cancer incidence, mortality and prevalence worldwide. Technical report, IARC CancerBase No. 5. version 2.0, IARCPress, Lyon, France (2004) http://www-dep.iarc.fr/. 2. American Cancer Society: Colorectal cancer: Facts and figures, special edition (2005) 3. Göktürk, S.B., Tomasi, C., Burak, A., Beaulieu, C.F., Paik, D.S., Brooke Jeffrey Jr., R., Yee, J., Napel, S.: A statistical 3-D pattern processing method for computer-aided detectin of polyps in CT colonography. IEEE Trans. Medical Imaging 20(12) (2001) 1251–1260 4. Paik, D.S., Beaulieu, C.F., Rubin, G.D., Acar, B., Jeffrey, Jr., R.B., Yee, J., Dey, J., Napel, S.: Surface normal overlap: A computer-aided detection algorithm with application to colonic polyps and lung nodules in helical CT. IEEE Trans. Medical Imaging 23(6) (2004) 661–675 5. Summers, R.M., Johnson, C.D., Pusanik, L.M., D., M.J., Youssef, A.M., Reed, J.E.: Automated polyp detection at CT colonography: Feasibility assessment in a human population. Radiology 219 (2001) 51–59 6. Yoshida, H., Näppi, J.: Three-dimensional computer-aided diagnosis scheme for detection of colonic polyps. IEEE Trans. Medical Imaging 20(12) (2001) 1261–1274 7. Kiss, G., Van Cleynenbreugel, J., Drisis, S., Bielen, D., Marchal, G., Suetens, P.: Computer aided detection for low-dose CT colonography. In: Medical Image Computing and Computer-Assisted Intervention. Number 3749 in Lecture Notes in Computer Science, Springer-Verlag (2005) 859–867 8. Summers, R.M., Yao, J., et al, P.P.J.: Computed tomographic virtual colonoscopy computeraided polyp detection in a screening population. Gastroenterology 129 (2005) 1832–1844 9. Pickhardt, P.: Target lesion: The radiologist’s perspective. In: Sixth International Symposium on Virtual Colonoscopy, Boston, MA (2005) 60–62 10. Vos, F.M., Serlie, I.W.O., van Gelder, R.E., Post, F.H., Truyen, R., Gerritsen, F.A., Stoker, J., Vossepoel, A.M.: A new visualization method for virtual colonoscopy. In Niessen, W.J., Viergever, M.A., eds.: Medical Image Computing and Computer-Assisted Intervention. Number 2208 in Lecture Notes in Computer Science, Berlin, Springer-Verlag (2001) 645–654 11. Mendonça, P.R.S., Bhotika, R., Sirohey, S., Turner, W.D., Miller, J.V., Avila, R.S.: Modelbased analysis of local shape for lesion detection in CT scans. In Duncan, J., Gerig, G., eds.: Medical Image Computing and Computer-Assisted Intervention. Number 3749 in Lecture Notes in Computer Science, Palm Springs, CA, USA, Springer-Verlag (2005) 688–695 12. Ibáñez, L., Schroeder, W., Ng, L., Cates, J.: The ITK Software Guide. Kitware Inc. (2003)
An Analysis of Early Studies Released by the Lung Imaging Database Consortium (LIDC) Wesley D. Turner, Timothy P. Kelliher, James C. Ross, and James V. Miller GE Research Center, Niskayuna NY 12309, USA
Abstract. Lung cancer remains an ongoing problem resulting in substantial deaths in the United States and the world. Within the United states, cancer of the lung and bronchus are the leading causes of fatal malignancy and make up 32% of the cancer deaths among men and 25% of the cancer deaths among women. Five year survival is low, (14%), but recent studies are beginning to provide some hope that we can increase survivability of lung cancer provided that the cancer is caught and treated in early stages. These results motivate revisiting the concept of lung cancer screening using thin slice multidetector computed tomography (MDCT) protocols and automated detection algorithms to facilitate early detection. In this environment, resources to aid Computer Aided Detection (CAD) researchers to rapidly develop and harden detection and diagnostic algorithms may have a significant impact on world health. The National Cancer Institute (NCI) formed the Lung Imaging Database Consortium (LIDC) to establish a resource for detecting, sizing, and characterizing lung nodules. This resource consists of multiple CT chest exams containing lung nodules that seveal radiologists manually countoured and characterized. Consensus on the location of the nodule boundaries, or even on the existence of a nodule at a particular location in the lung was not enforced, and each contour is considered a possible nodule. The researcher is encouraged to develop measures of ground truth to reconcile the multiple radiologist marks. This paper analyzes these marks to determine radiologist agreement and to apply statistical tools to the generation of a nodule ground truth. Features of the resulting consensus and individual markings are analyzed.
1 Introduction Despite warnings stretching back 40 years – the initial United States Surgeon General’s Report on Smoking was issued in 1964 and warning labels have been required on cigarettes sold in the United States since 1969 – lung cancer remains a serious health threat in the world with an estimated 1 million deaths in 2000 [1]. For the United States, lung cancer is the single largest fatal malignancy and is second only to heart disease in yearly fatalities. There are few effective treatments and five year mortality is approximately 14% [2,3] due largely to advanced stage of the disease at diagnosis. Despite this, lung cancer screening is not recommended, even for at risk populations, based largely
This publication was supported by the DOD and the Medical University of South Carolina under DOD Grant No. W81XWH-05-1-0378. Its contents are soley the responsibility of the authors and do not necessarily represent the official views of the Department of Defense or the Medical University of South Carolina.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 487–494, 2006. c Springer-Verlag Berlin Heidelberg 2006
488
W.D. Turner et al.
on statistically powerful studies carried out in the 1970s that showed little benefit to lung cancer screening. However, imaging advances over recent years – most notably the advent of thin slice MDCT – have prompted revisiting the notion of screening. Recent studies of stage I cancers demonstrate that the diagnosis of lung cancer prior to metasteses can give five year survival rates as high as 70% [4,1] and other studies [5] equate small, subtle cancers with early stage disease. But before screening becomes widely accepted, issues involving overdiagnosis, radiation dose, and appropriate population selection must be addressed. However, a dose-limiting screening protocol that enables early nodule detection while limiting overdiagnosis appears within reach. One of the promising vehicles for early detection is the low dose, Multi-Detector Computed Tomography (MDCT) scan. MDCT has been shown to be an effective method for detecting small (<1cm) nodules in the lungs that may be cancerous or pre-cancerous lesions. However, the rigors of reading a large number of MDCT chest scans with upwards of 500 images per scan, combined with the subtlety of some of nodules of interest argue for computer aided detection (CAD) as a necessity for a rational screening program. In this environment, the National Cancer Institute (NCI) formed the Lung Imaging Database Consortium (LIDC) to study the barriers to effective CAD development and to develop a database as a national resource that can be used to expedite development [6]. One of the major barriers to the development and evaluation of effective CAD devices is the absence of rigorous ground truth (GT). Unlike other cancer screening applications, the determination of malignancy in a discovered lung lesion is not undertaken for small, subtle lesions due to the likelihood of morbidity during the procedure. Although histological analysis of biopsy samples provides the most reliable assesment of malignancy, radiological diagnosis based on medical images currently serves as a surrogate for an actual malignancy determination. Performance of CAD algorithms tends to be very sensitive to the choice of GT. Consider the chart in Figure 1, which represents a set of lung scans read by three radiologists. The data was generated based on lung cancer screening exams in the GE Global Research cancer database. The y-axis shows how many nodules were detected by one, two or all three radiologists reading the exams. Depending on whether single radiologist GT, majority GT, or unanimous GT is chosen, the performance of a typical CAD system will change drastically. As an example, a CAD algorithm with 100% sensitivity on unanimous GT could have much lower sensitivity on single reader GT, and an algorithm with 100% specificity for single reader GT could gain hundreds of false positives using unanimous GT. To get the best possible GT under these constraints, and to capture nodule characteristics such as spiculations, lobulations and density, the LIDC asked multiple radiologists to perform a series of blinded and unblinded reads on a number of CT lung images. Each radiologist first read the lung cases without knowledge of how his/her colleagues marked the exams and placed contours around the periphery of nodules found. The radiologists also described the nodules with the features in Table 1. Once all initial reads where completed, a further round of unblinded reads where performed where the radiologists could modify their own contours based on the contours of their colleagues. In the end, all four sets of final contours were saved and provided as annotations to the exams. This allows CAD and cancer researchers to determine an appropriate method to
An Analysis of Early Studies Released by the Lung Imaging Database Consortium
489
Fig. 1. The variability of nodule ground truth based on radiological consensus Table 1. Nodule characeristics captured in the LIDC datasets Characteristic Subtlety Texture Margin Sphericity Spiculation Lobulation
1 - Extremely 1 - non/GGO 1 - Poorly Defined 1 - Linear 1 - Marked 1 - Marked
Definition 2 - Moderately 3 - Fairly 2 3 - Part/Mixed 2 3 2 3 - Ovoid 2 3 2 3
4 - Moderately 4 4 4 4 4
5 - Obvious 5 - Solid 5 - Sharp 5 - Round 5 - None 5 - None
combine the GT for their research so as to maximize detection and diagnostic capabilities of their algorithms. In this paper, we investigate the radiologist GT in the first twenty-nine LIDC datasets released. We use statistical tools such as Simultaneous Truth and Performance Level Estimation (STAPLE) [7] available in the Insight Segmentation and Registration ToolKit (ITK) [8] to both extract estimations of GT contours from the multiple radiologist marks and to characterize reader performance.
2 Methods The LIDC in cooperation with the NCI are collecting and distributing a database of lung MDCT scans with annotations of nodules and nodule charateristics. The public database currently stands at 30 datasets, of which we were able to process 29. The dataset size is growing and will reach 400 cases before the end of the project. We downloaded the cases from the LIDC public ftp site [9] (the data is also available at [10]). This data is stored in anonymized DICOM format that eliminates patient identification elements from the images, but preserves other data annotations such as dosage and reconstruction parameters as well as the make and model of the scanner for each study. The image data is accompanied by an XML file with information from the experts’ reads. The XML details nodules smaller than 3mm, nodules larger than 3mm and non-nodules. In this analysis we have focused on nodules larger than 3mm.
490
W.D. Turner et al.
Using ITK we read each image series and the corresponding XML data, which gives a list of points defining a 2D contour at a specific Z-location within the CT image stack. We convert these contours to label images in which only the interior voxels are labelled. This is in keeping with the instructions given to the radiologists, who were instructed to mark the nodules one pixel outside the border. The next step is to ensure that the same nodule is labeled consistently in each of the reader maps. This is done by comparing the bounding boxes of the nodules. If the centroid of a bounding box overlaps with the bounding box of a nodule by a different radiologist, we consider it to represent the same nodule. After the nodules are crosscompared each resulting unique nodule is assigned a label and the original label maps are adjusted to use the new labeling. The result of this processing is a set of volumes labeled with consistent nodule numberings across all of the readers for each series from the database. We write these volumes back to disk as DICOM series. While this is not the most compact storage choice, it has the advantage that all of the original DICOM information is carried with the label maps, minimizing the opportunity for error later in the process and allowing the nodule contours to be viewed with standard DICOM viewing tools. The result is a set D of label maps where Dm,n is the set of voxels from reader m with label n. 2.1 Calculating Ground Truth Given the consistently labeled nodule volumes representing a set of expert opinions, we combine the individual information into a common ground truth. There are a number of ways in which this can be done. We compute several and then contrast the results. The most expansive combination that we compute for a given nodule with label n is the union of all reads. Tmaxn = A, A ∈ Dm,n ∀m. (1) For this case any voxel labeled as part of nodule n by any of the readers is considered to be a valid part of n. This is equivalent to saying that all of the readers have perfect specificity, while some may have reduced sensitivity. That is all marked voxels are valid parts of the nodules and no marks represent false positive voxels. Conversly, we also compute the smallest estimate of ground truth by only considering those voxels that were selected by the entire set of readers. Tminn =
A, A ∈ Dm,n ∀m.
(2)
In this case, there are no false negative voxels; readers are accorded perfect sensitivity while potentially suffering from reduced specificity. Together Tmax and Tmin form the bounds on that which we will consider to be truth. For some questions these two estimates may be sufficient representations of the truth. Questions concerning whether the case needs further consideration may rely soly on the Tmax estimate. Other questions that look at the reproducibility of results may consider Tmin /Tmax. As this ratio approaches 1 the agreement on nodule definition is approaching consensus. The size and shape variation of nodules that fall within these
An Analysis of Early Studies Released by the Lung Imaging Database Consortium
491
bounds, however, is too wide to use in comparing algorithms that attempt to segment nodules for the purposes of computing measures of nodule features such as lobulation and spiculation. To arrive at a more accurate estimate of ground truth, suitable for use in scoring segmentation algorithms, we must consider truth not as just a binary decision, as in (1) and (2), but as a continuum from which we will select. To arrive at this narrower ground truth, we start by marking all the voxels which are in the complement of Tmax as having zero probability of being a part of the truth and at the same time mark the voxels that are elements of Tmin as having a 100% probability of being part of the truth. This leaves Tmax − Tmin as those voxels whose probability is uncertain. There are two methods in common use for selecting the probabilities to assign to these voxels. For the first method consider each radiologist marking as a vote for including the voxel in the nodule [11] as below: vni,j,k =
R
Vm,n (i, j, k)/R,
(3)
m=1
Vm,n (i, j, k) =
1 Ii,j,k ∈ Dm,n 0 otherwise
(4)
where vni,j,k is a voxel in the pmap for nodule n at image position (i, j, k), Ii,j,k is the voxel in the label map, and R is the number of radiologists who provided annotations. This voting method results in the creation of a pmap representing the probability that a voxel is a part of a nodule. As presented, the voting method classifies all reader’s opinions as equally valid. While it is possible to make a simple extension to weight readers differently in the voting based on their relative expertise, this has not been the practice in this domain. The second method for computing probabilities for the uncertain voxels is STAPLE [7]. Staple simultaneously computes the consensus ground truth and the specificity and sensitivity for each of the readers. It employs an Expectation Maximization (EM) algorithm to jointly optimize these values. As with the voting method a pmap is produced with values at 0 for voxels definitely outside the nodule and 1 for voxels definitely inside based on the available manual markings. Voxels where certainty cannot be achieved are marked in the interval (0, 1). Furthermore, STAPLE can be preconditioned with the expected sensitivities and specificities of the readers to guide the optimization. Both the voting and STAPLE pmaps represent a continuum of possible ground truths. They can be thresholded at a particular point to force a binary decision on what will be considered as truth for a particular calculation. In practice, the 50% level is often used as dividing line between regions in computer vision applications.
3 Results Table 2 shows a comparison of ground truthing techniques for several representative cases from the LIDC database. In each column the total number of nodule-labeled voxels is given. It is evident by comparing the discrepancy in Tmin and Tmax values that intra-observer variability is an issue. As expected, the thresholded pmaps produce totals that lie between the corresponding Tmin and Tmax values. The fact that the total
492
W.D. Turner et al.
number of voxels remains unchanged as the pmaps are thresholded at different levels indicates that probability values are tightly grouped either below the 25% level or above the 75% level. Table 2. Comparison of Ground Truth Approaches Case 1 2 3 4 5
Tmax 244 596 3049 953 6325
Tmin pmap25 pmap50 pmap75 88 160 160 160 175 276 276 276 1100 1630 1630 1630 233 484 484 484 2356 4092 4092 4092
Figure 2 shows models generated using the Visualization Toolkit’s discrete marching cubes implementation [12,13]. Not only is inter-observer variability clear, but it is also evident that for some cases substantial step artifacts exist between slices.
Fig. 2. Models of the same nodule segmented by different readers
Figure 3 shows reader segmentations for a representative nodule, as well as the corresponding pmap generated by STAPLE. Variability across readers is clear. STAPLE computed sensitivity averages 66% over the 24 nodule positive cases. Specificity is nearly 100% (99.9993%) over the same cases, but this to be expected because the vast number of voxels in the cases are not marked as part of any nodule. STAPLE computed sensitivity on a per nodule basis is higher (82%) for the 26 unanimously detected nodules reflecting radiologist disagreement about the actual presence of some subtle nodules.
4 Conclusion Diagnosis of lung nodules and recommendation of potentially costly follow-up depend on the accurate assesment of nodule characteristics. Inter- and intra-observer variability in such assesments indicates the complexity of the issue and motivates the development of CAD techniques to assist human readers. The LIDC database provides an excellent resource against which CAD algorithms can be scored. For accurate evaluation, however, a notion of ground truth must be derived from the hand segmentations provided
An Analysis of Early Studies Released by the Lung Imaging Database Consortium
(a)
(b)
(c)
(d)
(e)
493
(f)
Fig. 3. From left to right: (a) zoomed CT region around a nodule, (b) reader 0 segmentation, (c) reader 1 segmentation, (d) reader 2 segmentation, (e) reader 3 segmentation, and (f) log(pmap) scaled to range from 0 to 255 in order to highlight probability drop-offs. The probabilities represented in this pmap are 100%, 99.9%, 6.1%, 4.7%, and 1.2%. Reader sensitivity and specificity for this nodule as determined by STAPLE are, respectively: (0.903, 0.999), (0.876, 1), (0.781, 1), (0.846, 1). Mean sensitivity and specificity are (0.851, 1).
by the expert radiologists. In this paper we have proposed and compared the use of several ground-truthing techniques: the union of all reads, the intersection of all reads, and STAPLE. We performed STAPLE analysis on a per-case basis to better leverage the detection variability of the cases, but better segmentation results might be obtained by running on a per-nodule basis. We currently plan to release pmap data to a public ftp site for free download. It would be interesting to correlate dosage and reconstruction settings with human segmentation results. Such an analysis on segmentation contours could be performed by investigating pmaps produced by the STAPLE algorithm. Image acquisition parameters that result in pmaps that are tightly grouped around 0% or 100% probability suggest a protocol that mitigates human variability in contour deliniation. As another next step we plan to develop automated 2D/3D nodule measures and correlate them with radiologist reported measures. Such measures would address the issue of human variability. Hand segmentations performed on 2D slices can lead to “discontinuous” segmentations as viewed in 3D – no doubt as a function of the oftentimes amorphous nature of nodules as well as the slice thickness of CT images and partial voluming. In order to mitigate step-like artifacts, we propose the use of additional hand segmentation tools that would augment human contouring. One idea would be to provide a 3D model immediately after the complete segmentation of a nodule for evaluation. Performing segmentations in coronal and sagittal reconstructions in addition to axial could also prove beneficial as could segmentations on thinner slice protocols.
References 1. Humphrey, L., Teutsch, S., Johnson, M.: Lung cancer screening with sputum cytology examination, chest radiography, and computed tomography: An update for the U.S. preventive services task force. Annal Intern Med (2004) 740–753 2. Fry, W.A., Menck, H.R., Winchester, D.P.: The national database report on lung cancer. Cancer 77 (1996) 1947–1955 3. Dodd, L., Wagner, R., Armato III, S., McNitt-Gray, M.F., Beiden, S., Chan, H.P., Gur, D., McLennan, G., Metz, C., Petrick, N., Sahiner, B., Sayre, J., The Lung Imaging Database Consortium Group: Assessment methodologies and statistical issues for computer-aided diagnosis of lung nodules in computed tomography: Contemporary research topics relevant to the lung image database consortium. Acad Radiol (2004) 462–475
494
W.D. Turner et al.
4. Henschke, C.I., Naidich, D.P., Yankelevitz, D.F., McGuinness, G., McCauley, D.I., Smith, J.P., Libby, D., Pasmantier, M., Vazquez, M., Koizumi, J., Flieder, D., Altorki, N., Miettinen, O.S.: Early lung cancer action project: Initial findings on repeat screenings. Cancer 92 (2001) 153–159 5. Suzuki, K., Kusumoto, M., S., W., Tsuchiya, R., Asamura, H.: Radioloigic classification of small adenocarcinoma of the lung: Radiologic-pathologic correlation and its prognostic impact. Ann Thora Surg (2006) 413–420 6. Armato III, S., McClennan, G., McNitt-Gray, M.F., Meyer, C., Yankelevitz, D., Aberle, D., Henschke, C.I., Hoffman, E.A., Kazerooni, E.A., MacMahon, H., Reeves, A.P., Croft, B.Y., Clarke, L.P., For the Lung Image Database Consortium Research Group: Lung image database consortium: Developing a resource for the medical imaging research community. In: Radiology, RSNA (2004) 739–748 7. Warfield, S.K., Zou, K.H., Wells, W.M.: Simultaneous truth and performance level estimation (staple): An algorithm for the validation of image segmentation. IEEE Transactions on Medical Imaging 23 (2004) 903–921 8. Ib´an˜ ez, L., Schroeder, W., Ng, L., Cates, J., The Insight Software Consortium: The ITK Software Guide Second Edition. http://www.itk.org (2005) 9. NCI/LIDC: ftp://ncicbfalcon.nci.nih.gov/lidc/lidc release 0001/ (2006) 10. NCI/LIDC: http://ncia.nci.nih.gov (2006) 11. NCI/LIDC: http://imaging.cancer.gov/reportsandpublications/reportsandpresentations/ firstdataset (2006) 12. Lorensen, W., Miller, J.V., Padfield, D., Ross, J.C.: Creating models from segmented medical images. In: Medicine Meets Virtual Reality. (2005) 13. Schroeder, W., Martin, K., Lorensen, W.: The Visualization Toolkit. Kitware Inc. (2004)
Detecting Acromegaly: Screening for Disease with a Morphable Model Erik Learned-Miller1 , Qifeng Lu1 , Angela Paisley2 , Peter Trainer2 , Volker Blanz3 , Katrin Dedden4 , and Ralph Miller5 1
5
University of Massachusetts, Amherst, Massachusetts, USA {elm, qlu}@cs.umass.edu 2 Christie Hospital, Manchester, UK
[email protected],
[email protected] 3 Universit¨ at Siegen, Germany
[email protected] 4 Max-Planck Institut f¨ ur Informatik University of Kentucky Medical Center, Lexington, Kentucky, USA
[email protected]
Abstract. Acromegaly is a rare disorder which affects about 50 of every million people. The disease typically causes swelling of the hands, feet, and face, and eventually permanent changes to areas such as the jaw, brow ridge, and cheek bones. The disease is often missed by physicians and progresses beyond where it might if it were identified and treated earlier. We consider a semi-automated approach to detecting acromegaly, using a novel combination of support vector machines (SVMs) and a morphable model. Our training set consists of 24 frontal photographs of acromegalic patients and 25 of disease-free subjects. We modelled each subject’s face in an analysis-by-synthesis loop using the three-dimensional morphable face model of Blanz and Vetter. The model parameters capture many features of the 3D shape of the subject’s head from just a single photograph, and are used directly for classification. We report encouraging results of a classifier built from the training set of real human subjects.
1
Introduction
Figure 1 shows two men of the same age. For most observers, seeing either one of them alone on the street would not prompt any particular reaction. The man on the left would possibly be identified as slightly overweight but otherwise unremarkable while the man on the right appears perfectly healthy. In fact these men are identical twins. This photograph appeared recently in the New England Journal of Medicine as one of the journal’s periodic “Medical Mysteries” with the caption, “Which twin is the patient?” [10]. The man on the left has a condition known as acromegaly, which is difficult for most laypeople, and even for most physicians, to diagnose. Acromegaly is a rare disorder of the endocrine system which affects roughly 50 of every million people in the general population. Early detection is important in treating the disease successfully, but it is often missed because the signs are R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 495–503, 2006. c Springer-Verlag Berlin Heidelberg 2006
496
E. Learned-Miller et al.
Fig. 1. These men are identical twins. Which one is sick, and what is his condition? By fitting the 3D morphable face model of Blanz and Vetter [2] to each of these faces (see Figure 2), the system described correctly identified the man on the left as an acromegalic, and the man on the right as healthy. This “Medical Mystery” first appeared in the New England Journal of Medicine [10].
subtle and the condition is rare. Since many of its symptoms affect facial appearance, the disease can be detected by specialists (endocrinologists) in many cases from a normal frontal photograph. If a patient’s appearance suggests presence of the disease, blood tests may be performed to confirm its presence. Because the tests are expensive and time consuming, it would clearly be valuable to have an inexpensive and automatic prescreening method. The ultimate goal of our research is to develop an automated, voluntary prescreening system for acromegaly. For example, when obtaining a photograph for a driver’s license, one could voluntarily choose to be screened for various conditions such as acromegaly. We believe that such systems could make a signficant contribution to early detection (and hence to the timely treatment) of this disease [5]. In this work, we present a new way of combining morphable models [2] and support vector machines for supervised learning. We use this method in a semiautomated approach to detecting the presence of acromegaly in people, using a supervised learning paradigm. We built a training set of images by taking 24 frontal photographs of acromegalic patients and 25 frontal photographs of disease-free subjects. Because the recognition of acromegaly is dependent upon subtle features which are difficult to detect locally in the image, we decided upon a global method of modelling that uses information across the entire photograph. In particular, we modelled each subject’s face in an analysis-by-synthesis loop using the morphable models of Blanz and Vetter [2]. The parameters of the morphable model capture many features of the three-dimensional shape of a subject’s head from just a single photograph and are excellent features with which to classify subjects as either acromegalic or not. We use the parameters recovered for each test face in an SVM to classify a subject as having acromegaly or not. We report encouraging results of a classifier built from the training set of real human subjects.
Detecting Acromegaly: Screening for Disease with a Morphable Model
497
Fig. 2. A 3D morphable model [2] was adapted in an analysis-by-synthesis loop to match each of the men in Figure 1. Although the models are not perfect replicas of the photographs, they capture important properties of the faces which prove to be effective in distinghishing between patients that do or do not have acromegaly. For example, the 3D model captured the swelling of the left man’s nose, a strong indicator of acromegaly which is difficult to detect as an image feature using traditional feature detectors operating on the image. Our classifier correctly classified the man on the left as acromegalic and the man on the right as negative for the disease.
While in a previous paper [7], Huang et al. use a morphable model in conjunction with SVMs for classification, this was a signficantly different approach than our own to combining these two tools. In that paper, the morphable model is used to generate additional sample images of a subject’s face under varying lighting conditions and poses. These additional images can improve recognition accuracy in tasks like subject identification. Unlike our work, the previous paper uses the SVM to analyze these synthetic images of subjects’ faces. In our approach, the SVM operates on the parameters of the morphable model directly. This is a key point, as the parameters of the morphable model, being natural parameters of face shape variation, are likely to be more discriminative than pixel-valued parameters as used in the previous work. A separate paper [3] uses model parameters directly for subject identification, but using a nearest neighbor classifier. Thus, to the best of our knowledge, this paper is the first to use SVM classification directly on morphable model parameters. In addition, this is the first paper to use such techniques in medical diagnosis.
2
The Training Data
Because acromegaly is uncommon, photographs of acromegalics are not readily available. Thus, a major portion of the effort in this work was in acquiring a database of images of 24 acromegaly patients, and a matching set of images from 25 healthy subjects. Three acromegalics from our database are shown in Figure 3. The leftmost patient shows clear signs of the disease. Features such as a large jaw, protruding brow, frontal bossing (protrusion of the forehead), swollen nose, prominent cheekbones, enlarged lips, and prominent naso-labial
498
E. Learned-Miller et al.
Fig. 3. Examples of acromegalics from our database. The symptoms decrease in severity from left to right. (See text.)
folds (creases in the skin of the cheek) make such a patient a strong candidate for further evaluation by an endocrinologist. While no one of these symptoms would indicate acromegaly, taken together they are a strong indication of disease. The middle patient in the figure still has clear, although more subtle, signs of acromegaly. He is an example of a patient that would likely not be detected by a lay person, and might often be missed by a physician who is not an acromegaly specialist. The patient has an enlarged jaw, mild nasio-labial folds, and slight enlargment of the cheekbones. On the right of the figure is a patient with very minor, if any, signs of the disease visible even to the expert. It seems unlikely that any visual test will diagnose the most difficult cases; our goal rather is to flag patients in the second category. One of the dangers in building a binary classifier is that details of the image acquisition environment will be leveraged by the classifier to “recognize” a condition such as acromegaly. That is, if some simple feature of the acquisition environment differs between one class and another, this may be used improperly to aid the classification. This was an especially important concern for us since we acquired images of acromegalics in one location and images of normals in another location. To minimize the probability that some exogenous factor affected the classification performance, we developed a protocol for taking photos. Our protocol specified the camera make and model, the background for the photograph, the expression of the subject (relaxed and neutral expression, including a closed mouth and open eyes), the orientation of the patient (front-facing), and the general lighting conditions. Ten pictures of each patient were taken so that images with closed eyes, accidental non-neutral facial expressions, blur due to movement, and other anomalies could be removed. One of the defect-free photographs of each subject was chosen manually for inclusion in the final database. The color and texture of a patient’s face was not used directly in the final classification. This minimizes two important effects which could cause biases in the classification, that of lighting which is difficult to control carefully, and that of skin tone, which happens to be biased slightly toward darker tones in our set of acromegalics relative to our set of normals. There are three properties of our database that are not ideal, and which we plan to address in the near future. As noted above, all of the acromegalic photos
Detecting Acromegaly: Screening for Disease with a Morphable Model
499
were taken in one locale and all of the normals were taken in another locale. Despite our precautions, this could possibly lead to hidden biases in the classifier. Another issue is that the photos of acromegalics were taken somewhat closer or with a slightly different zoom, on average, than the photos of normals. While the morphable model software explicitly incorporates estimates of distance and zoom into its estimate of 3D shape (minimizing the effect of differences in perspective or fish-eye distortion), these are nevertheless systematic differences in the database which should be eliminated in the future. Finally, our database of normals consists only of white males, while our acromegalics are a signficantly more diverse population. While we do not believe that these issues had a significant impact on the classifier, we plan to remedy them by improving the database in the future. Next, we turn to the task of modelling the faces in our database.
3
Modelling Faces
Many of the symptoms of acromegaly are difficult to capture using traditional local image features such as edges and image derivatives. Symptoms such as swelling of the nose and lips, protrusion of the brow and cheekbones, and growth of the jaw are very difficult to detect locally. Our initial work on this project focussed on measuring distances between various facial landmarks and computing the relative size of landmarks not correlated with disease, such as iris diameter, and landmarks like jaw width, that would be expected to be larger, on average, given that the patient had the disease. One problem with this approach is choosing landmarks that lead to consistent measurements of a patient’s face. It is difficult to define in a repeatable fashion measurements such as the “width of the jaw”. Developing software to do this automatically is even more difficult. Another problem with the landmark method is that it does not use all of the information available in the photograph. A patient’s nose may be a normal width, according to a landmark based specification, and yet it may be clear to any observer that the patient’s nose is actually swollen. Thus, landmark-based methods may not capture a significant symptom which is obvious to an observer. It became clear that a system which could model the true three-dimensional shape of each subject’s head could alleviate many of these problems. The 3D morphable models of Blanz and Vetter [2] seemed to be an ideal tool to tackle such a problem. In previous work, Blanz and Vetter developed a linear statistical model of the 3D geometry and texture (or surface color) of human heads from a set of 3D Cyberware (TM) laser scans of 200 individuals (100 male and 100 female). In the morphable face model, facial surface data that were recorded with the laser scanner as a triangular mesh are represented in shape vectors that combine x, y, and z coordinates of all vertices: v = (x1 , y1 , z1 , ..., xn , yn , zn )T ∈ R3n . Sampled at a spacing of less than one millimeter, each surface is represented by n = 75972 vertices. Linear combinations of shape vectors will only produce
500
E. Learned-Miller et al.
realistic novel faces if corresponding points, such as the tip of the nose, are represented by the same vector components across all individual shape vectors. This is achieved by establishing dense correspondence between different scans, and forming vectors vi in a consistent way. Along with shape, the morphable face model also represents texture, but texture is discarded in our work here. There are two key features of the morphable model work that made it an ideal tool for our application. First, the 3D head model represents shape variations in terms of the common modes (principal components) of shape deformation of healthy human heads. Each parameter of the geometry-part of the model describes a common mode of variation in (densely) registered human heads. While parameters such as jaw size are not explicitly coded into the model, the statistical model must be able to represent such variability as linear combinations of its parameters in order to achieve good approximations of the subjects in the original Cyberware scan database. That is, as long as there were some significant relative differences in jaw size among the initial subjects, the model would need to encode this variability in some form to make good fits. We hypothesized that the analysis of these sorts of natural parameters of face variation would allow us to sort subjects into groups of acromegalics and healthy patients. The second appealing feature of the morphable model work is that its developers showed how an analysis-by-synthesis method could produce the approximate 3D shape and texture of a person’s head from a single photograph. After a manual initialization process (described below), a fully automatic procedure adapts the parameters of the morphable model until a rendered image of the model matches a given photograph as closely as possible under a soft constraint that makes the parameters of the resulting 3D head as likely as possible under the statistical morphable model. That is, the analysis-by-synthesis loops strives to minimize an error of the form E(α) = ||Iinput (x, y) − Imodel (x, y; α)||2 + f (α), x,y
where α is a vector of parameters estimated for the particular head under consideration and f (α) is the negative log likelihood of the parameters under the original linear statistical model of heads. We used the software of Blanz and Vetter to develop 3D models from our database photographs. To intialize the model fitting process, the user clicks on a number (seven to 12) of feature points in the image and the corresponding features on the 3D model using an interactive tool. These point correspondences are enforced in the first iterations of the algorithm. Their weight is gradually reduced to zero during the optimization. The pose, illumination, shape and texture coefficients are all set to the default values at the beginning of the fitting process. Additional details of the initialization and fitting process are described elsewhere [3]. In addition to global estimates of head shape, a secondary procedure was used in which smaller parts of the face were extracted and estimated separately. This procedure gives greater detail and accuracy for the followings facial regions, or groups of regions: the nose; the eyes, eyebrows, and brow; the mouth; and
Detecting Acromegaly: Screening for Disease with a Morphable Model
501
all other features, such as the cheeks, chin, forehead, ears, and neck. Thus, in addition to a single global geometric shape estimate for each head, there were four additional “parts” estimates, yielding a total of five individual geometric 3D shape estimates per face. In our experiments, these additional shapes improved the performance of the classifier significantly.
4
Experiments
After fitting the morphable model to each face in our database, and to each of the subregions described above, we retrieved 199 “geometry” parameters for each of the five pieces of the head estimate, for a total of 995 parameters per photograph. Parameters measuring texture were discarded as it was decided in advance that the small benefit they might add would be outweighed by the general increase in variance of the results. We used the first 99 parameters from each part of the geometric head shape estimate, for a total of 495 parameters per head in our final experiments. These parameters represent the components of shape with greatest variability in the original database of Blanz and Vetter. To see how this is used in acromegaly classification, consider a feature such as jaw size. Normal variation in jaw size will be expressed by the variability of some parameter of the original “normal” morphable model. If an acromegalic has an enlarged jaw, then fitting the morphable model will give a large coefficient for this particular parameter. By examining these coefficients over the full set of acromegalics we can detect trends in these unusually large coefficients and select them as features for classification, rather than as simply statistical outliers. With only 24 examples from the acromegalic class and 25 examples from the normal class, we needed to mitigate overfitting as much as possible. As a result we decided on a leave-one-out classification paradigm, in which we train on all examples except one, and then use the trained classifier to classify the remaining single sample. It is important to note that we are not using leave-oneout cross-validation [4], across multiple testing points, to train our classifier. In other words, the classification results on one data point have no influence on the training of the classifier for another data point. To do this would be “fitting to the test data”, and would invalidate our results. While the variance of our results may be high due to the small amount of traning and test data, we have no reason to believe that we are overfitting or that our results are biased. When working with such a small data set, it is important not only to control the capacity of the classifier being used, but also to limit the number of classifiers tried on the test data. Since we could not afford (given the limited training set size) to set aside a validation set with which to experiment, we chose almost all of the classifier parameters up front. We decided in advance to use 99 parameters from each part of the geometric shape estimate, and to use linear or quadratic SVMs. While we show results (in Figure 4) using varying numbers of model components to suggest a general trend, we chose, before seeing the training data, to select the one with 99 parameters as our final accuracy estimate. We
502
E. Learned-Miller et al.
Fig. 4. Leave-one-out classification accuracy of a linear kernel SVM versus the number of morphable model features used. The blue curve shows performance using only the global estimate of the head, while the green curve shows the performance using components from each of the five geometric shape estimates (one global and four parts based estimates).
did allow ourselves the luxury of experimenting with quadratic and linear kernels, and found that linear kernels performed better. We used the publicly available SVM package, SVM Light [8]. Using a leaveone-out paradigm, we trained an SVM using all but one of the training samples, and tested on the remaining sample. The accuracy with all 495 parameters from the five geometric parts estimates was 85.7%. Interestingly, none of the normals were classified as acromegalics, while seven of the acromegalics were classified as normals. The sensitivity of the test was hence 17/24 ≈ 71%. Five of the seven misclassified patients had very subtle signs of the disease that might easily be missed by an expert. Two misclassified examples showed obvious signs of the disease. The morphable models for one of these was a poor fit to the patient’s photograph and may explain one of the errors. There are a number of efforts in the medical community to statistically analyze face shape (e.g. [6,1]). These methods require a specialized apparatus to acquire three-dimensional face information. Stereo methods [1] and full 3D scans [6] are common methods for such acquisitions. A key feature of our work is that while the original statistical morphable model required 3D laser scans, it can be applied to other databases, like ours, consisting only of regular two-dimensional photographs. Such systems could in theory be widely deployed, analysing standard photographs to screen for a variety of conditions. To make such a system practical, the remaining manual initialization of the morphable model to a photograph will probably need to be automated. While there is an enormous literature on face recognition, it is less common to use face databases to categorize images into categories. An example is the classification of faces as male or female [9]. We are not aware of any previous work in trying to identify acromegaly. The visual nature of many of the symptoms and the benefits from early diagnosis [5] make this disease an ideal candidate for this type of analysis.
Detecting Acromegaly: Screening for Disease with a Morphable Model
503
References 1. Nicola D’Appuzo, “Measurement and modeling of human faces from multiimages,” International Archives of Photogrammetry and Remote Sensing, Volume 34, Number 5, Commission V, WG, V/6. Pages 241–246, 2003. 2. Volker Blanz and Thomas Vetter, “A morphable model for the synthesis of 3D faces.” SIGGRAPH’99 Conference Proceedings. 1999. 3. Volker Blanz and Thomas Vetter, “Face recognition based on fitting a 3D morphable model.” Trans. on Pattern Analysis and Machine Intelligence,” Vol. 25, No. 9. pp. 1063–1074. 2003. 4. Richard Duda, Peter Hart, and David Stork. “Pattern Classification.” Second edition. John Wiley and Sons, Inc., New York, 2001. 5. Pamela U. Freda, “Advances in the diagnosis of acromegaly.” The Endocrinologist, Volume 10, pages 237-244, 2000. 6. Peter Hammond, Tim J. Hutton, Michael A. Patton, and Judith E. Allanson. “Delineation and visualization of congenital abnormality using 3D facial images.” Intelligent Data Analysis in Medicin and Pharmacology at MEDINFO 2001, 2001. 7. Jennifer Huang, Volker Blanz, and Bernd Heisele. “Face recognition using component-based SVM classification and morphable models.” Proceedings of the 4th International Conference on Audio- and Video-Based Biometric Person Authentication. surrey, UK, 2003. 8. Thorsten Joachims. “Learning to classify text using support vector machines: Methods, Theory, and Algorithms.” Dissertation. Kluwer, 2002. 9. B. Moghaddam and M.-H. Yang. “Learning gender with support faces.” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 24, 5, pp. 707–711, 2002. 10. Pieters Nieuwlaat, “A medical mystery – Which twin is the patient?” New England Journal of Medicine, Vol. 351, page 68, 2004.
A Boosting Cascade for Automated Detection of Prostate Cancer from Digitized Histology Scott Doyle1 , Anant Madabhushi1 , Michael Feldman2 , and John Tomaszeweski2 1
2
Dept. of Biomedical Engineering, Rutgers Univ., Piscataway, NJ 08854, USA Dept. of Surgical Pathology, Univ. of Pennsylvania, Philadelphia, PA 19104, USA Abstract. Current diagnosis of prostatic adenocarcinoma is done by manual analysis of biopsy tissue samples for tumor presence. However, the recent advent of whole slide digital scanners has made histopathological tissue specimens amenable to computer-aided diagnosis (CAD). In this paper, we present a CAD system to assist pathologists by automatically detecting prostate cancer from digitized images of prostate histological specimens. Automated diagnosis on very large high resolution images is done via a multi-resolution scheme similar to the manner in which a pathologist isolates regions of interest on a glass slide. Nearly 600 image texture features are extracted and used to perform pixel-wise Bayesian classification at each image scale to obtain corresponding likelihood scenes. Starting at the lowest scale, we apply the AdaBoost algorithm to combine the most discriminating features, and we analyze only pixels with a high combined probability of malignancy at subsequent higher scales. The system was evaluated on 22 studies by comparing the CAD result to a pathologist’s manual segmentation of cancer (which served as ground truth) and found to have an overall accuracy of 88%. Our results show that (1) CAD detection sensitivity remains consistently high across image scales while CAD specificity increases with higher scales, (2) the method is robust to choice of training samples, and (3) the multi-scale cascaded approach results in significant savings in computational time.
1
Introduction
There will be an estimated 234,000 new cases of prostate cancer in the US in 2006, and approximately 27,000 men will die on account of it (Source: American Cancer Society). Trans-rectal ultrasound (TRUS) guided biopsy of the prostate followed by histological analysis under a microscope is currently the gold standard for prostate cancer diagnosis [1]. Up to twenty biopsy samples may be taken from a single TRUS procedure, making manual inspection time-consuming and labor-intensive. Computer-aided diagnosis (CAD), the use of computers to assist clinical diagnosis, has been traditionally applied to radiological images. Madabhushi, et al. [2] presented a powerful CAD system to automatically detect prostatic adenocarcinoma from high-resolution prostate MRI studies. The recent advent of high resolution whole slide digital scanners, however, has made histopathology amenable to CAD as well. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 504–511, 2006. c Springer-Verlag Berlin Heidelberg 2006
A Boosting Cascade for Automated Detection of Prostate Cancer
505
In the context of prostate histology, CAD methods have been proposed which utilize image features such as color, texture, and wavelets [3], textural secondorder statistical [4], and morphometric attributes [5] to characterize and detect cancer. However, in these studies the image analysis operations are applied at arbitrarily chosen image scales. This is contrary to the multi-scale approach employed by pathologists who obtain most of the information needed for a definitive diagnosis at the coarser image scales with the finer or higher scales usually serving to confirm their diagnoses. An effective CAD system to automatically detect prostatic adenocarcinoma should therefore incorporate the spirit of this hierarchical, multi-scale paradigm. In [6] Viola and Jones proposed a computationally efficient “Boosting Cascade” in which the AdaBoost classification algorithm [7] was used to quickly classify image regions using a small number of image features. The process is repeated using an increasingly larger number of image features and an increasing classification threshold at each iteration.
Fig. 1. A multi-scale representation of digitized human prostate histopathology
In this work, we propose a fully automated CAD system to extract and then combine multiple texture features within a Boosting Cascade framework [6] to detect prostatic adenocarcinoma from digitized histology. Pyramidal decomposition [8] is first applied to reduce the image into its constituent scales (Figure 1). At each image scale, we extract nearly 600 texture features at every image pixel. A likelihood scene corresponding to each texture feature is generated, in which the intensity at every pixel represents its probability of malignancy. A Boosting Cascade scheme is used to efficiently and accurately combine the different likelihood scenes at each image scale. Only pixels identified as adenocarcinoma with a pre-determined confidence level at each specific scale are analyzed further at the subsequent higher image scales. The novelty of our work lies in the following: • The method is fully automated and involves extraction of nearly 600 texture features at multiple scales and orientations to discriminate between benign and malignant tissue regions. • The use of a multi-scale classification framework to accurately and efficiently analyze very large digitized specimens (> 2 GB). Hence, only those pixels determined as adenocarcinoma with high probability at a given scale are considered for further analysis at the subsequent higher scales. The rest of this paper is organized as follows. In Section 2 we describe the methodology and in Section 3 we present our results. Concluding remarks and future directions are presented in Section 4.
506
2 2.1
S. Doyle et al.
Methodology Data Description and System Overview
Human prostate tissue samples cut into 6 μm slices are scanned into the computer at 40× optical magnification. Typical image file sizes were between 1-2 GB. We represent each digitized image by a pair C = (C, f ), where C is a 2D grid of image pixels c and f is the intensity at each pixel c ∈ C. The set of image scales for C is denoted as S(C) = {C 1 , C 2 , · · · , C n }, where n is the total number of image scales and C j = (C j , f j ) for j ∈ {1, 2, · · · , n} is the representation of C at scale j, for 1 ≤ j ≤ n. Hence, C 1 represents the image at the coarsest scale and C n at the finest scale. Our methodology is outlined in the Fig. 2. Outline of our methodology flowchart in Figure 2. Digital scenes are acquired from a whole slide digital scanner and are decomposed into n constituent scales using Burt’s pyramidal scheme [8]. At each scale, feature extraction is performed to create a series of likelihood scenes using Bayes Theorem [9]. An expert pathologist manually segmented cancer regions from S(C) for each of 22 images. During the training stage (off-line) probability density functions (pdf’s) for cancer for each of the texture features are generated using the cancer masks determined by the expert. Following feature extraction, Bayesian classification via the feature pdf’s is used to generate cancer likelihood scenes for each feature. At each scale j the various likelihood scenes are combined via the AdaBoost algorithm [7]. Only regions determined as cancer at scale j with a pre-specified confidence level are considered for analysis at scale j + 1. 2.2
Feature Extraction
Each image C is first converted from the RGB color space to the HSI space. We obtain a set of K feature scenes Fγj = (C j , gγj ), for γ ∈ {1, 2, · · · , K}, from each C j ∈ S(C) where for any cj ∈ C j , gγj (cj ) is the value of feature Φγ at scale j and at pixel c. The choice of features was motivated by the textural appearance of prostatic adenocarcinoma at the 3 scales (C 1 , C 2 , C 3 ) considered for analysis. A total of 594 texture features from the following three classes of texture operators were extracted. First-Order Statistics: A total of 117 first-order statistical features from each image corresponding to average, median, standard deviation, difference, derivatives along the X, Y , and Z axes, 3 Kirsch filter features, and 3 Sobel
A Boosting Cascade for Automated Detection of Prostate Cancer
507
filter features were extracted at three different pixel neighborhood sizes (3 × 3, 5 × 5, 15 × 15). Co-occurrence Features: A total of 117 Haralick features [10] corresponding to angular second moment, contrast, correlation, variance, inverse difference moment, entropy, sum average, sum variance, sum entropy, difference variance, difference entropy, and two measurements of correlation for three different pixel neighborhoods (3 × 3, 5 × 5, 7 × 7) were extracted. Wavelet Features: The phase and orientation values of the result of applying a family of 360 Gabor filters were obtained at every image pixel [2]. A Gabor wavelet is a Gaussian function modulated by a sinusoid [11]. The modulating function G for the family of 2D Gabor filters is given as:
G(x, y, θ, κ) = e
− 12 (( σxx )2 +( σyy )2 )
cos(2πκx ),
(1)
where x = x cos(θ) + y sin(θ), y = y cos(θ) + x sin(θ), κ is the filter scale factor, θ is the filter phase, σx and σy are the standard deviations along the X, Y axes, and x and y are the 2D Cartesian coordinates of each image pixel. We convolved the Gabor kernel with the image at 3 pixel neighborhood sizes (3 × 3, 5 × 5, 15 × 15) using five different scale parameter values κ ∈ {0, 1, · · · , 4} and eight orientation parameter values (θ = ·π 8 where ∈ {0, 1, · · · , 7}). In Figures 3 ((b)-(f)) are shown some representative feature images for the digitized histopathological image in Figure 3 (a).
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 3. (a) Original digitized prostate histopathological image with the manual segmentation of cancer overlaid (black contour), and 5 feature scenes generated from (a) and corresponding to (b) correlation (7 × 7), (c) sum variance (3 × 3), (d) Gabor filter (θ = 5·π , κ = 2, 3 × 3), (e) difference (3 × 3), and (f) standard deviation (15 × 15) 8
508
2.3
S. Doyle et al.
Training and Determining Cancer Ground Truth
Ground truth for the cancer class was generated by an expert pathologist who manually traced cancer regions on the digitized images at each image scale. The set of pixels marked by the pathologist as ground truth are denoted E(C j ) at scale j. The feature values gγj of pixels cj ∈ E(C j ) are used to generate pdf’s pjγ (cj , gγj |ωT ) at each scale for the cancer class (ωT ), for each texture feature Φγ . In this study, we use 3 images to generate the pdf’s. Figure 4 shows pdf’s for 3 different texture features for the cancer and non-cancer classes at the lowest image scale (j = 1). 0.035
0.05
0.05
0.045
0.045
0.03
0.025
0.04
0.04
0.035
0.035
0.03
0.03
0.025
0.025
0.02
0.015
0.01
0.02
0.02
0.015
0.015
0.01
0.01
0.005 0.005 0
0
50
100
150
200
250
300
0
0.005
0
50
(a)
100
150
(b)
200
250
300
0
0
50
100
150
200
250
300
(c)
Fig. 4. Pdf’s for cancer (red dots) and non-cancer regions (black circles) corresponding to (a) Gabor filter (θ = 6∗π , κ = 4, 15 × 15), (b) difference entropy (3 × 3), and (c) 8 correlation (3 × 3) at the lowest image scale j = 1
2.4
Feature Classification
For each scene C = (C, f ), Bayes Theorem [9] is employed to obtain a series of likelihood scenes Lγ = (C, lγ ), for γ ∈ {1, 2, · · · , K}, where for each pixel c ∈ C, lγ (c) is the posterior conditional likelihood P (ωT |c, gγ ) that c belongs to cancer class ωT given feature value gγ (c). Using Bayes Theorem [9] the posterior conditional probability that c belongs to ωT is given as P (ωT |c, gγ ) =
P (ωT )pγ (c, gγ |ωT ) v∈{T,N T } P (ωv )pγ (c, gγ |ωv )
(2)
where ωN T denotes the non-cancer class, pγ (c, gγ |ωT ) is the a-priori conditional density obtained during training via the pdf for feature Φγ , and P (ωT ) and P (ωN T ) are the prior probabilities of occurrence for the two classes (cancer and non-cancer), assumed as non-informative priors (P (ωT ) = P (ωN T ) = 0.5). 2.5
Feature Combination and the Boosting Cascade
We employ a hierarchical version of the well-known classification ensemble scheme AdaBoost [7] to create a single, strong classifier from 594 likelihood scenes or base learners. The method comprises two steps: Training and Testing. I j Training. We generate a Boosted classifier Π j = i=1 αji lij at each image scale j, where for every pixel cj ∈ C j , Π j (cj ) is the combined likelihood that pixel cj belongs to class ωT , αji is the feature weight determined during training for base learner Li , and I j is the number of iterations used to train the AdaBoost
A Boosting Cascade for Automated Detection of Prostate Cancer
509
algorithm. We used I j < I j+1 since additional discriminatory information is incorporated into the classifier only at higher scales. Three images randomly chosen from our database were used for training the Boosted classifier. Testing. At scale j we create a combined likelihood scene Lj = (C j , Π j ). Using Lj , a binary scene C j,B = (C j , f j,B ) is created where for cj ∈ C j , f j,B (cj ) = 1 iff Π j (cj ) > δ j , where δ j is a predetermined threshold. We then resize C j,B to obtain C j+1,B = (C j+1 , f j+1,B ). The feature extraction and Bayesian classification steps are then repeated to obtain Boosted classifier Lj+1 , considering only those pixels cj in C j,B for which f j,B (cj ) > 0. The Boosting Cascade algorithm is shown below. Algorithm. BoostingCascade() Input: Image pyramid S(C), ground truth for cancer E(C), number of pyramidal levels n, set of predetermined thresholds δ j Output: Set L of binary cancer segmentations at each scale begin 0. for j = 1 to n do 1. Obtain combined likelihood scene Lj for C j via AdaBoost [7]; 2. Obtain tumor mask C j,B by thresholding Lj at δ j ; 3. Obtain C j+1,B by interpolating C j,B so that C j+1,B = C j+1 ; 4. for each cj+1 in C j+1,B do 5. if f j+1,B (cj+1 ) < 1 then f j+1 (cj+1 ) = 0; 6. endfor 7. L[j] = {Lj }; 8. endfor 9. Output L; end
3
Results and Discussion
Figure 6 (a) shows the ROC curves for our CAD system obtained by evaluating all 22 images in our database at 3 different scales. The increase in ROC area at higher scales corresponds to an increase in specificity, further reiterating that information at higher scales is necessary to achieve a more accurate classification. Figure 6 (b) shows the ROC curves for a subset of testing images that were trained using 3 training sets comprising 3, 5, and 8 images respectively. As can be observed, the curves have a similar area, indicating that CAD is robust with respect to training. In Figure 6 (c) is a bar chart showing the comparative computational savings by using the Boosting Cascade scheme at each image scale. As might be expected, the savings are greater at the higher scales; an 8-fold savings at j = 3. The system was evaluated on a total of 22 separate patient studies. CAD tolerance was evaluated in terms of (i) accuracy, (ii) precision (robustness to training), and (iii) computational complexity. Accuracy was evaluated via the receiver operating characteristic (ROC) curve [2]. Figure 5 shows qualitative results for 3 images in our database. Figures 5 (a), (f), and (k) show the original
510
S. Doyle et al.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
Fig. 5. (a), (f), (k) Digital histopathological prostate studies. (b), (g), (l) Tumor masks corresponding to the studies shown in (a), (f), and (k). Corresponding combined likelihood scenes at scale j=1 ((c), (h), (m)), j=2 ((d), (i), (n)), and j=3 ((e), (j), (o)). Note the increase in detection specificity at higher scales. 1
0.9 0.9
0.8 0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3 0.2
0.2
0.1
0.1
0 0
0.05
0.1
0.15
0.2
0.25
(a)
0.3
0.35
0.4
0.45
0.5
0
0
0.02
0.04
(b)
0.06
0.08
0.1
(c)
Fig. 6. (a) Average ROC curve for all 22 studies in our database at scales j = 1 (solid line), j = 2 (dotted line), and j = 3 (dot-dashed line). The increase in ROC area at high scales demonstrates the increase in CAD detection specificity at high image resolutions. (b) ROC curves obtained for a subset of testing images trained using 3 (dot-dashed line), 5 (dotted line), and 8 (solid line) images. The similarity of the 3 ROC curves indicates that CAD is robust to training. (c) Computation times (in minutes) for CAD at each image scale with (gray bar) and without (black bar) the Boosting Cascade.
prostate images at scale j = 3. Figures 5 (b), (g), and (l) show the corresponding ground truth for cancer (black contour). Figures 5 (c)-(e), (h)-(j), and (m)-(o) show the combined likelihood scenes for images shown in (a), (f), and (k) at scales j = 1 ((c), (h), (m)), j = 2 ((d), (i), (n)), and j = 3 ((e), (j), (o)). These images show that integration of additional discriminatory information at higher scales (higher resolution) increases the CAD detection specificity.
A Boosting Cascade for Automated Detection of Prostate Cancer
4
511
Conclusions and Future Work
In this work, we have presented a novel fully automated CAD system that integrates nearly 600 texture features extracted at multiple scales and orientations into a hierarchical multi-scale framework to automatically detect adenocarcinoma from prostate histology. To the best of our knowledge this work represents the first attempt to automatically analyze histopathology across multiple scales (similar to the approach employed by pathologists) as opposed to selecting an arbitrary image scale [3]-[5]. Further, the use of a multi-scale framework allows for efficient and accurate detection of prostatic adenocarcinoma. At the higher scales, our hierarchical classification scheme resulted in an 8-fold savings in computation time. Also, while CAD detection sensitivity was consistently high across image scales, detection specificity was found to increase at higher scales. While the CAD system was trained using only 3 images, the inclusion of additional training data did not significantly change CAD accuracy, indicating robustness to training. In future work, we intend to incorporate additional morphological and shape-based features at the finer scales and to quantitatively evaluate our CAD technique on a much larger database of prostate histopathological images.
References 1. Matlaga, B., Eskew, L., and McCullough, D.: Prostate Biopsy: Indications and Technique. The Journal of Urology, 169:1 (2003) 12–19 2. Madabhushi, A., et al.: Automated detection of prostatic adenocarcinoma from high resolution ex-vivo MRI. IEEE Trans. on Med. Imaging, 24:12 (2005) 1611– 1625 3. Wetzel, A.W., et al.: Evaluation of prostate tumor grades by content based image retrieval. Proc. of SPIE Annual Meeting 3584 (1999) 244–252 4. Esgiar, A.N., et al.: Microscopic image analysis for quantitative measurement and feature identification of normal and cancerous colonic mucosa. IEEE Trans. on Information Tech. in Biomedicine 2:3 (1998) 197–203 5. Tabesh, A., et al.: Automated prostate cancer diagnosis and Gleason grading of tissue microarrays. Proc. of the SPIE 5747 (2005) 58–70 6. Viola, P., and Jones, M.: Rapid object detection using a boosted cascade of simple features. IEEE Conf. Comp. Vision and Pattern Recog 1 (2001) 511–518 7. Freund, Y., and Schapire, R.: Experiments with a new boosting algorithm. Proc. of the Natural Conf. on Machine Learning (1996) 148–156 8. Adelson, E.H., and Burt, P.J.: Image data compression with the Laplacian pyramid. Proc. of Pattern Recog. and Inf. Proc. Conf. (1981) 218–223 9. Duda, R.O., and Hart, P.E.: Pattern Classification and Scene Analysis. Wiley (1973) 10. Haralick, R.M., Shanmugan, K., and Dinstein, I.: Textural features for image classification. IEEE Trans. on Systems, Man, and Cybernetics SMC-3 (1973) 610–621 11. Manjunath, B.S., and Ma, W.Y.: Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Machine Intell. 2 (1996) 837–842
Optimal Sensor Placement for Predictive Cardiac Motion Modeling Qian Wu, Adrian J. Chung, and Guang-Zhong Yang Department of Computing, Imperial College London {q.wu, a.chung, g.z.yang}@imperial.ac.uk http://vip.doc.ic.ac.uk
Abstract. Subject-specific physiological motion modeling combined with lowdimensional real-time sensing can provide effective prediction of acyclic tissue deformation particularly due to respiration. However, real-time sensing signals used for predictive motion modeling can be strongly coupled with each other but poorly correlated with respiratory induced cardiac deformation. This paper explores a systematic framework based on sequential feature selection for optimal sensor placement so as to achieve maximal model sensitivity and prediction accuracy in response to the entire range of tissue deformation. The proposed framework effectively resolves the problem encountered by traditional regression methods in that the latent variables from both the input and output of the regression model are used to establish their inner relationships. Detailed numerical analysis and in vivo results are provided, which demonstrate the potential clinical value of the technique.
1 Introduction The management of acyclic tissue deformation due to voluntary and involuntary motion of the patient plays an important role in high-resolution imaging, intensity modulated radiation therapy, and intra-operative guidance. Despite the steady advances in imaging techniques, in situ motion adaptation based on real-time 3D imaging remains a challenge, largely due to the prohibitive bandwidth and processing requirements. For this reason, current research is focused on using a priori subjectspecific physiological motion modeling combined with low-dimensional real-time sensing to provide effective deformation prediction. In cardiovascular MR, for example, the use of diaphragmatic navigators is now a popular choice for coronary imaging and vessel wall imaging [1]. The strength and flexibility of cardiovascular MR in providing in situ respiratory motion measurements has also permitted the development of a number of more advanced techniques for adapting k-space data acquisition with different levels of respiratory induced motion. Despite its success, however, the technique is limited to intra-modality motion management and is difficult to be generalized beyond the MR environment. Nevertheless, the method has demonstrated the value of combining in situ sensing (in this case the movement of the dome of the diaphragm) and a priori subject specific modeling for real-time 3D deformation correction. Thus far, extensive research has also been directed to the use R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 512 – 519, 2006. © Springer-Verlag Berlin Heidelberg 2006
Optimal Sensor Placement for Predictive Cardiac Motion Modeling
513
of complementary MR-compatible sensing methods to perform real-time measurements of surface distortions particularly due to respiration. The strength of these techniques is that they do not require the interleaved acquisition of navigators, and thus significantly simplify the pulse sequence design. Furthermore, it makes the technique easily transportable to other imaging and therapeutic modalities. Fig. 1 illustrates a typical setup of using MR compatible surface deformation sensing (in this case by using optical tracker with stereo IR reflectors) for predictive motion modeling. The a priori subject specific motion modeling involves the imaging of the anatomical structure in different deformation states combined with in situ realtime measured sensor outputs. Registration based on free-form deformation or biomechanical modeling can be used to recover the underlying spatio-temporal deformation of the anatomical structure, followed by the estimation of the causality between tissue deformation and real-time sensor signals. At the prediction stage, realtime sensor signals are used to directly predict tissue deformation, thus avoiding the need for further 2D/3D imaging during motion adapted image acquisition or therapy. Such a setup is particularly useful for motion adaptation and attenuation correction of PET imaging which can eliminate the need for a separate and often lengthy PET transmission scan, as well as providing noiseless attenuation correction factors that cannot be achieved by using the current PET approaches.
Cardiac motion Chest wall and/or abdomen Predictive cardiac modelling Passive infrared targets
Fig. 1. A systematic illustration of predictive cardiac motion modeling incorporating optimal sensor placement. Not all sensors are sensitive to motion prediction and collinearity is common. The aim of optimal sensor placement is to determine a minimal subset of sensors (shaded) that provides the maximum information content for motion prediction, thus concurrently improving the prediction accuracy and minimizing the sensor usage.
Existing research has shown that surface measured signals for predictive motion modeling can be strongly coupled with each other but poorly correlated with respiratory induced cardiac deformation [2]. This can create significant problems for recovering the inherent model that explains the causality between respiratory motion and surface deformations. Another challenge associated with the technique is in
514
Q. Wu, A.J. Chung, and G.-Z. Yang
optimal sensor placement such that given the number of sensor channels used, the placement of the sensors can provide the maximal information content and model sensitivity to the entire range of tissue deformation. Thus far, the issue of inter-sensor coupling and poor motion correlation has been addressed numerically by the effective use of latent vectors with partial least squares regression or kernel based non-linear regression techniques. Limited research, however, has been directed towards the effect of sensor placement on the accuracy of the motion prediction result. The purpose of this paper is to develop a systematic framework for optimal sensor placement for predictive cardiac motion modeling based on sequential feature selection. We demonstrate that with the proposed method, the number of sensors used can be effectively minimized whilst the prediction accuracy is improved.
2 Method 2.1 Optimal Sensor Placement The purpose of optimal sensor placement is to construct and select subsets of features (sensor traces) that can be used to build a good predictor. The nature of the problem is akin to feature selection in machine learning for reducing the complexity of an induction system by eliminating irrelevant and redundant features. For regression learning systems which directly predict continuous value, the task of feature selection can be more difficult because the evaluation criteria based on the properties of discrete classes are inappropriate. Given a prediction model, the most crucial task is to establish an appropriate feature evaluation criterion to compare each feature subset. For regression, classical criteria are prediction error measures such as RMSEP (root mean square error in prediction) [3] and the RReliefF algorithm as proposed by Marko [4] for estimating the quality of attributes in regression problems. Given candidate features {x l }, l = 1, 2,..., M , the feature subset space can be denoted as ΩX , which is composed of all possible feature combinations X j . If we denote the predictive function as PR(⋅) , the estimated cardiac deformation Yˆj is Yˆj = PR(X j ) when the input is X j . Then the predicted volume Iˆj at certain respiratory position can be obtained through wrapping the 3D volume by the predicted Yˆj . When the chosen feature subset is X j , the prediction error measure RMSEP is defined as
⎡1 RMSEPj = ⎢ ⎣N
⎤ (Iˆji − I i )2 ⎥ ∑ i =1 ⎦ N
1/ 2
(1)
where N is the number of pixels (or voxels) in the region of interest (ROI) for comparison, I is the actually obtained images and I i , Iˆji represent each pixel on the true images and estimated images, respectively. The optimal feature subset Xop can then be chosen as
Xop = Xmin RMSEPj ∈Ω j
X
and we denote the number of features in Xop as [Xop ] .
(2)
Optimal Sensor Placement for Predictive Cardiac Motion Modeling
515
Once the evaluation criterion of feature selection has been determined, the next step is to choose a search procedure to explore the space ΩX . Since evaluation criteria are not monotonous, comparison of feature subsets generally amounts to a combinatorial problem (there are 2M − 1 possible subsets for M features), which rapidly becomes computationally impractical even for moderate input sizes. Branch and Bound (BB) [5] feature selection algorithm can be used to reduce the exhaustive search, but it is only feasible with monotonous criteria. Furthermore, the complexity of BB is still prohibitive for problems with very large feature sets. For these reasons, the following sub-optimal search strategies can be used: • Sequential forward selection (SFS): it starts with an empty set of features and adds features to the already selected feature set one at a time according to the evaluation criteria [6]. • Sequential backward selection (SBS): it starts with the full set of features and eliminates features from the selected feature set one at a time [7]. • Sequential forward floating selection (SFFS): it includes new features by applying the basic SFS procedure starting from the current feature set, followed by a series of successive conditional exclusion of the worst feature in the newly updated set [8]. • Sequential backward floating selection (SBFS): it excludes new features by applying the basic SBS procedure starting from the current feature set, followed by a series of successive conditional inclusions of the most significant feature from the available features [8]. Floating searching (SFFS and SBFS) is designed to prevent the nesting of feature subsets and also flexibly change the fixed values of l and r in the Plus l – Take away r algorithm to approximate the optimal solution as much as possible. 2.2 Predictive Motion Modeling To recover cardiac deformation and establish its intrinsic correlation with real-time measurable surface signals, 3D image volumes depicting different stages of the cardiac deformation due to respiration are used. In this study, predictive motion modeling and correction is achieved by partial least squares regression (PLSR). Specifically, PLSR searches for a set of components (called latent vectors or score vectors) that perform a simultaneous decomposition of X and Y with the constraint that these components explain as much as possible of the covariance between X and Y . It is followed by a regression step where the decomposition of X is used to predict Y . The simultaneous decomposition of X and Y is
X = TPT + E Y = UQT + F
(3)
where T and U are ( N × k ) matrices of the extracted k score vectors, the ( p × k ) matrix P is the factor loading matrix, the ( q × k ) matrix Q is the coefficient loading matrix, and the ( N × p ) matrix E and the ( N × q ) matrix F are residuals. PLSR
516
Q. Wu, A.J. Chung, and G.-Z. Yang
tries to find a score vector t in the column space of X and a score vector u in the column space of Y with the constraints that || w ||= 1,|| q ||= 1 such that
t = Xw u = Yq
(4)
to give the maximal squared covariance for (u T t)2
max[cov(t, u)]2 = max||w||=||q||=1 [cov(Xw, Yq)]2 .
(5)
Rather than linking measurements X and Y directly, PLSR tries to establish the inner relationships between T and U . A linear model is assumed to relate the score vectors
U = TD + H
(6)
where D is a diagonal matrix that has the regression weights as the diagonal elements, and H denotes residuals similar to E and F above. This allows Y to be modeled by T and Q as
Y = TQT D + F∗ .
(7)
Numerically, the classical solution to the equations presented above is based on the nonlinear iterative partial least squares (NIPALS) algorithm [9]. 2.3 Validation
To extract subject specific respiratory induced cardiac deformation patterns, ten normal subjects were recruited and underwent MR imaging on a Siemens Sonata 1.5T MR scanner. After scout scans, a segmented 3D TrueFISP sequence was used to acquire short-axis image volumes in free breathing with navigator-echoes and oversampling. The 3D stack comprised of 14 short axis slices from the valve plane to the apex throughout the full respiratory range. Data acquisition was repeated 20 times for total acquisition duration of 560 cardiac cycles. The imaging parameters used include an RF flip angle of 65°, in plane matrix size of 256×102, pixel size of 1.56mm × 2.70mm × 8.78mm, and FOV of 400mm × 275mm × 123mm. All 3D datasets were generated at different respiratory positions by retrospective respiratory gating. In order to reconstruct a 3D data set for a given respiratory position, each segment contributing to the data set was chosen from the over-sampled data when the accompanying navigator echo yielded a diaphragm position nearest to the given respiratory level. For each subject, a total of six and seven volumes covering different respiratory positions from end-inspiration to end-expiration were created. Sixteen surface traces lying on the chest wall were placed in a 4×4 grid with 25mm spacing. The end-expiration volume was selected as the reference volume to which the other volumes were registered with free-form deformation to recover the underlying spatio-temporal deformation field of the cardiac structure. The deformation of each volume was characterized by the movement of control vertices of the B-spines. The intrinsic relationship between tissue deformation and real-time measurable signals associated with different phases of respiratory motion was then
Optimal Sensor Placement for Predictive Cardiac Motion Modeling
517
extracted through the motion prediction technique described above. Sequential selections based on SFS, SBS, SFFS, and SBFS were used to determine the optimal sensor placement and its associated prediction errors. The prediction errors of the optimal feature subset obtained from above four sequential search procedures were compared. To obtain an unbiased assessment of the performance of each feature subset for predicting cardiac motion, a leave-one-out cross-validation was performed on all image volumes and the RMSEP for each volume was normalized against the registration results. Finally, the averaged RMSEP was used as the quality measure.
3 Results By the use of the first two significant latent vectors for PLSR analysis, Fig. 2(a) illustrates the relative performance of SFS, SBS, SFFS and SBFS for determining the optimal sensor placement for one of the subjects studied, where a small value 1.08
Normalized Prediction Error
SFS
SBS
SFFS
SBFS
1.07 1.06 1.05 1.04 1.03
SBS SFS SFFS SBFS
1.02 1.01 1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16
Number of Selected Traces
(a) 1.08
Normalized Prediction Error
LVs=1
LVs=2
LVs=3
LVs=4
LVs=5
1.07 1.06 1.05 1.04 1.03 1.02 1.01 1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16
Number of Selected Traces
(b) Fig. 2. (a) Comparison of SFS, SBS, SFFS and SBFS searching procedures for optimal sensor placement, where the number of optimal sensors derived from each technique is indicated by an arrow. (b) The performance of SFFS algorithm when different number of latent vectors is chosen for the PLSR prediction model.
518
Q. Wu, A.J. Chung, and G.-Z. Yang
Table 1. Quantitative assessment of the feature selection algorithms on ten subjects with the number of optimal traces and corresponding normalized RMSEP
Case 1 2 3 4 5 6 7 8 9 10
16 traces
Feature selection algorithm SFS SBS SFFS SBFS [ X ] [ X ] [ X ] RMSEP RMSEP RMSEP [Xop ] RMSEP RMSEP op op op 1.0452 1.3574 1.0439 1.0223 1.0528 1.0368 1.0858 1.0823 1.0552 1.0569
1.0245 1.1668 1.0356 1.0191 1.0346 1.0228 1.0568 1.0659 1.0504 1.0328
3 3 4 6 5 9 2 4 8 3
1.0270 1.1571 1.0352 1.0201 1.0383 1.0227 1.0591 1.0661 1.0489 1.0328
6 5 7 8 7 10 4 6 7 3
1.0245 1.1571 1.0351 1.0191 1.0346 1.0227 1.0568 1.0659 1.0489 1.0328
3 5 6 6 5 10 2 4 7 3
1.0245 1.1571 1.0356 1.0200 1.0346 1.0227 1.0568 1.0661 1.0489 1.0328
3 5 5 7 5 10 2 6 7 3
represents a good motion prediction result. It is evident from Fig. 2(a) that the optimal number of sensors is 5, and SFS, SFFS, and SBFS seem to provide similar results. This figure also reveals an important fact that as the number of sensors increases, the quality of the prediction result in fact deteriorates. This performance drop can be attributed to a number of factors, but intrinsic errors and non-linearity of individual sensors across the entire respiratory range are the main causes of concern. Fig. 2(b) illustrates the model dependence of the proposed feature selection algorithms when different model parameters are chosen for prediction. It highlights the importance of optimal sensor placement for the practical deployment of predictive motion management techniques, which has so far only been handled in an ad hoc manner. In order to provide an overview of the predictive errors for the ten subjects studied, Table 1 lists the normalized RMSEP for each of the sequential selection algorithms used and the optimal number of sensors to be used. It has been shown that for all the subjects studied, the accuracy of predictive cardiac modeling has been greatly increased after optimal sensor selection. The average number of minimal sensor placement is about 5, which is significantly lower than the original 16 channels used. Compared with the traditional SFS and SBS sequential procedure, the floating searching strategies seem to provide better overall results.
4 Discussion and Conclusion In this paper, we have developed a systematic framework of predictive cardiac motion modeling with optimal sensor placement. Subject specific respiratory patterns are a significant problem for motion management in in vivo imaging. The proposed framework effectively resolves the problem encountered by traditional regression methods in that we used the latent variables from both the input and output of the regression model to establish their inner relationships. This study demonstrates that the use of optimal sensor placement not only minimizes the sensing channels but also improves the overall prediction accuracy. Although in MR the use of navigator echoes
Optimal Sensor Placement for Predictive Cardiac Motion Modeling
519
has already improved the quality of the images acquired, particularly for capturing small mobile structures such as the coronaries, its wide spread use is still hindered by its lack of general adaptivity to different imaging platforms. The results from this study suggest that being effectively a real-time technique, the proposed method is not only reasonably accurate, but also able to predict respiratory induced cardiac motion from purely surface measurable signals. The development of a general method for 3D motion prediction based on easily measured surface signals will have significant impact on other parallel imaging modalities and the development of new integrated imaging modalities such as PET/MR and PET/CT.
References 1. J. Keegan, P.D. Gatehouse, G.Z. Yang, D.N. Firmin., "Coronary Artery Motion with the Respiratory Cycle during Breath-holding and Free-breathing: Implications for Slicefollowed Coronary Artery Imaging," Magn Reson Med., Vol. 47, pp. 476-481, 2002. 2. N. Ablitt, J. Gao, J. Keegan, L. Stegger, D.N. Firmin, G.Z. Yang, "Predictive Cardiac Motion Modeling and Correction with Partial Least Squares Regression," IEEE Trans. Med. Imag., Vol. 23, pp. 1315-1324, Oct. 2004. 3. R. Leardi, A.L. González, "Genetic Algorithms Applied to Feature Selection in PLS Regression: How and When to Use Them," Chemometrics and Intellingent Laboratory Systems, Vol. 41, pp. 195-207, 1998. 4. Marko Robnik-Sikonja, Igor Kononenko, "An Adaptation of Relief for Attribute Estimation in Regression," In D.Fisher (ed.): Machine Learning, Proceedings of 14th International Conference on Machine Learning ICML'97, Nashville, TN, 1997. 5. P.M. Narendra, K. Fukunaga, "A Branch and Bound Algorithm for Feature Subset Selection," IEEE Trans. Comput., Vol. 26, pp. 917-922, 1977. 6. A.W. Whitney, "A Direct Method of Nonparametric Measurement Selection," IEEE Trans. Comput., Vol. 20, pp. 1100-1103, 1971. 7. T. Marill, D.M. Green, "On the Effectiveness of Receptors in Recognition System," IEEE Trans. Inform. Theory, Vol. 9, pp. 11-17, 1963. 8. P. Pudil, J. Novovicova, J. Kittler, "Floating Search Methods in Feature Selection," Pattern Recognition Letters, Vol. 15, pp. 1119-1125, Nov. 1994. 9. H.Wold. Soft Modeling by Latent Variables: the Nonlinear Iterative Partial Least Squares Approach. In J. Gani, editor, Perspectives in Probability and Statistics, pp. 520-540. Academic Press, London, 1975.
4D Shape Registration for Dynamic Electrophysiological Cardiac Mapping Kevin Wilson1,5 , Gerard Guiraudon4,5 , Doug Jones3,4 , and Terry M. Peters1,2,5 1
Biomedical Engineering Program Department of Medical Biophysics 3 Department of Medicine, The University of Western Ontario 4 Canadian Surgical Technology and Advanced Robotics (CSTAR) Imaging Research Labs, Robarts Research Institute, London ON, Canada N6A 5K8 {kwilson, tpeters}@imaging.robarts.ca 2
5
Abstract. Registration of 3D segmented cardiac images with tracked electrophysiological data has been previously investigated for use in cardiac mapping and navigation systems. However, dynamic cardiac 4D (3D + time) registration methods do not presently exist. This paper introduces two new 4D registration methods based on the popular iterative closest point (ICP) algorithm that may be applied to dynamic 3D shapes. The first method averages the transformations of the 3D ICP on each phase of the dynamic data, while the second finds the closest point pairs for the data in each phase and performs a least squares fit between all the pairs combined. Experimental results show these methods yield more accurate transformations compared to using a traditional 3D approach (4D errors: Translation 0.4mm, Rotation 0.45◦ vs. 3D errors: Translation 1.2mm, Rotation 1.3◦ ) while also increasing capture range and success rate.
1
Introduction
Cardiac mapping and navigation systems have been evolving over the past several years and numerous systems now exist that can relate cardiac electrophysiological (EP) data to a spatial coordinate system [1]. These systems create a point cloud of spatial locations of EP data samples from a cardiac chamber, by tracking an EP probe using various types of tracking systems. From this point cloud of data a 3D surface model can be formed to represent the chamber and build a map representing electrical activation on the heart surface. More recently, studies have begun to investigate how pre-operative CT/MR images can be segmented and then registered to the cardiac navigation environment [2]. These surfaces can then be laid over the cardiac map to place the EP data in the context of patientspecific anatomy, as in the Biosense-Webster CARTOMERGETM System. Several fundamental problems exist with these systems. First, the image environment is static, which leads the surgeon to estimate the current position of the tools in the operating room’s dynamic environment. In addition, the shape of the EP map has typically been based only on collected data locations. A more R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 520–527, 2006. c Springer-Verlag Berlin Heidelberg 2006
4D Shape Registration for Dynamic Electrophysiological Cardiac Mapping
521
sophisticated approach has been demonstrated by Holmes et al. who ‘paint’ the CT/MR volume rendering with a colour-wash representation of the acquired EP data [3]. Finally, vital information, such as the frequency spectra of the EP data, which is often important in the study of diseases such as atrial fibrillation, is missing from the cardiac map. To resolve these problems we have extended the static 3D cardiac mapping environment into one that takes full advantage of 4D (dynamic 3D) cardiac data. As described in our previous work [4], pre-operative 4D CT/MR scans of the patient are collected to create a dynamic heart model. Once in the operating room the EP data can be collected at a sampling rate sufficiently high to capture the inherent frequency components of the signal, as well as associating a spatial location with each data point. At this stage, we have a patient-specific dynamic heart model as well as a dynamic point cloud of the locations of EP data collected in the operating room or EP laboratory. In this paper, we introduce two novel 4D shape registration methods to replace the standard 3D iterative closest point (ICP) [5], with a more accurate and robust registration of 4D shape environments. An evaluation study is then presented to show how using either of the new methods results in a closer approximation of the transformation between the coordinate systems of the source and the target, as well as increasing the success rate of the algorithm with fewer data samples than are necessary for the standard static ICP approach.
2 2.1
Methods Iterative Closest Point Algorithm
Besl and McKay [5] introduced a method (iterative closest point (ICP)) for registration of 3D shapes, which became the standard approach for registering surface objects. The algorithm begins by defining the model shape (or target) as X and the data shape (or source) as P . The data shape is then decomposed into a point set. For each point in P the closest point in X is found: Y = C(P, X)
(1)
where C is the closest point operator and Y represents the resulting closest points. From the pairs in Y and P , a least squares fit is performed according to [6] in order to obtain a transformation T , consisting of a rotation R, and a translation t: T (R, t) = min d(Y, T (P ))2 (2) where d is the distance between point pairs, and T (P ) is the result of mapping the points in P with the transformation T . After this transformation is applied, the process of finding the closest point pairs, and calculating a new transformation is repeated until a convergence criterion is satisfied. While many variations of the standard ICP have been developed [7,8,9], none have yet been expanded into a dynamic environment.
522
2.2
K. Wilson et al.
Average Iterative Closest Point Algorithm
Consider a 4D anatomical model and a corresponding dynamic EP data set. Typically, these 4D models consist of a series of individual 3D surfaces, and for our cardiac model, each individual 3D shape (extracted from a dynamic CT or MR data set) represents a different phase of a single heartbeat (see Fig. 1). For the data model, we grouped the collected locations from the tracking system into N bins (the number of phases for which we have created models) based on the phase of the heart at the time each point was acquired. In an ideal situation, we would now have N 3D models and related EP data sets with identical transformations that must be registered to one another. The first method we refer to as the average iterative closest point algorithm (A-ICP) (see Fig. 2), which treats each phase (model and data) as an individual 3D ICP problem and finds an ideal transformation, T i , for each of them. Then to find the unified transformation, T ave , we take the mean of all T i’s in which the
Fig. 1. A dynamic CT or MR scan is acquired, and from each phase image a subjectspecific heart model is generated
Fig. 2. Flowchart representing average iterative closest point algorithm
4D Shape Registration for Dynamic Electrophysiological Cardiac Mapping
523
registration was successful (as defined by the success criterion discussed below). To calculate the average transformation we used the quaternion-based mean [10] as a rapid and accurate approximation. 2.3
Multibin Iterative Closest Point Algorithm
The second method, the multibin iterative closest point algorithm (M-ICP) (see Fig. 3), groups the 4D model and data into separate bins based on the phase of the heartbeat, as described for the A-ICP approach. Next, we find the closest point pairs between the target model and the data set within each phase (similar to the first stage of the standard 3D ICP). However, once these pairs are found, a least squares fit is performed over all phases combined to obtain a single transform, T . This transform is then applied to the data set from each cardiac phase, and the process is repeated until convergence is achieved.
Fig. 3. Flowchart representing multibin iterative closest point algorithm
3 3.1
Experiments Methodology
To test the accuracy, capture range, and success rate of our algorithms we ran a series of experiments and compared the results provided by the standard ICP, A-ICP, and M-ICP registration algorithms. For our target model we used a 4D heart model consisting of 10 cardiac phases extracted from a high quality 4D MR volume [11]. For our data model, we generated point clouds representing samples of EP activity on the cardiac surface as described below. Regular 3D ICP Registration. The 3D registration experiment was conducted using one of the volumes extracted from our 4D dynamic segmented heart model. From this model, random locations were sampled directly from the cardiac surface to create a data point cloud. Random Gaussian noise was added
524
K. Wilson et al.
to the coordinates of each data point independently. When generating the noise a standard deviation of 1.5mm and mean of 0 was used to represent inaccuracies in the location measurement due to the following errors that were measured in our laboratory using a realistic beating heart phantom, (The Chamberlain Group. Great Barrington, MA., www.thecgroup.com). – Tracking System Errors (± 1mm) – Segmentation Errors (± 0.5mm) – Soft Tissue Deformation (± 1mm) 4D Average and 4D Multibin ICP Registration. Both 4D registration experiments were conducted using all ten phases of our segmented heart model. From this model random locations were sampled directly from the initial phase of the cardiac surface to create a data point cloud. These same locations were then extracted from the dynamic model for phases 2-10. Random Gaussian noise (reflecting the errors cited above) was added independently to each data point as it was for the 3D ICP registration experiment. The data sets were then transformed with a known rotation and translation according to the experimental parameters (see Table 1). Each experiment was repeated 100 times, and all were conducted with different randomly sampled points with random noise added. Table 1. 48 separate experiments were each carried out 100 times. Each experiment varied the number of samples in the data set, and the initial rotation and translation the data set was transformed.
3.2
Results
The A-ICP and M-ICP registration algorithms were compared the regular ICP registration algorithm. As described above, noise was added to the data model prior to applying the transformation. To obtain the rotation error, the mean of the absolute difference in rotation values about each axis between the original transformation and the algorithm’s transformation were calculated along with the maximum error. To obtain the translation error, the mean distance error (for the 100 trials) between the original
4D Shape Registration for Dynamic Electrophysiological Cardiac Mapping
525
Fig. 4. Evaluation of rotation error for data sets with different number of sampled points and different misalignment. Black bar: ICP, Dark-gray bar: A-ICP, Light-gray bar: M-ICP.
Fig. 5. Evaluation of translation error for data sets with different number of sampled points and different misalignment. Black bar: ICP, Dark-gray bar: A-ICP, Light-gray bar: M-ICP.
transformation and the transformation provided by each algorithm was selected. The results of the experiments are shown in Fig. 4 and 5. From these results we see that the use of either 4D registration method has significantly reduced the translation and rotation error in the transformation with the M-ICP method having a slight advantage over the A-ICP method. Considering a typical cardiac environment where the origin of the coordinate system lies in the center of the heart, these methods can result in a target location error (TLE) at the cardiac surface 2-3mm less than the standard 3D ICP method. When guiding tools for cardiac ablation procedures based on preoperative images this improvement in accuracy is crucial in the success of the intervention. For the success criterion of the registration we set the threshold for the mean RMS distance between the heart model and the data model with added noise to be 1.8mm, to account for the inaccuracies in the localization of the measurements as mentioned in section 3.1. The results in Fig. 6 show that with a significantly
526
K. Wilson et al.
Fig. 6. Evaluation of success rate for data sets with different number of sampled points and different misalignment. Black bar: ICP, Dark-gray bar: A-ICP, Light-gray bar: M-ICP.
large number of data samples the success rate for all methods is 100%. However, when reducing the number of data samples, the success of the 3D ICP drops much more quickly than either of the 4D methods. Execution time for all the algorithms may vary, however all methods can be parallelized and have very good scalability. This point has practical significance, since the speed with which the registration can be achieved between the acquired EP points and the cardiac model in the operating room is directly related to the overall efficiency of the procedure. For these experiments, the regular ICP registration took approximately 0.2 seconds, while the 4D registrations took approximately 2 seconds, on a 3.0GHz Pentium 4 computer with 1 GB of RAM.
4
Conclusions
We have extended the standard 3D ICP method to permit the registration of 4D datasets. Our experience with two variants of this new approach demonstrated that we can improve both the registration accuracy and the capture range, using the 4D approach rather than employing a series of independent 3D registrations. The anticipated application of these techniques is in our dynamic cardiac mapping environment, where approximately aligned 4D cardiac models and EP data sets are available. We believe that this will permit a more comprehensive cardiac EP mapping and visualization environment than currently existing systems. The advantages of this approach include having a more accurate registration of the tool tracking and image guidance systems, the ability to capture inherent frequency information of the electrocardiogram (an important factor in studying cardiac arrhythmias such as atrial fibrillation), and increasing the intuitiveness of the display by having a dynamic environment where the cardiac map is directly displayed on the patient specific model. Currently, we are conducting phantom and porcine studies to assess the overall accuracy and robustness of our dynamic cardiac mapping environment.
4D Shape Registration for Dynamic Electrophysiological Cardiac Mapping
527
Acknowledgments. We wish to thank Marcin Wierzbicki for valuable discussions, and funding support from an Ontario Graduate Scholarship (K.W.), CIHR, Canadian Foundation for Innovation and the Ontario Innovation Trust.
References 1. Kumar, G.A., Maheshwari, A., Thakur, R., Lokhandwala, Y.Y.: Cardiac mapping: Utility or futility? Indian Pacing Electrophysiology Journal 2 (2002) 20–32 2. Sun, Y., Azar, F.S., Xu, C., Hayam, G., Preiss, A., Rahn, N., Sauer, F.: Registration of high-resolution 3D atrial images with electroanatomical cardiac mapping: Evaluation of registration methodology. In: SPIE Medical Imaging 2005: Visualization, Image-Guided Procedures, and Display. Volume 5744. (2005) 299–307 3. Holmes, D.R., Rettmann, M.E., Cameron, B.M., Camp, J.J., Robb, R.A.: Virtual cardioscopy: Interactive endocardial visualization to guide RF cardiac ablation. In: SPIE Medical Imaging 2006. Volume 6143. (2006) 4. Wierzbicki, M., Drangova, M., Guiraudon, G., Peters, T.: Validation of dynamic heart models obtained using non-linear registration for virtual reality training, planning, and guidance of minimally invasive cardiac surgeries. Medical Image Analysis 8 (2004) 387–401 5. Besl, P.J., McKay, N.D.: A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 14 (1992) 239–256 6. Horn, B.K.P.: Closed-form solution of absolute orientation using unit quaternions. Journal of the Optical Society of America 4 (1987) 629–642 7. Su, Y., Holmes, D.R., Rettmann, M.E., Robb, R.A.: A piecewise function-tostructure registration algorithm for image guided cardiac catheter ablation. In: SPIE Medical Imaging 2006. Volume 6141. (2006) 8. Estepar, R.S.J., Brun, A., Westin, C.F.: Robust generalized total least squares iterative closest point registration. In: MICCAI 2004, LNCS. Volume 3216. (2004) 234–241 9. Chetverikov, D., Svirko, D., Stepanov, D., Krsek, P.: The trimmed iterative closest point algorithm. In: International Conference on Pattern Recognition. Volume 3. (2002) 545–548 10. Gramkow, C.: On averaging rotations. Journal of Mathematical Imaging and Vision 15 (2001) 7–16 11. Moore, J., Drangova, M., Wierzbicki, M., Barron, J., Peters, T.: A high resolution dynamic heart model based on averaged MRI data. In: MICCAI 2003, LNCS. Volume 2878. (2003) 549–555
Estimation of Cardiac Electrical Propagation from Medical Image Sequence Heye Zhang1 , Chun Lok Wong1 , and Pengcheng Shi1,2 1
2
Medical Image Computing Group, Hong Kong University of Science & Technology, Hong Kong School of Biomedical Engineering, Southern Medical University, Guangzhou, China {eezhy, eewclken, eeship}@ust.hk
Abstract. A novel strategy is presented to recover cardiac electrical excitation pattern from tomographic medical image sequences. The geometrical/physical representation of the heart and the dense motion field of the myocardium are first derived from imaging data through segmentation and motion recovery. The myocardial active forces are then calculated through the law of force equilibrium from the motion field, realized with a stochastic multiframe algorithm. Since tissue active forces are physiologically driven by electrical excitations, we can readily relate the pattern of active forces to the pattern of electrical propagation in myocardium, where spatial regularization is enforced. Experiments are conducted on three-dimensional synthetic data and canine magnetic resonance image sequence with favorable results.
1
Introduction
Electrocardiograms (ECGs) have been the standard clinical tools for cardiac function assessment over the past 150 years. Recently, electrocardiographic inverse approaches are becoming of great interests to the research and clinical communities because its potential to describe the patient-specific electrical activity of each myocyte of the heart [1]. Based on mathematical simulations of the cardiac electrical propagation process, there are a number of efforts that aim to estimate the macroscopic cardiac electrical activities from multichannel ECG measures [1]. While such inverse approaches are invaluable, they are ill-posed because of the absence of unique mapping between the intra-cardiac electrical excitation and body surface potential distribution, and appropriate constraints must be adopted to ensure that solutions are available. Since the mechanical activities of the myocardium are mainly driven by the cardiac electrical excitation, patient-specific myocardial kinematic measures should, indirectly, reflect the electrical propagation process of the heart. Over the past twenty years, there are many efforts in the medical image computing community devoted to the noninvasive recovery of cardiac motion field from imaging sequences. In general, medical images can directly provide cardiac motion information at some salient landmark points, such as the tag displacements from MR tagging images and the tissue velocities from the MR phase images. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 528–535, 2006. c Springer-Verlag Berlin Heidelberg 2006
Estimation of Cardiac Electrical Propagation from Medical Image Sequence
529
The dense cardiac motion fields, including displacements, velocities and accelerations throughout the myocardium, can then be estimated from those coarse measurements with a priori constraints [2,3,4,5], for which biomechanical models have been the popular choices because of their physics meaningfulness. A recent effort is particularly related to our current work [6]. Based on probabilistic measure of the onset of regional myocardial activation, derived from 3D motion field obtained by tracking tagged MR image sequence with non-rigid registration, it combines different features, such as displacement, velocities and the magnitude of the strain, to locate the abnormal electrical conduction pathways. Our framework is based on the physical principle that, if dense motion field (i.e. estimated from medical image sequences) and the material characteristics (i.e. the mechanical model and its parameters) of the myocardium are available, then the internal force (forces caused by inertia, damping, stresses) at each myocardial point can be calculated. Further, the external force (active forces caused by electrical excitation) could be calculated from the force equilibrium equation inside myocardium. Even though pressures do exist on the epi- and endocardial surfaces, we do not need to count the pressures when we check the force balance inside myocardium since the motion field of the whole heart already implies the presence of pressures on the cardiac surface. However, cardiac motion field estimated from images always contains noises and numerical errors. If we directly relate the onsets of electrical excitation to the onsets of active forces calculated from the estimated motion field, the pattern of electrical propagation will be corrupted by these noises and errors. Hence, electrical propagation from a priori information should be considered to constrain the pattern calculated from estimated motion field. In our current work, Tikhonov regularization, which is a standard technique in electrocardiographic inverse approaches [1], is applied to perform a weighted sum between predicted pattern from simulation and the pattern calculated from motion-derived active forces.
2 2.1
Methodology Anisotropic Representation
Given the definition of cardiac geometry and the dense motion field, a set of unstructured, adaptively sample nodes are distributed between endo- and epicardium and an anisotropic representation of cardiac fiber structures (a fiber orientation is attached to each node), geometry (sample nodes in and inside the boundary of cardiac geometry) and motion field is established by the moving least square (MLS) approximation. Let u(x), u(x) ˙ and u ¨(x) be the displacement, velocity and acceleration of the myocardial tissue at point x. The approximated displacement, velocity and acceleration function uh (x), u˙ h (x) and u ¨h (x) are then N N N given: uh (x) = I=1 φ(x)uI , u˙ h (x) = I=1 φ(x)u˙ I and u ¨h (x) = I=1 φ(x)¨ uI where φ(x) is the shape function, given by MLS approximation, of node I, N is the total number of sample nodes, uI is the nodal displacement value, u˙ I is the nodal velocity value and u¨I is the nodal acceleration value. Since there are not mesh constraints between nodes, refinement of framework input, motion field,
530
H. Zhang, C.L. Wong, and P. Shi
can be easily achieved by changing the density of sample nodes when new motion information are available. 2.2
Law of Force Equilibrium
After the anisotropic representation is created active forces can be calculated by the principal of force equilibrium with dense motion field and material models. Anisotropic Composite Material Model. In our approach the heart is assumed to be elastic. For elastic materials, both isotropic and anisotropic, the stress-strain relationships obey the Hookes Law [7]: S = C
(1)
where S = [S11 , S22 , S33 , S12 , S13 , S23 ]T and = [11 , 22 , 33 , 12 , 13 , 23 ]T , with Sij the components of the second Piola-Kirchhoff (PKII) stress tensor, C the stiffness matrix and ij the components of the Green-Lagrangian strain tensor, which can be calculated from displacement [7]: ij = 12 (ui,j + uj,i + uk,i uk,j )
(2)
where ui,j = ∂ui /∂xj . We assume that the cardiac tissues of the myocardium are transversely isotropic, which has the same material property in the crosssection planes of the myocytes, but different material property in the direction normal to the cross-section planes. Let C0 be the stiffness matrix of a point in the myocardium with 0◦ (along the x-axis) fiber orientation: ⎡ 1 −υ −υ ⎤−1 0 Ef E c f E c f 0 0 ⎢ −υ 1 −υ 0 0 0 ⎥ ⎢ Ec f Ef Ec f ⎥ ⎢ −υ −υ 1 0 0 0 ⎥ ⎢ Ec f Ec f Ef ⎥ C0 = ⎢ (3) ⎥ 1 ⎢ 0 0 0 G 0 0 ⎥ ⎢ ⎥ 1 ⎣ 0 0 0 0 G 0 ⎦ 0 0 0 0 0 2(1+υ) Ecf where Ecf and Ef are the cross fiber and along fiber Young’s modulus respectively. With C0 defined, the stiffness matrix at any point with fiber orientation of θ degrees polar angle and φ degrees azimuthal angle can be obtained using the tensor transformation [8]: Ctran = T −1 C0 T where T = Tazi (φ)Tpol (θ) the coordinate transformation matrices: ⎡ 2 ⎤ ⎡ 2 2 ⎤ c 0 s2 0 2sc 0 c s 0 2sc 0 0 ⎢ 0 1 0 0 ⎢ s2 c2 0 −2sc 0 0 ⎥ 0 0⎥ ⎢ 2 ⎥ ⎢ ⎥ 2 ⎢ s 0 c 0 −2sc 0 ⎥ ⎢ 0 0 1 0 0 0⎥ ⎢ ⎥ ⎢ ⎥ Tazi = ⎢ ⎥ Tazi = ⎢ −sc sc 0 c2 − s2 0 0 ⎥ ⎢ 0 0 0 c 2 0 2 s⎥ ⎢ ⎥ ⎣ −sc 0 sc 0 c − s 0 ⎦ ⎣ 0 0 0 0 c s⎦ 0 0 0 −s 0 c 0 0 0 0 −s c
(4)
Estimation of Cardiac Electrical Propagation from Medical Image Sequence
531
in which c = cos(φ) in Tazi (φ) and c = cos(θ) in Tpol (θ), and s = sin(φ) in Tazi (φ) and s = sin(θ) in Tpol (θ). Except Green-Lagrangian strain tensor deformation gradient F also can be used to describe the deformation [7]: Fij =
∂xi ∂Xi
(5)
where Fij the components of deformation gradient, xi the coordinates in deformed configuration and Xi the coordinates in reference configuration. With PKII stress S and deformation gradient F nominal stress P can be defined [7]: P = S · FT
(6)
Force Equilibrium Equation. Considering a dynamic analysis under finite deformation without damping in an arbitrary domain Ω with boundary Γ in Fig. 1 , law of force equilibrium should be obeyed everywhere inside domain Ω in each time instant [7]: Pji + fiB = ρ0 ai ∂Xj
(7)
with natural (force) boundary conditions fiS on Sf and the essential (displacement) boundary conditions uSi on Su (Fig. 1). Sf and Su are the location that natural boundary conditions and essential boundary conditions are applied respectively, ρ0 is the density of mass in reference configuration and ai are the components of acceleration. Xj are the coordinates in reference configuration, Pji are the components of nominal stress and f B is density of body force. Because it is difficult to define damping force in a point inside myocardium and the movements of myocardium are mainly driven by active forces, the effect of damping forces could be neglected in equation (7). Though we assume nominal stresses in equation (7), it is not difficult to prove that equation (7) also exists by replacing P with cauchy stress τ under infinitesimal deformation condition [7]: τij,j + fiB = ρai
(8)
where τij are the components of cauchy stress, τij,j = ∂τij /∂xj and ρ is the density of mass. A simple one-dimensional uniform bar can be used to demonstrate the law of force equilibrium. Consider this one-dimensional bar is subjected to a distributed load f B (x) = x and a concentrated load R at it right end and its left end is fixed in Fig. 1. We assume the bar has a constant cross sectional are A and Young’s modulus E. The law of force equilibrium gives the governing equations of the bar: d2 u + fB = 0 dx2 du EA |x=L = R dx
EA u|x=0 = 0
(9) (10)
532
H. Zhang, C.L. Wong, and P. Shi
(a) Domain
(b) Bar Fig. 1.
It is easy to proved that the analytical solution of displacements in equation (9) is u(x) = (−x3 /6 + Rx + 12 aL2 x)/EA by d2 u(x)/d2 x + f B (x) = −x + x = 0. If we consider a small part in the middle of the bar, we only need to consider the internal forces caused by stresses and body forces f B (x) to check the force balance because natural boundary condition, the concentrated load R, has been implied by the displacement field of whole bar. Though there could be other components in f B inside myocardium except active forces, the majority of f B are the active forces caused by electrical excitations. In our work we assume that f B only consist of active forces. We have not considered damping forces because it is difficult to define a damping force in one particular point inside myocardium and the contributions of damping forces could be neglected during cardiac contraction. Let’s go back to equation (7), stresses can be calculated from displacements and material models, and inertial forces can be calculated with accelerations and densities so that active forces could be easily calculated from equation (7). The movements of heart are driven by many kinds of forces, such as pressures inside heart and outside heart, but as we demonstrated in the one-dimensional bar example, the loading R will not affected the recovery of body forces inside the bar since displacements of bar have implied the presence of loading R in the end of bar. We assume only active forces can exist as external forces inside myocardium so that the active forces can be recovered from the motion field by the law of force equilibrium and material models (stress-strain relations and densities), as we have explained above that natural and essential boundary conditions have been implicitly enforced into motion field. 2.3
Recovery of Electrical Propagation Patterns
Since a certain delay is always assumed between the onsets of active forces and the instants of depolarization of myofibers, we can relate the pattern of active forces to the pattern of electrical propagation. But the onsets of active forces are not so easy to detect we adopt the peaks of active forces as the signs of activation of myofibers and a constant delay still can be assumed between the peaks of active forces and the instants of depolarization of myofibers. Finally we can attain patient specific the pattern of electrical propagation from a dense motion fields by the law of force equilibrium. Nevertheless such approach is purely based on image data without any priori constraints. Law of force equilibrium should be
Estimation of Cardiac Electrical Propagation from Medical Image Sequence
533
obeyed all the time, but the connection between the onsets of active forces and onsets of electrical excitations could be interrupted by errors or noises spatially. In this paper Tikhonov regularization [1] is applied to this connection with a priori constraint, which is generated from a modified FitzHugh-Nagumo model [9]. The classical Tikhonov regularized solution is obtained through minimization of the following equation: JT = (M ea − H × Est)T (M ea − H × Est) + λ(Est − P re)T Q(Est − P re)
(11)
where M ea measurement vector, Est estimation vector and P re prediction vector. λ and Q are referred to the regularization parameter and regularization matrix, H is the mapping matrix between measurement and estimation, and (∗)T denotes the transpose operation. There are many ways to determine the λ and Q, but we can interpret Tikhonov regularization as a stochastic framework by λ represents the measurement noise variance and estimation is the realization of a Gauss vector with a mean value equal to prediction and covariance matrix equal to the inverse of Q if Gaussian white noises are assumed. The main advantage of Tikhonov regularization is that the solution of equation (11) can be expressed in closed form as: Est = (H T H + λQ)−1 (H T × M ea + λQ × P re)
(12)
Measurements are the instants of depolarization of myofibers calculated from motion field and predictions are the instants of depolarization of myofibers from simulation in equation 12. Since measurement and prediction are the same data, the onsets of electrical excitation in myofibers, H matrix is a identity matrix.
3
Results
We verify our algorithm by using experiments on a deformable cube with two different fiber directions and left face is fixed. The cube is excited by electrical excitation initiated from its left face and deforms according to the propagation of the electrical excitation under finite deformation condition, and nodal contractile active forces and deformations of 100 time steps in one cycle are captured as the ground truth. In the experiments the ground truth contractile stresses are added with noises at 10dB SNR and the nodal displacements on the right face of the cube are added with noises at 1dB SNR to provided the noisy measurements. From the results in the first row in fig. 2, we can see that estimated displacements could be recovered from noisy measurements quite well in our implementation. From the results in the second row in fig. 2, we can see the a simulation of propagation of electrical excitation in the cube, noisy pattern of propagation recovered from estimated motion field, which caused by numerical errors and noises, and smooth pattern of propagation with Tikhonov regularization. Experiments also have been done on cardiac MR images. The MR images contain a sequence of sixteen images of a canine heart in one cardiac cycle,
534
H. Zhang, C.L. Wong, and P. Shi
Fig. 2. Cube geometry with fiber orientations and displacement magnitude maps in the first row, and patterns of propagation in second row. Left to right in first row: cube geometry, ground truth, mutliframe estimation. Left to right in second row: simulation, pattern without regularization, pattern with regularization. Color map in first row for displacement magnitude map. Color map in the second row for instants of depolarization map.
Fig. 3. From left to right in the first row: cardiac geometry with fiber orientations, circumferential strain map calculated from estimated displacement of frame 8, color map for strain magnitude map, estimated displacement magnitude map of frame 8 and color map for displacement magnitude map. From left to right in the second row : simulation, pattern without regularization, pattern with regularization and color map in the second row for depolarization map.
providing three-dimensional data for analysis. The fiber orientations of the heart are obtained by mapping the data from the Auckland University Bioengineering Institute1 , using the iterative closest points algorithm [10]. The results show that our algorithm is clinically relevant (Fig. 3).
4
Conclusions and Future Work
The inverse approach from MR images to electrical propagation is very novel, but difficult due to complicated processes from electrical excitation to heart 1
http://www.bioeng.auckland.ac.nz
Estimation of Cardiac Electrical Propagation from Medical Image Sequence
535
contraction. Reasonable, but computation feasible models with physics meaning should be considered to help the recovery of electrical propagation. Currently we only apply spatial constraint from simulated pattern to pattern from estimated motion field. A temporary constraint or new measurements (such as ECG signals) should increase the accuracies of our approach and provide more information of electrical propagation.
Acknowledgment This work is supported in part by China National Basic Research Program (9732003CB716100), by National Natural Science Foundation of China (60403040) and by Hong Kong Research Grants Council (CERG-HKUST6252/04E).
References 1. MacLeod, R.S., Brooks, D.H.: Recent progress in inverse problems in electrocardiology. IEEE EMBS Magazine 17 (1998) 73–83 2. Prince, J.L., McVeigh, E.R.: Motion estimation form tagged MR image sequences. IEEE Trans. Med. Img. (1992) 238–249 3. Haber, E., Metaxas, D., Axel, L.: Motion analysis of the right ventricle from MRI images. In: MICCAI. (1998) 177–188 4. Chandrashekara, R., Mohiaddin, R., Rueckert, D.: Analysis of 3D myocardial motion in tagged MR images using nonrigid image registration. IEEE Trans. Med. Img. 23 (2004) 1245–1250 5. Wong, C.L., Shi, P.: Finite deformation guided nonlinear filtering for multiframe cardiac motion analysis. In: MICCAI. (2004) 895–902 6. Sanchez-Ortiz, G., Sermesant, M., Rhode, K., Chandrashekara, R., Razavi, R., Hill, D., Rueckert, D.: Localization of abnormal conduction pathways for tachyarrhythmia treatment using tagged MRI. In: MICCAI. (2005) 425–433 7. Belytschko, T., Liu, W., Moran, B.: Nonlinear finite elements for continua and structures. John Wiley Sons Ltd (2000) 8. Borisenko, A.I., Tarapov, I.E.: Vector and tensor analysis with applications. Dover Publications, Inc. (1979) 9. Rogers, J., McCulloch, A.: A collation-galerkin finite element model of cardiac action potential propagation. IEEE Trans. BioMed. Eng. 41 (1994) 743–756 10. Besl, P.J., McKay, N.D.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 14 (1992) 239–256
Ultrasound-Guided Percutaneous Scaphoid Pinning: Operator Variability and Comparison with Traditional Fluoroscopic Procedure Maarten Beek1 , Purang Abolmaesumi2,3 , Suriya Luenam4 , Richard W. Sellens1 , and David R. Pichora1,4 1
Department of Mechanical and Materials Engineering, Queen’s University, Canada 2 School of Computing, Queen’s University, Canada 3 Department of Electrical and Computer Engineering, Queen’s University, Canada 4 Division of Orthopaedic Surgery, Kingston General Hospital, Canada
[email protected]
Abstract. This paper reports on pilot laboratory experiments with a recently proposed surgical procedure for percutaneous screw insertion into fractured scaphoid bones using ultrasound guidance. The experiments were intended to determine the operator variability of the procedure and its performance in comparison with a traditional pinning procedure using fluoroscopy. In the proposed procedure, a three-dimensional surface model is created from pre-operative computed tomography images and intra-operatively registered to the patient using ultrasound images. A graphical interface that communicates with an optical camera tracking the surgical tools, guides the surgeon during the procedure in real time. The results of the experiments revealed non-significant differences between operators for the error in the entry location of the drill hole (p=0.90); however, significant differences for the exit location (p<0.05). Comparison with the traditional pinning procedure shows that the outcome of the recently proposed procedure appears to be more consistent.
1
Introduction
The scaphoid bone (Figure 1) is the most frequently fractured bone in the wrist, representing about 90% of all wrist fractures [1]. In general, the bone is fractured through its relatively narrow waist [2]. It is generally accepted that immobilization in a short-arm cast needs to last for ten to twelve weeks [1,3]. When the proximal pole is involved, this immobilization period is often prolonged. Due to the poor blood supply of the proximal part of the bone, complications like malor non-unions are common; Mink van der Molen and co-workers reported that 12.8% of all scaphoid fracture cases in their study, of which 98% were initially treated by cast immobilization, eventually underwent operative treatment [1]. The outcomes of internal fixation of the scaphoid show high patient satisfaction and good wrist mobility [2]. In case of complications, pinning of the scaphoid is often the only remaining treatment option. An open approach for scaphoid pinning facilitates an accurate placement of the screw, because the bone and R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 536–543, 2006. c Springer-Verlag Berlin Heidelberg 2006
Ultrasound-Guided Percutaneous Scaphoid Pinning
537
Fig. 1. Dorsal view of a plastic model of the bones of a human hand showing the anatomical position of the scaphoid bone (circle). Both inserts show the threedimensional surface model of a typical scaphoid. A: palmar view; B: dorsal view.
the surrounding soft tissues are directly visible to the surgeon. However, open surgery inevitably involves some degree of soft tissue damage and an increased risk for infections. The open approach also violates several key ligaments, which may result in scaphoid instability [4]. Performing the procedure percutaneously reduces many of the drawbacks associated with an open surgery. However, insertion of the screw is challenging, due to the small size of the bone and the lack of direct vision of the surgical tools with respect to the targeted anatomy. The traditional percutaneous pinning procedure generally requires many intra-operative fluoroscopic images to ensure an accurate screw placement. For operating room staff and patients this means additional exposure to harmful ionizing radiation. Another drawback is the two-dimensional nature of fluoroscopic images, which limits the three-dimensional positioning of surgical tools. Furthermore, the close proximity of other carpal bones in the wrist joint interferes with the acquisition of clear lateral fluoroscopic scaphoid images. Recently, studies on arthoscopically assisted internal fixation of the scaphoid have been published [5,6]. The outcomes of this approach are good and the procedure even enables the detection of soft tissue lesions. However, fluoroscopy was nonetheless applied to verify the position and orientation of the inserted K-wires in both studies. In an earlier publication, we proposed a new procedure for percutaneous scaphoid pinning which involves pre-surgery planning using computed tomography (CT) images and intra-operative guidance using three-dimensional ultrasound (US) [7]. A graphical interface displays the surface model with the surgical plan, and the surgical tools in real time to guide the surgeon during screw
538
M. Beek et al.
insertion. This paper reports on the results of laboratory experiments into the operator variability of the newly proposed procedure and its performance in comparison with the traditional percutaneous procedure using fluoroscopy.
2
Materials and Methods
Two human scaphoid phantoms (No. 1009, Sawbones Inc., Vashon WA) were attached to 5×10cm Plexiglas sheets after flattening their dorsal side. Three alumina beads (diameter: approximately 3mm) were mounted to each phantom bone to verify the accuracy of each procedure. The goal in the experiments was to drill a hole through each phantom bone according to a consistent surgical plan. This plan was chosen through the geometric center of the surface model in the direction of the model’s main axis, determined from a principal component analysis. To determine the operator variability of the proposed procedure, two different operators were asked to drill into eight different phantoms. Furthermore, one of the operators was also asked to drill into an additional eight phantoms in the traditional way using fluoroscopy to enable comparison of the proposed procedure with the traditional fluoroscopic procedure. Statistical analyses to determine significant differences between operators and performances were performed using independent two-tailed t-tests. Equality of variances was tested using the Levene’s test. 2.1
Generation of Surface Models
Each Plexiglas sheet with the scaphoid phantoms and beads was scanned using a LightSpeed Plus CT scanner (GE Medical Systems, Waukesha WI) at Kingston General Hospital. The in-plane resolution of the images was 0.19×0.19mm and the images were 1.25mm apart (interpolated to 0.625mm). Software developed at Queen’s University was used to segment the geometry of the bone phantoms from the CT images and to create three-dimensional surface models of the phantoms using a marching cubes algorithm. The locations in CT coordinates of the alumina beads were determined from the CT images and recorded. 2.2
Traditional Procedure Using Fluoroscopy
Each scaphoid phantom was covered by a piece of opaque foam to simulate the percutaneous character of the procedure. The amount of fluoroscopic images needed to achieve an accurate drill hole according to the surgical plan, was recorded for each phantom. After the experiments each Plexiglas sheet was mounted to a stainless steel jig equipped with an optical target serving as frame of reference. Using a Certus optical camera (Northern Digital Inc., Waterloo ON) and a calibrated stylus probe, the position of the alumina beads and of the entry and exit of each realized drill hole was measured after the experiment. Using the alumina beads, the surface model of each phantom was registered to the frame of reference, enabling the determination of the error between the planned and realized drill holes.
Ultrasound-Guided Percutaneous Scaphoid Pinning
539
Fig. 2. Semi-automatic segmentation of ultrasound images. A: typical segmentation result; B: close-up of A. The darker points in the bright band represent the segmentation.
2.3
Proposed Procedure Using Ultrasound
In this part of the study, the same Certus optical camera was used to track the optical targets in real time. Each Plexiglas sheet with the phantom bones was rigidly mounted to the stainless steel jig with the reference optical target. A 12MHz, one-dimensional array US transducer (GE Medical Systems, Waukesha WI) and a surgical drill were also equipped with optical targets. Due to the amount of time involved, the US transducer was pre-calibrated using an N-wire phantom and the error of this calibration method proved to be 0.64mm [8]. The US transducer was handled carefully throughout the experiments to avoid jeopardizing its calibration. For each experiment, the drill was re-calibrated: The tip position of the drill bit was determined by pivoting [9] and the direction of the drill bit was determined from this tip position and the center of the drill chuck which was measured with a calibrated stylus probe before inserting the drill bit [7]. The tip position accuracy proved to be better than 0.3mm after calibrating by means of pivoting [7]. US images were used to register the surface models to the frame of reference. After the jig was placed in a tub of room-temperature water, US images were captured through a frame-grabber, which was synchronized with the optical camera. The US images were semi-automatically segmented (Figure 2): A few (normally less than five) seed pixels are selected in each image by the operator using a mouse pointer. Subsequently, after linear interpolation of the seeds, the pixels are moved up and down within a small vertical band and the ones with the highest intensity are selected. These pixels are finally converted into a point cloud (using the US calibration data and the position and orientation data from the optical camera). The surface model was then registered to the points segmented from the US images using an iterative closest point (ICP) algorithm [10]. The mean of the surface registration error was 0.53mm and the mean of the fiducial registration error was 2.7mm [7]. Two operators were asked to drill one hole through scaphoid phantoms according to the surgical plan while
540
M. Beek et al.
Fig. 3. Graphical interface to guide the surgeon during percutaneous scaphoid pinning. The interface displays three orthogonal views of which the larger one can be aligned with the direction of the surgical plan. The surgical plan is represented by the cylinder and the surgical drill by the wireframe cone.
looking at a computer monitor. In the monitor, the graphical interface was displayed, showing the surface model and graphical representations of the surgical plan and surgical drill in real-time (Figure 3). The position of the entry and exit of each realised drill hole was measured using a calibrated stylus probe.
3
Results
The mean error of the entry location between the planned and realized drill hole was 1.89mm and 1.93mm for operator1 (n=13) and operator2 (n=8), respectively. Statistical analysis showed that this difference was not significant (p=0.90). However, for the mean error of the exit location these numbers were 3.04mm and 1.99mm, respectively. This difference appeared to be significant (p<0.05). For the traditional pinning procedure, on average, 7.6 fluoroscopic images were needed for each phantom (range: 5-11). For a comparison of the proposed with the traditional procedure, the reader is referred to Figure 4. The mean of the distance between the realized and planned entry of the drill hole was 2.63mm
Ultrasound-Guided Percutaneous Scaphoid Pinning
541
Fig. 4. Comparison of the mean of the errors (error bars represent one standard deviation) in the entry location (diagonal lines) and exit location (horizontal lines) of the realized drill hole for the proposed scaphoid pinning procedure using ultrasound and the traditional approach using fluoroscopy (n=8).
(standard deviation: 1.01mm, n=8). This value was 5.73mm for the exit of the drill hole (standard deviation: 3.95mm, n=8). For the newly proposed procedure, the same operator realized values of 1.89mm (standard deviation: 0.55, n=8) and 3.04mm (standard deviation: 1.05, n=8) for the means of the errors in the entry and exit of the realized drill hole, respectively. The differences did not appear to be significant. The Levene’s test for equality variance indicated a difference that approached significance for the error of the entry of the drill hole.
4
Conclusions
A new percutaneous scaphoid pinning procedure was proposed, that uses a threedimensional surface model of the bone from pre-operative CT images and intraoperative ultrasound guidance. Since the scaphoid bone articulates with five other bones, lots of its surface is covered with articulating cartilage. In order to prevent cartilage damage that might introduce osteoarthritis, a distance of maximally 2mm from the bone’s main axis is acceptable for screw insertion. The results of this study show that this tight accuracy requirement can be met. Advantages of the proposed procedure are a reduced risk of infections and minimal
542
M. Beek et al.
soft tissue damage due to its percutaneous nature. A recent study showed that the radiation exposure to the hands of the surgeon in every fluoroscopic procedure is on average comparable with a chest x-ray (approximately 20 mrem), limiting the recommended number of cases to about 2,500 per year [11]. To the patient the effective radiation involved with the pre-operative CT scanning will be negligible since the volume which is scanned is small and no major organs are within this volume. Due to the complex anatomy of the wrist joint, the accompanying ability to plan the surgery in advance can also be considered as a plus for the proposed procedure. The operator variability of the proposed procedure appeared to be minimal. The significant difference between operators in the error of the exit location of the realized drill hole might be due to differences in operator experience with the guidance interface. The small sample size of this pilot study may be the main reason for the lack of differences that are statistically significant when comparing the newly proposed ultrasound-guided procedure with the traditional fluoroscopic approach. However, the fact that the difference between the variance for the entry of the drill hole approaches significance might be indicative for the improved consistency of the ultrasound-guided procedure. Segmentation and registration were not real time in this study. However, it appeared that between 15 and 20 US images were required for obtaining a sufficiently accurate registration. The total amount of time involved between capturing the first US image and finishing the registration was nonetheless acceptable (less than 3 minutes). It is well-known that registration algorithms, like the iterative closest point method [10], need a sufficiently accurate initial estimation to avoid getting stuck in a wrong local minimum. Presently, this initial estimation is obtained by manual manipulation of the surface model with respect to the points segmented from the US images. A future version of the graphical interface might incorporate the selection of pre-determined landmarks on the patient using a calibrated stylus probe to facilitate a landmark registration as a first estimate. Another option is to try all orientations that can be achieved by flipping the model about its main axes (just eight) and assuming that the registration with the smallest average distance between the points and the model is the correct registration. The experiments described in this paper are pilot experiments enabling more realistic experiments in the near future. During an actual surgery, the acquisition of clear lateral fluoroscopic images of the scaphoid is virtually impossible due to the close proximity of the other carpal bones. Experiments using fluoroscopic guidance on isolated scaphoids might, therefore, result in incorrect conclusions. Because the reference optical target cannot be mounted directly to the scaphoid in an actual pinning procedure, the bone has to be registered with respect to an optical target close to the surgical target area. Since the ligaments in the wrist joint are tight and the hand is generally positioned in a pose that locks the scaphoid during screw insertion, we expect the mobility of the scaphoid to be negligible, but this has yet to be verified.
Ultrasound-Guided Percutaneous Scaphoid Pinning
543
Acknowledgment We thank the staff at Imaging Services at Kingston General Hospital for their assistance in obtaining CT images. We also thank Heather Grant for assisting in the statistical analyses.
References 1. Mink van der Molen, A., Groothoff, J., Visser, G., Robinson, P., Eisma, W.: Time off work due to scaphoid fractures and other carpal injuries in the netherlands in the period 1990 to 1993. Journal of Hand Surgery 24B (1999) 193–198 2. Daeke, W., Wieloch, P., Vergetis, P., Jung, M., Martini, A.: Occurrence of carpal osteoarthritis after treatment of scaphoid nonunion with bone graft and herbert screw: a long-term follow-up study. Journal of Hand Surgery 30A (2005) 923–931 3. Saeden, B., Tornkvist, H., Ponzer, S., Hoglund, M.: Fracture of the carpal scaphoid: A prospective, randomised 12-year follow-up comparing operative and conservative treatment. Journal of Bone and Joint Surgery 83B (2001) 230–234 4. Filan, S., Herbert, T.: Herbert screw fixation of scaphoid fractures. Journal of Bone and Joint Surgery 78B (1996) 519–529 5. Shih, J., Lee, H., Hou, Y., Tan, C.: Results of arthroscopic reduction and percutaneous fixation for acute displaced scaphoid fractures. Arthroscopy 21 (2005) 620–626 6. Slade, J., Gutow, A., Geissler, W.: Percutaneous internal fixation of scaphoid fractures via an arthroscopically assisted dorsal approach. Journal of Bone and Joint Surgery 84A (2002) 21–36 7. Beek, M., Abolmaesumi, P., Chen, T., Sellens, R., Pichora, D.: Percutaneous scaphoid pinning using ultrasound guidance. In: Cleary, K. R., Galloway, R. L. Jr. (Eds.) Medical Imaging 2006: Visualization, Image-Guided Procedures, and Display. (2006) 8. Chen, T., Abolmaesumi, P., Pichora, D., Ellis, R.: A system for ultrasound-guided computer-assisted orthopaedic surgery. Computer Aided Surgery 10 (2005) 281– 292 9. Schmerber, S., Chassat, F.: Accuracy evaluation of a cas system: Laboratory protocol and results with 6d localizers, and clinical experiments in otorhinolaryngology. Computer Aided Surgery 6 (2001) 1–13 10. Besl, P., McKay, N.: A method for registration of 3-d shapes. IEEE Transactions on Pattern and Machine Intelligence 14 (1992) 239–256 11. Singer, G.: Radiation exposure to the hands from mini c-arm fluoroscopy. Journal of Hand Surgery 30A (2005) 795–797
Cosmology Inspired Design of Biomimetic Tissue Engineering Templates with Gaussian Random Fields Srinivasan Rajagopalan* and Richard A. Robb Biomedical Imaging Resource, Mayo Clinic College of Medicine, Rochester, MN, USA Abstract. Tissue engineering integrates the principles of engineering and life sciences toward the design, construction, modification and growth of biological substitutes that restore, maintain, or improve tissue function. The structural integrity and ultimate functionality of such tissue analogs is defined by scaffolds- porous, three-dimensional "trellis-like" structures that, on implantation, provide a viable environment to regenerate damaged tissues. The orthogonal scaffold fabrication methods currently employed can be broadly classified into two categories: (a) conventional, irreproducible, stochastic techniques producing reasonably biomorphic scaffold architecture, and (b) rapidly emerging, repeatable, computer-controlled techniques producing straight edged "contra naturam" scaffold architecture. In this paper, we present the results of the first attempt in an image-based scaffold modeling and optimization strategy that synergistically exploits the orthogonal fabrication techniques to create repeatable, biomorphic scaffolds with optimal scaffold morphology. Motivated by the use of Gaussian random fields (GRF) to model cosmological structure formation, we use appropriately ordered and clipped stacks of GRF to model the three-dimensional pore-solid scaffold labyrinths. Image-based metrology, fabrication and mechanical characterization of these scaffolds reveal the possibility of enabling the previously elusive deployment of promising benchside tissue analogs to the clinical bedside.
1 Introduction Tissue and organ loss and defects occur in a wide variety of clinical situations, and their reconstruction to restore structural and functional integrity is a challenging but necessary step in the patient's rehabilitation. Tissue engineering attempts to address this challenge by regenerating damaged or malfunctioning organs from the recipient's own cells. The cells are harvested and embedded on a natural or synthetic carrier material, template or scaffold, that is ultimately implanted at the desired site. On implantation and integration, blood vessels attach to the new tissue, the scaffold dissolves, and the newly grown tissue clings, crawls, proliferates and eventually blends in with its surroundings. The scaffold architecture plays the essential and crucial role in defining the structural integrity and ultimate functionality of the engineered tissues. Tissue engineering scaffolds are conventionally fabricated using "particulate leaching" [1]– a stochastic approach in which water soluble porogen, such as salt, *
Is currently with GE Global Research, Bangalore, India.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 544 – 552, 2006. © Springer-Verlag Berlin Heidelberg 2006
Cosmology Inspired Design of Biomimetic Tissue Engineering Templates with GRF
545
sugar, or gelatin particles are included into biocompatible material before polymerization, and are subsequently leached out. The footprints left by the leached out porogen forms the pores within the material. Manual intervention, inconsistent and inflexible processing procedures, unpredictable morphology, anti-biomorphic crystalline footprints of the porogens and irreproducibility are critical limitations of this technique. Alternatively, scaffolds can be fabricated reproducibly using computer-controlled rapid prototyping devices. While such techniques offer unique ways to precisely control scaffold architecture, the CAD-based scaffold models, with exception of [2], employed in these approaches have straight edges and sharp turns. Such partitions do not provide the biomorphic environment suitable for cell attachment, migration and proliferation [3]. Figure 1 shows representative µCT cross sections of the scaffolds fabricated with the stochastic and deterministic techniques.
Fig. 1. Representative µCT cross sections of tissue engineering scaffolds fabricated with stochastic (left) and deterministic (right) processes
Towards synergistically exploiting the advantages of these orthogonal techniques, we propose the use of rapid prototyping devices to deterministically fabricate the pore-solid labyrinths designed using stochastic processes. Instead of letting the driven-by-nature reaction-diffusion random processes determine the ultimate suboptimal scaffold structure, we use computer-generated Gaussian random fields (GRF) as "ab initio" models for subsequent optimization driven by the structural requirements for functional scaffolds. The inspiration for using GRF as tissue engineering templates is drawn from its successful use as the primordial density fluctuation model to explain the complex hierarchy of nodes, filaments and sheets interlocking large voids forming the largescale cosmological structures [4, 5]. This success, achieved at the mega-structural level, when translated and perfected with optimized design and controlled fabrication techniques at the macro- and microscopic lamellar, architectural and whole organ levels, could finally realize the significant potential of the previously elusive biomorphic tissue analogs.
2 Gaussian Random Fields: Theory and Generation A stochastic process is a collection of random variables {Z(s): s ∈ D} indexed over a set D. A random field is a stochastic process where D ⊆ ℜ2. In GRF, the finite
546
S. Rajagopalan and R.A. Robb
collections (Z(s1), ... Z(sn)) are jointly normal. The distribution of such a field is completely determined by its mean and covariance functions which describe the spatial trend and spatial association respectively [6]. While any mean function m could be used, the covariance function r should be symmetric and positive definite. Following are some of the popular r(l) (l is the distance between points where the covariance is computed) covariance functions used to model GRF : Matern correlation (CM) [7]: r (l ) =
⎛ l ⎞ ⎛ l ⎞ 1 ⎜ ⎟ κθ ⎜ ⎟ , l ≠ 0; = 1, l = 0. 2θ 2 −1 Γ(θ 2 ) ⎜⎝ θ1 ⎟⎠ 2 ⎜⎝ θ1 ⎟⎠
where θ1, θ2 > 0, Γ is the gamma function and κθ2 is a modified Bessel function of the third kind of order θ2. 3
Spherical correlation (CS) [8]: r (l ) = 1− 3 ⎛⎜ l ⎞⎟ + 1 ⎛⎜ l ⎞⎟ , l ≤ 0; = 0, l > 0. whereθ > 0 2 ⎝θ ⎠ 2 ⎝θ ⎠
Exponential correlation (CE): r (l ) = θ
l θ2 1
, where θ1 ∈(0,1) and θ 2 ∈(0,2].
Rational quadratic correlation (CR) [9]: r (l ) = ⎛⎜1+ l 2 ⎞⎟ ⎜ ⎟ 2
⎝
θ1 ⎠
−θ 2
, where θ1 ,θ 2 > 0.
The statistical characteristics of these covariance functions are discussed in [10]. In addition to their mathematical elegance and ubiquitous relevance, GRF adequately describe the morphology of bicontinuous microemulsions, polymer blends and foams [11-13]. This provides us the impetus to explore the use of GRF in tissue engineering. To generate a zero-mean GRF Z(s) with a positive-definite covariance function r in a finite subset S = {s1, ..., sn}⊂ ℜ2, a n x n symmetric positive-definite covariance matrix T such that Tij = Cov (Z(si),Z(sj)) = r(si,sj) is constructed from which the multivariate normal vector (Z(s1),...Z(sn)) ~ N(0,1) is generated using classical Cholesky decomposition [14]. Other computationally efficient methods to generate GRF over large grids are addressed in [10]. Figure 2 shows a mosaic of GRF generated with the different covariance functions described above. For this study, we generated 50 GRF (grid size 256 x 256) for each of the covariance functions. The inplane resolution was assumed to be 100 microns.
Fig. 2. Representative zero-mean GRF generated with different covariance functions C(.). The parameters θ1, θ2 and θ used for each C(.) are specified under the respective images.
Cosmology Inspired Design of Biomimetic Tissue Engineering Templates with GRF
547
3 GRF-Based Tissue Engineering Scaffold Modeling Volumetric images for stochastic tissue engineering scaffolds can be obtained by stacking multiple GRF which represent the cross sectional images. The biphasic poresolid labyrinths can be modeled from these stacks by clipping the GRF at appropriate levels. Figure 3 shows the result of clipping the GRFs in Figure 2 at a level of 0.6 which, due to the normal distribution of the GRF, directly translates to an in-plane porosity (void fraction) of 60%. Figure 3 shows that clipped GRFs naturally possess macro and microporous networks- an essential requirement in tissue engineering scaffolds for concomitant cell proliferation and vascular infiltration.
Fig. 3. Clipping GRF (Figure 2) at a level of α yields binary random pore-solid sections with an in-plane porosity of α*100%. In this example, α = 0.6
The binary random fields generated with the desired in-plane porosity can be stacked together to form the three-dimensional pore-solid labyrinths. However, the randomness of the individual GRF makes the brute-force stacking sub optimal. Optimal scaffold architecture should be adequately porous with least tortuous and globally interconnected pore network [1]. Additionally, within micron ranges, the scaffold should have reasonably smooth out-of-plane spatial gradients so that the cytofilaments of proliferating cells can anchor strongly to the substrate [3]. Global porosity, tortuosity and interconnectivity can be maximized by permuting the clipped GRF such that the binary similarity between adjacent sections is maximized. In [16] an exhaustive list of binary similarity measures has been analyzed. Any of the symmetric measures explored therein can be used in the permutation; in this study we have used the Dice similarity coefficient [17]. By permuting the GRF sections based on maximization of adjacent section similarity, the out-of-plane overlap of pore and solid subspaces are maximized. This optimization decreases pore tortuosity, increases global pore connectivity and simultaneously improves the scaffold's structural integrity and mechanical strength. To provide the appropriate out-of-plane spatial gradients, the permuted cross sections are assumed to be separated by an interstice (we chose an interstitial distance
548
S. Rajagopalan and R.A. Robb
of 500 microns). Shape based interpolation within the interstice facilitates the smooth morphing of the pore boundaries. We employed a robust shape-based interpolation technique [18] to generate intermediate cross sections at 100 microns apart. Briefly, the technique uses a feature-guided approach to interpolate porous and tortuous binary objects. The feature points derived from the boundaries of the candidate source objects are matched non-linearly. The intermediate objects are obtained by appropriately blending the warped source objects. A robust outlier-rejecting, nonlinear point matching algorithm based on thin-plate splines is used for establishing the feature correspondence. This scheme correctly handles objects with holes, large offsets and drastic invaginations. The mosaic in figure 4 shows a representative set of consecutive cross sections obtained by interpolating the appropriately ordered clipped GRF. In contrast to the straight edged crystalline footprints (Figure 1) reminiscent in conventional stochastic scaffolds, GRF-based scaffolds have curved partitions which favor tissue ingrowth [3]. Cell proliferation is further enhanced by the smooth out-ofplane spatial gradients.
Fig. 4. Representative intermediate sections of the porous scaffolds generated by clipping GRF with covariance function CE( θ1, θ2) = (0.9, 1.5)
4 Image Based Metrology of GRF-Scaffolds The regeneration of specific tissues guided by tissue analogues is dependent on diverse architectural indices like pore size, porosity, pore shape, pore interconnectivity, tortuosity and scaffold stiffness [19]. These indices can be computed non-destructively with the binary pore-solid subspace. While bulk averaged quantities such as porosity and surface area are derived directly from the pore-solid delineations, the spatially distributed geometric indices are derived from the medial axis representations of the pore network. The computational framework for image-based metrology of porous scaffolds and the biological relevance of the computed indices are described in [20]. We provide succinct details here. The internal geometry of a porous medium is microscopically quantitated by partitioning the pore space into a discrete and well-defined collection of individual pores. Pores are defined as regions of the void space confined by solid surfaces (a.k.a nodal pores, pore bodies, chambers) connected by pore channels (a.k.a pore necks, throats). The pore-throat partitioning is performed by classifying the pore network skeleton into nodal and link points. The geometry of the individual throats is obtained from the cross-sectional area and perimeter of their intersections with planes normal at each link point. Bulk pore interconnectivity is computed from the Euler number (χ) obtained by 3D connected component analysis. Degree of Anisotropy (DA) and Structure model index (SMI) are computed by characterizing the
Cosmology Inspired Design of Biomimetic Tissue Engineering Templates with GRF
549
preferential orientational and topological alignment of pores. Geometrical tortuosity is computed as the distribution of the shortest path lengths between all possible skeletal voxel pairs incident on the two faces considered. Figure 5 shows the pore skeletal network in the central region of the GRF scaffolds. The individual nodes on the skeleton are color coded based on their nearest distance to a solid voxel. Qualitatively, the skeletal networks seem to be highly interconnected. Table 1 summarizes the quantitative results of image-based metrology. The effective pore radius was obtained by finding the radius of a sphere of equivalent volume. The effective throat radius is the radius of a circle of equivalent area. To compute the throat pore radii, the average of two different adjoining pore bodies was computed for each throat. The global porosity deviates from the initial inplane porosity (60%) due to dilations introduced by shape-based interpolation.
Fig. 5. Qualitative visualization of the skeleton of GRF scaffolds. The medial axis correspond to the central region of the GRF scaffolds modeled respectively (left to right, top to bottom) with Matern, spherical, exponential and rational quadratic covariance functions. Table 1. Summary of image-based metrology of GRF-scaffolds GRF
φ
χ
τAv
(%)
EPR
ETR
(µm)
(µm)
DA
SMI
Key: φ - porosity χ - Euler Number
CM (1,1)
63
-6.5
1.50
382
421
0.83
0.98
CM(1,2)
61
-38.5
1.47
527
510
0.73
0.55
τAv – Avg.tortuosity
CS(5)
61
-5.75
1.49
428
472
0.79
0.79
EPR – Avg. eff. pore radii
CS(10)
59
-
1.54
562
494
0.63
0.27
ETR – Avg. eff. throat radii
21.75 CE(9,15)
62
-
1.55
560
576
0.5
0.14
SMI – struct. model index
21.25 CR(11,11)
63
-16.5
DA – anisotropy degree.
1.50
397
438
0.68
0.66
550
S. Rajagopalan and R.A. Robb
The Euler number for all the scaffolds are negative indicating high pore interconnectivity- a characteristic hardly achievable with conventional stochastic scaffolds.The tortuosity for all the GRF scaffolds characterized is within the 1.41-1.78 range considered to be optimal for mass transport through globally interconnected porous space [21]. Effective pore and throat radii are within the ranges required for orthopedic tissue engineering scaffolds [20]. Results for degree of anisotropy shows that all the scaffolds are anisotropic- a favored attribute for multidirectional tissue growth. Non negative values for SMI indicate the absence of concave surfaces and enclosed cavities typically found in suboptimal conventional scaffolds.
5 GRF Scaffold Fabrication and Mechanical Characterization The volumetric binary GRF can be directly converted into surface models suitable for fabrication with rapid prototyping devices. Figure 6 shows the geometric .STL model within a sub region of a GRF scaffold. Multiple copies of the six scaffolds shown in Table 1 were fabricated with a Stereolithography machine (3D Systems, CA). The machine uses a UV laser that is vector scanned over the top of a bath of photopolymerizable liquid polymer material. As polymerization is initiated, the laser beam creates a first solid plastic layer, at, and just below the surface of the bath. This laser polymerization process is repeated to generate subsequent layers by tracing the laser beam along the design boundaries and filling in the 2D cross-section of the model, layer-by-layer, in steps of 0.02 inches. After building the model, excess resin was removed and cured in a UV oven. For mechanical characterization, the GRF scaffolds were uniaxially compressed on a Dynamic Mechanical Analyzer (DMA: TA Inst., New Castle, DE). Five specimens for each of the GRF scaffolds were used for the mechanical testing. The specimens were uniaxially compressed with parallel plates by applying a ramp force of 4N/min for 4.5 minutes. Figure 7 compares the linear modulus on the different scaffolds. Preliminary analysis of the results indicates that the mechanical strength adequately meets the requirements of tissue engineering scaffolds [1]. More investigation is needed to link the destructive mechanical testing with the nondestructive image metrology.
Fig. 6. Polygonization of the GRF scaffold architecture. The wireframe shows a sub region extracted from one of the GRF scaffolds.
Cosmology Inspired Design of Biomimetic Tissue Engineering Templates with GRF
551
Fig. 7. Linear modulus of GRF scaffolds (N=5) based on uniaxial compression on DMA under identical loading conditions (4 N/min. ramp force for 4.5 minutes)
6 Conclusions Recognizing the "random walk through the design space" as the critical bottleneck in the status quo of tissue engineering scaffolds, we have proposed a cosmological approach to adapt the randomness in GRF towards modeling optimized and predictable scaffolds and fabricating them with deterministic processes. In conjunction with appropriate biomaterials, the proposed image guided, domain knowledge driven, predicate rich, computer assisted intervention might contribute to the generation of "living" prosthesis that could integrate with the host tissue thereby reducing the need for further surgery or possible implant failure.
References 1. Yang S., et al., Design of scaffolds for use in Tissue Engineering: Part I Traditional Factors, Tissue Engineering 7(6), (2001), 679-689. 2. Rajagopalan S, Robb RA., Schwarz meets Schwann: Design and fabrication of Tissue Engineering Scaffolds, MICCAI 2005, 794-801. 3. Spalazzi JP., et al., Osteoblast and Chondrocyte interactions during coculture on scaffolds, IEEE Eng. Med & Biol., 22(5) (2003) 27-34. 4. Coles P, Chiang LY., Characterizing the nonlinear growth of large-scale structure in the universe, Nature, 406 (2000) 376-378. 5. Chiang LY., et al., Return mapping of phases and the analysis of the gravitational clustering hierarchy, Mon. Not. R. Astron. Soc., 337 (2002) 488-494. 6. Berger JO., et al., Objective Bayesian analysis of spatially correlated data, J Am Stat Assoc., 96(456) (2001) 1361-1374. 7. Matern B., Spatial Variation, Second Edition, Springer Verlag, Berlin, 1986. 8. Wackernagel H., Multivariate Geostatistics, Springer Verlag, Berlin, 1995. 9. Yaglom AM., Correlation theory of stationary and related random functions I. Basic results, Springer Verlag, New York, 1987. 10. Teubner M., Level surfaces of Gaussian random fields and microemulsions, Europhys Lett., 14(1991) 403-408.
552
S. Rajagopalan and R.A. Robb
11. Knackstedt MA, Robers AP., Morphology and macroscopic properties of conducting polymer blends, Macromolecules, 29 (1996) 1369-1371. 12. –ibid-., Mechanical and transport properties of model foamed solids, J. Mater Sci Lett, 14 (1995) 1357-1359. 13. Press WH., et al., Numerical recipes in C. The art of scientific computing, 2nd Edn, Cambridge University Press, 1992. 14. Kozintsev B., Computations with Gaussian Random Fields, PhD Thesis, University of Maryland, 1999. 15. Rajagopalan S, Robb RA., Assessment of similarity indices to quantify segmentation accuracy of scaffold images for tissue engineering, Proc SPIE 5747:1636-47, Medical Imaging 2005. 16. Dice LR, Measures of the amount of ecologic association between species, Ecology 26 (1945) 297-302. 17. Rajagopalan S, Karwoski RA., Robb RA., Shape based interpolation of porous amd tortuous binary objects, MICCAI 2002, 957-58. 18. Cima LG et al., Tissue engineering by cell transplantation using degradable polymer substrates, J. Biomech Eng. 113 (1991) 143-151. 19. Rajagopalan S, Robb RA., Image based metrology of porous tissue engineering scaffolds, Proc SPIE Vol 6144:540-550, Medical Imaging 2006. 20. Ramanujan S, Pluen A., Diffusion and convection in collagen gels: implications for transport in the tumor interstitium, J Biophys, 83 (2002) 1650-60. 21. Karageorgiou V, Kaplan D., Porosity of 3D biomaterial scaffolds and osteogenesis, Biomaterials, 26(27)(2005) 5474-91.
Registration of Microscopic Iris Image Sequences Using Probabilistic Mesh Xubo B. Song1 , Andriy Myronenko1 , Stephen R. Plank2 , and James T. Rosenbaum2 1
Department of Computer Science and Electrical Engineering OGI School of Science and Engineering, Oregon Health and Science University, USA 2 Department of Ophthalmology, Department of Cell and Developmental Biology, and Department of Medicine Casey Eye Institute, Oregon Health and Science University, USA {xubosong, myron}@csee.ogi.edu, {rosenbaj, plancks}@ohsu.edu
Abstract. This paper explores the use of deformable mesh for registration of microscopic iris image sequences. The registration, as an effort for stabilizing and rectifying images corrupted by motion artifacts, is a crucial step toward leukocyte tracking and motion characterization for the study of immune systems. The image sequences are characterized by locally nonlinear deformations, where an accurate analytical expression can not be derived through modeling of image formation. We generalize the existing deformable mesh and formulate it in a probabilistic framework, which allows us to conveniently introduce local image similarity measures, to model image dynamics and to maintain a well-defined mesh structure and smooth deformation through appropriate regularization. Experimental results demonstrate the effectiveness and accuracy of the algorithm.
1
Introduction
Recent development of videomicroscopy technology for imaging the immune response is revolutionizing the way to study and understand the immune mechanism [1,2]. The motion patterns of leukocytes, specifically T cells, are directly related to the cellular and chemical environment in lymph node, thymus and a site of eye inflammation and can reveal underlying disease mechanisms [3,4,5]. Our specific interest has focused on ocular inflammatory disease, a leading cause of blindness. The eye is especially attractive for imaging studies because cellular migration can be recorded without introducing any surgical trauma. Microscopy videos can reveal patterns of T cell, neutrophil and antigen-presenting cell migration in the ocular uveal tract, indicating a complexity in immune responses that has not been closely examined before [1,3,4]. By studying these microscopy videos, we can characterize the migration of leukocytes within the iris stroma in disease models. However, the characterization of leukocyte motility is made difficult by motion artifacts in videomicroscopy. The videos were taken of sedated murine eyes R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 553–560, 2006. c Springer-Verlag Berlin Heidelberg 2006
554
X.B. Song et al.
affected by uveitis (inflammation of the uveal tract), with a static microscopic camera looking at a portion of the iris. The motion artifacts are caused by wandering of the eye, dilation and contraction of the pupil, head motion, and sometimes refocusing of the camera during imaging. The net result is jitter and distortion (both spatial and intensity) in the image plane, which subsequently obscures the leukocyte motion. Frame-by-frame image registration is needed to stabilize and rectify the image sequences, which will pave the way for subsequent cell tracking. The deformation across frames is locally non-linear, and it is not feasible to obtain an accurate closed-form deformation model by examining the image formation process. In addition, the motion artifacts can cause the region on the iris being imaged to go in and out of depth of field of the camera, resulting in local blurring and local intensity instability. In this paper, we focus on frame-by-frame registration of the image sequence, using a probabilistic mesh model to account for the nonlinear nonrigid nature of the image deformation and to accommodate the local intensity variations.
2
Method
Mesh-based deformable model has been successfully used for motion estimation, compensation, video compression and other applications [6,7,8,9]. A mesh consists of a set of control nodes, which define polygon elements (patches) in the image. The mesh nodes move freely. The displacement of an interior point in an image element can be interpolated from the corresponding nodal displacements. The motion field over the entire frame is described by the displacements of the nodes only. Very complex motion field can be reproduced by a mesh model, given that sufficient number of nodes are used. As long as the nodes form a feasible mesh, mesh-based representation is guaranteed to be continuous and thus free from the blocking artifacts. Another key advantage of mesh model is that it enables continuous tracking of the same set of nodes over consecutive frames, which is important for registration of image sequences. Triangular and quadrangular are the most common mesh elements. In this paper, we focus on quadrangular elements. Our approach is closely related to that in [6]. We generalize the original mesh by formulating mesh deformation in a Bayesian framework, which allows us to naturally introduce priors to model video dynamics, to constrain the mesh structure, and to account for the local intensity variations. Consider two images I1 (x) = I(x, t1 ) and I2 (x) = I(x, t2 ), which are the reference image and the target image respectively. Let x1 C1 , ..., x1 CK represent the coordinates of the K control nodes placed in image I1 (x), with known positions. These nodes move to locations x2 C1 , ..., x2 CK in image I2 (x). The motion estimation problem is to find the location of all nodes x2 C1 , ..., x2 CK , so that all image elements in the reference frame matches well with the corresponding deformed elements in the target frame. Given reference image I1 , target image I2 , and the location of control nodes x1 C1 , ..., x1 CK in image I1 , the posterior probability of nodes locations x2 C1 , ..., x2 CK in image I2 ca be written as
Registration of Microscopic Iris Image Sequences Using Probabilistic Mesh
555
P (x2 C1 , ..., x2 CK |I1 , I2 ; x1 C1 , ..., x1 CK ) ∝ P (I2 |x2 C1 , ..., x2 CK ; x1 C1 , ..., x1 CK ; I1 )P (x2 C1 , ..., x2 CK |x1 C1 , ..., x1 CK ; I1 ) = P (I2 |x2 C1 , ..., x2 CK ; x1 C1 , ..., x1 CK ; I1 )P (x2 C1 , ..., x2 CK |x1 C1 , ..., x1 CK ) (1) The first term is the likelihood of observing I2 given I1 and the nodes location in both images. The second term is the prior of nodes location in I2 given where they are in I1 . In the following section, we will introduce approaches for modeling the likelihood function that reflect the local intensity properties and for modeling the priors that incorporate image dynamics and enforces well-defined mesh structures. 2.1
The Likelihood Term
Let D be the number of nodes in each image element. For quadrangular elements, D = 4. Let m be the element index. Denote B m as the mth element the images, m m and xi C1 , ..., xi CD as the D nodes that are responsible for defining element B m in image Ii , i = 1, 2. Under the assumption that all elements in an image are conditionally independent given their corresponding defining nodes, and that an element in image I2 only depends on its own nodes and the the same element in I1 , the likelihood term in (1) becomes P (I2 |x2 C1 , ..., x2 CK ; x1 C1 , ..., x1 CK ; I1 )
=
=
m
m
P (I2 (B m )|x2 C1 , ..., x2 CK ; x1 C1 , ..., x1 CK ; I1 ) m
m
m
m
P (I2 (B m )|x2 C1 , ..., x2 CD ; x1 C1 , ..., x1 CD ; I1 (B m )).
(2)
The likelihood term breaks down to measure the element-by-element image similarity between the reference and the target images, given the locations of the element nodes in both images. With noisy images, it is desirable to have the element similarity measures dependent on the “distinctiveness” of the elements. For instance, two similar elements that have “distinct features” such as edges, corners, line crossings and rich textures should be given more confidence than two similar homogenous elements. This can be captured by modeling the element-by-element similarity with a Gaussian distribution given by m
m
m
m
P (I2 (B m )|x2 C1 , ..., x2 CD ; x1 C1 , ..., x1 CD ; I1 (B m )) ∼ N (I2 (B m )−I1 (B m ), ΣB m ) where the choice of ΣB m reflects the distinctiveness of patch B m . For instance, 2 2 we can use isotropic diagonal matrix ΣB m = σm I, where σm is the local intensity variance in element m, and I is the identity matrix. This is equivalent to assigning weighting to different element pairs in an error function according to the distinctiveness of the elements. Element-by-element similarity measure with such choice of ΣB m also has the property of being invariant with respect to local intensity scaling.
556
2.2
X.B. Song et al.
The Prior Term for Image Dynamics
One of the key advantages of mesh model is its ability for continuous tracking of the same set of nodes over consecutive frames. The image sequence often has its dynamics. Taking advantage of the dynamics can lead to improved tracking robustness and reduced search space for optimization. The image dynamics can be captured by properly defining the prior term P (x2 C1 , ..., x2 CK |x1 C1 , ..., x1 CK ) in (1). The prior term depends only on nodes location and not on intensity. In case where the nodes locations across frames are first-order Markovian and go through a random walk, the prior can be modeled as a Gaussian distribution with P (x2 C1 , ..., x2 CK |x1 C1 , ..., x1 CK ) ∼ N (x2 C1 − x1 C1 , ..., x2 CK − x1 CK ; ΣC ), where ΣC = σd2 I and σd2 reflects the random walk step size. 2.3
The Prior Term for Maintaining a Well-Defined Mesh
When adapting the mesh nodes locations, it is necessary to ensure that the mesh structure is well-defined. In other words, the mesh doesn’t change topology and there are no nodes flips overs or obtuse elements. Typically this is done by limiting the search range of the nodes location when they are updated during an iterative procedure [6]. Here we adapt a less ad hoc and more principled approach by introducing, as a prior term in Bayesian formulation, a node order preserving term that explicitly enforces the ordering of the nodes. Since we always start with well-defined mesh x1 C1 , ..., x1 CK in I1 , this order preserving term will only be applied to constrain the nodes locations x2 C1 , ..., x2 CK in image I2 . We can define the prior term as: P (x2 C1 , ..., x2 CK |x1 C1 , ..., x1 CK ) = C Cj Cn n 2 P (x2 C1 , ..., x2 CK ) ∝ exp{− β2 j,n x2 j − xC 2 }, where x2 and x2 are neighboring nodes. This term is similar to the one introduced in [10] for Elastic Nets. Since the spatial difference of neighboring nodes is an approximation of the firstorder derivative of the deformation field, this order preserving term also serves as a Tikhonov regularization term that enforces the smoothness of the deformation field. 2.4
The Complete Posterior
Putting together the likelihood term and the two prior terms into (1), we have the posterior probability P (x2 C1 , ..., x2 CK |I1 , I2 ; x1 C1 , ..., x1 CK ) ∝
m
m
m
m
m
P (I2 (B m )|x2 C1 , ..., x2 CD ; x1 C1 , ..., x1 CD ; I1 (B m ))· P (x2 C1 , ..., x2 CK |x1 C1 , ..., x1 CK )
1 −1 m exp{− (I2 (B m ) − I1 (B m ))T ΣB ) − I1 (B m ))}· m (I2 (B 2 α β C −1 n 2 exp{− (x2 C − x1 C )T ΣC (x2 C − x1 C )}exp{− x2 j − xC 2 } j,n 2 2 ∝
m
(3)
Registration of Microscopic Iris Image Sequences Using Probabilistic Mesh
557
where x1 C and x2 C are vectors formed by concatenating the control nodes coordinates x1 C1 , ..., x1 CK and x2 C1 , ..., x2 CK in images I1 and I2 respectively, and α and β are the hyper-parameters controlling the strength of the two prior terms. The practically useful values for the hyper-parameters can be obtained manually for a given type of images. An energy function can be defined as the negative log likelihood, given by E(x2 C1 , ..., x2 CK ) = − log P (x2 C1 , ..., x2 CK |I1 , I2 ; x1 C1 , ..., x1 CK ) −1 m = (I2 (B m ) − I1 (B m ))T ΣB ) − I1 (B m ))+ m (I2 (B m
α β C −1 n 2 (x2 C − x1 C )T ΣC (x2 C − x1 C ) + x2 j − xC 2 , j,n 2 2 which can be minimized by gradient-based optimization.
3
(4)
Implementation
The image sequences were acquired of the iris and ciliary/limbal region of anesthetized animals with endotoxin-induced uveitis, observed by intravital epifluorescence videomicroscopy with a modified DM-LFS microscope (Leica) and a CF 84/NIR Black-and-White camera from Kappa, Gleichen, Germany [3]. Timelapse videos were be recorded for 30 to 90 minutes at 3 frames per minute. The images are monotone of size 720x480. 3.1
Preprocessing
The images are first normalized to reduce the effect of global intensity variation across frames, followed by an edge-preserving smoothing process to reduce noise while preserving structural features in the images (e.g., vessel branches). A global affine registration is used to initiate the mesh deformation algorithm. The affine registration, even if not extremely accurate, is a good initial guess of the final registration. It also is crucial for speeding up the mesh-bases registration and for avoiding poor local minimum of the energy function. 3.2
Hierarchical Mesh-Based Registration
We adopt a hierarchical procedure, which successively approximate the control nodes locations. We start with an image down-sampled at the lowest resolution level L. At this image resolution, we start with a regular mesh such that each element covers a 16x16 image patch. The mesh nodes locations are updated according to the gradient descent of the energy function in (4), until the energy function reaches a preset threshold. Then these nodes locations are translated to the image with resolution level L−1, and additional nodes are inserted so that in this image the elements maintain roughly the size of 16x16. The location of these newly inserted nodes are determined by linearly interpolating the existing nodes locations. The nodes locations in this newly formed mesh are once again updated
558
X.B. Song et al.
according to gradient descent of the energy function. This process is repeated until the images reach the highest resolution level and the energy function is reduced to a predetermined level. Such a hierarchical process is important because it speeds up the optimization process and is crucial for avoiding getting stuck at a local minimum of the energy function by providing, at each iteration, a more reasonable starting point for optimization. By the construction of such hierarchy, we ensure an adequately complex, instead of an overtly complex, deformation field. This is consistent with the Occam’s Razor principle, which prefers the simplest model among all models that are consistent with data.
4
Results
We tested the algorithm on Pentium4 3.5GHz machines with 4GB Ram. The code was implemented in Matlab with some subroutines written in C. It takes approximately 30 seconds to register 2 images frames. We illustrate the effectiveness of the proposed algorithm on two microscopic iris video sequences, comparing the root of mean squared pixel-by-pixel intensity differences (RMSE) between two frames, for before and after registration. The RMSE are computed on images after intensity normalization. The first sequence has 51 frames. The average RMSE for this sequence is 0.1365 ± 0.0383 before registration, which reduces to 0.0074 ± 0.0011 after registration. The second
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 1. Two image frames from the first video sequence: (a) frame 1; (b) frame 30; (c) the absolute intensity difference between the two frames before registration; (d) the estimated deformation field found by the algorithm; (e) the registration result when image (b) is aligned with image (a); (f) the absolute intensity difference between the two frames after image (b) is aligned with image (a)
Registration of Microscopic Iris Image Sequences Using Probabilistic Mesh
(a)
(b)
(c)
(d)
(e)
(f)
559
Fig. 2. Two image frames from the second video sequence: (a) frame 12; (b) frame 25; (c) the absolute intensity difference between the two frames before registration; (d) the estimated deformation field found by the algorithm; (e) the registration result when image (b) is aligned with image (a); (f) the absolute intensity difference between the two frames after image (b) is aligned with image (a)
sequence has 25 frames. The average RMSE for this sequence is 0.0912 ± 0.0193 before registration, which reduces to 0.0165 ± 0.0053 after registration. Visualization of the actual video sequences demonstrates significantly improved image stability that were originally severely jittery and deformed. Figures 1 and 2 illustrate the registration result on a pair of image frames from these two sequence. In both figures, panels (a) and (b) are the two frames from the sequences to be registered. Panel (c) is the absolute frame difference before registration. Panel(d) is the deformation field between the two frames, and (e) shows the second image registers onto the first. Panel (f) is the absolute frame difference after registration. In both examples, we can see significant improvement in the difference images. The complex nonlinear deformations are effectively captured by the mesh-based model.
5
Conclusions and Discussions
Registration of microscopic iris images is an important step toward leukocyte tracking and characterization, but it is difficult due to the high local nonlinearity of the deformation. We generalize the mesh-based image registration method by formulating it in a probabilistic framework. Such formulation allows us the flexibility to measure local similarity between images and to encode priors. We define similarity measures that reflect the local image intensity properties, introduce priors that can capture the dynamics of image deformation in video sequence as well as the prior that enforces proper mesh structure and smooth
560
X.B. Song et al.
deformation. The algorithm was implemented using gradient descent optimization in a hierarchical fashion. The algorithm proves to be effective and accurate for registration of microscopic image sequences, which provide sufficient information for subsequent cell tracking.
Acknowledgement This work is partially supported by NIH grants EY013093 and EY06484 and by Research to Prevent Blindness awards to SRP, JTR, and the Casey Eye Institute.
References 1. Rosenbaum, J.T.and Planck, S., Martin, T., Crane, I., Xu, H., Forrester, J.: Imaging ocular immune responses by intravital microscopy. Int Rev Immunol 21 (2002) 255–272 2. Halin, C., Rodrigo Mora, J., Sumen, C., von Andrian, U.: In vivo imaging of lymphocyte trafficking. Annu Rev Cell Dev Biol. 21 (2005) 581–603 3. Becker, M., Nobiling, R., Planck, S., Rosenbaum, J.: Digital video-imaging of leukocyte migration in the iris: intravital microscopy in a physiological model during the onset of endotoxin- induced uveitis. J Immunol Meth 240 (2000) 23–27 4. Becker, M., Adamus, G., Martin, T., Crespo, S., Planck, S., Offner, H., Rosenbaum, J.: Serial imaging of the immune response in vivo in a t cell mediated autoimmune disease model. The FASEB Journal 14 (2000) A1118 5. Kawakami, N., Nagerl, U., Odoardi, F., Bonhoeffer, T., Wekerle, H., Flugel, A.: Live imaging of effector cell trafficking and autoantigen recognition within the unfolding autoimmune encephalomyelitis lesion. J Exp Med 201 (2005) 1805–1814 6. Wang, Y., Lee, O.: Active mesh: A feature seeking and tracking image sequence representation scheme. IEEE Trans. Image Processing 3 (1994) 610–624 7. Nosratinia, A.: New kernels for fast mesh-based motion estimation. IEEE Trans. Circuits Syst. Video Technol. 11 (2001) 40–51 8. Nakaya, Y., Harashima, H.: Motion compensation based on spatial transformations. 4 (1994) 339–356, 366–7 9. Toklu, C., Tekalp, A., Erdem, A., Sezan, M.: 2d mesh based tracking of deformable objects with occlusion. In: International Conference on Image Processing. (1996) 17A2 10. Durbin, R., Szeliski, R., Yuille, A.: An analysis of the elastic net approach to the traveling salesman problem. Neural Computation 1 (1989) 348–358 11. Bajcsy, R., Kovacic, S.: Multiresolution elastic matching. 46 (1989) 1–21 12. Ferrant, M., Warfield, S., Guttmann, C., Mulkern, R., Jolesz, F., R., K.: 3d image matching using finite element based elastic deformation model. In: MICCAI. (1999) 202–209 13. Brown, L.: A survey of image registration techniques. 24 (1992) 325–376
Tumor Therapeutic Response and Vessel Tortuosity: Preliminary Report in Metastatic Breast Cancer Elizabeth Bullitt1, Nancy U. Lin2, Matthew G. Ewend1, Donglin Zeng1, Eric P. Winer2, Lisa A. Carey1, and J. Keith Smith1 1
CASILab, CB # 7062, University of North Carolina, Chapel Hill, NC 27599, USA {bullitt, ewend, jksmith, carey}@med.unc.edu,
[email protected] http://casilab.med.unc.edu/ 2 Dana-Farber/Harvard Cancer Center, Boston, MA 02115, USA {Nancy_Lin, Eric_Winer}@dfci.harvard.edu
Abstract. No current non-invasive method is capable of assessing the efficacy of brain tumor therapy early during treatment. We outline an approach that evaluates tumor activity via statistical analysis of vessel shape using vessels segmented from MRA. This report is the first to describe the changes in vessel shape that occur during treatment of metastatic brain tumors as assessed by sequential MRA. In this preliminary study of 16 patients undergoing treatment for metastatic breast cancer we conclude that vessel shape may predict tumor response several months in advance of traditional methods.
1 Introduction Effective monitoring of brain tumor therapy poses a major clinical problem. If a tumor previously sensitive to a drug later becomes resistant, the therapeutic regimen should be changed rapidly. Unfortunately, there is presently no reliable, noninvasive means of monitoring therapeutic efficacy. Biopsy often provides an answer but is too invasive to be performed frequently. Change in neurological examination is helpful but often occurs late in the clinical course and is an insensitive measure of tumor growth. The current standard of practice is to monitor tumor treatment by performing magnetic resonance (MR) scans at regular intervals and to assess tumor growth by measuring the size of the tumor on gadolinium-enhanced images in one, two, or three dimensions [1,2,3]. Such measurements can then be compared to similar measurements made from a later scan. Regardless of the relative merits of each of these approaches (1D estimates are more crude than 2D, and 2D are more crude than 3D), all three assessments give only anatomical information about what the tumor is doing “now”, cannot predict what the tumor will do “next”, and provide basically similar information [4]. A reliable method of monitoring tumor metabolic or physiologic activity would be of higher value, since it could inform the clinician what the tumor is about to do “next”. As a result, many techniques are under development, including positron emission tomography, MR spectroscopy, and perfusion imaging. Reviews are provided by Benard [5] and Law [6]. None of these techniques have yet proven clinically reliable. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 561 – 568, 2006. © Springer-Verlag Berlin Heidelberg 2006
562
E. Bullitt et al.
This report provides an initial evaluation of a new method of assessing tumor treatment response over time. The approach employs a quantitative, statistical measure of vessel tortuosity using vessels segmented from high-resolution MR angiograms (MRA). Subjects were those with breast cancer metastatic to brain and enrolled in a multi-center trial aimed at evaluating the response of brain mestastases to an experimental drug. Tumor status was evaluated for each patient by vessel tortuosity measurements, by three-dimensional calculation of tumor volume, and by clinical criteria, which included uni-dimensional assessment of tumor size on Gadolinium-enhanced MR images by RECIST criteria [1]. Our hypothesis was that sequential measures of vessel shape would correctly predict tumor response in advance of both clinical criteria and the volumetric calculation of tumor size made from MR.
2 Methods All subjects were enrolled in a five-center drug trial aimed at treating breast cancer metastatic to brain. MR scans of the head were obtained prior to initiation of therapy and every two months thereafter, with additional scans sometimes acquired for clinical reasons. Patients were withdrawn from the drug study for progressive intracranial disease if they met RECIST criteria for tumor enlargement [1]—a one-dimensional assessment of tumor size made from gadolinium enhanced images. Neither vessel shape measures nor 3D tumor volumetric assessments were used as the basis for clinical decision. The current imaging study describes results in the 16 patients who, as of January 1 2006, had undergone at least two good-quality MRA examinations, whose images were available for analysis, and who had been withdrawn from the drug study for progressive intracranial disease. Some of these subjects exhibited initial response to therapy. Subjects still undergoing treatment and subjects withdrawn from the drug study for progressive extracranial disease were excluded from the current report as such patients had undefined endpoints for intracranial disease. All patients in the current study received sequential MRA and T1-gadolinium enhanced images obtained on various 1.5T Siemens, GE, and Phillips MR scanners. We required standardized, time-of-flight, MRA images from which to perform a statistical analysis of vessel shape. We therefore made cross-institutional requirements for MRA acquisitions that required repeated scanning of the same patient on the same machine, coverage of the entire head, and voxel size of 0.5 x 0.5 x 0.8 mm3. Any MRA that failed to meet these criteria was excluded. The protocols used for T1-gadolinium enhanced imaging were more variable. We strongly encouraged all institutions to acquire T1 images at 3mm interslice spacing or less, but did not exclude a patient from analysis if these recommendations were not followed. Vessels were segmented from each MRA using a method that, proceeding from a seed point, automatically defined each vessel as a set of regularly spaced skeleton points with an associated radius at each point [7]. A second, semi-automatic program was then used to define vessel trees and to exclude veins and the external circulation [8]. The segmentation process defined each vessel as an ordered, regularly spaced set of x, y, z skeleton points with an associated radius at each point. Vessel segmentation and tree formation required approximately 30 minutes.
Tumor Therapeutic Response and Vessel Tortuosity
563
All tumors of 1 cm3 or more were defined from T1-gadolinium enhanced images using a program that defined tumors via polygon drawing and filling on orthogonal cuts through an image volume. Tumor volume was automatically calculated as the number of labeled voxels multiplied by voxel size, with results expressed in cubic cm. For the current study, a meaningful increase (decrease) in tumor volume was defined as a volumetric increase (decrease) from baseline values of at least 20% with a concomitant change in volume of at least 0.5 cm3. We had initially planned to analyze vessel shapes on a regional basis, viewing each defined tumor as an independent entity. So many lesions were present, however, that this approach proved impractical. The entire brain was therefore taken as the region of interest, and results are here reported for each patient’s entire intracranial circulation regardless of the number of metastatic tumors, their locations, or their volumes. Although many vessel shape measures were calculated for each subject, the current report analyzes only the “malignancy probability” (MP). The MP equation was derived from an earlier, blinded study of benign and malignant tumors. This earlier study concluded via discriminant analysis of multiple vessel shape parameters that only a combination of two tortuosity metrics appeared effective in generically separating benign from malignant disease [9]. One of these metrics, the “Sum of Angles Metric” (SOAM), sums curvature along a space curve and normalizes by vessel length [10]. The second metric, the “Inflection Count Metric” (ICM), multiplies the total path length of a space curve by the number of inflection points and divides by the distance between endpoints [10]. The SOAM is effective in flagging highfrequency, low amplitude curves. The ICM is effective in flagging large amplitude curves that frequently change direction. Each metric describes a different form of tortuosity. The combination of the two metrics quantitatively describes the vessel shape abnormality typically associated with cancer, aptly described by Baish as “many smaller bends upon each larger bend” [11], with SOAM quantifying the small bends and ICM the larger ones. More information about tortuosity metrics is given in [10] and derivation of the combinatorial equation is described in [9]. What the minimum clinically meaningful change in the MP should be is unknown. For the current study, we defined in advance that evidence of tumor response required a drop in MP by >=20 from baseline value. MP calculation requires normalization of each tortuosity value by the means and standard deviations of healthy vessel values via z-scoring [9]. Data from 34 healthy subjects, ranging in age from 18-72 and including subjects of both sexes, were used for this normalization.
3 Results Tumor presentation as visualized by T1-gadolinium enhanced images differed from subject to subject. The majority of patients exhibited multiple tiny metastases as well as 1-4 lesions of 1 cm3 or more. Vessel abnormalities were widespread throughout the head, and consisted of the “many smaller bends upon each larger bend” [11] typical of cancer-associated vasculature. Figure 1 illustrates these typical vessel shape abnormalities as well as their improvement during successful treatment.
564
A
E. Bullitt et al.
B
C
D
E
Fig. 1. Improvement of vessel tortuosity abnormalities during successful treatment. A: Axial slice of a T1-GAD baseline scan showing a large tumor (arrow). B: Lateral view of the vessels segmented from the baseline MRA. Rectangle outlines the region magnified in C. C: Tortuosity abnormalities involve both smaller branches and the large frontopolar arteries (arrows). D: The tumor has regressed at month 2 of treatment (arrow). E: At month 2 small, abnormal vessel branches have largely disappeared and there is normalization of the larger vessels’ shapes.
Table 1 provides summary results of the time in months for clinical criteria, volumetric measurements, and vessel shape analysis to indicate tumor progression from the time of the baseline scan. The table is divided into five groups, each indicative of a different set of results. Group 1 consists of 4 patients who exhibited clear-cut evidence of tumor growth by both clinical criteria and volumetric measurement at 2 months. Tumor growth was often explosive, with volumetric increases of as much as 200% and 5 cm3. All patients exhibited a high MP by vessel analysis both at baseline and at month 2. Although vessel analysis was consistent with other measures of tumor progression, it did not provide added information. By any criterion, this group of patients failed treatment. Group 2 consists of 4 patients who were removed from the drug trial at month 4 or 6, but in whom both volumetric measurement and vessel analysis would have correctly indicated therapeutic failure at month 2. All 4 patients exhibited tumor growth by more than 20% volume and 0.5 cubic centimeters at month 2 with continued growth thereafter, and none exhibited improvement in MP values at any time point. In these 4 cases volumetrics and vessel analysis performed equally well, and both performed better than clinical RECIST criteria [1] in flagging tumor progression. Group 3 consists of an important group of 5 patients who demonstrated stable disease or tumor remission volumetrically at month 2, but in whom vessel analysis indicated continued high malignancy at month 2. Any volumetric improvement was transient in these subjects. All patients exhibited tumor growth volumetrically over baseline at month 4, with monotonic increase in tumor volume thereafter. In these 5 cases, vessel analysis correctly predicted tumor activity 2 to 4 months in advance of all methods based upon gadolinium-enhanced imaging. Figure 2 (left) provides an example. Group 4 consisted of one patient who had multiple tiny lesions, all too small to analyze meaningfully either volumetrically or by RECIST [1] criteria. She was withdrawn from the drug study at month 4 for clinical worsening; vessel analysis correctly indicated high ongoing tumor activity at month 2. Group 5 contains the only two subjects who showed significant improvement in vessel malignancy probability at month 2. It is of note that these two subjects were also the only ones to show sustained improvement or stability in tumor volume. Subject 003 demonstrated progressive, dramatic reduction in lesion volume until month 4,
Tumor Therapeutic Response and Vessel Tortuosity
565
Table 1. Time in months at which tumor progression was noted by clinical criteria (CLIN), volumetrics (VOL), and by vessel malignancy probability (MP). Subjects are classified into 5 response groups (column 1), with the patient identification number given in column 2. Column 5 provides comments, with “CM” indicating carcinomatous meningitis and “Volume smaller?” indicating a volumetric decrease insufficient to meet formal criteria for volumetric reduction. Group 1 1 1 1
Study ID 007 009 016 022
CLIN 2 2 2 2
VOL 2 2 2 2
MP 2 2 2 2
COMMENT Failed by all criteria Failed by all criteria Failed by all criteria Failed by all criteria
2 2 2 2
006 010 026 029
4 6 4 4
2 2 2 2
2 2 2 2
Monotonic 3D tumor growth Monotonic 3D tumor growth Monotonic 3D tumor growth Monotonic 3D tumor growth
3 3 3 3 3
008 012 015 023 027
4 7 6 4 3
4 6 4 4 3
2 2 2 2 2
Slow, monotonic tumor growth Volume smaller? month 2 only Volume smaller month 2 only Volume smaller? month 2 only Volume smaller? month 2 only
4
018
4
>4
2
Only tiny lesions
5 5
003 013
11 6
>10 >6
4 6
Volume smaller; new CM Lesion stable after month 2
Fig. 2. Two examples of tumor volume reduction in which vessel MP values correctly presaged tumor activity. In both graphs, the x axis gives time in months (time 0 is the baseline scan prior to treatment). The y axis gives a percentage (0-100%) representing both the malignancy probability (black bar) and the tumor percent volume (grey bar) in which the numerator is the tumor volume and the denominator the maximum tumor volume recorded for that patient; the tumor volume is 100% at the time point it was largest during the study. Left: Case 015. The tumor regressed significantly at month 2 but subsequently enlarged rapidly; vessel MP correctly flagged the tumor as of high (100%) MP at month 2, presaging worsening during the next cycle. Right: Case 003. The tumor volume progressively regressed between months 0 and 4 with vessel MP indicating a drop in activity from 100% to 64% at month 2, correctly presaging future improvement during the next 2-month cycle. MP returned to 100% at month 4 at which time the tumor stopped regressing; the subject also developed carcinomatous meningitis. No MRA data were available after month 4.
with subsequent stability of lesion volume through the time she was withdrawn from study at month 11 for carcinomatous meningitis. Her vessel malignancy probability improved significantly at month 2 (Figures 1, 2), correctly presaging ongoing
566
E. Bullitt et al.
improvement, but reverted to her baseline 100% level at month 4. It is unknown at which time point she began to develop microscopic leptomeningeal metastases. Neither volumetric nor RECIST measurements were able to delineate her impending treatment failure since these measurements were based upon the size of her solid tumor (Figure 3) rather than upon overall metabolic/physiologic activity.
Fig. 3. Baseline T1 gadolinium-enhanced scan (left) and at 10 months (right) in a patient who exhibited an 88% reduction in lesion volume but developed carcinomatous meningitis
Case 013 is more difficult to interpret because she is missing a critical time point. Her tumor showed a volumetric increase between baseline and month 2 but subsequent volumetric stability through month 6, at which time she was (perhaps incorrectly) withdrawn from the drug study for meeting RECIST criteria. Her vessel MP had dropped to lower levels at months 2 and 4 but reverted to her baseline level at month 6. We believe that her tumor would have exhibited significant growth at month 8 on the basis of her vessel MP values, but image data from that time point are not available.
4 Discussion Early assessment of tumor treatment response would be of high benefit to clinical therapeutic management. Current methods of evaluating drug therapy are inaccurate, delay diagnosis, and may even reach the wrong conclusion [16; Fig 3]. Commonly employed RECIST criteria involve estimating a change in a three-dimensional tumor volume using a single, noisy, 1-dimensional measurement, and even volumetric determination provides only anatomical information based upon a single imaging modality. Fundamentally, any method based solely upon analysis of gadolinium enhanced images will only be able to compare what the tumor size appears to be “now” to its size in the past, with additional confounding factors produced by successful treatment resulting in tumor necrosis, which can also induce gadolinium enhancement. The use of sequential vessel shape analyses to measure cancer treatment response offers several theoretical advantages. First, abnormal vessel tortuosity provides an early flag of cancer activity. In experimental animal models, abnormal vessel tortuosity appears within 24 hours of subcutaneous injection of cancer cells [12]. Indeed, this abnormality appears when the tumor burden is only 60-80 cells and occurs earlier than neoangiogenic sprouting [12]! Second, the typical vessel shape abnormality of “many smaller bends upon each larger bend” [11] appears across anatomical regions,
Tumor Therapeutic Response and Vessel Tortuosity
567
across cancer types, and even across species. The same malignancy probability equation can thus be used to flag cancer activity in different locations [13]. Third, vessel shape abnormalities extend far beyond tumor confines and involve initially healthy vessels coursing in the tumor vicinity [9, 12], thus making it feasible to perceive such vessels using high-resolution MRA even though MRA cannot depict capillaries. Fourth, the vessel shape abnormalities associated with active cancer appear to be independent of tissue vascularity as measured by perfusion imaging [14]. This capacity provides a significant strength when assessing hypovascular malignant or hypervascular benign tumors [9]. Finally and importantly, histological analyses in animal models have described rapid normalization of vessel shape abnormalities during successful treatment [12, 15]. For a variety of reasons, the quantitative assessment of vessel shape thus appears to be an attractive approach for monitoring tumor treatment response. The current study is the first to describe vessel shape changes in a set of human cancer patients imaged serially by MR during treatment. In these patients, vessel shape analyses provided earlier information about tumor progression than clinical RECIST criteria in 11/16 cases and than volumetric assessment in 7/16 cases. In these subsets of patients, tumor progression could have been correctly identified 2-6 months earlier by vessel analysis than by the standard methods used to assess tumor growth. In the remaining cases, vessel analysis performed equally well. The ability of vessel analysis to correctly predict tumor regression in response to treatment is more difficult to define from this study. Only two subjects exhibited long-term volumetric stability or decrease in lesion size. It is important to note that the only two subjects to exhibit volumetric tumor suppression for 4 months or more were also the only two in which vessel malignancy probability exhibited reduction at month 2. However, meaningful statistical analysis is impossible with this small number of responders. The current study should therefore be interpreted as providing interesting preliminary results only. Definitive conclusions will require a much larger number of subjects responsive to therapy. An important question is that of the time over which it takes vessel shapes to normalize during successful treatment and to revert to malignant patterns during tumor recurrence. Although histological studies in animals indicate that such pattern changes occur rapidly [12, 15], MR cannot discriminate the tiny vessels likely to be earliest affected. The time required for tortuosity changes to develop in the larger vessels delineable from MR is unknown. On the basis of the current study which studies a particular tumor type, a particular form of therapy, a small number of patients, and with MR scans performed at 2 month intervals, it appears that vessel shape changes indicative of response occur within 2 months and that vessel shape changes can predict what a tumor is about to do during at least the next 2 month period. Shorter scanning intervals might allow even more rapid determination of future tumor behaviour.
Acknowledgments Supported by R01EB000219(NIBIB), P50CA58185-AV-55P2(Avon), P30CA1608629S1(NCI), P30CA58223 (NCI), and M01RR00046 (NIH).
568
E. Bullitt et al.
References 1. Therasse P, Arbuck SG, Eisenhauer EA, et al: New guidelines to evaluate the response to treatment in solid tumors: European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada. J Natl Cancer Inst 92 (2000) 205–216. 2. Macdonald DR, Cascino TL, Schold SC Jr, Cairncross JG.: Response criteria for phase II studies of supratentorial malignant glioma. J Clin Oncol 8 (1990) 1277–1280. 3. Dempsey MF, Condon BR, Hadley DM: Measurement of tumor “size” in recurrent malignant glioma: 1D, 2D, or 3D? AJNR 26 (2005) 770-776. 4. Shah GD, Kesari S, Xu R, Batchelor TT, et al:Comparison of linear and volumetric criteria in assessing tumor response in adult high-grade gliomas. Neuro-Oncology 2005. Available at http://puck.ingentaconnect.com/vl=6507458/cl=14/nw=1/rpsv/cw/www/dup/15228517/ previews/contp1-1.htm 5. Benard F, Romsa J, Hustinx R: Imaging gliomas with positron emission tomography and single-photon emission computed tomography. Seminars in Nuclear Medicine 33 (2003) 148-162. 6. Law M, Yang S, Wang H, Babb JS, Johnson G, Cha S, Knopp EA, Zagzag D: Glioma Grading:Sensitivity, specificity, and predictive values of perfusion MR imaging and proton MR spectroscopic imaging compared with conventional MR imaging. AJNR 24 (2003) 1989-1998. 7. Aylward, S.R., Bullitt, E.: Initialization, noise, singularities and scale in height ridge traversal for tubular object centerline extraction. IEEE-TMI 21 (2002) 61-75. 8. Bullitt, E., Aylward, S., Smith, K., Mukherji, S., Jiroutek, M., Muller, K.: Symbolic Description of Intracerebral Vessels Segmented from MRA and Evaluation by Comparison with X-Ray Angiograms. Medical Image Analysis 5 (2001) 157-169. 9. Bullitt E, Zeng D, Gerig G, Aylward S, Joshi S, Smith JK, Lin W, Ewend MG. Vessel tortuosity and brain tumor malignancy: A blinded study. Academic Radiology 12 (2005) 1232-1240. 10. Bullitt E, Gerig G, Pizer S, Aylward SR. Measuring tortuosity of the intracerebral vasculature from MRA images. IEEE-TMI 22 (2003) 1163-1171. 11. Baish JS, Jain RK. Fractals and cancer. Cancer Research 60 (2000) 3683-3688. 12. Li CH, Shan S, Huang Q, Braun R, Lanzen J, Hu K, Lin P, Dewhirst M, Initial stages of tumor cell-induced angiogenesis: evaluation via skin window chambers in rodent models. J Natl Cancer Inst 92 (2000) 143-147. 13. Bullitt E, Wolthusen A, Brubaker L, Lin W, Zeng D, Van Dyke T. Malignancy-associated vessel tortuosity: A computer-assisted, MRA study of choroid plexus carcinoma in genetically engineered mice. AJNR 27 (2006) 612-619 14. Parikh A, Smith JK, Ewend MG, Bullitt E. Correlation of MR perfusion imaging and vessel tortuosity parameters in assessment of intracranial neoplasms. Technology in Cancer Research and Treatment 3 (2004) 585-590. 15. Jain RK. Normalizing tumor vasculature with anti-angiogenic therapy: a new paradigm for combination therapy. Nature Medicine 7 (2001) 987-98.
Harvesting the Thermal Cardiac Pulse Signal Nanfei Sun1 , Ioannis Pavlidis1, , Marc Garbey2 , and Jin Fei1 1
Computational Physiology Lab Department of Computer Science University of Houston, Houston, TX, USA {nsun, jinfei, garbey}@cs.uh.edu,
[email protected] 2
Abstract. In the present paper, we propose a new pulse measurement methodology based on thermal imaging (contact-free). The method capitalizes both on the thermal undulation produced by the traveling pulse as well as the periodic expansion of the compliant vessel wall. The paper reports experiments on 34 subjects, where it compares the performance of the new pulse measurement method to the one we reported previously. The measurements were ground-truthed through a piezo-electric sensor. Statistical analysis reveals that the new imaging methodology is more accurate and robust than the previous one. Its performance becomes nearly perfect, when the vessel is not obstructed by a thick fat deposit.
1
Introduction
The research described in this paper aims to recover robustly the pulse frequency in a contact-free manner. This effort is part of a general framework that we established for measuring multiple vital signs. The hallmark of the framework is that all measurements are performed at a distance and under a single sensing regime (thermal imaging). So far we have demonstrated that it is possible to perform at a distance measurements of pulse [1], breathing rate [2, 3], superficial vessel blood flow [4], and perfusion [5]. The technology is very appealing in the context of psycho-physiology, where outfitting the subject with contact sensors is not recommended. In psychological experiments it is very important for the subject to feel as free as possible or a variable may be introduced in his psychological state. As the technology for measuring vital signs at a distance matures, it may find biomedical applications beyond psycho-physiology, wherever sustained physiological measurements are of interest. Cardiac pulse is an important vital sign that reflects the health status of the subject’s cardiovascular system. It is also indicative of the metabolic rate and stress level of the subject. In [1] we introduced a thermal imaging method to measure pulse. That paper established the feasibility of measuring pulse at a distance using passive sensing. In the present manuscript we report substantial improvements that take the initial method from the realm of feasibility to the realm of applicability.
Corresponding author.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 569–576, 2006. c Springer-Verlag Berlin Heidelberg 2006
570
N. Sun et al.
In the rest of the paper we unveil our new imaging methodology for measuring pulse. Specifically, in section 2, we describe the method itself. In section 3, we report and discuss its experimental validation.
2 2.1
Pulse Measurement Methodology Cross-Section Temperature Function
Within the rectangular region of interest, the operator draws a line that traverses the cross-section of the thermal imprint of the vessel (e.g., carotid). The line has to bleed over to the surrounding tissue (see Fig. 1). By applying our measurement methodology on this line over time, we can capture the thermal undulation caused by pulsative vessel blood flow. For the typical experimental configuration we use (subject at 6 f t and camera outfitted with a 50 mm lens) the cross-section of a major vessel, such as the carotid, spans between
Fig. 1. Carotid region of interest and cross-sectional measurement line
Fig. 2. Temperature profile across the vessel thermal imprint
Harvesting the Thermal Cardiac Pulse Signal
571
5 − 10 pixels. We increase the low spatial resolution of the measurement line by applying quadratic interpolation [6]. The points of the interpolated measurement line correspond to temperatures and form the cross-section temperature function gt (x), x ∈ {−N, . . . , N } at frame t. We model the cross-section temperature function using the first five (5) cosine functions of the Fourier series [7]. The modeling yields a smoother curve ht (x), x ∈ {−N, . . . , N } at frame t (see Fig. 2). 2.2
Ridge and Boundary Temperature Functions
We use the facial tissue tracking algorithm we reported in [8] to compensate for natural head motion during measurement. This is not sufficient, however, since there is fine vessel displacement not related to volitional head movement. This displacement is due to minute motor motion, diaphragm induced motion, and the elastic response of the vessel to systolic and diastolic pressure. The vessel displacement moves the point of the maximum temperature reading along the measurement line. We call this point, ridge point and it corresponds to the middle of the vessel’s cross section, where the blood flow speed is maximal. At frame t, the ridge point is defined as: rt = arg max{ht (x)}, x ∈ {−N, . . . , N }, x
(1)
where ht (x) is the smoothed cross-section temperature function. The time evolution of the ridge point forms the ridge line, which is indicative of the vessel’s displacement. The ridge line maps to the ridge temperature function lr (x, t), x ∈ {−N, . . . , N }, t ∈ {1, . . . , T }, which is an accurate record of the evolution of the vessel’s maximum temperature (strong signal). The vessel’s minimum temperature is recorded at the vessel’s boundary, where the blood flow speed is minimal. At each frame t, we select on either side of the measurement line, the boundary point bt to be: bt = arg max{|ht (x)| + |ht (x)|}, x ∈ {0, . . . , N }, x
(2)
where ht (x) is the smoothed cross-section temperature function. The time evolution of the boundary point forms the boundary line. The boundary line does not exactly mirror the displacement of the ridge line. The reason is that the vessel is compliant and its volume changes with respect to pressure. Therefore, the vessel expands during diastole and contracts during systole, superimposing a boundary deformation on the general vessel displacement. The boundary line maps to the boundary temperature function lb (x, t), x ∈ {0, . . . , N }, t ∈ {1, . . . , T }, which is an accurate record of the evolution of the vessel’s minimum temperature. This function carries valuable pulse information that is associated to the periodic expansion of the vessel’s wall. Fig. 3 depicts the ridge and boundary lines and the corresponding temperature functions for a measurement applied on the carotid of a subject. The measurement lasted for T = 1, 000 frames.
572
N. Sun et al.
Fig. 3. Ridge and boundary lines along with the corresponding temperature functions
2.3
Computation of the Mean Pulse Frequency
Initially, we compute the mean pulse over an extended period of time T (T ≥ 30 sec). In long observation periods the pulse frequency is expected to dominate in the spectral domain, since it is more consistent than white noise. Therefore, this reliable mean value estimate can be used in a feedback loop to narrow-down the search space in a second-pass instantaneous pulse computation. The computation is based on Fourier analysis and takes into account both the ridge and boundary temperature functions lr (x, t) and lb (x, t) respectively. We will show the development for the ridge temperature function lr (x, t) only. Exactly, the same applies for the boundary function lb (x, t). Specifically: 1. We use a low order trigonometric polynomial to prepare the function lb (x, t) for Fast Fourier Transformation (FFT): Lr (x, t) = lr (x, t) − (α cos(t) + β),
(3)
where α = 12 (Lr (x, 0) − Lr (x, T − 1)) and β = 12 (Lr (x, 0) + Lr (x, T − 1)). This ensures that the shift will not affect the stability of the scheme by minimizing the Gibbs phenomenon. 2. We extend Lr (x, t) to a 2T periodic function as follows: we apply a symmetry function (Eq. (4)) and then a periodic extension (Eq. (5)): ∀t ∈ (0, T ), Lr (x, T − t) = −Lr (x, t)
(4)
∀t ∈ (0, 2T ), ∀k ∈ Z, Lr (x, t + k2T ) = Lr (x, t)
(5)
3. We apply a classic decimation-in-time (Cooley and Tukey) 1D base-2 FFT method [6] to obtain the power spectrum Pr (f ) of function Lr (x, t): Pr (f ) = F (Lr (x, t)).
(6)
Harvesting the Thermal Cardiac Pulse Signal
573
4. We model the power spectrum Pr (f ) as a multi-Normal distribution Pr (f ) by applying a Parzen window method [9]: Pr (x) =
F 1 W (f − fi ), F i=1
(7)
where W (f ) is the Parzen window Normal kernel: 2
(x−μp ) − 1 2 2σp W (x) = √ e . σp 2π
(8)
We take as μp = 0 and σp2 = 0.1. The normalized mean frequency variance of the pulse for the subjects in our data set is σn2 = 0.1, as it is computed from the ground-truth measurements. Therefore, our choice σp2 = 0.1 for the variance of the Parzen window kernel is relevant. Once we compute the model spectra Pr (f ) and Pb (f ) of the ridge and boundary temperature functions respectively, we multiply them to obtain the combined model spectrum Prb (f ) (see Fig. 4). Then, we find the frequency fn for which the model spectrum Prb assumes its maximum amplitude. We consider fn the mean pulse frequency of the subject during the extended time period T . In fact, we represent this mean pulse frequency as a Normal distribution with mean μl = fn and variance σl2 = σn2 = 0.1.
(a)
(b)
(c)
Fig. 4. All graphs are normalized: (a) Raw ridge and boundary spectra. (b) Multinormal models after the application of Parzen window. (c) Combined multi-normal model.
2.4
Computation of the Instantaneous Pulse Frequency
The time window of the pulse computation may vary between the period that is required for a single heartbeat (lower limit) to the long observation period we use for the mean pulse computation (upper limit). The typical period required for the completion of a heartbeat is ∼ 1 sec, although this may vary depending on the physical condition of the subject. Our imaging system operates with an average
574
N. Sun et al.
speed of 30 f ps, so 1 sec equates to ∼ 30 frames. We select the time window for instantaneous pulse computation to be 512 frames (∼ 14 sec) respectively. The chosen number is a power of 2 and facilitates the FFT computation. It is also a reasonable compromise between very long (T ≥ 30 sec) and very short (T ∼ 1 sec) observation periods. In order to compute the combined model spectrum Prb−i (f ) for the short observation period Ti , we apply exactly the same procedure that we described in section 2.3 for long observation periods. Then, we filter Prb−i (f ) by multiplying 2 it with the normal distribution N (μl , σl ) of the mean pulse: Prb−i (f ) = Prb−i (f ) ∗ N (μl , σl2 ).
(9)
In essence, we use the mean pulse frequency to localize our attention in the instantaneous pulse frequency spectrum. Then, we compute the frequency fi for which the amplitude of the spectrum Prb−i (f ) is maximum. This is the tentative instantaneous pulse frequency. 2.5
Post-processing
The instantaneous pulse frequency computation described in section 2.4 may occasionally be affected by noise despite the defensive mechanisms built into the methodology. To address this problem we use an estimation function that takes into account the current measurement as well as a series of past measurements. This way, abrupt isolated measurements are smoothed over by the natural continuity constraint. The instantaneous pulse frequency computation is being performed over the previous Ti frames (Ti = 512). We convolve the current power spectrum Pc = Prb−i with a weighted average of the power spectra computed during the previous M time steps (see Fig.5). We chose M = 60, since at the average speed of 30 f ps sustained by our system, there is at least one full pulse cycle contained within 60 frames even in extreme physiological scenarios. Therefore, the historical contribution of our estimation function remains meaningful at all times. Specifically, the historical frequency response at a particular frequency f is given as the summation of all the corresponding frequency responses for the M spectra, normalized over the total sum of all the frequency responses for all the historical M spectra: M Pc (f ) H(f ) = M c=1 F c=1 j=1 Pc (j)
(10)
Finally, we convolve the historical power spectrum H with the current power spectrum to filter out transient features. We then designate as pulse the frequency fpulse that corresponds to the highest energy value of the filtered spectrum within the operational frequency band (see Fig.5).
Harvesting the Thermal Cardiac Pulse Signal
575
Fig. 5. Estimation Function
3
Experimentation and Discussion
We used a high quality Thermal Imaging (TI) system for data collection. The centerpiece of the TI system is a Phoenix InSb 640 × 480 Mid-Wave Infrared (MWIR) camera [10]. We recorded 34 thermal clips from the faces of 34 subjects while resting in an armchair. Concomitantly we recorded ground-truth pulse signals with the ML750 PowerLab/4SP [11] data acquisition system, accessorised with a piezo-electric sensor. The data set features subjects of both genders (24 males vs. 10 females), different races, and with varying physical characteristics. All imaging measurements were performed on a major facial vessel, that is, carotid, temporal, or supra-orbital. We evaluated the performance of the method regarding the mean pulse computation by calculating the accuracy against the mean ground-truth measurements. We evaluated the performance of the method regarding the instantaneous pulse computation by calculating the cumulative sums (CuSum) between the instantaneous imaging measurements and their corresponding ground-truth ones. The overall accuracy of the mean pulse measurement using the new method has improved to 92.1%, compared to the previous method’s [1] 88.5% performance. The new method improved dramatically the accuracy for 21 subjects that have clear thermal vessel imprints (from 88.5% to 96.9% ). These are typically the cases of lean subjects where the vessel is not obstructed by a thick fat deposit. The stellar performance is due to accurate localization of the boundary signal, which weighs heavily in the current method. Further improvements in the quantification of the boundary signal under difficult conditions is the focus of our ongoing efforts. The overall CuSum error is only 7.8%, which indicates a strong performance in instantaneous pulse measurements.
576
N. Sun et al.
Acknowledgment This material is based upon work supported by the National Science Foundation under Grant No. 0414754, entitled “Interacting with Human Physiology.” Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
References [1] Sun, N., Garbey, M., Merla, A., Pavlidis, I.: Imaging the cardiovascular pulse. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Volume 2., San Diego, California, USA (2005) 416–21 [2] Murthy, R., Pavlidis, I., Tsiamyrtzis, P.: Touchless monitoring of breathing function. In: Proceedings of the 26th Annual International Conference of the IEEE Engineering in Medicine and Biology. Volume 2., San Francisco, California (2004) 1196–9 [3] Fei, J., Zhu, Z., Pavlidis, I.: Imaging breathing rate in the CO2 absorption band. In: Proceedings of the 27th Annual International Conference of the IEEE Engineering in Medicine and Biology, Shanghai, China (2005) 700–5 [4] Garbey, M., Merla, A., Pavlidis, I.: Estimation of blood flow speed and vessel location from thermal video. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Volume 1., Washington D.C. (2004) 356–63 [5] Pavlidis, I., Levine, J.: Monitoring of periorbital blood flow rate through thermal image analysis and its application to polygraph testing. In: Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology. Volume 3., Istanbul,Turkey (2001) 2826–9 [6] Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: 12. In: Numerical Recipes in C. 2nd edn. Cambridge University Press, New York, New York (1992) 504–21 [7] Arfken, G.: 14. In: Fourier Series. 3 edn. Academic Press, Orlando, Florida (1985) 760–93 [8] Tsiamyrtzis, P., Dowdall, J., Shastri, D., Pavlidis, I., Frank, M., Ekman, P.: Lie detection - recovery of the periorbital signal through tandem tracking and noise suppression in thermal facial video. In Carapezza, E.M., ed.: Proceedings of SPIE Sensors, and Command, Control, Communications, and Intelligence (C3I) Technologies for Homeland Security and Homeland Defense IV. Volume 5778., Orlando, Florida (2005) [9] Duda, R.O., Hart, P.E., Stork, D.G. In: Pattern Classification. A WileyInterscience Publication, New York, New York (2001) [10] Indigo Systems Inc. 70 Castilian Dr., Goleta, California 93117-3027: (http://www. indigosystems.com) [11] ADInstruments Pty Ltd Unit 6, 4 Gladstone Rd, Castle Hill, NSW 2154, Australia: PowerLab ADInstruments Owners Manual. (2004)
On Mobility Analysis of Functional Sites from Time Lapse Microscopic Image Sequences of Living Cell Nucleus Lopamudra Mukherjee1, Vikas Singh1, Jinhui Xu1 , Kishore S. Malyavantham2, and Ronald Berezney2 1
Department of Computer Science and Engineering, State University of New York at Buffalo {lm37, vsingh, jinhui}@cse.buffalo.edu 2 Department of Biological Sciences, State University of New York at Buffalo {ksm4, berezney}@buffalo.edu
Abstract. Recent research in biology has indicated correlations between the movement patterns of functional sites (such as replication sites in DNA) and zones of genetic activity within a nucleus. A detailed study and analysis of the motion dynamics of these sites can reveal an interesting insight into their role in DNA replication and function. In this paper, we propose a suite of novel techniques to determine, analyze, and interpret the mobility patterns of functional sites. Our algorithms are based on interesting ideas from theoretical computer science and database theory and provide for the first time the tools to interpret the seemingly stochastic motion patterns of the functional sites within the nucleus in terms of a set of tractable ‘patterns’ which can then be analyzed to understand their biological significance.
1 Introduction DNA replication is the process of copying a DNA strand in a cell prior to division and is among the most important processes inside the mammalian cell nucleus. These replication (or synthesis) processes occur at so called replication sites (RS, for short). The study of the basic properties of these sites is fascinating not only because it helps interpret the fundamental functions of a cell but may also lead to understanding variations between a cell’s healthy and diseased states (such as cancer). Recent developments in microscopic imaging techniques[1,2] have therefore focused on imaging RS in action in order to obtain a better understanding of their dynamics. The last few years in particular have seen several notable advances in live nuclear microscopic imaging and the development of associated software modules for image processing and analysis. This has been complemented by simultaneous developments in the usage of fluorescent proteins for staining these sites before imaging for better identification and visualization. These advances collectively have provided biologists for the first time a view of the spatial location, movement, and other behavioral aspects of RS within living cells in real time.
This research was supported by NSF award CCF-0546509 and NIH grant GM 072131-23.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 577–585, 2006. c Springer-Verlag Berlin Heidelberg 2006
578
L. Mukherjee et al.
From a biologist’s perspective, RS exhibit an extremely interesting characteristic in that they are constantly in motion. This interest stems partly from observations reported in recent literature[3] which show a significantly faster movement of the sites encompassing the euchromatin DNA compared to those around the hetrochromatin DNA. Because it is well known that euchromatin is composed mostly of actively expressing genes whereas heterochromatin is largely inactive, the observed motion behavior strongly suggests a deeper connection between the sites’ mobility patterns and the process of gene expression. These relationships can only be explored if the motion dynamics of these sites are well understood. The focus of our work is the design of algorithms that enables us to determine, study, and interpret these mobility patterns. A closer analysis of the problem reveals several potential complications. For instance, a visual evaluation of the data suggests that the sites undergo an almost random motion between consecutive image frames. A temporal tracking algorithm adds little extra information towards any discernible patterns of movement except that it yields the sites’ individual motion magnitudes from one image frame to the next. Moreover, details about the kinds of motion dynamics these sites undergo during various cell phases are not yet known. This makes the mathematical formulation of the problem challenging because we know neither the behavior of an ideal system (such as human vision/cognition) on this input nor the kind of patterns we would like to determine. Part of the reason that human vision also fails to extract any useful output from such data is that any intrinsic motion patterns are apparently irrecoverable from an overwhelming global to-and-fro-motion the sites exhibit. Given these difficulties, our objective is to search for some “method in madness” in an effort to represent the seemingly stochastic motion as a combination of a set of tractable functions. In this paper, we propose novel techniques to address the problems discussed above in an effort to study the mobility properties of the RS. We approach this goal with a two-fold objective. The first question we try to answer is whether given a set of points1 apparently undergoing haphazard motion, are there small subsets of points that show similar degree of motion. To answer this question, we propose a technique based on high dimensional point set clustering for grouping sites based on their mobility properties. We then illustrate how we employ this idea to identify spatial patterns by splitting the nucleus into various almost-independent mobility zones. Our second independent goal is to identify sets of points that seem to be moving together, as if linked by a slightly deformable chain (sub-structure identification). Observe that the true motion pattern, say P, the sites undergo is unknown. Nonetheless, it is expected that if a group of points are indeed moving rigidly as a substructure, their mobility (high or low) should be same for each point in that group. Therefore, to evaluate our algorithms we compare the results obtained using our two techniques and observe that there is a high degree of territorial overlap between the rigid point sub-sets generated in the second step and the mobility zones determined from the first; further, the patterns determined are consistent with expert opinion.
1
In the remainder of this paper, we will use the terms sites (RS) and points interchangeably depending on the biological or geometric context of the discussion.
On Mobility Analysis of Functional Sites
579
2 Method 2.1 Temporal Tracking of Replication Sites (RS) The displacement (motion magnitude) of RS between consecutive image frames in the sequence can broadly be classified into two separate types of movement. The first and arguably the more interesting type of movement is due to the individual motion of the sites. The second type of movement can be attributed to a nominal displacement of the complete nucleus from one image frame to the next. Before attempting to address (and determine) the patterns of movements of individual sites, we employ a simple preprocessing step to correct for the global displacement (an isometric transformation owing to the second type of movement), that the nucleus undergoes. This procedure also provides temporal tracking information that yields the point to point correspondences from a pair of images. While such a process is by no means sufficient towards our ultimate goal of motion analysis because it yields no information about the patterns of movement, nonetheless, it serves as a useful first step. For this purpose, we employ a simple technique proposed by Cho and Mount[4] to calculate an alignment between the pair of point sets representing RS in consecutive image frames. While the theoretical performance analysis of the algorithm guarantees an approximation ratio of 3 (alignment will be no worse than three times the unknown optimal alignment under Hausdorff distance measure), we observe that such an analysis is quite conservative. In practice, the technique performs favorably by calculating a transformation (rotation matrix, R, and a translation vector, t), that aligns the two point sets quite well. Once this alignment has been determined, we calculate point to point correspondences by using a combination of bipartite matching and rank-maximal matching coupled with certain assumptions about a neighborhood of motion. The size of the neighborhood is chosen based on the temporal resolution of the image sequence (usually, about 2 seconds) and the maximal to-and-fro motion a site can be expected to undergo within this time period. The results are then manually verified to ensure whether such a procedure returns accurate correspondences. In general, we obtain an accuracy of about 90% which is comparable to those reported in recent literature [5]. 2.2 Determining Mobility Zones To investigate the mobility properties of the sites, our main idea is to to zone (or cluster) the sites based on their motion magnitude and then use this information to spatially partition the image into mobility coded regions. We proceed as follows. The number of mobility zones (k) and the ‘window’ of time to be considered for determining the non-uniform motion of the sites are assumed given. Here, a fixed time window consists of d discrete intervals or image frames. The parameters k and d are then used to create (and populate) a high dimensional feature space S, where S ∈ ℜd if the left end of the time window is placed on (but not including) time point t0 . S is populated by making use of the tracking information obtained in §2.1. Consider a graph, G, induced by the correspondence information determined by the temporal tracking algorithm. G has d ‘levels’ (or parts) where points in the image frame at time point, t j , are represented j as nodes in G at level j, say G j . The information that a point pi in the image at t j
580
L. Mukherjee et al.
corresponds to a point pkj+1 can be easily represented in G by introducing a directed j j+1 edge between the nodes corresponding to pi and pk . The weight of an edge from a node nij ∈ G j to nkj+1 ∈ G j+1 can be calculated by considering the distance traveled j by the point pi from time point, t j to t j+1 given by dist(·,·) denoting the L2 distance j j+1 between pi and pk . Our feature space, S, can then be easily populated by considering each unique (not necessarily node disjoint) correspondence path from a node in G1 to a node in Gd . The inhomogenous representation of a point in S can then be calculated by considering the edge weights (in order) of its corresponding correspondence path. Note that the those image points that could not be tracked will not have a complete correspondence path from G1 to Gd and thus will not be represented in S. Our purpose now is to determine k spatial clusters in S; each cluster will represent a ‘mobility zone’ and an inverse transformation on S will yield information about the mobility patterns of individual points in the input images. We based our algorithm on a hierarchical clustering technique known as Agglomerative clustering [6]. Hierarchical clustering algorithms work in arbitrary number of dimensions and group items into a hierarchy of clusters relying on distance measures to determine similarity between clusters. This cost function is then minimized in an optimization framework. Agglomerative clustering in particular works by considering each entity as an individual cluster, and then pairs of items are repeatedly merged until the total number of clusters in S is exactly k (similar to Kruskal’s Minimum Spanning Tree algorithm). We employ a special type of agglomerative hierarchical clustering called CURE (Clustering Using Representatives) [7] that has a running time complexity of O(n2 ). It performs reliably, is robust with respect to the value of k, and produces high quality clusters even in the presence of outliers. Once the clustering is done, we assign to each cluster a unique ‘color tag’ as an identifier. An inverse transformation is then applied on S; this yields a color tag for each correspondence path in G. Clearly, if the points corresponding to two correspondence paths, P1 and P2 , were assigned the same color (members of the same cluster in S), the colors tags for P1 and P2 will also be the same. The color tags of the correspondence paths are then transferred back to the sets of RS in the input image sequence. Observe that the clustering process uses only the motion parameters and is independent of the spatial distribution of the points. However, using the clustering information, we can now analyze the sites’ spatial properties in context of the motion they undergo and investigate if any interdependencies can be determined. We do this as follows. First, we partition the nucleus using Voronoi diagrams [8], dividing it into spatial regions of influence based on the distribution of sites. We then make use of the sites’ color tags (obtained via an inverse mapping on S) to merge adjacent polygons having the same color tag to obtain larger cells obtaining a chromatic Voronoi Diagram. This final step provides us with a very useful tool for analyzing the spatial relationships of the RS. Notice that larger cells in the chromatic voronoi diagram are in fact a contiguous mobility sub-region and indicate a sub-structure of RS undergoing similar motion over the time window. By analyzing the chromatic voronoi diagrams for an entire sequence (all d image frames in a sequence), we can not only assess the reliability of the scheme (the partitioning should be fairly continuous over the given time window and should not change abruptly from one image frame to the next), but also analyze the motion dynamics of a given sequence of RS (see Fig. 1 for an illustration).
On Mobility Analysis of Functional Sites
(a)
(b)
(c)
581
(d)
Fig. 1. Illustration of (a) original cell nuclear image, (b) sites colored according to mobility based clustering, (c) Voronoi Partitions of the sites, (d) contiguous mobility zones determined by merging adjacent voronoi cells with the same color tag. The color scheme used to denote the zones is denoted by the color bar on the right. Regions containing sites which could not be tracked (about 10%) are shown in black.
Remark: Our attempts at obtaining an equivalent clustering solution in 1D using the mean values of the feature vectors yielded significantly different results. We suspect that motion variability exhibited by a RS in d frames is not sufficiently captured by its mean velocity alone. 2.3 Determining Rigid Substructures Our strategy here is to determine subsets of points that approximately retain their substructure from one image frame to the next (and over the sequence). This is a key difference to the previous motion-based clustering scheme because substructure similarity of the form we address here is independent of motion (displacement transformations such as translations and rotations). Similarity and agreement in the results of these independent techniques2 on the same data set indicates a fair likelihood of accuracy in the absence of any gold standard data. In recent literature, there has been a flurry of activity aimed at utilizing structural pattern identification techniques to study common substructures of proteins in bioinformatics research [9]. The common feather in most of these approaches is the representation of proteins as geometric graphs. Each node in the graph represents a ‘unit’ (or atom) in the protein and has real world coordinate values; the edges between the nodes represent the inter-atomic bonding. Each technique then tries to calculate a similarity measure between these graphs (or trees). Adapting these techniques to our problem, however, is rather difficult. because no apparent ‘links’ between the RS are known. Of course, an alternative is to look for patterns for every possible grouping of points. While this may work for a small number of points, our datasets are typically large. This would require a substantial amount of computation time making it practically infeasible. Taking these issues into account, the approach we adopt is as follows. Consider two point sets, P1 and P2 , corresponding to RS in consecutive image frames. For every triplet of point in the first set, say (pi , p j , pk ), a create a point pi jk in a search space, S, such that 2
The previous technique performs clustering in a feature space based solely on motion while the second technique will consider ‘spatial structures’ and ignore motion.
582
L. Mukherjee et al.
pi jk = (di j , dik , dk j ) where di j = dist(pi , p j ), dik = dist(pi , pk ), dk j = dist(pk , p j ) and di j ≥ dik ≥ dk j (dist(·, ·) denotes the L2 distance). Thus, we obtain two sets of points, P1 and P2 in S, each comprising of no more than n3 points (if max(|P1 |, |P2 |) = n). The process is useful because it converts our substructure isomorphism determination problem into a point location problem as we shall discuss now. Consider a triplet of points that did not undergo any structural change from the first time point to the next. The reason why this transformation to a point location problem works is that the triplet of points at the time point tP1 , say, (ui , u j , uk ), and the triplet of points at time point tP2 , say, (vx , vy , vz ), will map to the same point in S. In other words, the notion of a ε-neighborhood in S is analogous to a quasi isomorphism between triplets of RS in consecutive image frames (where ε is a small constant). To exploit this neighborhood information, we make use of range-search trees[8] for point location queries in S. However, since the transformation to S loses all location information of triplets of RS, we check potentially isomorphic structures against the tracking information calculated in §2.1 and verify if a given pair of two triplet of points are isomorphic under some rigid transformation T . Then, we gradually increase the sets of likely matches to include more than three points at a time. This strategy allows us to avoid paying an exorbitant cost associated with checking all combinations of nr point sets in consecutive image frames (where r ≥ 3). Finally, the process yields isomorphic substructures in consecutive image frames.
3 Results Our algorithms were implemented in C++using CGAL and LEDA on a machine running GNU/Linux. Evaluations were performed on 15 temporal data-sets (2D + time), each consisting of about 10 image frames taken 2 seconds apart. For simplicity of presentation, we will discuss results for each subsection in the order they appeared in the paper. Due to space limitations, we will omit details about the preprocessing phase. We will try to highlight our main observations here using a few datasets as illustrative examples instead of discussing results for each set individually. Motion based high dimensional clustering: The correspondence information from §2.1 was obtained and our algorithms described in §2.2 were used to determine clusters using user provided parameter k and d based on the sites’ motion magnitudes. The color tags of each of the k motion clusters were then used to ‘tag’ its member RS. A visualization of these color-coded sites in real space revealed a rather interesting property. We found that a significant number of spatially proximate sites (in real space) were allocated to the same cluster in the high dimensional feature space. Further, a partition of the nucleus (based on Voronoi Diagrams) showed a number of adjacent cells joining together to form contiguous mobility zones. This observation indicates that close-together sites show a similar degree of motion as previously illustrated in Fig. 1. The robustness of our algorithm was further evaluated by varying the two input parameters: (1) value of k (number of clusters), (b) length of time window (dimensionality of the feature space, d). (1) varying the value of k. We noticed that the spatial correlations of sites that belong to the same mobility cluster in feature space are preserved
On Mobility Analysis of Functional Sites
583
under varying values of k. In Figs. 2 (a)-(c), we illustrate the results of color partitioning of the nucleus (represented by the first image frame in the sequence) based on mobility clustering for different k values. Observe that between Fig. 2 (a) and 2 (b), the overall partitioning schema remains approximately constant. However, based on the number of clusters desired (the chosen value of k), the higher mobility zone in (a) was upgraded to being the highest mobility zone in (b) whereas the lower mobility zone of (a) showed more variation in movement splitting up into two zones. A similar phenomena is seen when we increase k from 3 to 4 in Figs. 2 (b) and (c). This trend conveys a strong sense
k=2
k=3
k=4
Fig. 2. Nucleus of a cell showing varying number of mobility zone
in that the clustering technique is robust to the value of k. Therefore, one can increase the value of k if a more vivid detail is desired (more colors in the partition). (2) increasing the length of the time window. We observed that a lengthier time window (higher dimensionality, d, of feature space) does not have a detrimental effect on the performance of the clustering. In fact, spatial relationships of the clusters determined for several different choices of time windows were very similar (for a fixed k value). We illustrate these results in Figs. 3 (a)-(c). The only deterioration we could notice was that as the length of the time window became progressively larger, the number of sites that could not be tracked also increased (showing more black regions in Figs. 3 (a)-(c)). We suspect that this might be as a consequence of inaccuracies in the temporal tracking (preprocessing) phase. (3) Overall quality of clusters. The overall quality of mobility
d=4
d=6
d=8
Fig. 3. Nucleus of a cell showing mobility zones determined from time point 0 to 4, time point 0 to 6 and time point 0 to 8
based clustering was observed to be quite good (calculated using metrics suggested in [7] and other simple measures such as geometric standard deviations with respect to the centroid of the cluster). While we will avoid a detailed discussion on this topic due to lack of space but would like to point out that the standard deviation of the clusters remained relatively constant as a function of an increase in the length of the time window
584
L. Mukherjee et al.
(related to (2) above). However, we noticed an almost linear relationship between the compactness and the value of k chosen (related to (1) above). Rigid substructure determination: The algorithm described in §2.3 were used to determine rigid substructures using parameters d and a maximum allowable deformation ε. Fig. 4 (a) shows some of rigid substructures obtained for a particular sequence of images where d = 5 (time point 0 to 5). Rigid substructures of points identified by our technique are illustrated as a minimum spanning tree. Small deformative changes in the rigid substructures across the time sequence manifest as variations in the spanning tree structure from one time point to the next (see Fig. 4). For example, in 4 (c) (which shows an enlarged view of one of the rigid substructures in Fig. 4 (a) with correspondence of points), point labeled j was connected to point i in time point 0 but its corresponding point j in time point 3 was connected to point h , indicating that it moved closer to point h in the course of the time sequence, even though the overall substructure remained the same. Agreement: To evaluate the agreement of the two techniques, we superimposed the rigid substructures obtained using the algorithms in §2.3 ‘on’ the voronoi regions generated based on mobility based clustering. The results obtained on a sequence are shown in Fig. 4(a) and (c) whereas the superimposed image is shown in Fig. 4(b). In all datasets, we observed that the rigid substructures belong almost entirely to the same zone. k
j
e
h
g
i
d
c
f
b a
k”
j”
g”
e” d” c”
h” i” f”
b” a”
(a)
(b)
(c)
Fig. 4. A cell nucleus showing (a) rigid substructures (in a spanning tree form), (b) superimposed with mobility zones (color scheme same as in Fig. 3), (c) enlarged view of a sub-region for time point 0 (top) and time point 3 (bottom)
4 Conclusions We have proposed several algorithms for determination and analysis of motion dynamics of functional sites. While the subproblems we address such as mobility based clustering and determination of quasi-isomorphic substructures are interesting in their own right, we are particularly excited by the ‘motion patterns’ in the movement of RS
On Mobility Analysis of Functional Sites
585
discovered by the technique. These results dispense (at least in part) the notion about the randomness of such movements. In fact, the strong correlation between spatial proximity and the degree of motion of the sites, coupled with presence of rigid substructures suggests that the movement of replication sites over small regions inside the nucleus are either inter-dependent or the sites are influenced equally by some unknown factor. Investigations are currently underway to understand and interpret these patterns in terms of nuclear replication, function and cell phases.
References 1. Rebollo, E., Gonzalez, C.: Time-lapse imaging of male meiosis by phase-contrast and fluorescence microscopy. Cell Biology and Biophysics 247 (2004) 77–87 2. Neumann, B., Held, M., Liebel, U., Erflea, H., Rogers, P., Pepperkok, R., Ellenberg, J.: Highthroughput RNAi screening by time-lapse imaging of live human cells. Nature Methods 3 (2006) 385–390 3. Berezney, R., Malyavantham, K.S., Pliss, A., Bhattacharya, S., Acharya, R.: Spatio-temporal dynamics of genomic organization and function in the mammalian cell nucleus. Advances in Enzyme Regulation 45 (2005) 17–26 4. Cho, M., Mount, D.: Improved approximation bounds for planar point pattern matching. In: Proc. Workshop on Algorithms and Data Structures (WADS). (2005) 432–443 5. Chen, X., Zhou, X., Wong, S.: Automated segmentation, classification, and tracking of cancer cell nuclei in time lapse microscopy. IEEE Transactions on Biomedical Engineering 53 (2006) 762–766 6. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31 (1999) 264–323 7. Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. In: In Proc. of ACM SIGMOD Int’l Conf. on Management of Data. (1998) 73–84 8. de Berg, M., Schwarzkopf, O., van Kreveld, M., Overmars, M.: Computational Geometry: Algorithms and Applications. Springer-Verlag (2000) 9. Wang, X., Wang, J., Shasha, D., Shapiro, B., Rigoutsos, I., Zhang, K.: Finding patterns in three dimensional graphs: Algorithms and applications to scientific data mining. IEEE Transactions on Knowledge and Data Engineering 14 (2002) 731–750
Tissue Characterization Using Dimensionality Reduction and Fluorescence Imaging Karim Lekadir1, Daniel S. Elson2, Jose Requejo-Isidro2, Christopher Dunsby2, James McGinty2, Neil Galletly3, Gordon Stamp3, Paul M. W. French2, and Guang-Zhong Yang1 1
Visual Information Processing Group, Department of Computing 2 Department of Physics 3 Division of Investigative Sciences Imperial College London, United Kingdom
Abstract. Multidimensional fluorescence imaging is a powerful molecular imaging modality that is emerging as an important tool in the study of biological tissues. Due to the large volume of multi-spectral data associated with the technique, it is often difficult to find the best combination of parameters to maximize the contrast between different tissue types. This paper presents a novel framework for the characterization of tissue compositions based on the use of time resolved fluorescence imaging without the explicit modeling of the decays. The composition is characterized through soft clustering based on manifold embedding for reducing the dimensionality of the datasets and obtaining a consistent differentiation scheme for determining intrinsic constituents of the tissue. The proposed technique has the benefit of being fully automatic, which could have significant advantages for automated histopathology and increasing the speed of intraoperative decisions. Validation of the technique is carried out with both phantom data and tissue samples of the human pancreas.
1 Introduction Fluorescence is an effective means of achieving optical molecular contrast in a wide range of instruments including cuvette-based systems, microscopes, endoscopes and multi-well plate readers. Fluorescent molecules (fluorophores) can be used as “labels” to tag specific molecules of interest. Alternatively, the fluorescence properties of the target molecules themselves may be exploited to provide label-free contrast. In addition to providing information about the properties of the fluorophores, the fluorescence process can be sensitive to the local environment surrounding the fluorophore, thus providing a sensing function. In principle, different species of fluorophores may be characterized by their excitation and emission spectra, quantum efficiency, polarization response, and fluorescence lifetime. These parameters can change as a function of the local viscosity, temperature, refractive index, pH, calcium, oxygen concentration, and electric field. The practical use of these techniques for tissue characterization is therefore challenging due to the need for an automatic method for extracting R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 586 – 593, 2006. © Springer-Verlag Berlin Heidelberg 2006
Tissue Characterization Using Dimensionality Reduction and Fluorescence Imaging
587
the intrinsic fluorescence properties of the tissue from what is often high dimensional data (combinations of x , y, t, λex , λem ). Fluorescence lifetime imaging (FLIM) is a technique that allows the fluorescence decay profile to be determined for each pixel of an image. After optical excitation, the fluorescence emitted by a fluorophore typically decays with a molecular- and/or environment-dependent average time, called the fluorescence lifetime, (typically ~100 ps – few ns). FLIM is becoming increasingly popular as a method of measuring FRET [1] or autofluorescence contrast in tissues [2,3]. Typically, the computation of fluorescent lifetimes or the analysis of raw FLIM data requires a model of the expected decay function to be chosen based on a priori assumptions. For instance, a double exponential decay model may be chosen in the case of a molecule with two distinct decay pathways, or a stretched exponential model may be chosen where a distribution of lifetimes is expected [4]. The results obtained depend on the choice of the model, and there are many situations where either the correct model is unknown or involves more fitting parameters than are meaningful for the limited signal-to-noise ratio. In practice, even if the correct model is used to obtain the fluorescence lifetime at each pixel, the analysis and presentation of the data are complicated for systems designed to resolve additional fluorescence properties such as the excitation or emission profiles. Thus far, some alternative model-free methods have been proposed for time resolved data only, such as the expansion of fluorescence decays in a discrete Laguerre basis [5]. The challenge, however, is to take advantage of the model-free nature of the algorithms and treat the multi-spectral raw time-gated images as the input dimensions whilst presenting the data in an intuitive image format. The purpose of this paper is to introduce a new framework that is model free and makes no assumptions about the distribution of the data for automated tissue characterization. By the use of manifold embedding, the method reduces the entire time-resolved image stack to a consistent representation using a color map that reflects the intrinsic fluorescence properties of the tissue sample. This technique has the additional benefit of being entirely automated, which could have significant advantages for automated histopathology and increasing the speed of intraoperative decisions. Validation of the technique is carried out with both phantom data and tissue samples of the human pancreas.
2 Methods 2.1 Fixed Reference IsoMap (FR-IsoMap) Theoretically, the main contribution of the work is the introduction of FR-IsoMap which allows consistent dimensionality reduction across samples. IsoMap [6] is a nonlinear dimensionality reduction technique that is able to describe the global structure of non-linear manifolds by detecting the main meaningful underlying dimensions. The method preserves the interpoint distances in a way that is similar to classical multidimensional scaling (MDS) [7] but caters for complex nonlinear manifolds by the use of geodesic distances for representing the dissimilarities. Given a set of N pixels { Pi } in the original space, a neighborhood graph G is first constructed and used to initialize the geodesic distance matrix DG as follows:
588
K. Lekadir et al.
⎧⎪d (P , P ) if P and P are neighbors i j i j ⎪ dG (i, j ) = ⎪⎨ ⎪⎪ +∞ otherwise ⎪⎩
(1)
where d is a dissimilarity distance between two pixels. The final geodesic distances are determined by calculating the shortest paths between all pairs of points in the graph G , by replacing all entries dG for each value of k from 1 to N by:
dG (i, j ) = min (dG (i, j ) , dG (i, k ) + dG (k , j ))
(2)
The matrix DG of geodesic distances is then fed to the classical MDS procedure [7]. The derived N eigenvectors v p are sorted in the decreasing order of the corresponding eigenvalues λp . The new coordinates { yip } of a pixel Pi in the embedded space are then calculated as follows: yip = λp v pi
(3)
The three main coordinates are used as color channels to construct a color map representation for tissue characterization. For every new image, however, reapplying IsoMap can be time consuming and may not guarantee a consistent embedding as it can change dramatically depending on the data distribution of the manifold. This is due to the lack of a fixed coordinate system, which prohibits the comparison of the embedded results across different tissue samples. To circumvent this problem, FRIsoMap is developed, which involves applying IsoMap only on a training data set that represents well the variability within the tissue. In order to obtain an optimal reference coordinate system whilst maintaining the topology of the manifold, input vectors that are evenly distributed on the manifold are selected. This is achieved by first selecting one or some seed pixels that correspond to the main constituents in the image and then eliminating points on the manifold that are within a predefined distance of the selected pixel. This is then continued until all pixels are either selected or rejected. To ensure consistent embedding, the training sample generated at the first stage and the corresponding original and embedding coordinates are used as a model to predict the embedded coordinates of a pixel in a new image. Given the fact that the training sample represents the variability within the data, the position of a new pixel in the embedded space can be predicted using the coordinates of the most similar pixels in the training set. The k nearest neighbors of the training samples are located and the corresponding embedded coordinates are then calculated by minimizing the Sammon’s nonlinear mapping criteria shown in Eq. (4) such that the remapped distances approximates well the original distances [8] i.e., 2
Es =
∑ i<j
dij*
1 ⎡dij* ⎤ ⎣ ⎦
∑ i<j
(dij* − dij ) dij*
(4)
are pairwise distances in the embedded and original spaces, respecwhere dij and tively. By defining Es (m ) and dij (m ) as the mapping error and the embedded
Tissue Characterization Using Dimensionality Reduction and Fluorescence Imaging
589
distance after the mth-iteration, respectively, the above problem can be solved iteratively and the newly estimated coordinates of the sample at iteration m + 1 is given by: ⎡ ∂E (m ) ⎤ ∂2E (m ) ⎥ x p (m + 1) = x p (m ) − α ⎢⎢ ⎥ 2 ⎢⎣ ∂x p (m ) ⎥⎦ ∂x p (m )
−1
(5)
where x p is the pth coordinate component of the new pixel in the fixed coordinate system, and α is the step size. The initial values of the coordinates are given by the nearest neighbor, which in practice gives a good indication of the final result. 2.2 Phantom and Tissue Sample Validation
In order to validate the proposed FR-IsoMap scheme for tissue characterization, both phantom data and tissue samples of the unstained human pancreas were used. Timeresolved fluorescence images were recorded by exciting the sample using a pulsed excitation source (a broadband supercontinuum laser source for the phantom experiment [9] and an amplified and frequency-doubled Ti:Sapphire laser for the pancreas tissue data). The resulting fluorescence was then imaged onto a microchannel plate gated optical intensifier (GOI, gate width ~400 ps) and temporal profiles of the fluorescence decay were recorded by acquiring gated images of the fluorescence at different delays after excitation (for details of a similar system see [10]). For the phantom data set, a total of 17 gated images were recorded over a total delay range of 11 ns of two identical rows in a 384 multi-well plate containing five different dye solutions (DASPI in ethanol (DASPI-I), Coumarin, Rhodamine 700, Eosin, and DASPI in 40% ethanol, 60% glycerol (DASPI-II)). The large area imaged (1.5 x 2 cm) allowed a standard camera objective to image the fluorescence onto the MCP. For pancreas tissue imaging, the MCP was coupled to an Olympus IX71 inverted fluorescence microscope, and 14 gated images were recorded over a total delay time of 15 ns. All human tissues used in these experiments, including fresh resection specimens and biopsies, were obtained in accordance with local ethics committee approval (2004/6742). All tissue had been ethically consented for use in medical research. Blocks of formalin-fixed paraffin-embedded tissue were cut to 10 μm sections and mounted unstained on uncoated glass slides. Glass coverslips with 1.5 thickness were mounted on the slides using either an aqueous-based coverslip mountant (50% PBS, 50% glycerol) or a commercially available mountant (DPX). To assess the quality of the mapping, the distances between every new pixel in an image and the entire training sample are calculated in both the original space and the common coordinate system. The preservation of the distances is then measured by calculating the corresponding correlation coefficient: ρD *D =
∑ (d − d )(d − d ) ∑ (d − d ) ∑ (d − d ) * ij
* ij
*
*
ij
2
2
ij
(6)
590
K. Lekadir et al.
3 Results Fig. 1 shows the conventional lifetime map (FLIM map) and the proposed FR-IsoMap for the multi-well phantom data with five different dye solutions. It can be seen that the FLIM map shows strong lifetime contrast for the DASPI-I and the Coumarin but is not able to distinguish the remaining dyes well. As highlighted in the histogram in Fig. 2(a), five peaks representing the different dyes are generated but with narrow separation and overlapped with the background, suggesting that ambiguous segmentation may be generated in the FLIM map. Also, the areas immediately surrounding the wells are low intensity background, but lifetimes for these regions are calculated that are similar to the well in the immediate vicinity due to the influence of scattered light or image blurring. Although this background could be thresholded out, this requires user input and is an interruption of the automated process. In addition, signals of low intensity may be lost along with the background. In contrast, the FR-IsoMap shows distinctive sample separation with clearly delineated background. The mapping accuracy calculated using Eq. (6) was equal to 0.989 and 0.993 for the two samples shown in Fig. 1, respectively. In Table 1, the average interpixel distances in the embedded space were calculated between all combination of dyes and within the five dyes. The results show a low average interpixel distance within a dye while this distance increases significantly across the different dyes, suggesting that the constituents are well separated. For a Table 1. Average interpixel distances within and between the different dyes calculated in the embedded space DASPI-I DASPI-I Coumarin Rhodamine Eosin DASPI-II Background
11.6 293.2 146.0 292.3 174.3 191.2
Coumarin Rhodamine 293.2 57.7 193.1 260.1 186.8 233.2
146.0 193.1 13.6 212.5 72.0 70.4
Eosin 293.3 260.1 212.5 41.3 145.7 229.2
DASPI-II Background 174.3 186.8 72.0 145.7 17.5 113.6
191.2 233.2 70.4 229.2 113.6 4.6
(a) (b) FR-IsoMap
FLIM
Fig. 1. Results of FR-IsoMap and FLIM as applied to the multi-well phantom data with five different dye solutions. FLIM with single exponential decay is not able to distinguish completely between the different dyes or the background while the FR-IsoMap clearly distinguishes all the dyes in the images.
Tissue Characterization Using Dimensionality Reduction and Fluorescence Imaging
591
visual illustration, Fig. 2(b) shows the histogram of the interpixel distances within the Rhodamine dye and between the Rhodamine and the DASPI-II dyes. Although the average interpixel distance between the two dyes correspond to the lowest amongst all dyes combinations in Table 1, the graph shows that the two constituents are clearly separated by using the proposed FR-IsoMap method.
(a)
(b)
Fig. 2. (a) FLIM histogram showing the peaks corresponding to the different dyes overlapping especially with the background. In (b), the histogram describing the interpixel distances in the FR-IsoMap embedded space within the Rhodamine and between DASPI-II and the Rhodamine clearly separates the two dyes.
For the characterization of the pancreas tissue compositions, the proposed method was applied to the 30 datasets collected. The training sample was constructed by selecting 567 pixels from dataset 28. It is evident from the accuracy results plotted in Fig. 3 that high mapping accuracy is maintained throughout the entire dataset, with an average correlation coefficient equal to 0.993 ± 0.005.
Fig. 3. Reconstruction accuracy plotted for the 30 human tissue pancreas samples studied, demonstrating a high accuracy mapping for the entire tissue datasets
592
K. Lekadir et al.
For visual illustrations of the color maps generated for the pancreas images, Fig. 4 (a) and (b) illustrate two examples corresponding to the datasets 1 and 25 involved in this study. It can be seen that the different constituents are well distinguished in the color maps and mapped in a consistent manner across the samples when compared with the FLIM images. In particular, there appears to be more fine structure and improved spatio-contrast visible in FR-IsoMap. The oval structure in the centre of Fig. 4(a) is a duct, and the area just to the right of it (around the centre of the image) is collagen and elastin rich tissue. Fig. 4(b) has two oval islets of Langerhans in the centre, a duct on the right hand side, and the edge of the pancreas towards the left. It is worth noting that Fig. 4(c) corresponds to the same sample than in (b) but with a particularly high level of noise, in this case the obtained FR-IsoMap still displays a good contrast between the different constituents in the image while the FLIM map is affected by the noise, due to the least squares approach to fit the exponential decay.
(a)
(b)
(c)
FR-IsoMap
FLIM
Fig. 4. Example human pancreas tissue sections mapped with FLIM and FR-IsoMap respectively, showing the improved tissue class separation and improved SNR and spatio-contrast introduced by the proposed tissue characterization framework
4 Conclusions We have presented in this paper a framework for tissue characterization with time resolved fluorescence images. The method is based on soft clustering achieved by using dimensionality reduction. The proposed FR-IsoMap constructs a fixed
Tissue Characterization Using Dimensionality Reduction and Fluorescence Imaging
593
dimensional coordinate system by selecting a training set that best represents the variability within the tissue. New samples are then embedded into the reference coordinate system such that local similarities are preserved. Consistent color maps are generated from the embedded data which are capable of enhancing dissimilarities between tissue compositions. In comparison with the conventional FLIM maps, the proposed methods does not require a priori knowledge about the sample under investigation and all of the analysis may be presented on just one image in a fully automated manner. Furthermore, this approach may be extended to higher-dimensionality data sets, for instance where combinations of the lifetime, excitation and emission spectral profiles are recorded, and validation work is currently in progress.
References 1. Bastiaens PIH and Squire A, "Fluorescence lifetime imaging microscopy: spatial resolution of biochemical processes in the cell," Trends in Cell Biology, vol. 9, pp. 48-52, 1999. 2. Mizeret J, Wagnieres G, Stepinac T, and Bergh HVD, "Endoscopic tissue characterization by frequency-domain fluorescence lifetime imaging (FD-FLIM)," Lasers in Medical Science, vol. 12, pp. 209-217, 1997. 3. Cubeddu R, Comelli D, D'Andrea C, Taroni P, and Valentini G, "Time-resolved fluorescence imaging in biology and medicine," Journal of Physics D: Applied Physics, vol. 35, pp. R61-R76, 2002. 4. Lee KCB, Siegel J, Webb SED, Leveque-Fort S, Cole MJ, Jones R, Dowling K, Lever MJ, and French PMW, "Application of the stretched exponential function to fluorescence lifetime imaging," Biophysical Journal, vol. 81, pp. 1265-1274, 2001. 5. Jo JA, Fang QY, Papaioannou T, and Marcu L, "Fast model-free deconvolution of fluorescence decay for analysis of biological systems," Journal of Biomedical Optics, vol. 9, pp. 743-752, 2004. 6. Tenenbaum JB, Silva Vd, and Langford JC, "A global geometric framework for nonlinear dimensionality reduction," Science, vol. 290, pp. 2319-2323, 2000. 7. Cox TF and Cox MAA, Multidimensional scaling. London: Chapman & Hall, 1994. 8. Sammon JW, "A nonlinear mapping algorithm for data structure analysis," IEEE Transactions Computers, vol. 18, pp. 401-409, 1969. 9. McConnell G, "Confocal laser scanning fluorescence microscopy with a visible continuum source," Optics Express, vol. 12, pp. 2844-2850, 2004. 10. Siegel J, Elson DS, Webb SED, Lee KCB, Vlanclas A, Gambaruto GL, Leveque-Fort S, Lever MJ, Tadrous PJ, Stamp GWH, Wallace AL, Sandison A, Watson TF, Alvarez F, and French PMW, "Studying biological tissue with fluorescence lifetime imaging: microscopy, endoscopy, and complex decay profiles," Journal/Applied Optics, vol. 42, pp. 2995-3004, 2003.
A Method for Registering Diffusion Weighted Magnetic Resonance Images Xiaodong Tao and James V. Miller GE Research, Niskayuna, New York, USA
Abstract. Diffusion weighted magnetic resonance (DWMR or DW) imaging is a fast evolving technique to investigate the connectivity of brain white matter by measuring the self-diffusion of the water molecules in the tissue. Registration is a key step in group analysis of the DW images that may lead to understanding of functional and structural variability of the normal brain, understanding disease process, and improving neurosurgical planning. In this paper, we present a new method for registering DW images. The method works directly on the diffusion weighted images without using tensor reconstruction, fiber tracking, and fiber clustering. Therefore, the performance of the method does not rely on the accuracy and robustness of these steps. Moreover, since all the information in the original diffusion weighted images is used for registration, the results of the method is robust to imaging noise. We demonstrate the method on intra-subject registration with an affine transform using DW images acquired on the same scanner with the same imaging protocol. Extension to deformable registration for images acquired on different scanners and/or with different imaging protocols is also discussed. This work is part of the National Alliance for Medical Image Computing, funded by the National Institutes of Health through the NIH Roadmap for Medical Research, Grant U54 EB005149. Information on the National Centers for Biomedical Computing can be obtained from http://nihroadmap.nih.gov/bioinformatics.
1 Introduction Diffusion is a process caused by Brownian motion of molecules. From the microscopic viewpoint and within a small enough time period, the motion is completely random, governed by thermal dynamics. When observed at a relatively large scale, the diffusion is often limited by the structure of the substance in which the molecules reside. By investigating the diffusion properties of molecules in a sample, one can infer the structure of the sample. Diffusion weighted magnetic resonance imaging is an emerging modality that is able to measure the diffusivity in a sample [1,2]. Its use in the area of medical imaging has drawn increasing interest over the past decade. DW imaging is a powerful tool for neurologists in mapping the brain white matter fibers and has a great potential to elicit the functional organization, development, aging, and disease process of the brain [3,4,5,6]. DW imaging also plays an important role for neurosurgical planning [7]. Group analysis is essential for understanding normal variability and detecting abnormalities. It involves transforming images to a canonical coordinate frame, normalizing gross inter-subject morphological differences [8]. In the general framework of image R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 594–602, 2006. c Springer-Verlag Berlin Heidelberg 2006
A Method for Registering Diffusion Weighted Magnetic Resonance Images
(a)
(b)
(c)
595
(d)
Fig. 1. T2-weighted images [(a) and (b)] and DW images [(c) and (d)] of the same subject (registered and interpolated to the same axial slice) from two different sessions. Although (a) and (b) show good alignment, (c) and (d) differ due to DW intensity dependency on patient orientation.
registration, a transformation is sought to maximize the similarity between the transformed moving image and the fixed image. One premise is that both images contain the same information that is independent of the transformation. This premise, however, is not true for DW images whose intensity values also depend on the patient orientation, and therefore on the transformation (Fig. 1). The dependence of the DW measurements on patient orientation makes DW image registration fundamentally different from conventional registration of T1 and T2 weighted images. Previous approaches for registering DW images usually decompose the DW images into an orientation independent part (e.g., a fractional anisotropy (FA) map) and an orientation dependent part (e.g., a diffusion tensor map or a principal direction map). The orientation independent parts are registered and the orientation dependent parts are reoriented by the transformation from the orientation independent part [9,10,11,12,13]. Note, the orientation independent parts are usually derived from a tensor estimation or are based on an extra T1 acquisition. Therefore, the accuracy of DW image registration depends on these steps. In this paper, we develop a unified framework for DW image registration, which works directly on the DW images without requiring tensor reconstruction, or co-registration of DW images with corresponding T1 weighted images. We demonstrate the new algorithm on intra-subject registration using affine transforms. We also compare the proposed algorithm with an algorithm based on reorientation of principal diffusion directions. Finally, we discuss how this new registration framework can be extended into more general deformable registration.
2 Background In organized tissues, such as brain white matter, water molecules undergo anisotropic self diffusion, which is often modeled by the self-correlation function P (r|r + R, t) (using the nomenclature from [1]). The self-correlation function can be thought of as the probability that a molecule initially at r will move to location r + R after a time of t. The average propagator is defined as: P¯ (R, t) = P (r|r + R, t)ρ(r)dr, (1)
596
X. Tao and J.V. Miller
where ρ(r) is the proton density. Under certain assumptions (e.g., narrow gradient pulse), the average propagator is related to the measured MR diffusion signal by the Fourier relationship: E(q) = P¯ (R, t)ei2πq·R dR, (2) where q = (2π)−1 γδg, γ is the gyromagnetic ratio for the protons, δ is the diffusion gradient duration, and g is a diffusion gradient vector. The above equation has been normalized such that E(q) = 1 for q = 0. In practice, one or more T2-Weighted baseline images are acquired to properly normalize the diffusion weighted images. The MR signal, E(q), is used to reconstruct the average propagator, P¯ (R, t), based on a model for diffusion (e. g., anisotropic Gaussian diffusion). When studying orientational structure of tissue, it is often sufficient to limit the diffusion gradient vectors to be unit vectors. Under these conditions, E(q) in Eq. (2) can be viewed as samples drawn from a function defined on the unit sphere. In the following, we will denote a DW image as I(r, g), with r ∈ 3 and g = 1. g’=T(g)/||T(g)||
g
g•R
Affine transform
R
r
T
g•T(R) T(r)
R
g: diffusion gradient direction R: displacement along a fiber
Fig. 2. A voxel is transformed by an affine transform T. Under such a transform, the projection of the displacement vector R on the diffusion gradient direction changes from g · R to g · T(R).
From Eq. (2), we can see that the diffusion MR signal depends on the relative position of the tissue sample and the diffusion gradient vector g. Usually g is defined in the world coordinate system and is fixed for a given imaging protocol. Therefore, I(r, g) depends on the patient orientation with respect to the scanner. For a constant g and a given anatomic location, the MR signal, I(r, g), varies with the orientation of r with respect to g. This is illustrated in Fig. 2. The orientation dependency of the DW images differs from conventional T1-weighted and T2-weighted images and makes it difficult to compare DW images directly. In Section 3.1, we describe an angular interpolation algorithm that enables a direct comparison of DW images. The angular interpolation algorithm is the basis of our registration method for DW images.
3 Registration Framework Many image registration algorithms exist in the literature. The majority can be formulated as a general optimization problem. Given two images in 3-dimensional space, a
A Method for Registering Diffusion Weighted Magnetic Resonance Images
597
fixed image I1 (r) and a moving image I2 (r), r ∈ 3 , a transformation T is sought to minimize a cost function: E[T] = D [I1 (r), T ◦ I2 (r)] ,
(3)
where D is a measure of difference between two images. D can be the mean squared error between image intensities, distances between corresponding pairs of landmarks, or information theoretic measures such as mutual information. Generally speaking, there are four major components for a registration task [14]: a metric [D in Eq. (3)] that defines the difference measure between images; a transformation T that defines the class of transformation applied to the moving image, which may be linear (e. g., affine transform) or deformable (e. g., B-Spline deformable transform); an optimizer that searches the parameter space of the transformation; and an interpolation method that interpolates image intensities in the transformed moving image. 3.1 Angular Interpolation of Diffusion Weighted Images As shown in Fig. 1, registered and resampled T2-weighted images can accurately align the anatomy, yet the resampled DW images can be quite different due to the DW MR signal’s dependency on patient orientation. To address the orientation dependency, we need to transform diffusion weighting gradient directions in accordance with the transform that aligns the patient anatomy. We will describe an angular interpolation algorithm in this section. The MR diffusion signals E(g) is a function defined on the unit sphere. The measurements are samples of this function in the given diffusion gradient directions. Given N diffusion gradient directions gi , with gi = 1, the intensity of DW image with a diffusion gradient direction k at a spatial location r is N N I(r, k) = wi I(r, gi ) / wi , (4) i=1
i=1
where wi = e−d /2σ , with d = cos−1 |k · gi | being the geodesic distance on the sphere between the gradient direction and the interpolated direction, and σ being a parameter that controls the smoothness of the interpolation operator. Other weighting schemes based on angular difference can also be used. By taking the absolute value of the dot product of k and gi , we assume the image I(r, g) is symmetric in g, i.e., I(r, g) = I(r, −g). Using the interpolation of Eq. (4), we can estimate a diffusion MR measurement in any direction, given that the underlying orientational structure of tissue is fairly smooth. We note that when k = gi for some i, Eq. (4) becomes a smoothing operation on the DW images, which can serve as a pre-processing step to reduce noise. This angular smoothing operation uses information at a given voxel from all diffusion gradient directions. Moreover, it does not spatially average across voxels. 2
2
3.2 Resampling DW Images Using Angular Interpolation The angular interpolation algorithm can be used to transform and resample DW images to have consistent diffusion gradient directions. Given an affine transform T, we can
598
X. Tao and J.V. Miller
represent it by a 3 × 3 transformation matrix, M , and a translation vector, t. Using this representation, a point r and a diffusion gradient vector g are transformed into (see Fig. 2) r = M · r + t, and g = M · g/M · g. (5) The vector g is normalized to have unit length because we only care about its direction. Therefore, an affine transform T will transform a DW image I(r, g) into M −1 g I (r, g) = I M −1 (r − t), (6) M −1 g using Eqs. (4). Note if we can register T2 weighted image, we could use the resulting transformation to reorient the DW images using the above equation. 3.3 Registration of Diffusion Weighted Images Although it is possible to use T2 weighted images to obtain a transform that aligns two DW image sets, much more information is contained in diffusion weighted images. By using the complete information, we will get more accurate and robust results. We now describe a method to directly register DW images using the angular interpolation. For the purpose of discussion, we will use a mean squared error (MSE) and an affine transform in the following. Extension to other image metric is straightforward. Extension to deformable registration can be achieved by approximating a deformable transform with locally affine transforms. Given two sets of DW images, I1 (r) and I2 (r), r ∈ 3 , that consist of volumetric images in M and N diffusion gradient directions, respectively. The volumetric images are denoted as (1)
(2)
I1 (r, gi ) and I2 (r, gj ), i = 1, · · · , M, j = 1, · · · , N. Note that I1 (r) and I2 (r) can have different gradient direction. Therefore, the proposed method is able to register DW images acquired with different protocols. The intensity of the DW images are properly normalized and bias corrected. So, mean square error can be used as an image similarity measure. The objective of registration is to find the affine transformation T (composed of a transformation matrix M and a translation vector t) that minimizes the MSE between I1 (r) and transformed I2 (r), which is given by [see Eq. (6)] E(M, t) =
M i=1 r
⎡
⎛
⎣I1 r, gi(1) − I2 ⎝M −1 (r − t),
⎞⎤2
(1) M −1 gi ⎠⎦ −1 (1) M gi
dr.
(7)
The transform, T, not only maps a point r in the fixed image domain to an anatomically corresponding point r in the moving image domain, but also interpolates the DW image intensity at r in the properly transformed diffusion gradient direction using measurements at that location from all directions. Effectively, the spatial transformation “moves” the patient so that the same anatomic feature is at the same location with
A Method for Registering Diffusion Weighted Magnetic Resonance Images
599
respect to the scanner, and the angular interpolation “rotates” gi so that a consistent relationship between gi and tissue orientation is achieved. We use a gradient descent method to solve Eq. (7). Other optimization algorithms can also be used. The registration algorithm is implemented using the Insight Segmentation and Registration Toolkit (ITK) [15], which provides rich sets of transformations, image metrics, optimizers, and interpolation algorithms that can be customized for different applications. In our implementation, we use an affine transform, a mean square error metric, and a gradient descent optimizer. The transformed images are interpolated using a linear interpolation algorithm. The new angular interpolation algorithm is used to compute MSE between DW images.
4 Experimental Results Two sets of DW images were acquired for one subject with the same imaging protocol. Each set of DW images has 5 baseline T2 weighted images and 30 diffusion weighted images. The image resolution is 0.9375 × 0.9375 × 2.5mm3 . The two sets of images are denoted as I1 and I2 for session one and session two. I1 is used as the fixed image. I2 is registered to I1 , resulting in a transform T. Based on our experiments, we fixed σ to be the average of the geodesic distances between neighboring directions in all the experiments. Fig. 3 shows the axial slices of a DW image with same diffusion weighting gradient from I1 , transformed I2 , and transformed I2 with angular interpolation. From the image, we can see that angular interpolation introduces blurring, but the image with angular interpolation does reveal same tissue orientation at same place. The regions highlighted by the boxes are zoomed in for better visualization in each images.
(a)
(b)
(c)
Fig. 3. An axial slice from I1 (a); same slice from transformed I2 (b); and transformed I2 with angular interpolation (c). The images have same diffusion weighting gradient w.r.t. the scanner.
Fig. 4 (a) shows the fractional anisotropy map of an axial slice of I1 ; Fig. 4 (b) shows the same slice of the transformed FA map of I2 ; and Fig. 4 (c) shows the FA map computed from the registered I2 . The same structures can be seen in all the images. Because of the extra step of angular interpolation, the FA map showed in Fig. 4 (c) has less contrast. In Fig. 5, tractography results are overlaid on a coronal slice of FA maps. We used a streamline based tractography algorithm similar to [16]. For all tractography, the same
600
X. Tao and J.V. Miller
(a)
(b)
(c)
Fig. 4. An axial slice of the FA map of I1 (a), the transformed FA map of I2 (b), and the FA map of the registered I2 (c)
(a)
(b)
(c)
Fig. 5. Fiber tracts of the fixed DW images (a), registration using only T2 weighted images (b), and registration using entire DW image set (c)
set of parameters are used; the seed points are located near mid-sagittal plane and have the same coordinates. Fig. 5 (a) shows fiber tracts obtained from the fixed image I1 ; Fig. 5 (b) shows fiber tracts obtained by reorienting principle directions using a method similar to [9]; and Fig. 5 (c) shows fiber tracts obtained from the registered I2 using the proposed method. The figures clearly show that fiber tracts in (a) and (c) reveal the same white matter structures. In (b) the fiber tracts deviate from the underlying WM structures, although the FA map look similar to the ones in (a) and (c). The deviation is primarily due to the inconsistency between the registration and tensor reorientation.
5 Summary and Discussion In this paper, we presented a novel method for registering diffusion weighted MR images. The method works directly on the DW images without using tensor reconstruction or referencing co-registered T1 weighted images. We developed an angular interpolation algorithm that allows us to estimate DW image intensity at a given location in any direction from the acquired DW images. This interpolation algorithm effectively rotates the patient into a consistent orientation so that direct comparison of DW images with same diffusion gradient direction is meaningful. We showed preliminary results of registering DW images of the same subject acquired in two imaging sessions. Because of the smoothing effect of the angular interpolation, the DW images after interpolation is smoother and the FA of the registered images is smaller. The principal diffusion directions are well preserved, indicated by fiber tracking results. There are several advantages of the proposed method. Firstly, by using all the information in the original DW images, the registration is robust. The cost function [Eq. (7)]
A Method for Registering Diffusion Weighted Magnetic Resonance Images
601
depends on DW images in all diffusion directions. It is minimized only when all images are consistently aligned. Secondly, the method does not depend on tensor reconstruction or reference to co-registered T1 weighted images. Therefore, its performance does not rely on the accuracy and robustness of these steps. And thirdly, the method fits well in the unified registration framework without requiring an extra step to reorient tensors or principal diffusion directions. Validation is a difficult but important topic for the future. More studies will be done to compare the proposed method with existing ones. In this paper, we used an affine transform to register DW images of same subject acquired in two sessions. Extending the algorithm to non-rigid registration of DW images of different subjects will be very interesting, and have broader applications. Future work also includes applying the method to more datasets and studying variability of diffusion properties of the brain white matter.
Acknowledgment The authors would like to thank Dr. Susumu Mori at Department of Radiology Johns Hopkins University, School of Medicine for providing the datasets used in this paper.
References 1. P. T. Callaghan, “Principles of Nuclear Magnetic Resonance Microscopy”, Oxford: Oxford University Press, 1993. 2. P. Basser, J. Mattiello and D. LeBihan, “Estimation of the Effective Self-Diffusion Tensor from teh NMR Spin Echo”, Journal of Magnetic Resonance, B, 103:247-254, 1994. 3. C. Pierpaoli, P. Jezzard, P. J. Basser, A. Barnett and G. Di Chiro, “Diffusion Tensor MR Imaging of the Human Brain”, Radiology, 201: 637-648, 1996. 4. D. S. Tuch, T. G. Reese, M. R. Wiegell and V. J. Wedeen, “Diffusion MRI of Complex Neural Architecture”, Neuron, 40:885-895, 2003. 5. P.S. Huppi, S.E. Maier, S. Peled, G.P. Zientara, P.D. Barnes, F.A. Jolesz and J.J. Volpe, “Microstructural development of human newborn cerebral white matter assessed in vivo by diffusion tensor magnetic resonance imaging”, Pediatric Research, 44:584-590, 1998. 6. J. Foong, M. Maier, C. A. Clark, G. J. Barker, D. H. Miller and M. A. Ron, “Neuropathological abnormalities of the corpus callosum in schizophrenia: a diffusion tensor imaging study”, J Neurol Neurosurg Psychiatry, 68:242-244, 2000. 7. C. A. Clark, T. R. Barrick, M. M. Murphy and B. A. Bell, “White matter fiber tracking in patients with space-occupying lesions of the brain: a new technique for neurosurgical planning?”, Neuroimage, 20: 1601-1608, 2003. 8. J. B. A. Maintz and M. A. Viergever, “A survey of medical image registration”, Medical Image Analysis, 2:1-36, 1998. 9. D. C. Alexander, C. Pierpaoli, P. J. Basser and J. C. Gee, “Spatial Transformations of Diffusion Tensor Magnetic Resonance Images”, IEEE Trans. Med. Imag., 1131-39, 2001. 10. D. K. Jones, L. D. Griffin, D. C. Alexander , M. Catani, M. A. Horsfield, R. Howard and S. C. R. Williams, “Spatial Normalization and Averaging of Diffusion Tensor MRI Data Sets”, NeuroImage, 17:592-617, 2002. 11. D. Xu, S. Mori, D. Shen, P.C.M. van Zijl and C. Davatzikos, “Spatial Normalization of Diffusion Tensor Fields”, Magnetic Resonance in Medicine, 50:175-182, 2003.
602
X. Tao and J.V. Miller
12. H.-J. Park, M. Kubicki, M.E. Shenton, A. Guimond, R.W. McCarley, S.E. Maier, R. Kikinis, F.A. Jolesz and C.-F. Westinb, “Spatial normalization of diffusion tensor MRI using multiple channels”, Neuroimage, 20:1995-2009, 2003. 13. O. Ciccarelli, A.T. Toosy, G.J.M. Parker , C.A.M. Wheeler-Kingshott, G.J. Barker, D.H. Miller and A.J. Thompsona, “Diffusion tractography based group mapping of major whitematter pathways in the human brain”, Neuroimage, 19:1545-1555, 2003. 14. T. Yoo (editor), “Insight into Images: Principles and Practice for Segmentation, Registration, and Image Analysis”, A. K. Peters, Ltd., 2004. 15. L. Ib´an˜ez, W. Schroeder, L. Ng and J. Cates, “The ITK Software Guide: The Insight Segmentation and Registration Toolkit”, Kitware Inc., 2003. 16. D. Weinstein, G. Kindlmann and E. C. Lundberg, “Tensorlines: advection diffusion based propagation through diffusion tensor fields”, In Proceedings, IEEE Visualization, San Francisco, CA, 1999; p. 249–253.
A High-Order Solution for the Distribution of Target Registration Error in Rigid-Body Point-Based Registration Mehdi Hedjazi Moghari1 and Purang Abolmaesumi1,2 1
Department of Electrical and Computer Engineering, Queen’s University, Canada 2 School of Computing, Queen’s University, Canada
[email protected]
Abstract. Rigid registration of pre-operative surgical plans to intraoperative coordinates of a patient is an important step in computerassisted orthopaedic surgery. A good measure for registration accuracy is the target registration error (TRE) which is the distance after registration between a pair of corresponding points not used in the registration process. However, TRE is not a deterministic value, since there is always error in the localized features (points) utilized in the registration. In this situation, the distribution of TRE carries more information than TRE by itself. Previously, the distribution of TRE has been estimated with the accuracy of the first-order approximation. In this paper, we analytically approximate the TRE distribution up to at least the second-order accuracy based on the Unscented Kalman Filter algorithm.
1
Introduction
The problem of finding the best translation vector and rotation matrix that accurately registers matching point pairs in two data sets, corrupted by isotropic noise, is also called the absolute orientation problem. This problem has been shown to have closed-form solutions [1]. However, reported solutions will end up with imperfect registration when there is error in extracting the feature points employed for registering the two data sets (these extracted points might also be called fiducial points [2], and therefore, the error in extracting them is named fiducial localization error, FLE). Hence, some measures are required to estimate the level of the registration accuracy. To analyze the accuracy of point-based registration algorithms, Maurer et al., has proposed a measure called the target registration error (TRE) [3] which is the distance between the corresponding point pairs not used in the registration process. The properties of TRE have been under investigation for a long period of time. Fitzpatrick et al., in 1998, analytically derived an approximation of the root mean squared value of TRE [2,4]. Later on, in 2001, they mathematically explored an approximation of the TRE distribution [5]. However, the work only presents the first-order approximation of TRE and it does not give an accurate result when the structure of data sets are not equivalent (the moments of data sets about each axis are substantially R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 603–611, 2006. c Springer-Verlag Berlin Heidelberg 2006
604
M.H. Moghari and P. Abolmaesumi
different) [5]. In what follows, we derive an estimation of the distribution of TRE along with its mean squared value, with the accuracy of at least second-order Taylor series approximation. Furthermore, a closed-form solution for computing the distribution of TRE is obtained and is verified by means of numerical simulations. Furthermore, it is shown that the proposed algorithm outperforms the algorithm addressed in [5] most importantly when data sets have different moments about each axis.
2
Method
Let’s assume that the points in one space (moving data set) is shown by U3×N as a matrix whose columns correspond to the position vectors of points in that space. Y3×N represents the corresponding points in the second space (fixed data set). The ith columns of U and Y represent a pair of corresponding points in the two spaces. As in [5], we also, make a simplifying assumption that FLE in the moving space U is negligible (identically zero) in comparison to FLE in the fixed data set Y, by adding the variance of point localization error in the moving data 2 set, σU , to the one in the fixed data set, σY2 . The goal is to determine a rotation matrix R and a translation vector t from the following observation model: y1:i = R(θx ,θy ,θz ) u1:i + t(tx ,ty ,tz ) + n1:i ,
(1)
where y1:i = [y1T , ..., yiT ]T , u1:i = [uT1 , ..., uTi ]T , n1:i = [nT1 , ..., nTi ]T and i = 1, ..., N . t = [tx , ty , tz ]T is the translation vector along orthogonal coordinate frame’s axes and θx , θy and θz are the rotational angles about the orthogonal coordinate frame’s axes, respectively. ni is fiducial localization error (FLE) for point i which is assumed to be a zero-mean Gaussian random vector with covari2 ance matrix Σni = (σU + σY2 )I, where I is the identity matrix. In other words, ni is the error occurring in collecting features (points) from a rigid object for the registration, and it is assumed to be isotropic, and to have the same distribution at each point. In this work, we have used the Euler angles to represent the rotation matrix R. We have accounted for this representation’s singularity points in our simulations by limiting the range in which each rotation angle could change. However, the same analysis presented here is valid for other rotation matrix representations such as the Quaternions and the rotation around a helical axis. Let us define the state vector x as, x = [tx , ty , tz , θx , θy , θz ]T = [xTt , xTθ ]T . xR and xt is defined to be [θx , θy , θz ]T and [tx , ty , tz ]T , respectively. Since Equation (1) is a nonlinear function of x, we utilize the Unscented Kalman Filter (UKF) algorithm [6] to estimate the optimum state vector x in the sense of minimum mean square error. UKF is an algorithm for sequentially updating the state vector x (pose parameters) which is governed by the nonlinear system, Equation (1). Sequentially estimating the pose parameters while pairs of corresponding points are being collected, makes real time registration process more feasible. An alternative to this approach is the employment of the Extended Kalman Filter algorithm for estimating the state parameters and their variances, as proposed
A High-Order Solution for the Distribution of Target Registration Error
605
by Pennec et al [7]. Although EKF works for smoothly nonlinear functions, we propose the UKF algorithm since: 1) UKF can be applied to non-differentiable functions, 2) UKF avoids the derivation of Jacobian matrices, and 3) UKF is valid for higher-order (at least second-order) estimation of the Taylor series expansion than the standard EKF algorithm with the same amount of computational complexity. However, [8] shows that UKF and EKF algorithms have equivalent performance for estimating pose parameters; but our investigations [9] illustrate that the UKF algorithm outperforms EKF in two ways. Firstly, UKF requires fewer input points to converge to the correct solution; secondly, it estimates the variance of pose parameters more accurately than EKF. 2.1
The UKF-Based Registration Algorithm
Let us assume that the state model is defined as xi = xi−1 + N (0, ΣQ ) with the initial value and covariance matrix x0 and P0x , respectively. N (0, ΣQ ) is a zero-mean Gaussian random vector with a covariance matrix ΣQ . It is assumed that the state vector x has the Gaussian distribution. By these assumptions, one can estimate the state vector x from the observation model as follows: 1) Predict the state vector and its covariance matrix from the state model as: x ˆ− ˆi−1 , and Pxˆ− = Pxˆi−1 + ΣQ . i = x i 2) Append the ith point from the moving data set U to the already collected points from that data set, and estimate their corresponding points’ positions in the fixed data set, Y, by using the predicted state vector in step one as y ˆ1:i = R(θˆx ,θˆy ,θˆz ) u1:i + t(ˆtx ,ˆty ,ˆtz ) . 3) Compute the error in the estimated corresponding points in the data set Y, (y1:i − y ˆ1:i ), to update the state vector and its covariance matrix as follows: T x ˆi = x ˆ− ˆ1:i ), Pxˆi = Pxˆ− − Ki E[y1:i y1:i ]KTi , i + Ki (y1:i − y i
(2)
T T where Ki = E[xi y1:i ]/E[y1:i y1:i ]. Since the observation model, Equation (1), is T nonlinear, the Unscented Transform (UT) [10] is used to compute E[xi y1:i ] and T E[y1:i y1:i ], respectively. This algorithm is iterated through all the points in the moving data set U to estimate the optimal state vector.
2.2
Verification of the UKF-Based Registration Algorithm
We examine the UKF-based registration algorithm by registering two random data sets each containing 30 points for 6,000 trials. The moving data set U is drawn uniformly from a cube within the rang of ±250mm. The fixed data set Y is generated by applying a random transformation on the moving data set. The random transformation is generated by uniformly drawing the rotational and translational parameters along all axes within the range of ±80o and ±80 mm, respectively. Also, zero-mean Gaussian random noise is added to the fixed data set to simulate FLE which is assumed to be isotropic. The distribution of FLE is considered to be N (0, 1mm2 ) and to be the same at each point in the fixed data
606
M.H. Moghari and P. Abolmaesumi
set Y. The UKF-based registration algorithm is then employed to register these two data sets. It is assumed that if ||xt − x ˆt ||2 and ||xR − x ˆR ||2 are less than 2 2 2mm and 2degree , respectively, the algorithm is converged. In this simulation the UKF-based registration algorithm converges for %97 of the time. Figure 1 shows the error distribution of estimated translational and rotational parameters. Mean and variance of error of the estimated transformation parameters are listed in Table 1. This simulation is repeated with different distributions of FLE, as N (0, 5mm2 ) and N (0, 10mm2 ), respectively. In these experiments, the algorithm converges %92 and %77 of the time, respectively. By increasing the variance of FLE in the fixed data set, the number of registration points should be raised to attain the same accuracy and convergence rate.
Distribution
tx
ty 0.03
0.03
0.02
0.02
0.02
0.01
0.01
0.01
0 −1
0
1
0 −1
Error (mm) θ Distribution
tz
0.03
0
1
0 −1
0
Error (mm) θ
x
1
Error (mm) θ
y
0.03
0.1
Table 1. Mean and variance of error of estimated translational (mm) and rotational (degree) parameters
z
0.1
0.02 0.05
0.05 0.01 0
−1
0
1
Error (degree)
0 −0.5
0
Error (degree)
0.5
0
−1
0
1
Error (degree)
Estimation Error ˆ tx − tx ˆ ty − t y ˆ tz − t z θˆx − θx θˆy − θy θˆz − θz
Experiment I M ean V ar. -0.002 0.002 0.001 -0.004 0.007 -0.003
0.033 0.033 0.033 0.031 0.003 0.031
Fig. 1. Distribution of the estimated transformation parameters using the UKF-based registration algorithm over 5824 trials
3
Derivation of the TRE Distribution
Let us assume that the registration is performed between the fixed and the moving data sets and the transformation parameters are converged on the true solution. Let r be a point, other than the ones used in the registration algorithm, where we would like to compute the target registration error, TREr . The registration error at the target point r can be determined as: ˆ +ˆ ˆ − R)r + (ˆ e(r) = Rr t − Rr − t = (R t − t) = ΔRr + Δt,
(3)
ˆ − R and Δt = ˆ where ΔR = R t − t. Assuming that the registration algorithm ˆ and ˆ has converged on the true values of transformation parameters (R t are unbiased, and E[ΔR] and E[Δt] are zero), based on the defined state model, Δt is a zero-mean Gaussian distributed random vector with covariance matrix ΣΔt = E[ΔtΔtT ], where ΣΔt can be easily derived from Equation (2) as follows: ΣΔt ΣΔθt T Pxˆ = E[(x − x ˆ)(x − x ˆ) ] = , (4) ΣΔθt ΣΔθ 6×6
A High-Order Solution for the Distribution of Target Registration Error
607
Assuming that the registration algorithm has converged on the true solution, (ΔRr), in Equation (3), is a zero-mean random vector with an unknown distribution. Its mean (E[ΔRr] = 0) and covariance matrix (ΣΔRr = E[(ΔRr)(ΔRr)T ]) can be also estimated using the Unscented Transform [10], since (ΔRr) is a nonlinear function of the Gaussian random vector xθ . xθ has mean x ˆθ and covariance matrix Σθ that can be determined from Equations (2) and (4), respectively. Therefore, the distribution of error vector e can be approximated by a zero-mean Gaussian distribution with covariance matrix E[eeT ], N (0, E[eeT ]), where E[eeT ] can be determined as follows: E[eeT ] = E[ΔRr(ΔRr)T ] + 2E[(ΔRr)ΔtT ] + E[ΔtΔtT ].
(5)
Σ(ΔRrΔt) = E[(ΔRr)ΔtT ] is the cross-correlation matrix between (ΔRr) and Δt that can be computed by using the Unscented Transform and propagating the estimated state vector x ˆ and its covariance matrix Pxˆ through the nonlinearity ([ΔRr]ΔtT ). Equation (5) can be simplified as, E[eeT ] = ΣΔRr + 2Σ(ΔRrΔt) +ΣΔt . By defining the error vector e as [ex , ey , ez ]T , we have T REr = e2x + e2y + e2z , where ex , ey and ez are zero-mean Gaussian random variables. Since TREr is a distance error and distance is invariant to rotations, error vector e can be arbitrary rotated in a space without varying the distribution of TREr . Hence, one may rotate e such that its covariance matrix, E[eeT ], becomes diagonal and therefore its components become independent zero-mean Gaussian random variables. Lets assume that Σe is the diagonalized covariance matrix of E[eeT ] = diag[λ1 , λ2 , λ3 ], where λ1 , λ2 and λ3 are the variances of the first, the second and the third elements of the rotated error vector e, respectively. We first determine the distribution of the TRE2r from which the distribution of TREr can be easily derived. TRE2r is the summation of three squared independent zero-mean Gaussian random variables, e1 , e2 and e3 , with the variances of λ1 , λ2 and λ3 , respectively. It means that the distribution of TRE2r is the convolution of each squared Gaussian random variable as pdf(T REr2 ) (x) = 1 pdf(e21 ) (x) ∗ pdf(e22 ) (x) ∗ pdf(e23 ) (x), where pdf(e2i ) (x) = √2πxλ exp(− 2λx i ), x ≥ 0. i 2 By performing the convolutions, the distribution of TREr can be obtained as: exp(− 2λx3 ) pdf(T REr2 ) (x) = √ 2 2πλ1 λ2 λ3
0
x
1 1 exp( −λ 4 [ λ1 + λ2 − (x − λ)
2 λ3 ])
λ 1 1 I0 ( [ − ])dλ,(6) 4 λ1 λ2
π where I0 (x) = π1 0 exp(x cos φ) dφ. From Equation (6), the distribution of TREr is derived as follows: x2 x2 1 1 x exp(− 2λ ) exp( −λ 4 [ λ1 + λ2 − 3 pdf(T RE(r) ) (x)= √ 2πλ1 λ2 λ3 0 (x2 − λ)
2 λ3 ])
λ 1 1 I0 ( [ − ]) dλ. (7) 4 λ1 λ2
608
4
M.H. Moghari and P. Abolmaesumi
Locus of the Points with the Same Mean Square TRE
In this section, we explain how to estimate mean squared value of TREr , E[T REr2 ], using the variances estimated from the UKF-based registration algorithm. By using Equation (3), mean square TREr can be formulated as: E[T REr2 ] = E[eT e] = E[(ΔRr + Δt)T (ΔRr + Δt)], = rT E[ΔRT ΔR]r + 2E[ΔtT ΔR]r + E[ΔtT Δt],
(8)
where ΣΔR = E[ΔRT ΔR] is the covariance matrix of the random matrix ΔR and is a symmetric positive definite matrix, and v = E[ΔtT ΔR]T . By defining Δt = [Δtx , Δty , Δtz ]T , Equation (8) can be rewritten as: E[T REr2 ] = rT ΣΔR r + 2vT r + E[Δt2x ] + E[Δt2y ] + E[Δt2z ],
(9)
11 22 33 where E[Δt2x ] = ΣΔt , E[Δt2y ] = ΣΔt and E[Δt2z ] = ΣΔt can be obtained from ij Equation (4) by considering that Σt is the ijth element of the covariance matrix ΣΔt . Also, since ΔR is a nonlinear function of the Gaussian random vector xθ with mean of x ˆθ and covariance matrix of Σθ , covariance matrix ΣΔR can be estimated using UT. Vector v is computed by propagating x ˆ and Pxˆ through the nonlinearity (ΔtT ΔR) and using UT. Therefore, by estimating the variance of each translational components, covariance matrix ΣΔR and v, E[T REr2 ] can be easily obtained from Equation (9). To find the loci of the points with the same mean squared values of TRE, let’s represent the target point r as a vector [rx , ry , rz ]T . Then, Equation (9) can be expanded as follows: 11 2 22 2 33 2 12 13 23 E[T REr2 ] = ΣΔR rx + ΣΔR ry + ΣΔR rz + 2ΣΔR rx ry + 2ΣΔR rx rz + 2ΣΔR ry rz
+ 2v1 rx + 2v2 ry + 2v3 rz + E[Δt2x ] + E[Δt2y ] + E[Δt2z ],
(10)
ij where ΣΔR is assumed to be the ijth element of ΣΔR and vi is the ith element of vector v. The locus of the points that have the same mean square TRE can be derived from Equation (10). For example, the locus of the points with the same E[T RE 2 ] on a plane parallel to x and y axes is either an ellipse or a circle depending on the covariance matrix ΣΔR .
5
Results
In order to verify our derivations, several simulations are performed. In the first simulation, we choose N , the number of points in data sets to be 30. We generate the moving data set U by drawing N points uniformly within a cube in the range of ±100mm. Also, one target position is selected randomly from a cube in the range of ±200mm. In order to generate the fixed data set Y, we perturb independently x, y, and z components of each point in U by zero-mean Gaussian random noise with variance of 6mm2 . In this way we produce the same model as that used by Fitzpatrick [5]. The moving data set U is registered to
A High-Order Solution for the Distribution of Target Registration Error
609
the fixed one Y and the TRE is measured at the target point. The distribution of TRE is generated by repetitions of perturbation and registration steps for 100, 000 times. The generated distribution is then used as the gold standard. We employ the UKF-based registration algorithm to register these two data sets to compute the variance of the estimated transformation parameters. Equation (7) is then used to estimate the distribution of TRE at the target position. We have repeated this simulation with different target positions for 500 times and have compared the estimated distribution of TRE with the gold standard one (generated from simulations) by determining the distance among the two distributions ∞ ∞ 2f1 (λ) 2f2 (λ) as [11], d(f1 , f2 ) = −∞ f1 (λ) log f1 (λ)+f d(λ)+ −∞ f2 (λ) log f1 (λ)+f d(λ), 2 (λ) 2 (λ) where f1 and f2 are two different distributions. Means and variances of the distance among the computed distributions over 500 trials are listed in Table 2. Additionally, in each trial, E[T RE 2 ] is estimated. The normalized mean and the normalized variance of error in the estimation of E[T RE 2 ] are listed in Table 3. Tables 2 and 3 show that the estimated distribution of TRE and the estimated E[T RE 2 ] using the proposed and Fitzpatrick’s methods are almost the same as the ones generated by numerical simulations. This has been predictable, since for uniformly distributed data sets (which have the same moments along all axis), the first-order approximation is sufficient to estimate the distribution of TRE. In the second simulation it is shown that when data sets have significantly different moments along each axis, the first-order approximation may not be sufficient to accurately estimate the TRE distribution. In this simulation, we generate data sets with different variances along x, y and z directions. Basically, x, y, and z coordinates of points in the moving data set are generated randomly from uniform distributions in the range of ±10mm, ±30mm, and ±60mm, respectively. We have repeated the same simulation procedure as we performed in the first simulation. Figure 2 shows the estimated distributions of TRE and the one obtained by numerical simulations at a randomly selected target point r = [−6.35, −33.2, −167.13]T . As before, distance between the estimated distribution of TRE and the simulated one is computed and shown in Table 2. Also, the normalized mean and the normalized variance of error in the estimation of E[T RE 2 ] are listed in Table 3. It can be concluded from Tables 2, 3 and Figure 2 that when data sets are not uniformly distributed along all axes, the first-order approximation is not accurate enough to estimate the distribution of TRE, whereas our proposed higher order approximation closely follows the ground truth simulation results in this case. In the final simulation, the locus of the target points in the same plane parallel to x and y axes, which have the same E[T RE 2 ], is obtained. To do so, we have randomly generated 30 points within a cube in the range of ±100mm as the moving data set U. Also, a target position p1 = [59.9569, −45.2688, −29.6124]T is chosen randomly within a cube in the range of ±200mm. Zero-mean Gaussian random noise with the variance of 6mm2 in each direction is added to each point in the moving data set to generate the fixed data set Y. The UKF-based registration algorithm is used to register these data sets. Using Equation (10), locus of the points which have the same E[T RE 2 ] as p1 in a plane parallel to xy plane is drawn in Figure 3.
610
M.H. Moghari and P. Abolmaesumi
Table 2. Mean and variance of distances between the distributions, estimated by the Fitzpatrick’s and the UKF algorithms and the simulated one (gold standard)
Table 3. Normalized mean and normalized variance of error in the estimation of mean square TRE using the Fitzpatrick’s and the UKF algorithms Simulation UKF Results M ean V ar. Sim.-I %1.9 %0.008 Sim.-II %10.68 %0.023
Simulation UKF Fitzparick’s Results M ean V ar. M ean V ar. Sim.-I 0.0006 1 × 10−7 0.0008 3 × 10−7 Sim.-II 0.073 2 × 10−4 0.235 9 × 10−4
Fitzparick’s M ean V ar. %1.5 %0.19 %42.79 %0.19
100
0.16 Sim. Dist. of TRE Est. Dist. of TRE using UKF Sim. Dist. of TRE using Fitzpatrick Alg.
0.14
80
P2
60 0.12
40
Y AXIS (mm)
pdf
TRE
0.1
0.08
0.06
20 0 −20
0.20mm2
−40
0.22mm2
P1
0.04
−60 0.28mm2
0.02
0 0
−80
5
10
15
20
25
30
Target Registration Error (mm)
35
40
45
−100 −100
0.31mm2
−80
−60
−40
−20
0
20
40
60
80
100
X AXIS (mm)
Fig. 2. Estimated TRE distributions using Fig. 3. Loci of the points with the same Fitzpatrick’s and UKF-based registration E[T RE 2 ] values algorithms
6
Discussion
We have proposed a new method to estimate the distribution of target registration error (TRE) at an arbitrary target position with the accuracy of at least second-order Taylor series expansion. It is shown that when data sets have different moments along each axis, the first-order approximation may not be accurate enough to estimate the TRE distribution; however, the proposed algorithm still accurately calculates the distribution of TRE. To derive the TRE distribution, the UKF-based registration algorithm is presented. The algorithm is able to estimate the registration transformation parameters along with their variances. Using the estimated variances, new closed-form formulas for computing the distribution of TRE and E[T RE 2 ] (Equations (7) and (9)) are drawn, respectively.
References 1. Lorusso, A., Eggert, D.W., Fisher, R.B.: A comparison of four algorithms for estimating 3-D rigid transformations. In: British Mach. Vis. Conf. (1995) 2. Fitzpatrick, J.M., West, J.B., Maurer, C.R.: Predicting error in rigid-body pointbased registration. IEEE Trans. Med. Imag. 17(5) (1998) 694–702
A High-Order Solution for the Distribution of Target Registration Error
611
3. Maurer, C.R., Fitzpatrick, J.M., Wang, M.Y., Galloway, R.L., Macinnas, R.J., s. Allen, G.: Registration of head volume images using implantable fiducial markers. Tech. Rep. CS-96-03, Dept. Comp. Sci., Vanderbilt Univ., Nashvill, TN (1996) 4. Fitzpatrick, J.M., West, J.B., Maurer, C.R.: Derivation of expected registration error for point-based rigid-body registration. In: SPIE- MedI. Volume 3338. (1998) 16–27 5. Fitzpatrick, J.M., West, J.B.: The distribution of target registration error in rigidbody point-based registration. IEEE Trans. on Med. Imag. 20(9) (2001) 917–927 6. Julier, S.J., Uhlmann, J.K.: Unscented filtering and nonlinear estimation. In: Proc. of the IEEE (Invited paper). Volume 92. (2004) 401–422 7. Pennec, X., Thrion, J.P.: A frame work for uncertaity and validation of 3D registration methods based on points and frames. Comp. Vis. 25(3) (1997) 203–229 8. LaViola, Jr., J.J.: A comparison of unscented and extended kalman filtering for estimating quaternion motion. In: Proc. American Cont. Conf. (2003) 2435–2440 9. Moghari, M.H., Abolmaesumi, P.: Comparing the unscented and extended kalman filter algorithms in the rigid-body point-based registration. In: EMBC2006. (2006) 10. Julier, and J. K. Uhlmann, S.J.: The Scaled Unscented Transformation. In: Proc. IEEE American Cont. Conf., Anchorage AK, USA, IEEE (2002) 4555–4559 11. Lin, J.: Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theory 37(1) (1991) 145–251
Fast Elastic Registration for Adaptive Radiotherapy Urban Malsch1 , Christian Thieke2,3 , and Rolf Bendl1 1
3
Department of Medical Physics
[email protected] 2 Clinical Cooperation Unit Radiooncology, DKFZ Heidelberg Department of Radiation Oncology, University of Heidelberg, Germany
Abstract. A new method for elastic mono-modal image registration for adaptive fractionated radiotherapy is presented. Elastic registration is a prerequisite for many medical applications in diagnosis, therapy planning, and therapy. Especially for adaptive radiotherapy efficient and accurate registration is required. Therefore, we developed a fast block matching algorithm for robust image registration. Anatomical landmarks are automatically selected at tissue borders and relocated in the frequency domain. A smooth interpolation is calculated by modified thinplate splines with local impact. The concept of the algorithm allows different handling of different image structures. Thus, more features were included, like handling of discontinuities (e. g. air cavities in the intestinal track or rectum, observable in only one image), which can not be registered in a conventional way. The planning CT as well as delineated structures of target volume and organs at risks are transformed according to deviations observed in daily acquired verification CTs prior each dose fraction. This way, the time consuming repeated delineation, a prerequisite for adaptive radiotherapy, is avoided. The total calculation time is below 5 minutes and the accurateness is higher than voxel precision, which allows to use this tool in the clinical workflow. We present results of prostate, head-and-neck, and paraspinal tumors with verification by manually selected landmarks. We think this registration technique is not only suitable for adaptive radiotherapy, but also for other applications which require fast registration and possibilities to process special structures (e. g. discontinuities) in a different way.
1
Introduction
Recent developments in radiotherapy allow a high precision in shape of dose distributions in three dimensional target volumes (e. g. tumor), while high gradients reduce the applicated dose in other regions (e. g. organs at risk: OAR). Currently correct positioning of the patient is assured by fixation techniques (head masks, vacuum pillow, . . . ). With recently available in-room 3D imaging techniques, e. g. ConeBeam CT (kilovoltage-Source attached at Linac-gantry) or CT scanner on rails (e.g. Siemens Primatom) global setup errors can be determined by rigid registration between planning and verification CT images, which are acquired R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 612–619, 2006. c Springer-Verlag Berlin Heidelberg 2006
Fast Elastic Registration for Adaptive Radiotherapy
613
directly before therapy [1]. Rigid registration can not compensate complexer deviations, e. g. organ movements, different fillings of hollow organs (bladder, rectum, lung, . . . ), tissue dependent reaction to the irradiation, or other motion artifacts, e. g. caused by respiration during therapy. So far, security margins around the tumor and OARs are used to consider these uncertainties. At the same time, however, healthy tissue next to the tumor is also irradiated with a therapeutical dose level and possibly damaged. A novel strategy of dynamic adaptation of an initial plan to daily changes in size, shape, and position of the target volume and critical structures allows a reduction of security margins and thus a reduction of normal tissue complication probability while preserving or increasing tumor control probability [2]. This adaptation has to be done within less than 15 minutes, in order to keep up with physiological changes (e. g. different filling of the bladder) and clinical workflow. So far no automatic segmentation tool can produce a complete delineation within this time. Therefore, we propose a different strategy by elastic registration of spatial deviations between planning and verification CT image. We apply the measured distortions to the already segmented patient model to produce adapted delineations. Several registration techniques were developed in the past decades [3], like optical flow [4], demon algorithm [5], viscose fluid [6,7], free form deformations [8], or block matching [9]. Most of them are rather expensive in terms of computation time. For adaptation of therapy plans, especially calculation time is an important constraint. We use a block matching algorithm similar to that one proposed by R¨ osch et al [9] with some modifications, which make the algorithm fast and allow the integration of additional features, like handling of discontinuities in the deformation field, e. g. due to moving air cavities in rectum.
2
Method
Our elastic registration algorithm is divided into three main steps: 1. Automatic identification of a few landmarks at positions pi = (pi,x , pi,y , pi,z ), (i = 1...N ) in first image. 2. Calculation of appropriate translation vectors ti to corresponding positions in the other image by optimizing a similarity measure in a surrounding subregion Ωpi , called templates of pi . 3. Interpolation for any position q, which can be polygon vertices of delineated structures or all other image positions, to transform them. These general registration steps will now be discussed in detail: 2.1
First Step, Selection of Promising Positions
Promising positions pi are spots in an image A, which can presumably be recognized in the corresponding image B. Positions at tissue borders turned out to be especially suitable. In CT images, tissue classes (e. g. fat, muscle, bones) can be identified quickly by fixed thresholds. Border positions of these classes are
614
U. Malsch, C. Thieke, and R. Bendl
Fig. 1. (left): transversal slice of a prostate case. (right): raw tissue separation by fixed thresholds with selected template candidates.
identified by voxels with less than 6 neighbors of same class. Promising template positions pi are evenly distributed at the borders with consideration of minimal distance between each other to avoid agglutinations (see Fig. 1). 2.2
Second Step, Relocation
Each promising position pi of image A, representing a characteristic structure, must be relocated into image B to identify the deviation ti between the images at these positions. By defining template areas Ωi around each position pi , these sub-images can be re-identified in the corresponding image by local rigid registration Ri , to get a corresponding position q i = pi Ri . Several similarity measurements were proposed in literature to identify best correspondences [10]. We use the maximization of cross correlation coefficients ΘC , calculated in the frequency domain, which computes transformation parameter within a bounded local region. A subsequent normalized ΘC is used as an absolute measure of conformity, which allows the identification of doubtful correspondences. Cross correlation ΘC between a small domain ΩA(p) at position p of image A and a rigidly displaced domain ΩB(pR) of image B is defined by ¯A(p) ΩB(pR) (x) − Ω ¯B(pR) (1) ΘC (ΩA(p) , ΩB(pR) ) = ΩA(p) (x) − Ω x∈Ω(A,B)
and the normalized correlation ΘC is defined by ΘN C (ΩA(p) , ΩB(pR) ) =
ΘC (ΩA(p) , ΩB(pR) ) ΘC (ΩA(p) , ΩA(p) )ΘC (ΩB(pR) , ΩB(pR) )
(2)
Eq.(1) can be calculated efficiently in the frequency domain by fast Fourier transformation (FFT) of the areas ΩA(p) and ΩB(pR) : ΘC (ΩA , ΩB ) = FFT−1 (FFT(ΩA )(FFT(ΩB )∗ ))
(3)
with X ∗ conjugate complex of X. The displacement of a feature is identified at the maximum of ΘC (see Fig. 2). In our application it is sufficient to consider translations only, since a rotation of larger structures can be described by different displacements of their surface points.
Fast Elastic Registration for Adaptive Radiotherapy
(a)
(b)
615
(c)
Fig. 2. (a) a feature in image A at position pi (here located at border of a vertebral body), (b) searching area at same position in corresponding image B and (c) surrounding correlation values (calculated in frequency domain). The bright spot indicates best translation. Size of feature: 11 × 11 × 7 voxel (padded with zero values), size of search region: 27 × 27 × 17. Maximal detectable displacements are tx,y,z = ±(8, 8, 5).
2.3
Third Step, Interpolation
In this step, elastic transformation of delineated planning structures (VOIs) are calculated. We define VOIs by closed polygons, located in transversal CT slices. Thus, vectors for all polygon vertices must be calculated to transform them to the verification image. If the whole planning CT should be transformed, too (e. g. to verify the registration), translation vectors for all voxels must be interpolated as well. Regularly arranged translation vectors can be interpolated quickly, e. g. by B-splines or trilinear interpolation. But since templates are arbitrarily arranged, we use modified thin-plate splines (TPS) [11] to interpolate a translation vector field T (A, B). Each translation t = (tx , ty , tz ) ∈ IR3 , located at any position q = (qx , qy , qz ) ∈ IR3 , is defined by: t(q) = a0 + ax qx + ay qy + az qz +
n
wi,xyz · u (q − pi )
(4)
i=1
with the radial base function u(t) = |t| = t2x + t2y + t2z , which corresponds to the minimization of the bending energy of a thin metal plate in 3D case. The TPS interpolation is global in nature and one can not avoid the global effect of each template. Among others, this means that the transformation at a position q is affected by each anchor point pi . This is clear, because the number of coefficients, calculated by matrix inversion and multiplications with anchor positions, are equal to the number of anchors. Since calculation time for inverting the matrices to calculate the affine and elastic coefficients (a0,x,y,z and wi ) increase quadratically with the number of templates, we use a series of locally defined TPS to speed up the calculation and to reduce the influence of far-off image positions. As a result, one translation vector does not affect all image positions, but only its nearby image fraction. These image fractions are determined in a recursive process: we start with the whole image as the first current fraction.
616
U. Malsch, C. Thieke, and R. Bendl
Fig. 3. Block TPS. The initial image is divided in a recursive process, as long as a fraction contains more than 10% of the translation vectors (black dots). An overlap of 10% of the fractions guarantees a smooth transition.
If the current fraction contains more than a predefined number of template positions pi , this fraction is divided into two halves (first along x-, second along y- and last along z-direction) and that strategy is recursively applied to both fractions. Each fraction is handled by an individual TPS interpolation. To guarantee a smooth transition between all fractions, an overlap of 10% is defined (Fig. 3). Inside the overlapping regions vectors are calculated separately by all related fractions. The linear interpolation of all involved fractions is used as final value. The number of calculated translation vectors is slightly increased due to overlapping regions. But overall calculation time is reduced markedly, since the number of TPS coefficients for each calculation is reduced. 2.4
Discontinuities
Discontinuities are a result of features, observable in only one image, e. g. gas filled cavities inside the rectum or colon. No adequate maximization of ΘC can be found in these regions. However, especially for usage in adaptive radiotherapy for prostate cancer, air cavities deform strongly proximate image structures like prostate and bladder. The method presented so far has its limitations under these conditions, but a separate processing at these regions allows to overcome this drawback: After Gaussian filtering, air cavities are easily detected by an appropriate threshold. Corresponding pairs of landmarks at border and center of cavities allow a constriction by TPS interpolation. Border positions pi are detected in the same manner as described in Chap. 2.1. To determine in 3D corresponding positions q i at the exact center, a 3D distance map D is calculated, which indicates for each voxel inside of the cavity the distance to the rectum wall. Therefore, the maximal value of the distance map along the path perpendicular to the wall indicates corresponding center positions: q i = pi + r0 ∇A(pi ), with r0 = argmaxD(pi + r∇A(pi ))
(5)
r
∇A(pi ) is the gradient at border position pi in the image A. To assure that the constriction will have only local influence, the TPS-interpolated vectors are
Fast Elastic Registration for Adaptive Radiotherapy
(a)
(b)
(c)
617
(d)
Fig. 4. All images show the same frontal slice of an air cavity in the rectum. White contour: original delineated rectum, gray contour: automatically segmented air cavity. (a) distance map to indicate distances from border to center of the air cavity, (b) distances outside of air cavity. (c) TPS interpolation, locally limited to a near neighborhood by weighting resulting vectors with distance map (b). (d) constricted air cavity and transformed delineations.
multiplied by a weighting factor, which is reciprocal to its distance to the air cavity. In addition, to avoid misleading distortions of bony structures in the direct neighborhood of air cavities, additional fixation points are inserted to forbid the deformation of these structures.
3 3.1
Results Quantitative Measurements
Quantitative verification of elastic registration results is as complex as its calculation. Since every image position is calculated individually, for every point of the image an individual error is present. If it would be possible to measure the deviation between calculated vectors and real movements exactly one would certainly consider this deviation. For patient data no objective gold standard exists, even if fiducials like gold seeds would have been implanted they would show the real motion only at a limited number of positions. However, usually it is possible to identify a series of anatomical landmarks in both data sets quite well to describe deviations by distances between corresponding positions. We manually defined those landmarks in a series of 5 test cases. These cases Table 1. Maximal absolute deviations (tx, ty, tz) between both original images and maximal absolute error in automatic calculated deviations (dx, dy, dz) (in mm) LMm LMa − LMm max|tx| max|ty| max|tz| max|dx| max|dy| max|dz| 1 Prostate 4.4 5.4 8.5 1.9 2.2 3.1 2 Parasp. 3.4 5.4 9.0 2.0 2.0 2.9 3 Parasp. 14.2 5.4 8.3 3.2 2.0 2.9 4 H&N 6.1 9.3 6.4 2.1 2.1 3.2 5 H&N 4.4 19.2 6.3 2.0 2.9 3.0
# Tumor
ni 67 59 63 78 79
618
U. Malsch, C. Thieke, and R. Bendl
Fig. 5. These bar charts show the mean errors in x, y and z-direction for five selected patients and maximal errors (whisker caps). The errors show differences between manually selected landmarks and those vectors of the automatically calculated translation vector field at same positions.
were carefully selected to assure that sufficient large displacements are present. Manually selected landmarks allow a semi-quantitative quality control of our strategy. This procedure was applied to one prostate, two paraspinal and two head-and-neck cases to show robustness of the algorithm. The results of five sets with n1 . . . n5 pairs of landmarks are represented in Table 1. First column (LMm ) describes deviations between both original images by maximal distances. We observed deviations of about 1 cm, in maximum 14.2 mm and 19.2 mm (patient 3 and 5). Maximal absolute differences (dx , dy , dz ) between automatically calculated deviations LMa and manually selected landmarks LMm are represented in second column, which describe maximal error of each registration result. Highest errors in x- and y-direction (3.2 mm and 2.9 mm) were observed in registration of patient 3 and 5, having shown the strongest deviations at all. Deviations in z-direction were about 3.0 mm. Results of automatic registration is acceptable, since mean error is below voxel size (2 × 2 × 3 mm3 ). However, some outliers were observed (up to 3.2 mm), visualized by its maximum values with whisker caps in the bar charts of Figure 5. Most of these outliers are a result of landmarks, located at positions with low clinical importance, e. g. at patient’s surface. The presented approach calculates an elastic transformation between different CT datasets. The algorithm was designed to reuse pre-segmented structures by adapting them to each therapy situation. Overall, the main objective was a low calculation time, since the algorithm should be used in a small time frame (below 10 mins). This approach allows to handle image structures, which can not be processed by conventional registration, in a different manner. We included a procedure to constrict air cavities, which represent discontinuities between two images and can not be handled in the conventional way. With the current
Fast Elastic Registration for Adaptive Radiotherapy
619
implementation, a registration of a normal CT (256 × 256 × 50) is done in less than 5 minutes with accurateness below voxel precision in clinical relevant structures (measured by comparison with manually selected landmarks). This allows us to use our tool for selected cases in clinical routine. Even breathing motion can be detected in 4D CT-images to individualize initial security margins. However, the most time consuming part is still the search for correspondences (half of the calculation time). In the next step, we want to implement this part in parallelized algorithms, which should speed up the total calculation time.
References 1. Thieke, C., Malsch, U., Bendl, R., Thilmann, C.: Kilovoltage CT using a linac-CT scanner combination (accepted). Br. J. Radiol. (2006) 2. Yan, D., Vicini, F., Wong, J., Martinez, A.: Adaptive radiation therapy. Phys. Med. Biol. 42 (1997) 123–132 3. Zitov´ a, B., Flusser, J.: Image registration methods: a survey. Image and Vision Computing 21 (2003) 977–1000 4. Schunck, B.G., Horn, B.K.P.: Determining optical flow. Artificial Intelligence 17 (1981) 185–204 5. Thirion, J.P.: Image matching as a diffusion process: an analogy with maxwell’s demons. Med. Image Anal. 2 (1998) 243–260 6. Christensen, G.E., Joshi, S.C., Miller, M.I.: Volumetric transformation of brain anatomy. IEEE Trans. Med. Imaging 16 (1997) 864–877 7. Foskey, M., Davis, B., Goyal, L., Chang, S., Chaney, E., Strehl, N., Tomei, S., Rosenman, J., Joshi, S.: Large deformation three-dimensional image registration in image-guided radiation therapy. Phys. Med. Biol. 50 (2005) 5869–5892 8. Rueckert, D., Sonoda, L.I., Hayes, C., Hill, D.L., Leach, M.O., Hawkes, D.J.: Nonrigid registration using free-form deformations: application to breast MR images. IEEE Trans. Med. Imaging 18 (1999) 712–721 9. R¨ osch, P., Mohs, T., Netsch, T., Quist, M., Penney, G.P., Hawkes, D.J., Weese, J.: Template selection and rejection for robust non-rigid 3D registration in the presence of large deformations. Proceedings of the SPIE (2001) 545–556 10. Hill, D.L., Batchelor, P.G., Holden, M., Hawkes, D.J.: Medical image registration. Phys. Med. Biol. 46 (2001) R1–45 11. Bookstein, F.: Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11 (1989) 567–585
Registering Histological and MR Images of Prostate for Image-Based Cancer Detection Yiqiang Zhan1,3, Michael Feldman2, John Tomaszeweski2, Christos Davatzikos1, and Dinggang Shen1 1
2
Sect. of Biomedical Image Analysis, University of Pennsylvania, Philadelphia, PA Dept. of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 3 Dept. of Computer Science, Johns Hopkins University, Baltimore, MD
[email protected] Abstract. This paper presents a deformable registration method to co-register histological images with MR images of the same prostate. By considering various distortion and cutting artifacts in histological images and also fundamentally different nature of histological and MR images, our registration method is thus guided by two types of landmark points that can be reliably detected in both histological and MR images, i.e., prostate boundary points, and internal salient points that can be identified by a scale-space analysis method. The similarity between these automatically detected landmarks in histological and MR images are defined by geometric features and normalized mutual information, respectively. By optimizing a function, which integrates the similarities between landmarks with spatial constraints, the correspondences between the landmarks as well as the deformable transformation between histological and MR images can be simultaneously obtained. The performance of our proposed registration algorithm has been evaluated by various designed experiments. This work is part of a larger effort to develop statistical atlases of prostate cancer using both imaging and histological information, and to use these atlases for optimal biopsy and therapy planning.
1 Introduction With an aging population, prostate cancer (PCa) has become a major medical and socioeconomic problem. Since PCa is relatively less aggressive and is curable if it is detected in the early stage, the early diagnosis of PCa becomes significant for the treatment of this disease. Currently, the common diagnosis method is the transrectal ultrasound (TRUS) guided needle biopsy, following an abnormal digital rectal exam (DRE) or an increasing prostate specific antigen (PSA) level. However, the most widely used biopsy strategy, which randomly picks tissue sample from each of roughly partitioned prostate regions, is more like a blind targeting, resulting in high false negative detection rate. Considering that the PCa is not uniformly distributed within the prostate capsule, we have developed an optimal needle biopsy strategy for prostate cancer detection, using population-based prostate cancer distribution information obtained from a large cohort of structure-labeled and cancer-identified histological samples of prostates normalized in the standard space [1]. We also developed a texturebased deformable model to segment prostate from TRUS images [2], for warping the optimal needle biopsy locations defined in the standard space to the patient’s space. In R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 620 – 628, 2006. © Springer-Verlag Berlin Heidelberg 2006
Registering Histological and MR Images of Prostate
621
order to achieve a patient-specific optimal needle biopsy, we need to include patientspecific cancer information, which can be obtained from the MR images of the patient by an image-based tissue characterization method [3,4] that have been trained by a sufficient number of prostate MR images with cancerous regions identified. To identify the cancerous tissues from MR images using a machine learning method, the ground-truth cancer information that is available only in histological images should be mapped to the space of MR images. Currently, the ground-truth cancer information in histological images is manually mapped into MR images by a human expert, i.e., manually labeling the cancerous regions in MR images according to the cancer locations in the corresponding histological images [4]. However, this process is generally very tedious and time-consuming, particularly for a large number of prostate samples required to train a tissue characterization method. Therefore, in this study, we propose an automatic method to co-register histological image with MR image of the same prostate, by mimicking the way that the human experts used to manually align histological and MR images. It is worth noting that, although multi-modality image registration methods have been extensively investigated, the studies dealing with the registration of histological images with MR images are very limited, probably due to the more complicated and diverse nature of histological images. The recently proposed histological image registration methods could be categorized into two classes. The first class of methods focuses on registering the extracted contours of anatomical structures [5,6]. Although organ contours can be perfectly aligned in these methods, it is not guaranteed that internal structures are also accurately registered. The second class of methods focuses on registering images by maximizing the overall similarity of two images, such as using mutual information [7,8]. However, these methods might be misled by various distortions and cutting artifacts in the histological images, since patches with low structural content often lead to morphologically inconsistent local registrations with a low MI response [15]. Considering these limitations, our registration method is based on the common features that can be reliably identified in both histological and MR images, i.e., two types of automatically identified landmarks locating on prostate boundaries and salient internal anatomical regions, which are simply called boundary landmarks and internal landmarks, respectively. In particular, by detecting the internal landmarks commonly available in both histological and MR images, the registration of anatomical structures inside the prostate can be successfully completed, and the mis-registration due to distortion and cutting artifacts can be potentially avoided as the local patches around these landmarks have salient structure information for correspondence matching. The correspondences between landmarks as well as the transformation between histological and MR images can be simultaneously obtained by maximizing an overall similarity function, which integrates the similarities between estimated corresponding landmarks, along with spatial constraints.
2 Methods The identification of meaningful landmarks commonly-available in both histological and MR images is significant for the success of our registration method. Also, the definition for the similarity of the two same-type landmarks respectively in histological and MR images is equally important. These methods are explored in Sections 2.1 and
622
Y. Zhan et al.
2.2 for boundary landmarks and internal landmarks, respectively. With the respective landmarks automatically identified in histological and MR images, their correspondences as well as the transformation between histological and MR images are then established by maximizing an overall similarity function described in Section 2.3. 2.1 Boundary Landmarks Landmarks detection. Since the organ boundaries are usually important for guiding the registration of medical images, the points locating on the prostate boundaries are selected as the first type of landmarks to be used for helping register histological and MR images. In our study, the prostate capsules are first segmented from histological and MR images, respectively, by a manual delineation or an automated segmentation algorithm [2]. Then, a triangular mesh surface is generated for each prostate boundary using a marching cubes algorithm, with the vertices of the surface selected as the boundary landmarks. Similarity definition. As each boundary landmark is actually a vertex of the surface, its spatial relations with vertices in the neighborhood can be used to describe the geometric features of the boundary landmark. In particular, an affine-invariant attribute vector [10] is used to characterize the geometric anatomy around each boundary landmark. Assuming xi is a boundary landmark under study, its geometric attribute is defined as the volume of the tetrahedron formed by xi and its neighboring vertices. The volumes calculated from vertices in different neighborhood layers are stacked into an attribute vector F ( xi ) , which characterizes the geometric features of xi from a local to a global fashion. F ( xi ) can be further made affine-invariant as F ( xi ) , by normalizing it across the whole surface [10]. By using this attribute vector, the similarity between two boundary landmarks xi and y i , respectively in histological and MR images, can be defined by an Euclidean distance between their normalized attribute vectors [10], i.e., S (x i , y j ) = 1 − F ( x i ) − F ( y j )
(1)
2.2 Internal Landmarks Compared to the boundary landmarks, it is relatively difficult to define the landmarks within the prostate capsules. Inspired by the fact that radiologists usually register histological and MR images by matching the internal blob-like structures, i.e., gland tissues containing fluid, we use those blob-like structures as the second type of landmarks, i.e., internal landmarks, to guide the image registration. However, due to the enlargement or shrinkage of gland tissues during the cutting procedure, the size of the same blob can be different in histological and MR images. Since local structures only exist as meaningful entities over a certain range of scale [11], various blobs can only be detected as salient local structures at their appropriate scales. Also, as we will mention in the next, the similarity between two internal landmarks respectively in histological and MR images is measured by the mutual information between the respective local regions around the two landmarks. Accordingly, the detected size of blob is important information to determine the size of local region to be extracted around each internal landmark, for measuring the local image similarity.
Registering Histological and MR Images of Prostate
623
Landmarks detection. According to the scale-space analysis method [11], the detection of the location and size of blob is completed by three steps. First, a scale-space representation is constructed for a 3-dimensional image f(x,y,z), i.e., L( x, y, z; s) = g(x, y, z; s) ∗ f (x, y, z )
(2)
where g (x, y, z; s ) is a Gaussian kernel, i.e., g ( x, y, z; s ) = 1 e − ( x + y ( 2πs ) 2
2 3/ 2
2
+ z2 ) / 2s2
. Second,
the Laplacian of this scale-space function L(x, y, z; s ) is calculated. For scale-invariant comparison of Laplacian values, we use a normalized Laplacian [11] defined as ∇ 2norm L = s 2 ∇ 2 L = s 2 ( L xx + L yy + L zz ) . Finally, the local peak responses over scale space
can be scale-invariantly detected as the candidates of interesting local structures, with the corresponding scales indicating the sizes of those local structures. Since an ideal blob can be modeled as a Gaussian function, its actual location and size can be directly estimated from the scale-space analysis results. For example, if we detect a peak at a location ( x0 , y0 , z0 ) with scale s0, it indicates that there might be a blob centered at ( x0 , y0 , z0 ) with size of 32 s0 , which can be mathematically derived next. Assume an ideal blob locates at ( x0 , y0 , z0 ) with the size f ( x, y, z ) =
A 3 2
2 0
( s )
3/ 2
e
(
s , i.e.,
3 2 0
)
− ( x − x0 ) 2 + ( y − y0 ) 2 + ( z − z0 ) 2 2( 32 s02 )
(3)
The scale-space representation of this blob f can be obtained as L ( x, y , z ; s ) =
A ( 32 s
2 0
+s )
2 3/ 2
e
(
)
− ( x − x0 ) 2 + ( y − y0 ) 2 + ( z − z0 ) 2 2( 32 s02 + s 2 )
(4)
At any scale s, the normalized Laplacian of L(x, y, z; s ) reaches its maximum at location ( x0 , y0 , z0 ) , i.e.,
(
)
2 Q ( s ) = max ∇ norm L ( x , y , z ; s = ( x,y,z)
3 As 2 ( s + s 2 )5/ 2 3 2
2 0
(5)
The maximum of Q(s) across scales is calculated by differentiating Q(s) with respect to s, dQ 9 As ( s 02 − s 2 ) = ds ( 32 s 02 + s 2 ) 7 2
(6)
As equation (6) equals to zero while s = s0 , the normalized Laplacian achieves its maximum at ( x0 , y0 , z0 ; s0 ) in the scale-space, which indicates a blob detected with center ( x0 , y0 , z 0 ) and size 32 s0 . In summary, by finding the local maxima in a normalized Laplacian map, the centers of blob-like structures are detected and used as candidates for internal landmarks. The expected internal landmarks are further determined as follows. First, the local maxima in the normalized Laplacian map are thresholded, to ensure the detection of salient blob-like structures. Second, average intensity within each detected blob-like structure is computed, to select the bright blobs that we are interested in. Third, the
624
Y. Zhan et al.
extremely-flat blobs are discarded, to avoid the selection of blobs on the boundaries of prostate capsule. Fig 2(a) gives an example of detected internal landmarks, along with their corresponding scales represented by the sizes of circles. Similarity definition. Once the internal landmarks are determined in both histological and MR images, the normalized mutual information (NMI) [12] is used to evaluate the similarities of two internal landmarks respectively in the histological and MR images. The evaluation of NMI differs from the traditional MI-based registration in the following four aspects. First, NMI is calculated only on the subvolumes around two internal landmarks under comparison. Second, since the sizes of corresponding blobs can be different in histological and MR images due to distortion and cutting artifact, the subvolume around each internal landmark is extracted according to its detected scale, indicating the invariance of our method to scales. The two subvolumes under comparison might have different sizes, but they are normalized to have same size before comparison. Third, in order to capture rich image information around each internal landmark for determining its corresponding landmarks in other modality image, NMI calculated from multiple subvolumes with different sizes around landmarks are integrated. Finally, the subvolumes under comparison are allowed to rotate within a small range of angle, to search for the maximal NMI. Accordingly, assume two internal landmarks u and v have scales su and sv , respectively, their similarity can be defined as follows. ⎧
⎪ ∑ NMI⎨⎪V (u, i ⋅ s ), θ α
M (u, v) = max −α ≤Δ ≤
N
i =1
⎩
u
⎞⎫⎪ ⎛ s T ⎜⎜V (v, i ⋅ sv ); u , Δθ ⎟⎟⎬ sv ⎠⎪⎭ ⎝
(7)
where V (u, R ) denotes a spherical subvolume around the landmark u with the radius R. T (V ; s , Δ θ ) is the transformation operator with a scaling factor s and a rotation factor Δθ . The variable i represents different size factor of subvolume to be extracted, and N is the total number of multiple subvolumes used, i.e., N=3 in our study. NMI {⋅,⋅} denotes the normalized mutual information between two same-sized spherical volume images. 2.3 Overall Similarity Function After defining the similarity between same-type landmarks, the correspondences between landmarks and the transformation between histological and MR images can be simultaneously estimated, by maximizing an overall similarity function defined next. Assuming the automatically detected boundary landmarks and internal landmarks are {xi | i = 1" I } and {u j | j = 1" J } in MR image, and { y k | k = 1" K } and {vl | l = 1" L} in histological image. The overall similarity function can be defined as: J L ⎧ I K max E ( A, B, h) = max ⎨∑∑ aik S (xi , y k ) + ∑∑ b jl M (u j , vl ) A, B , h A, B , h j =1 l =1 ⎩ i =1 k =1 J L ⎛ I K 2 ⎞⎫ ⎪ − λ ⎜⎜ ∑∑ aik D(xi , h( yk ) ) + ∑∑ b jl D(u j , h(vl )) + W (h ) ⎟⎟⎬ i k j l = 1 = 1 = 1 = 1 ⎝ ⎠⎪⎭
(8)
Registering Histological and MR Images of Prostate
625
Here, h is a transformation function that maps histological image to MR image. A = {aij } and B = {bkl } are the binary correspondence matrix, which subject to I +1
K +1
⎧ a = 1, a = 1, a = {0,1}⎫ and ⎪⎧ J +1 b = 1, L+1 b = 1, b = {0,1}⎫ . The binary corre∑ ik ik jl jl jl ∑ ⎨∑ ik ⎬ ⎨∑ ⎬ k =1 l =1 ⎪⎩ j =1 ⎩ i =1 ⎭ ⎭
spondence matrix describes the correspondences between landmarks. Note that an extra row and an extra column are added to the binary correspondence matrix for handling the outliers. If a landmark cannot find its correspondence, it is regarded as an outlier and the extra entry of this landmark will be set as 1. S (⋅,⋅) and M (⋅,⋅) are respectively the similarity between boundary landmarks and the similarity between internal landmarks, as defined in equations (1) and (7). D(⋅,⋅) denotes the Euclidean distance between two points. W (h ) is a smoothing term of h. According to different smoothing criterion, h can be represented by kernels [14] or thin plate spline [9]. By maximizing the overall similarity function, the correspondences between landmarks and the transformation between histological and MR images can be simultaneously estimated. The correspondences established between internal landmarks after maximizing the overall similarity function are shown in Fig 1 (b). 2
a1
a2
b1
b2
Fig. 1. Detection of internal landmarks and their correspondences. The internal landmarks are detected from histological (a1) and MR images (a2), respectively, with the sizes of circles indicating salient scales. The corresponding landmarks are detected in histological (b1) and MR images (b2). Crosses with the same color denote the corresponding landmarks detected.
626
Y. Zhan et al.
It is worth noting that, in our overall similarity function, the similarity of two landmarks is measured not only by their spatial relations, but also by their geometric and image features. The importance of incorporating image similarity for correspondence detection has also been demonstrated in [14], when registering diffusion tensor images of individual brains.
3 Experiments Histological images and T2-weighted MR of five prostates are used for validating the performance of our registration algorithm. For these images, the corresponding anatomical landmarks and cancerous regions have been manually labeled by an expert. By registering the histological and MR images using a registration algorithm, the correspondences can be automatically established for any points in the histological and MR images, including the manually denoted landmarks. Also, the cancerous regions in the histological images can be automatically warped to the MR images space, therefore automatically labeling the cancerous regions in MR images. In this way, we can compare the difference between the manually-labeled corresponding landmarks and the algorithm-labeled corresponding landmarks. We can also compute the overlay percentage of manually labeled cancerous regions with automatically labeled cancerous regions in MR images. The performances of three different registration methods are compared. The three registration methods are: (1) Method 1, which is an affine registration algorithm, called FLIRT [13], using mutual information; (2) Method 2, which only uses boundary landmarks to guide the registration; (3) Method 3, i.e., our proposed method. Fig 2 visually demonstrates the performance of three different registration algorithms in warping histological images to match with MR images, which indicates that our method produced the best results in establishing correspondences for landmarks and
a
b
c
d
Fig. 2. Comparison of warping histological images to match with MR images by three different registration methods. Two red points and a red region in (a) denote the manually labeled landmarks and cancerous region in an MR image, respectively. For comparison, those red points and the boundary of cancerous region are repeatedly displayed in three warped histological images (b-d) by three registration methods, i.e., Methods 1, 2, and 3, respectively. The blue points in each of three warped histological images (b-d) are the warped landmarks manually labeled in original histological image, as correspondences to those red landmarks in MR image. The dark region in each warped histological image denotes the warped version of the manually labeled cancerous region in the histological image.
Registering Histological and MR Images of Prostate
627
Table 1. Average distances between manually- and automatically-labeled landmarks
Subject 1 Subject 2 Subject 3 Subject 4 Subject 5 Mean
Method 1 (mm)
Method 2 (mm)
Method 3 (mm)
1.31 1.81 1.25 1.43 1.53 1.47
1.03 1.05 0.97 1.09 1.03 1.03
0.77 0.97 0.76 0.81 0.87 0.82
Table 2. Overlay percentage between manually- and automatically-labeled cancerous regions
Max Min Average
Method 1
Method 2
Method 3
82.9% 55.9% 71.6%
87.5% 60.4% 75.5%
88.3% 64.1% 79.1%
labeling cancerous region. The quantitative comparisons on 5 prostate subjects are summarized in Tables 1 and 2, indicating again that our method performs the best among all of three registration methods. We are now planning a large-scale validation study on our method.
4 Conclusion A novel method for co-registeration of histological and MR images of prostate has been proposed in this paper. Instead of matching only the prostate boundaries or evaluating the similarity of intensities in the entire images, our method uses the automatically detected boundary landmarks and internal landmarks to guide the deformable registration of histological and MR images, therefore offering the robustness to various distortions and cutting artifacts in histological images. In particular, the boundary landmarks are determined by analyzing the geometry of the surface of prostate capsule, and the similarity between the boundary landmarks in histological and MR images is calculated by the corresponding geometric features. The internal landmarks are determined by using a scale-space analysis method, which provides the saliency, location, and size of the local blob-like structure. Moreover, the similarity between two internal landmarks respectively in histological and MR images is determined by NMI between the local images around the internal landmarks under comparison. Finally, the correspondences among the automatically detected landmarks and the dense transformation between histological and MR images are simultaneously determined by maximizing an overall similarity function that integrates the similarities between landmarks with spatial constraints. Experimental results have shown that our proposed method can register anatomical landmarks within prostate capsules at a relatively accurate rate. Also, it can automatically label cancerous regions in MR image by using the cancerous regions reliably detected in histological images, thus facilitating us to learn the profile of cancerous tissues in MR images from a sufficient number of samples in the future. This is
628
Y. Zhan et al.
important for achieving image-based optimal biopsy using patient-specific information, as we mentioned in the beginning of this paper.
References 1. D. Shen, Z. Lao, J. Zeng, W. Zhang, I. A. Sesterhenn, L. Sun, J. W. Moul, E. H. Herskovits, G. Fichtinger, and C. Davatzikos, "Optimization of Biopsy Strategy by A Statistical Atlas of Prostate Cancer Distribution," Medical Image Analysis, vol. 8, pp. 139150, 2004. 2. Y.Zhan and D. Shen , "Deformable Segmentation of 3-D Ultrasound Prostate Images Using Statistical Texture Matching Method", IEEE Trans. on Medical Imaging, vol. 25, pp. 245-255, March 2006. 3. A. Madabhushi, M. Feldman, D. N. Metaxas, D. Chute, and J. Tomaszewski, "A Novel Stochastic Combination of 3D Texture Features for Automated Segmentation of Prostatic Adenocarcinoma from High Resolution MRI," presented at MICCAI, 2003. 4. I. Chan, W. Wells, 3rd, R. V. Mulkern, S. Haker, J. Zhang, K. H. Zou, S. E. Maier, and C. M. Tempany, "Detection of prostate cancer by integration of line-scan diffusion, T2mapping and T2-weighted magnetic resonance imaging; a multichannel statistical classifier," Med Phys, vol. 30, pp. 2390-8, 2003. 5. L. Taylor, B. Porter, G. Nadasdy, P. di Sant'Agnese, D. Pasternack, Z. Wu, R. Baggs, D. Rubens, and K. Parker, "Three-dimensional registration of prostate images from histological and ultrasound," Ultrasound Med Biol, vol. 30, pp. 161-168, 2004. 6. M. Jacobs, J. Windham, H. Soltanian-Zadeh, D. Peck, and R. Knight, "Registration and warping of magnetic resonance images to histological sections," Medical Physics, vol. 26, pp. 1568-1578, 1999. 7. T. Schormann and K. Zilles, "Three-dimensional linear and nonlinear transformations: An integration of light microscopical and MRI data," Human Brain Mapping, vol. 6, pp. 339347, 1998. 8. A. d. B. d'Aische, M. D. Craene, X. Geets, V. Gregoire, B. Macq, and S. K. Warfield, "Efficient multi-modal dense field non-rigid registration: alignment of histological and section images," Medical Image Analysis, vol. 9, pp. 538-546, 2004. 9. H. Chui and A. Rangarajan, “A new point matching algorithm for non-rigid registration”, Computer Vision and Image Understanding, 89:114-141, 2003. 10. D. Shen, E. Herskovits, and C. Davatzikos, "An adaptive focus statistical shape model for segmentation and shape modeling of 3D brain structures," IEEE Trans. on Medical Imaging, vol. 20, pp. 257-271, 2001. 11. T. Lindeberg, "Feature detection with automatic scale selection," International Journal of Computer Vision, vol. 30, pp. 77-116, 1998. 12. F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, and P. Suetens, "Multimodality Image Registration by Maximization of Mutual Information," IEEE Trans. on Medical Imaging, vol. 16, pp. 187-198, 1997. 13. M. Jenkinson and S.M. Smith, “A global optimization method for robust affine registration of brain images”, Medical Image Analysis, 5(2):143-156, June 2001. 14. C. Davatzikos, F. Abraham, G. Biros and R. Verma, “Correspondence Detection in Diffusion Tensor Images”, ISBI 06’, April 2006. 15. A. Andronache, P. Cattin, and G. Székely, "Adaptive Subdivision for Hierarchical Nonrigid Registration of Multi-modal Images Using Mutual Information," presented at MICCAI 2005, 2005.
Affine Registration of Diffusion Tensor MR Images Mika Pollari1, Tuomas Neuvonen2,4 , and Jyrki L¨ otj¨ onen3 1
4
Laboratory of Biomedical Engineering, Helsinki University of Technology, P.O.B. 2200, FIN-02015 HUT, Finland
[email protected] 2 Department of Clinical Neurophysiology, Helsinki University Central Hospital, P.O.B. 340, FIN-00029 HUS, Finland
[email protected] 3 VTT Information Technology, P.O.B. 1206, FIN-33101 Tampere, Finland
[email protected] Neuroscience Unit, Dept. of Physiology, Inst. of Biomedicine, University of Helsinki
Abstract. We present a new algorithm for affine registration of diffusion tensor magnetic resonance (DT-MR) images. The method is based on a new formulation of a point-wise tensor similarity measure, which weights directional and magnitude information differently depending on the type of diffusion. The method is compared to a reference method, which uses normalized mutual information (NMI), calculated either from a fractional anisotropy (FA) map or a T2 -weighted MR image. The registration methods are applied to real and simulated DT-MR images. Visual assessment is done for real data and for simulated data, registration accuracy is defined. The results show that the proposed method outperforms the reference method.
1
Introduction
Diffusion tensor magnetic resonance imaging (DT-MRI) [1] is a relatively novel method, which increases the diagnostic value of magnetic resonance imaging (MRI). DT-MRI is based on the attenuation of MR signal due to the random motion of water molecules during a particular diffusion-weighted imaging sequence. DT-MRI assumes a Gaussian diffusion profile and models the diffusion of water molecules in tissue with diffusion tensors, which can be estimated from a set of diffusion-weighted MR images. In the brain, the diffusion process varies greatly between tissues. In the cerebro-spinal fluid (CSF) and in most parts of the gray matter (GM) diffusion is isotropic, i.e. without a preferred direction. The white matter (WM) consists of axonal fiber bundles, which lead on to anisotropic diffusion, where the principal diffusion direction is parallel to the local orientation of fibers. Same behavior is also seen in thalamus and in some other internal GM structures. The ability of DT-MRI to detect white matter abnormalities has made it a popular method in neuroimaging applications. In 2002 Kubicki et al. reported that DT-MRI had been used in 14 different clinical applications such R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 629–636, 2006. c Springer-Verlag Berlin Heidelberg 2006
630
M. Pollari, T. Neuvonen, and J. L¨ otj¨ onen
as multiple sclerosis, Alzheimer’s disease, and schizophrenia [2]. In this paper we study the affine registration of DT-MR data. The main motivation behind this study is the construction of DT-MR atlas, which requires inter-subject registration. DT-MR atlases provide a tool to compare variations in white matter between groups and between a group and a subject. The simplest approach to DT-MR registration is to use the scalar data instead of the tensor data. The registration is based on either scalar indices or T2 -weighted MR images. Several indices have been proposed, see for instance [3]. The most frequently used indices are related to the size of diffusion (e.g. tensor trace), or the magnitude of anisotropy (e.g. fractional anisotropy). Even though scalar approach is simple, a lot of valuable information is discarded. Multi-channel registration methods present a second approach to DT-MR registration. Guimond et al. and Park et al. have introduced a multi-channel demons algorithm using the sum of squared differences calculated from the independent tensor components [4,5]. Leemans et al. proposed an affine registration method using k-channel mutual information calculated from 60 diffusion weighted and one T2 -weighted MR image [6]. Rohde et al. introduced a nonlinear registration method using multivariate mutual information calculated from independent tensor components and T2 -weighted data [7]. Yet another type of approach is based on a direct similarity between tensors. Alexander et al. [8] have extended an elastic multi-resolution matching algorithm to DT data. Zhang et al. proposed a registration algorithm which is based on diffusion profiles [9]. Ruiz-Alzola et al. [10] proposed a method, where registration is based on the identification of highly structured points and local matching of DT data. In this work, we introduce a novel affine DT-MR registration method. The transformation is based on the global affine model. Our main contribution is the unique formulation of a point-wise tensor similarity measure. It measures the overlap between diffusion modes (i.e. linear, planar, or isotropic). This overlap is weighted with information, which is the most reliable in each mode. The rest of the paper is organized as follows: In Sec. 2.1 the material is described. The general framework for the registration method is given in Sec. 2.2. Next, a detailed description of the point-wise similarity measure is given in Sec. 2.3. Evaluation protocol is described in Sec. 2.4. The results are presented in Sec. 3. Finally, the paper is finished with discussion and conclusion in Sec. 4
2 2.1
Material and Methods Material
We acquired diffusion tensor images from 4 voluntary subjects, 2 males and 2 females, without any diagnosed neurological disorders (ages from 27 to 30 years). The data were acquired with a Siemens Sonata 1.5T MRI scanner (Siemens, Erlangen, Germany) at the Helsinki University Central Hospital using 128×128 matrix, 1.75×1.75 mm2 in plane resolution, 4 mm slice thickness and 12 noncollinear diffusion weighting gradients with a b-value of 1000 s/mm2 , echo time (TE) of 119 ms, repetition time (TR) of 6800 ms, collecting 36 contiguous slices,
Affine Registration of Diffusion Tensor MR Images
631
repeating the measurement 6 times. Diffusion tensors were estimated for each voxel using a least-squares approach [11]. All measurements were approved by the local ethical committee. 2.2
Registration Method
The goal in registration was to find an optimal transformation T ∗ : x → x which transformed the spatial coordinates of the source volume A to the spatial coordinates of the reference volume B, so that the similarity function I was maximized T ∗ = max I(A(T (x)), B(x)). (1) T
The voxels from the source volume were transformed to the co-ordinate system of the destination volume using an affine transformation T (x) = Mx + t,
(2)
where M was a [3×3] matrix and t was the translation vector. During the transformation tensors were reorientated to ensure that their orientation remained consistent with the anatomical surrounding. In this study we used Preservation of principal direction (PPD) reorientation strategy, see details in [12]. In PPD reorientation two rotation matrices R1 and R2 were constructed and each transformed tensor D was reoriented using D = (R2 R1 )D(R2 R1 )T . The rotation matrices were constructed from the known rotation axis and angle. First, the matrix M was applied to unit eigenvectors v1,2,3 and new unit vectors ni = Mvi /|Mvi | were calculated. The first rotation matrix R1 (r1 , θ1 ) mapped v1 onto n1 . The rotation axis r1 and the rotation angle θ1 for R1 were r1 = v1 × n1 , θ1 = arccos (v1 · n1 ).
(3)
In (3), ’×’ and ’·’ denoted cross and dot product. The projection of n2 onto plane perpendicular of R1 v1 was P(n2 ) = n2 − (n2 · n1 ) · n1 .
(4)
The second rotation matrix R2 (r2 , θ2 ) mapped v2 from the position after first rotation R1 v2 onto plane spanned by n1 and n2 . The rotation axis r2 and the rotation angle θ2 for R2 were r2 = R1 v2 × P(n2 )/|P(n2 )|, θ2 = arccos (R1 v2 · P(n2 )/|P(n2 )|).
(5)
In the location where the source voxel was mapped, a new value for the destination data was interpolated. For scalar data we used tri-linear interpolation and for tensors we used Log–Euclidean tri-linear interpolation [13] 8 ˆ D = exp wi log(Di ) , (6) i=1
632
M. Pollari, T. Neuvonen, and J. L¨ otj¨ onen
where wi were the weights for tri-linear interpolation. Log–Euclidean interpolation was fast to compute and it produced positive definite tensors. The similarity measure was calculated voxel-by-voxel between the transformed and reoriented source voxels and the interpolated destination voxels, see details in Sec. 2.3. The similarity measure was calculated only from the brain voxels. The brain tissue was extracted using the following simple routine: 1) Thresholding was done for T2 data. Thresholding removed all air voxels leaving the actual brain voxels and some unwanted voxels (skin, fat, etc.) untouched. 2) The morphological opening was done in order to separate the unwanted voxels from the brain. 3) Connected components were labeled and the largest component (brain) was kept. 4) The morphological closing was performed to cancel out the effects of morphological opening. As an optimization, downhill simplex method from the hand-book of Numerical Recipes in C [14] was used. 2.3
Similarity Measure
Our approach was based on deriving a point-wise similarity measure. We looked overlap between diffusion modes (linear, planar, isotropic), and weighted this overlap with the information, which was the most reliable in that mode. For the diffusion modes, we used the coefficients defined by Westin et al. in [15] cl =
λ1 − λ2 λ2 − λ3 λ3 , cp = , cs = , λ1 λ1 λ1
(7)
where subscripts l, p, s denoted linear, planar, and spherical (i.e. isotropic) diffusion, and λ1,2,3 were the tensor eigenvalues. With these notations the similarity between reference A and source B volume had the form I(A, B) = cal cbl Sl (a, b) + cap cbp Sp (a, b) + γ ∗ cas cbs Ss (a, b), (8) a∈ΩA ,b∈ΩB
where Sl,p,s were the mode-dependent similarity functions, ΩA,B were the brain masks for the volumes, γ was a weighting coefficient, and superscript a denoted the transformed and reoriented source voxel, and superscript b denoted the corresponding interpolated destination voxel. Isotropic diffusion (CSF, GM, WM) was more common in brain than the anisotropic diffusion (only in WM). As we wanted to accurately align also fibers in the WM, we had to weight less isotropic voxels, and thus the weighting γ was introduced1 . Next, we derived mode-dependent similarity functions Sl,p,s . In the case of isotropic diffusion, the tensor trace tr = λ1 +λ32 +λ3 , and gray-level (g) from T2 data were well defined, whereas the directional information was ambiguous. Thus, we proposed the following similarity measure for isotropic diffusion: 1 |tra − trb | |g a − g b | Ss (a, b) = 1− + 1− . (9) 2 max (tra , trb , 1) max (g a , g b , 1) 1
In all the experiments, we used value γ = 0.5.
Affine Registration of Diffusion Tensor MR Images
633
In the case of linear and planar diffusion, the tensor trace and the gray-level information were useless as we already knew the tissue class (anisotropic diffusion took place in WM2 ). For linear diffusion, the first eigenvector v1 was well-defined whereas the other eigenvectors were ambiguous. Similarly for the planar diffusion, the third eigenvector v3 was well-defined but the other directions were ambiguous. Thus, we proposed the following directional similarities for linear and planar diffusion Sl (a, b) = |v1a · v1b |, Sp (a, b) = |v3a · v3b |,
(10)
(where ’·’ denoted the dot product). Equations (7)–(10) were combined and the similarity function was derived I(A, B) = cal cbl |v1a · v1b | + cap cbp |v3a · v3b | + a∈ΩA ,b∈ΩB
+γ ∗
1 cas cbs 2
1−
|tra − trb | max (tra , trb , 1)
|g a − g b | + 1− max (g a , g b , 1)
(11) .
For a reference to our similarity measure we used normalized mutual information [16]. NMI was calculated either from the T2 data or from the FA maps 3 ¯ 2 + (λ2 − λ) ¯ 2 + (λ3 − λ) ¯ 2) 3((λ1 − λ) ¯= 1 FA = , where λ λi . 3 i=1 2(λ21 + λ22 + λ23 ) 2.4
(12)
Evaluation
We used a two stage evaluation protocol: 1) For real data we did visual assessment. Each subject was registered to the other subjects (a total of 12 registrations). The registered source volume was compared with the reference volume using the chessboard visualization. In the chessboard visualization a new volume was constructed, where every second cube (we used cube size 16×16×16 voxels) was taken from the reference volume and every second cube from the transformed source volume. Misregistrations were seen as a disconnectivity in the cube borders. We divided the results into successful and misregistrations using visual (subjective) assessment. Using these classifications we calculated a success rate (i.e. percentage of successful registrations) for each method. 2) For simulated data we generated random affine transformations T known . The matrix M was generated as M = P(px , py , pz )Q(ϕx , ϕy , ϕz )Rx (θ)Ry (ω)Rz (φ),
(13)
where P wad a scaling matrix of form P = diag(px , py , pz ), Rx,y,z were rotation matrices around x, y and z axis, and Q was a shearing matrix of form ⎡ ⎤ 1 tan(ϕz ) tan(ϕy ) 1 tan(ϕx ) ⎦ . Q(ϕx , ϕy , ϕz ) = ⎣ (14) 0 0 1 2
And in some small internal GM structures.
634
M. Pollari, T. Neuvonen, and J. L¨ otj¨ onen
Fig. 1. The registration results for the proposed (left) and the reference method using MR images (middle) and FA maps (right) are shown using the chessboard visualization
The scaling parameters px,y,z were randomly drawn from the uniform distribution U (0.7, 1.3), the shearing parameters ϕx,y,z from the uniform distribution U (−π/8, π/8), and the rotation angles θ, ω, φ from the uniform distribution U (−π/20, π/20). Also translation parameters to each direction were drawn from the uniform distribution U (−7, 7) (measured in mm). Each subject was transformed according to these known transformations. First, inverse transformation for T known was calculated. Using the inverse transformation we calculated the mapping from each voxel location of new volume (simulated data) to original data. A new value was interpolated in this location. We used tri-linear interpolation for scalar data and Log–Euclidean interpolation tensors. Interpolated voxels were transformed to new volume and during the transformation tensors were reoriented (PPD). As a result we obtained the simulated data, where we knew the ground truth transformations. The simulated data were registered to the original data T reg and the registration error was calculated for all brain voxels (a total of N voxels in Ω)
=
1 ||(T reg ◦ T known )xi − xi ||2 , N
(15)
xi ∈Ω
3
Results
In Figure 1, the results were shown in one registration for the proposed (left) and the reference method using either T2 images (middle) or FA maps (left). In the left and in the middle there were no notable errors. In the right, there was a clear misalignment. The results in the left and in the middle were Table 1. The success rates for the proposed and the reference method using either MR images or FA maps. The total number of registrations was 12. Proposed NMI (MR) NMI (FA) 92% 75% 25%
Affine Registration of Diffusion Tensor MR Images
635
classified as a successful registration, but the case in the right was classified as a misregistration. The success rates for real data were given in Table 1. Ten random affine transformations were applied to each subject and 40 simulated data sets were created. Using these data sets we calculated the registration accuracy for each method. The results were given in Table 2. Table 2. The mean and the standard deviation of the registration accuracy (in mm) for the simulated data Proposed NMI (MR) NMI (FA) 0.1±0.1 8.9±8.9 16.9 ±4.1
4
Discussion and Conclusions
The proposed and the reference method were first applied to real DT-MR data. The proposed method gave a good alignment in 11 registrations out of 12 (a success rate of 92%). For the NMI-based reference method the success rate was considerably lower. With T2 images the success rate was 75% and with FA maps only 25%. Next, the methods were applied to the simulated data. The proposed method had the best robustness and accuracy (residual error 0.1±0.1 mm.). When NMI-based reference method was used with T2 data we noticed that there were large variations in the registration results (standard deviation 8.9 mm). Thus the overall accuracy was poor (mean accuracy 8.9 mm.). However, in 20 registrations out of 40 the average misregistration was smaller than the largest voxel dimension (4.0 mm). If only these registrations were considered, the method had a reasonable registration accuracy (0.4±0.8 mm.). The results for FA map based NMI registration are poor (residual error 16.9 ± 4.1). Even the smallest individual registration error was as high as 12.5 mm. These tests demonstrated that the proposed method was more robust than the reference NMI-base reference method. Furthermore, the results showed that the FA map based NMI registration could not produce accurate registrations. It should be noted that we could not conclude from the experiments what was the (true) inter-subject registration accuracy. The problem was in the evaluation protocol. In real data registration, inter-subject variations were present. However, using visual assessment we could only define the success rates but not the registration accuracies. With the simulated data the registration accuracies were calculated. However, the simulated data didn’t represent inter-subject variations. Thus, the simulated data experiment gave information about the robustness, but did not give any information about the (true) inter-subject registration accuracies. In summary, we proposed a new method for DT-MR registration. Our main contribution was a new formulation of a point-wise tensor similarity measure. Although not demonstrated in this study, the developed similarity measure is not limited to affine transformations. In the future, we plan to extend the method to non-rigid and nonlinear transformations, make quantitative evaluation of the method, and compare the method to other DT-MR registration methods.
636
M. Pollari, T. Neuvonen, and J. L¨ otj¨ onen
References 1. Basser, P., Mattiello, J., LeBihan, D.: MR diffusion tensor spectroscopy and imaging. Biophys. J. 66 (1994) 259–267 2. Kubicki, M., Westin, C.F., Maier, S., H. Mamata, M.F., Ersner-Hershfield, H., Kikinis, R., Jolesz, F., McCarley, R., Shenton, M.E.: Diffusion Tensor Imaging and Its Application to Neuropsychiatric Disorders. Harvard Rev Psychiatry 10 (2002) 324–336 3. Basser, P., Pierpaoli, C.: Microstructural and Physiological Features of Tissue Elucidated by Quantitative-Diffusion-Tensor MRI. Journal of Magnetic Resonance, Series B 111 (1996) 209–219 4. Guimond, A., Guttmann, C.R.G., Warfield, S.K., Westin, C.F.: Deformable registration of DT-MRI data based on transformation invariant tensor characteristics. In: Proceedings of the IEEE International Symposium on Biomedical Imaging (ISBI’02), Washington (DC), USA (2002) 5. Park, H., et al.: Spatial normalization of diffusion tensor MRI using multiple channels. NeuroImage 20 (2003) 1995–2009 6. Leemans, A., Sijbers, J., Backer, S., Vandervliet, E., Parizel, P.: Affine Coregistration of Diffusion Tensor Magnetic Resonance Images Using Mutual Information. Lecture Notes in Computer Science 3708 (2005) 523–530 7. Rohde, G.K., Pajevic, S., Pierpaoli, C.: Multi-channel registration of diffusion tensor images using directional information. In: ISBI, IEEE (2004) 712–715 8. Alexander, D.C., Gee, J.C.: Elastic Matching of Diffusion Tensor Images. Computer Vision and Image Understanding 77 (2000) 233–250 9. Zhang, H., Yushkevich, P.A., Gee, J.C.: Deformable Registration of Diffusion Tensor MR Images with Explicit Orientation Optimization. In Duncan, J.S., Gerig, G., eds.: MICCAI. Volume 3749 of Lecture Notes in Computer Science., Springer (2005) 172–179 10. Ruiz-Alzola, J., Westin, C.F., Warfield, S., Alberola, C., Maier, S., Kikinis, R.: Nonrigid registration of 3D tensor data. Medical Image Analysis 6 (2002) 143–161 11. Papadakis, N.G., Xing, D., Huang, C.L.H., Hall, L.D., Carpenter, T.A.: A Comparative Study of Acquisition Schemes for Diffusion Tensor Imaging Using MRI. Journal of Magnetic Resonance 137 (1999) 67–82 12. Alexander, D., Pierpaoli, C., Gee, P.B.J.: Spatial transformations of diffusion tensor images. IEEE Transactions on Medical Imaging 20 (2001) 1131–1139 13. Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Fast and simple computations on tensors with Log-Euclidian metrics. Research Raport 5584, INRIA (2005) 14. Press, W., Teukolsky, S., Vetterling, W., Flannery, B.: Numerical recipes in C: the art of scientific computing. Second edn. Cambridge University Press (1992) 15. Westin, C.F., Maier, S., Mamata, H., Nabavi, A., Jolesz, F., Kikinis, R.: Processing and visualization of diffusion tensor MRI. Medical Image Analysis 6 (2002) 93–108 16. Studholme, C., Hill, D.L., Hawkes, D.J.: An overlap invariant entropy mesure of 3D medical image aligment. Pattern Recognition 32 (1999) 71–86
Analytic Expressions for Fiducial and Surface Target Registration Error Burton Ma1 and Randy E. Ellis2 1
Human Mobility Research Centre, Queen’s University, Canada
[email protected] 2 Dept. of Radiology, Brigham & Women’s Hospital, USA
[email protected]
Abstract. We propose and test analytic equations for approximating expected fiducial and surface target registration error (TRE). The equations are derived from a spatial stiffness model of registration. The fiducial TRE equation is equivalent to one presented by [1]. We believe that the surface TRE equation is novel, and we provide evidence from computer simulations to support the accuracy of the approximation.
1 Introduction Many forms of computer-assisted surgery use registration to align patient anatomy to medical imagery. Quantifying registration error and its possible effects on the surgical outcome are of great interest to practitioners of computer-assisted surgery. One taskspecific method of measuring registration error is to estimate the target registration error (TRE), which is defined as the error between a measured anatomical target under the registration transformation and its corresponding location in the medical image. Simulation studies of fiducial TRE have been described in [2,3,4,5] and elsewhere. A statistical derivation of fiducial TRE was described in [1]. Equations predicting surface TRE, where points are measured from a surface and aligned to a model of the surface, have not yet been described, to the best of our knowledge. In this article we show that our spatial stiffness models of fiducial and surface registration yield analytic expressions for TRE that accurately match simulation results.
2 Spatial Stiffness of a Passive Mechanical System The background material, from the robotics literature, is taken from our previous work [6]. A general model of the elastic behavior of a passive unloaded mechanism is a rigid body that is suspended by linear and torsional springs, which leads to analysis of the spatial stiffness or compliance of the mechanism. For a passive mechanism in local equilibrium, a twist displacement t of a rigid body is related to a counteracting wrench force w by a 6 × 6 spatial stiffness matrix K: B w = Kt = BAT D t (1) where A, B, and D are 3 × 3 matrices. The twist is a vector t = [υ T ωT ]T where υ T = [vx vy vz ] is linear displacement and ω T = [ωx ωy ωz ] is rotational displacement. The R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 637–644, 2006. c Springer-Verlag Berlin Heidelberg 2006
638
B. Ma and R.E. Ellis
wrench is a vector w = [f T τ T ]T where f T = [fx fy fz ] is force and τ T = [τx τy τz ] is torque. Equation 1 is simply a general, vectorial expression of Hooke’s Law. K is a symmetric positive-definite matrix for stable springs and small displacements from equilibrium. The eigenvalues of K are not immediately useful because their magnitudes change with the coordinate frame used to define K; however, it can be shown [7] that the eigenvalues of KV = D − BT A−1 B and CW = A−1
(2)
are frame invariant. The eigenvalues μ1 , μ2 , μ3 of KV are the principal rotational stiffnesses, and the eigenvalues σ1 , σ2 , σ3 of C−1 W are the principal translational stiffnesses. The screw representation of a twist is a rotation about an axis followed by a translation parallel to the axis. The screw is usually described by the rotation axis, the net rotation magnitude M , with the independent translation specified as a pitch, h, that is the ratio of translational motion to rotational motion. For a twist h = ω ·υ/ω2 , M = ω, and the axis of the screw is parallel to ω passing through the point q = ω × υ/ω2 . A pure translation (where ω = 0) has h = ∞ and M = υ, with the screw axis parallel to υ passing through the origin. A unit twist has magnitude M = 1, in which case, for ω = 0, h = ω · υ and q = ω × υ. For a small screw motion with M = α and ω = 0, a point located at a distance ρ from the screw axis will be displaced by length l ≈ |α| ρ2 + (ω · υ)2 (3) Equation 3 is the basis of the frame-invariant quality measure for compliant grasps described by [7]. Because the principal rotational and translational stiffnesses have different units, they cannot be directly compared to one another. One solution is to scale the principal rotational stiffnesses by an appropriate factor (see [7] for details) to yield the so-called equivalent stiffnesses, μeq,i : μeq,i = μi /(ρ2i + (ω i · υ i )2 )
i = 1, 2, 3
(4)
where, μi is an eigenvalue of KV with an associated eigenvector ωi , and ρi is the distance between the point of interest and the screw axis of the twist [υ Ti ω Ti ]T . 2.1 Fiducial Registration Stiffness Matrix We previously described a spatial stiffness model of fiducial registration where each fiducial marker was attached to its noise-free location with a linear spring [8]. The resulting stiffness matrix for fiducial registration with N fiducials centered at the origin was shown to be ⎡ ⎤ N I3×3 ⎡ 2 −[Π×] ⎤
⎢ yi + zi2 −xi yi −xi zi ⎥ A B ⎢ ⎥ K=⎣ = (5) N ⎣ −xi yi x2i + zi2 −yi zi ⎦⎦ BT D [Π×] i=1 2 2 −xi zi −yi zi xi + yi th T where pi = [xi yi zi ]T is the location of the i fiducial and the matrix B = [Π×] 0 zi −yi is the cross-product matrix −zi 0 xi such that [Π×]u = Π × u. If the fiducials yi −xi
0
Analytic Expressions for Fiducial and Surface Target Registration Error
639
are centered at the origin then B = BT = 0 which is very unusual for stiffness matrices [8]. Because the principal stiffnesses are invariant under rigid coordinate frame transformation we can assume that the fiducials are centered at the origin without loss of generality. Under this assumption, the principal rotational stiffnesses are the eigenvalues of KV = D − BT A−1 B = D. The matrix D is the inertia tensor for a system of N point particles of unit mass [9]; thus, the rotational stiffnesses are the principal moments of inertia and the eigenvectors are the principal axes. 2.2 Surface Registration Stiffness Matrix Our stiffness model for surface registration also used linear springs attached to the registration point, but the other end of the spring was allowed to slide to the nearest corresponding surface point when the registration points were displaced by a small amount [6]. The simplest expression for the stiffness matrix for surface registration with N surface registration points was shown to be K=
N i=1
ni ni pi × ni
pi × ni =
A B BT D
(6)
where pi is the ith surface registration point and ni is its associated unit normal vector; expanding Equation 6 provides some intuition for optimizing registration point selection [6].
3 Target Registration Error We hypothesize that TRE can be estimated by considering a constant amount of work done (energy), c, and calculating the displacement of the system. The work done is taken to be the sum c = cδ + cr of two components cδ = cr representing the energies required to respectively translate and rotate the system. The magnitude of the translation and rotation are related to the error in localizing a fiducial marker or surface registration point. We refer to these errors are fiducial localization error (FLE) and point localization error (PLE), and we characterize them by their variances s2FLE and s2PLE . We explore our hypothesis first for the case of fiducial registration. 3.1 Fiducial Registration TRE Suppose that the fiducial markers are translated by an amount αδ1 in a direction parallel to υ 1 where υ 1 is the eigenvector associated with the principal translational stiffness σ1 . Such a translation will induce a TRE of magnitude αδ1 . The work done by this 2c translation is cδ1 = 12 σ1 α2δ1 ; simple rearrangement yields α2δ1 = σδ11 . A similar argument can be used for translations in the directions parallel to υ 2 and υ 3 . The squared 2c 2c 2c translational TRE is α2δ = α2δ1 + α2δ2 + α2δ3 = σδ11 + σδ22 + σδ33 . It seems reasonable to assume that the work done for each direction is equal if the FLE is isotropic; substituting cδ1 = cδ2 = cδ3 = c3δ and σ1 = σ2 = σ2 = N yields α2δ =
2cδ N
(7)
640
B. Ma and R.E. Ellis
Suppose the system is rotated about the axis ω 1 where ω 1 is the eigenvector associated with the principal rotational stiffness μ1 . Such a rotation will induce a TRE; let the magnitude of the TRE be αr1 . The work done by this rotation is cr1 = 12 μeq1 α2r1 ; simple 2c rearrangement yields α2r1 = μeqr1 . Using a similar argument for rotations about ω2 and 1
ω 3 leads to a total squared displacement of α2r = α2r1 + α2r2 + α2r3 =
2cr1 μeq1
+
2cr2 μeq2
+
2cr3 μeq3
Recall that Equation 4 states μeq,i = μi /(ρ2i + (ω i · υ i )2 ), where μi is the principal rotational stiffness, ω i is the eigenvector associated with μi , υ i is the linear displacement component of the twist vector ti = [υ Ti ω Ti ]T , and ρi is the distance between the target and the screw axis of ti . It can be shown that υ Ti and ω Ti must be mutually perpendicular if B = 0 [10]; therefore ω i · υ i = 0 and μeq,i = μi /ρ2i . Assuming the work done is evenly divided among the three rotations and substituting for μeq,i gives the squared rotational TRE as α2r =
2cr ρ21 2cr ρ22 2cr ρ23 + + 3μ1 3μ2 3μ3
(8)
Adding the two components are in quadrature and taking the square root yields the root mean squared TRE 2cδ 2cr ρ21 2cr ρ22 2cr ρ23 TRE = α2δ + α2r = + + + (9) N 3μ1 3μ2 3μ3 Amount of Work Done. The amount of work done cδ = cr must be defined to compute an estimate of TRE. It is easiest to compute the work done by translation. Consider the squared displacement of the center of mass α2com of a system of N points where the location of each point is subject to zero mean, isotropic noise with variance s2 . If the centroid of the noise-free points is at the origin the expected value of α2com is 2
2
2
2
2
2
E[α2com ] = E[P2 ] = E[P x + P y + P z ] = E[P x ] + E[P y ] + E[P z ]
(10)
where P = [P x P y P z ]T is the mean of the noisy point locations. Two statistical facts are ([11]) E[P x ] = E[P y ] = E[P z ] = 0 (11) s2 (12) 3N Where Var denotes the variance. Using Equation 11, Equation 10 can be rewritten as Var(P x ) = Var(P y ) = Var(P z ) =
2
2
2
E[α2δ ] = E[P x ] + E[P y ] + E[P z ] = E[(P x − E[P x ])2 ] + E[(P y − E[P y ])2 ] + E[(P z − E[P z ])2 ]
(13)
For any random variable X the variance of X is defined as Var(X) = E[(X − E[X])2 ] Thus Equation 13 becomes E[α2com ] = Var(P x ) + Var(P y ) + Var(P z ) =
s2 N
(14)
Analytic Expressions for Fiducial and Surface Target Registration Error
641
To compute the energy associated with the displacement of the center of mass we consider the work done by a single spring with spring constant kcom = N attached to the center of mass. The expected work done is E[ccom ] =
1 1 kcom E[α2com ] = s2 2 2
(15)
Substituting ccom = cδ and s2 = s2FLE into Equation 15 gives the expected work done by translation induced by fiducial localization noise as 1 2 s (16) 2 FLE Substituting cδ = cr and Equation 16 into Equation 9 gives the TRE for isotropic noise as 1 ρ2 ρ2 ρ2 TRE = sFLE + 1 + 2 + 3 (17) N 3μ1 3μ2 3μ3 E[cδ ] =
Recall that the μi are the principal moments of inertia of the fiducial configuration; squaring Equation 17 yields Equation 46 from [1] for the expected value of TRE2 . 3.2 Surface Registration TRE The development of the equations for estimating TRE for surface-based registration is similar to that presented in Section 3.1 for fiducial registration. Rather than repeating large sections of text, we refer the reader to Section 3.1 for details, and present only the relevant equations in this section. By considering the work done induced by translation along the three principal axes of translation, the squared translational TRE can be written as 2cδ 1 1 1 2 αδ = + + (18) 3 σ1 σ2 σ3 By considering the work done by rotation about the three principal axes of rotation, the squared rotational TRE can be written as 2cr ρ21 + (ω 1 · υ 1 )2 ρ2 + (ω 2 · υ 2 )2 ρ2 + (ω 3 · υ 3 )2 α2r = + 2 + 2 (19) 3 μ1 μ2 μ2 Addition of Equations 18 and 19 in quadrature yields our estimate for surface registration TRE. Amount of Work Done. The same argument used for defining the work done in the fiducial registration case (Section 3.1) applies here. The amount of work done by translation is cδ = 12 s2PLE . Equating the energies of translation and rotation, and substituting into Equations 18 and 19 gives the TRE for isotropic noise as sPLE 1 1 1 TRE = √ + + + σ1 σ2 σ3 3 2 12 ρ1 + (ω 1 · υ 1 )2 ρ22 + (ω 2 · υ 2 )2 ρ22 + (ω 3 · υ 3 )2 + + (20) μ1 μ2 μ2
642
B. Ma and R.E. Ellis Table 1. Ellipsoid registration point parameters (Δ = dπ/2)
u v
1 − π2 0
π 2
2 −Δ 0
π 2
3 + 0
Δ 2
4 0 − π2 + Δ
π 2
5 0 −Δ
π 2
6 π −
Δ 2
7 − π2 − π2 + Δ
8
9
π 2
π 2
− π2 + Δ
π 2
−Δ
4 Methods and Results We tested our TRE equations using computer simulations where registration transformations were computed using registration features (fiducials and model surface points) contaminated with isotropic localization noise. Our fiducial registration experiments were similar to those described in [2]. Because the square of Equation 17 is equivalent to Equation 46 from [1] we were not surprised that there was excellent agreement between the simulated RMS TRE and the predicted TRE. To conserve space for the more interesting surface registration simulations, we will omit the results from the fiducial simulations and refer the reader to [10] for results and discussion. 4.1 Surface Registration We ran simulations of surface registration TRE using a variety of surfaces: The surfaces used were an ellipsoid, and models of two cadaver femora and the radius of a patient who underwent computer-assisted surgery. The simulations used the following steps: 1. Normally distributed noise was drawn from N (0, s2PLE /3) and added to the x, y, and z components of the each surface registration point. 2. The noisy point locations were registered to the surface using ICP [12]. ICP was initialized with the true registration transformation (the identity) and was run to convergence in RMS error (tolerance of 10−9 mm, maximum 100,000 iterations). 3. The registration was applied to the set of target locations. The displacement of a target under the registration transformation was the target registration error. The ellipsoid had the parametric form p(u, v) = [150 cos v cos u
50 cos v sin u 25 sin v]T , 0 ≤ u ≤ π, − π2 ≤ v ≤
π 2
and its registration points were parameterized by a scalar quantity d where 0 ≤ d ≤ 1; the 9 registration points used can be computed using the values shown in Table 1. At d = 0 all of the points lie in the plane x = 0, which is the most circular cross section of the ellipsoid. The results for the ellipsoid simulations are shown in Figure 1. The predicted value of TRE agreed with the simulation results except for small values of d. The predicted TRE was large because there were small principal stiffnesses for these point configurations. The points were still close to the surface of the ellipsoid, however, so the ICP algorithm did not diverge far enough from the true registration transformation to produce the very large values of TRE predicted by the stiffness model. The predicted values of TRE as a function of target location and noise magnitude agree with the simulation results with an error of less than 2% of simulated RMS TRE.
Analytic Expressions for Fiducial and Surface Target Registration Error
643
Fig. 1. Estimated (solid curve) and simulated TRE (symbols) for the ellipsoid simulations; error bars shown at ±1 standard deviation of RMS TRE. (Left) TRE as a function of registration point location (sPLE = 0.5 mm, target [50 25 15]T ). (Middle) TRE as a function of target location along the line [−150 − 50 − 25]T + t[300 100 50]T (sPLE = 0.5 mm, d = 0.5). (Right) TRE as a function of noise magnitude (d = 0.5, target location [50 25 15]T ).
Fig. 2. Models, registration points (dark spheres), and targets (dark cubes) used for the bone surface TRE experiments. (Right) Targets are distributed along the mechanical axis of the femur for the distal femur experiment; the mechanical axis is clinically relevant in knee arthroplasty.
Target points and registration points used for the bone model experiments are shown in Figure 2. We varied the number of registration points used for the radius and proximal femur, and we varied the noise magnitude for the distal femur; results are shown in Figure 3. The predicted TRE always fell within one standard deviation of TRE magnitude for the radius and proximal femur experiments; the agreement between predicted and simulated TRE improved as the number of registration points increased. For the distal femur experiment using 14 registration points carefully selected from regions of low curvature the predicted TRE agreed with the simulated TRE with an error of less than 4% over a wide range of target locations and noise magnitude.
5 Discussion and Conclusion We have presented analytic expressions of expected root mean square fiducial and surface target registration error. The derivation for fiducial TRE is novel and the resulting equation is equivalent to one published by [1]. Equation 20 for approximating surface TRE is unique as far as we know. The estimated surface TREs were in good agreement with our simulated values. The reliability of Equation 20 can be compromised in realistic situations because of the strong dependence on the normal vectors ni . We would caution against relying on
644
B. Ma and R.E. Ellis
Fig. 3. Estimated (solid curve) and simulated TRE (symbols) for the bone surface simulations; error bars shown at ±1 standard deviation of RMS TRE. (Left) TRE as a function of the number of registration points for the distal radius. (Middle) TRE as a function of the number of registration points for the proximal femur. (Right) TRE as a function of target location along the line of the mechanical axis (parameterized by t) and noise magnitude for the distal femur experiment (sPLE = 0.35, 0.75, 1.0 mm).
Equation 20 if points are selected from surfaces with high curvature features. Nevertheless, we believe that Equation 20 is a useful tool for studying surface-based registration and we have already applied it the optimization of registration point selection [10].
References 1. Fitzpatrick, J.M., et al.: Predicting error in rigid-body point-based registration. IEEE Trans Med Imaging 17(5) (1998) 694–702 2. Maurer Jr., C.R., et al.: Registration of head volume images using implantable fiducial markers. IEEE Trans Med Imaging 16(4) (1997) 447–462 3. Evans, A.C., et al.: Image registration based on discrete anatomic structures. In Maciunas, R.J., ed.: Interactive Image-Guided Neurosurgery. American Association of Neurological Surgeons, Park Ridge, IL (1993) 63–80 4. Hill, D.L.G., et al.: Accuracte frameless registration of MR and CT images of a head: Applications in surgery and radiotherapy planning. Radiology 191 (1994) 447–454 5. Ellis, R.E., et al.: A method for evaluating CT-based surgical registration. In Troccaz, J., Grimson, E., M¨osges, R., eds.: CVRMed-MRCAS’97. LNCS 1205, Grenoble, France, Springer (1997) 141–150 6. Ma, B., Ellis, R.E.: Spatial-stiffness analysis of surface-based registration. In Barillot, C., Haynor, D., Hellier, P., eds.: MICCAI2004. LNCS 3216, Springer (2004) 623–630 7. Lin, Q., et al.: A stiffness-based quality measure for compliant grasps and fixtures. IEEE Trans Robot Automat 16(6) (2000) 675–688 8. Ma, B., Ellis, R.E.: A spatial-stiffness analysis of fiducial registration accuracy. In Ellis, R.E., Peters, T.M., eds.: MICCAI2003. Volume 1 of LNCS 2879., Springer (2003) 359–366 9. Meriam, J.L., Kraige, L.G.: Engineering Mechanics: Dynamics. John Wiley and Sons (1986) 10. Ma, B.: A Unified Approach to Surface-Based Registration for Image-Guided Surgery. PhD thesis, Queen’s University, Kingston, Ontario, Canada (2005) 11. Ross, S.M.: Introduction to Probability Models. 7th edn. Academic Press (2000) 12. Besl, P., McKay, N.: A method for registration of 3-D shapes. IEEE Trans Pattern Anal 14(2) (1992) 239–256
Bronchoscope Tracking Based on Image Registration Using Multiple Initial Starting Points Estimated by Motion Prediction Kensaku Mori1 , Daisuke Deguchi1 , Takayuki Kitasaka1, Yasuhito Suenaga1 , Hirotsugu Takabatake2 , Masaki Mori3 , Hiroshi Natori4 , and Calvin R. Maurer Jr.5 1
Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi, 464-8603, Japan
[email protected] 2 Sapporo Minami-Sanjo Hospital 3 Sapporo Kosei-General Hospital 4 Keiwakai Nishioka Hospital 5 Dept. of Neurosurgery, Stanford University
Abstract. This paper presents a method for tracking a bronchoscope based on motion prediction and image registration from multiple initial starting points as a function of a bronchoscope navigation system. We try to improve performance of bronchoscope tracking based on image registration using multiple initial guesses estimated using motion prediction. This method basically tracks a bronchoscopic camera by image registration between real bronchoscopic images and virtual ones derived from CT images taken prior to the bronchoscopic examinations. As an initial guess for image registration, we use multiple starting points to avoid falling into local minima. These initial guesses are computed using the motion prediction results obtained from the Kalman filter’s output. We applied the proposed method to nine pairs of X-ray CT images and real bronchoscopic video images. The experimental results showed significant performance in continuous tracking without using any positional sensors.
1
Introduction
A bronchoscope is a tool for observing the inside of the bronchus or used when doing a biopsy around a suspicious region. A typical video bronchoscope consists of a manipulator that a physician holds and a flexible tube equipped a tiny video camera, a light source, and a working channel. A physician watches a TV monitor that display video images taken by the video camera installed at the tip and proceed to the desired point where biopsy is performed. Due to the progress of CT imaging devices, it is possible to detect lung nodules of a very early stage, especially lung nodules existing on the peripheral areas of the lung. Very- or ultra-thin bronchoscopes are now available and enable biopsies of such lung cancers. However, since the bronchus has a complex tree structure with R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 645–652, 2006. c Springer-Verlag Berlin Heidelberg 2006
646
K. Mori et al.
many branches, it is very easy for a physician to get disoriented. Also, physicians have to pay careful attention to important organs such as aorta to avoid severely damaging them. The development of a bronchoscope guidance system that navigates a physician to a target point during bronchoscopic examination is strongly desired. Although several miniature-size magnetic sensors, which can be installed at the tip of a bronchoscope, are available, the outputs of these sensors are not so precise due to effect of Ferro-magnetic materials existing in examination rooms. There are several reports on overcoming these problems using bronchoscope tracking based on image registration. Even though a very precise magnetic sensor is available, the sensor’s output is affected by breathing or heart beat motion. This means that the sensor outputs will shift even when bronchoscopic images does not change. Reference [1] is about trying to guide a bronchoscope using a miniature-size magnetic sensor. However, the navigation accuracy is not very good (15mm at the right upper lobe). On the other hand, there are some reports on the estimation of bronchoscope motion based on image registration between real bronchoscopic (RB) images and virtual bronchoscopic (VB) images derived from CT images. Bronchoscope tracking methods based on image registration are quite robust against breathing or heart beat motion. Reference [5] estimates bronchoscopic motion by finding the camera parameters of the VB system that gives the most similar VB image to the current RB image. However, it does not work well if the bronchoscope shows large motion between two consecutive frames. Reference [2] proposed an image registration technique that uses bifurcation information of the bronchus. But it is hard to register RB and VB images around positions where no bifurcation structure is observed. Higgins et al. proposed a method for registering RB and VB images using normalized mutual information [3]. Reference [4] tries to create VB images that are very similar to RB images by recovering BRDF from registered RB and VB images. Nagao et al. presented a method that predicts bronchoscopic motion using the Kalman filter [8]. However, the image registration is still trapped in the local maxima. To solve these problems, this paper proposes a method for bronchoscope tracking based on image registration using multiple initial guesses (starting points). These initial guesses are obtained from bronchoscope motion prediction results calculated with the Kalman filter. Also, we show a method for overlaying anatomical names on RB images using estimated bronchoscopic camera motion.
2 2.1
Method Overview
The inputs of the proposed methods are a set of CT images and a bronchoscopic video from the same patient. The output is a sequence of the estimated bronchoscope camera position and the orientation represented in the CT coordinate system (=the VB coordinate system). Bronchoscope tracking is formulated as the search process that finds the best camera position and orientation of the VB system that generates the most similar VB image to the current RB frame. In
Bronchoscope Tracking Based on Image Registration
647
this search process, multiple starting points are used to avoid falling into local maxima. Figure 1 shows the processing flow. 2.2
Camera Parameter
There are two types of camera parameters: (a) an intrinsic camera parameter and (b) an extrinsic camera parameter. The former includes the viewing angle, the offset of the image center from the optical axis, or the distortion parameters. Since these parameters are fixed for each bronchoscopic camera, we obtain these parameters using Tsai’s calibration method prior to the bronchoscopic examination. The latter defines the camera position and orientation. Here, we denote the extrinsic camera parameter that generates a VB image V corresponding to RB image Bk at the frame k as Rk tk Qk = , (1) 0T 1 where Rk and tk represent a rotational matrix and a translational vector, respectively. The transformation Qk is a transformation matrix from the camera coordinate system to the CT coordinate system. The origin of the camera coordinate system is located at tk and its orientation is represented by Rk in the CT coordinate system at the k-th frame. The extrinsic camera parameter of the camera at the (k + 1)-th frame is represented as ΔRk+1 Δtk+1 Qk+1 = Qk ΔQk+1 = Qk , (2) 0T 1 where ΔRk+1 and Δtk+1 = (Δtx , Δty , Δtz )T represent a rotational matrix and translational vector defined in the camera coordinate system at the k-th frame and show the change of the camera position and orientation from the k-th to (k+1)-th frames. For simplicity, we write a matrix Q composed from a rotational matrix R and a translational vector t as Q = (R, t). Estimation of the RB camera motion is equivalent to finding a sequence of the VB camera parameter that generates VB images corresponding to RB images. Therefore, RB camera tracking is done by obtaining a sequence of bronchoscope motion ΔQk+1 = (ΔRk+1 , Δtk+1 ). 2.3
Camera Motion Tracking
Initial Setup. For the first frames of the RB video, the initial starting point of the search is manually given by comparing the first frame of an RB video with VB images with changing R and t. Motion Prediction. We predict the RB camera position and orientation using the Kalman filter [8]. RB motion is modeled by its velocity and acceleration. We assume that the bronchoscope shows constant acceleration motion in very short period. We input the positions of the bronchoscope at the 1, · · ·, k-th frames and obtain the predicted position at the (k + 1)-th frame.
648
K. Mori et al.
Observation parameter Qk Camera Motion Tracking Rough estimation Predict Δ tk+1 by Kalman filter Compute initial search positions Qk+1 using Δ tk+1 ^
Precise estimation
Real Bronchoscope
2 Δt^k +1 ekx
Real Bronchoscopic image Bk+1
Similarity calculation between Bk+1 and V
Observation parameter Qk+1
Fig. 1. Processing procedure
3D X-ray CT image
eky
tk
ekz
Δ t^k +1
ek z
t^k +1
k+ 1
Virtual Bronchoscopy System
Δt ^
yes
no
Changing observation parameter Δ Qk+1
2
Is the similarity max?
2 Δt^k +1 eky
Virtual Bronchoscopic image V
ekx
Fig. 2. Allocation of initial points for image registration
Calculation of Multiple Initial Guess for Image Registration. We calculate multiple starting points for image registration using the predicted camera position. First, we obtain the predicted camera position ˆ tk+1 for the current frame. Then, initial positions for the search of image registration process are computed as the predicted position ˆ tk and a set of points Nk around ˆ tk . Here, Nk is defined as Nk = ˆ tk + Sv|v = aexk + beyk + cezk , |a| + |b| + |c| = 3, a, b, c = ±1 , (3) where the vectors exk , eyk , and ezk are the basis vectors that represent the orientation of the VB camera at the k-th frame. The matrix S is calculated as ⎛ ⎞ |Δˆ tk+1 · exk | 0 0 ⎠, S=⎝ (4) 0 |Δˆ tk+1 · eyk | 0 z ˆ 0 0 |Δtk+1 · ek | where Δˆ tk+1 = ˆ tk+1 − tk is a vector that represents movement from the camera position tk at the k-th frame to the predicated camera position ˆ tk+1 at the (k + 1)-th frame. The initial camera positions defined by Eq. (4) are vertices of a rectangular illustrated in Fig. 2 and ˆ tk+1 . Multiple Image Registration. We perform image registration between the RB and VB images by using the initial guesses computed in the previous step. As an initial guess for camera orientation, we use the camera orientation Rk at the the previous frame. So the initial starting point of camera parameter estimation
Bronchoscope Tracking Based on Image Registration
649
ˆ k+1 = (Rk , t)(t ∈ Nk ). We in the image registration process is described as Q perform the image registration process for each initial starting point by ˆ k+1 ΔQ), ΔQ∗k+1 = arg max E(Bk+1 , V(Q ΔQ
(5)
where E(B, V) is the image similarity measure between B and V. Here, we use the selective image similarity measure presented in [6]. Powell method is used for solving this maximization process. After performing the image registration process for each initial starting point, we select the obtained camera parameter change ΔQ∗k+1 that gives the highest image similarity as the final result of camera motion estimation ΔQk+1 . The camera parameter at the (k + 1)-th ˆ k+1 ΔQk+1 . Since more than three frames are frame is calculated as Qk+1 = Q ˆ k+1 = Qk required to obtain motion prediction from the Kalman filter, we use Q as an initial guess and perform image registration once for each frame if k < 3. After image registration, we proceed to the next frame. 2.4
Anatomical Name Display During Bronchoscopy
It is possible to overlay navigation information on RB images if the RB camera motion is tracked. Here, we overlay the anatomical names of bronchial branches that are currently observed on RB images. To achieve this function, we extract bronchus regions from CT images. Next, tree structures are extracted from the extracted bronchus regions. Anatomical names are automatically assigned to bronchial branches using the method presented in [7]. Visible branches on RB images are determined using the camera position and orientation of the RB and tree structure. If visible branches are identified, we overlay their anatomical names on the RB images.
3
Experiments and Discussion
We applied the proposed method to nine pairs of CT images and bronchoscopic videos (eight real patients and one phantom). The image acquisition parameters of the CT images are: 512x512 pixels, 72-341 slices, 1.25-5.0mm slice thickness, and an 0.5-2.0mm reconstruction pitch. RB videos were taken by BF-240 (Olympus, Tokyo, Japan) and recorded onto digital video tapes. All of the registration task were performed as off-line jobs due to the limitations of computation time. We used a personal computer with dual 3.6GHz Intel Xeon processors and used Linux operating system. We also performed bronchoscope tracking using the previous method (using only one initial starting point and without motion prediction by Kalman filtering) and with multiple initial guesses without motion prediction. In the latter case, the camera position of the previous frame is used as ˆ tk ). To process one RB frame, it took 8.70 seconds, while the previous method (using only one initial guess) took 1.01 seconds. We counted the number of frames that were successfully tracked with visual inspection. The results of tracking are shown in Table 1 and bronchoscope tracking are shown in Fig. 3. Also, we tried
650
K. Mori et al. Table 1. Results of bronchoscope motion tracking
Max movement Number Case Path in a sec. [ mm] of frames A 35.9 993 1 B 19.5 690 C 34.4 379 D 28.5 279 2 A 35.0 430 A 68.4 1365 3 B 43.6 965 C 40.6 865 4 A 26.2 300 A 19.1 400 5 B 50.6 800 C 32.9 524 A 25.5 225 6 B 31.9 530 7 A 14.2 2000 A 36.9 500 8 B 20.5 205 9 A 32.8 1600 Total 13050
Num. of successive frames tracked continuously Previous Proposed method method w/o KF w/ KF 993 918 993 690 690 690 82 379 379 279 279 279 430 430 430 278 278 1365 965 965 965 679 859 865 82 82 82 400 400 400 487 477 410 353 515 524 140 154 190 457 457 219 339 412 2000 225 354 493 205 205 205 467 1600 1600 7551 9454 12089
to display anatomical names on RB images using the image registration results. Figure 4 shows examples of anatomical display on RB images. From the results shown in Table 1, we can see that the proposed method showed a greatly improved tracking performance compared to those with the previous method. The total number of frames tracked successfully increased by 4538 frames (roughly equivalent to 150 seconds). The tracking performance was especially much improved in Case 3A where the bronchoscope showed large movement between successive frames (see Max. movement in a sec. in Table 1). The previous methods fell into local solution in the search process. However, introduction of the search from multiple initial starting points avoided this local solution and showed good tracking results. Motion prediction for the initial starting points setup also contributed to this improvement. As shown in Table 1, we still failed in tracking (Case 1A or Case 3B). In such cases, bubbles or large bronchi deformation caused by heartbeat were observed on RB images. Both the previous and the proposed methods could not track bronchoscopic motion appropriately. One solution for solving this problem is to pause tracking if such organ deformation or bubbles are detected on RB images and to resume when they disappeared. Since we used multiple initial starting points in the search process, the computation time increased from 1.01 to 8.70 seconds. However, each search process is executed independently. It is quite easy to implement this on cluster-type PCs for reducing computation time.
Bronchoscope Tracking Based on Image Registration
(a) Case 1C
(b) Case 2A
(c) Case 3A
651
(d) Case 7A
Fig. 3. Tracking results of the proposed method
Fig. 4. Anatomical name overlay on RB images using image registration results
4
Conclusion
This paper presented a method for bronchoscope tracking that uses utilizes image registration from multiple starting points calculated from motion prediction results. Multiple image registration is performed from the calculated starting
652
K. Mori et al.
points to avoid to being trapped in local maxima. The experimental results showed a great improvement in tracking performance. Future work includes: (a) the use of PC clusters to achieve real-time RB camera motion tracking, (b) detection of frames where bubble or large bronchi deformation is observed, (c) further consideration on the allocation of initial stating point for image registration. Acknowledgements. The authors wish to thank our colleagues for their suggestions and advice. Parts of this research were supported by Grant-In-Aid for Scientific Research from the Ministry of Education, the 21st-Century COE program, a Grant-In-Aid for Scientific Research from the Ministry of Education, Culture, Sports, Science, and Technology, Japan Society for Promotion of Science, a Grant-In-Aid for Cancer Research from the Ministry of Health and Welfare, and Kayamori Foundation of Informational Science Advancement.
References 1. A. Schneider, H. Hautmann, H. Barfuss, T. Pinkau, F. Peltz, H. Feussner, A. Wichert, “Real-time image tracking of a flexible bronchoscope,” Proc. of CARS2004, pp.753–757, 2004 2. I. Bricault, G. Ferretti, and P. Cinquin, “Registration of real and CT-derived virtual bronchoscopic images to assist transbronchial biopsy”, IEEE Trans. Med. Imaging, vol. 17, pp. 703–714, 1998. 3. J. P. Helferty, and W. E. Higgins, “Technique for Registering 3D Virtual CT Images to Endoscopic Video,” Proceedings of ICIP (International Conference on Image Processing), pp. 893–896, 2001 4. A. J. Chung, F. Deligianni, A. Shah, A. Wells, and G. Z. Yang, “VIS-a-VE: Visual Augmentation for Virtual Environments in Surgical Training,” Data Visualization 2005, Eurographics / IEEE VGTC Symposium on Visualization, pp. 101–108, 2005 5. K. Mori, D. Deguchi, J. Sugiyama, Y. Suenaga, J. Toriwaki, C. R. Maurer Jr., H. Takabatake, and H. Natori, “Tracking of a bronchoscope using epipolar geometry analysis and intensity-based image registration of real and virtual endoscopic images,” Medical Image Analysis, 6, pp.321-336, 2002 6. D. Deguchi, K. Mori, Y. Suenaga, J. Hasegawa, J. Toriwaki, H. Takabatake, H. Natori, “New Image Similarity Measure for Bronchoscope Tracking Based on Image Registration,” Proc. of MICCAI 2003, Lecture Notes in Computer Science,Vol.2878, pp.309-406, 2003 7. K. Mori, S. Ema, T. Kitasaka, Y. Mekada, I. Ide, H. Murase, Y. Suenaga, H. Takabatake, M. Mori, H. Natori, “Automated Nomenclature of Bronchial Branches Extracted from CT Images and Its Application to Biopsy Path Planning in Virtual Bronchoscopy,” Proc. of MICCAI 2005, Lecture Notes in Computer Science,Vol.3750, pp.854–861, 2005 8. J. Nagao, K. Mori, T. Enjouji, D. Deguchi, T. Kitasaka, Y. Suenaga, J. Hasegawa, J. Toriwaki, H. Takabatake, H. Natori, “Fast and Accurate Bronchoscope Tracking using Image Registration and Motion Prediction,” Proc. of MICCAI 2004, Lecture Notes in Computer Science,Vol.3217, pp.551–558, 2004
2D/3D Registration for Measurement of Implant Alignment After Total Hip Replacement Branislav Jaramaz1,2 and Kort Eckman2 1
The Western Pennsylvania Hospital, Pittsburgh, Pennsylvania, USA 2 Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
[email protected],
[email protected]
Abstract. Measurements of cup alignment after total hip replacement (THR) surgery are typically performed on postoperative pelvic radiographs. Radiographic measurement of cup orientation depends on the position and orientation of the pelvis on the X-ray table, and its variability could introduce significant measurement errors. We have developed a tool to accurately measure 3D implant orientation from postoperative antero-posterior radiographs by registering to preoperative CT scans. The purpose of this study is to experimentally and clinically validate the automatic CT/X-ray matching algorithm by comparing the X-ray based measurements of cup orientation with direct 3D measurements from postoperative CT scans. The mean measurement errors (± stdev) found in this study were 0.4°±0.8° for abduction and 0.6°±0.8° for version. In addition, radiographic pelvic orientation measurements demonstrated a wide range of inter-subject variability, with pelvic flexion ranging from –5.9° to 11.2°. Keywords: radiograph, registration, mutual information, hip replacement, THR, THA.
1 Introduction The standard postoperative follow-up procedure for total hip arthroplasty involves the assessment of implant orientation from antero-posterior (AP) radiographs to assure the surgeon that the implant is correctly aligned. Improper placement can lead to impingement and dislocation problems, increased polyethylene wear, and implant loosening. The long-term follow-up procedure typically involves obtaining a series of radiographs over a period of years to detect any changes in implant alignment or in surrounding bony tissue. Accurate implant orientation measurements provide useful feedback to the surgeon who, based on this information, may be able to detect a systematic bias in operative technique and adjust it accordingly. Information about implant orientation is also used to determine the need for revision surgery and other additional interventions. The standard postoperative follow-up procedure involves the acquisition of 2D AP radiographs. The orientation of the implant is then measured on the resultant radiograph. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 653 – 661, 2006. © Springer-Verlag Berlin Heidelberg 2006
654
B. Jaramaz and K. Eckman
In the case of computer-assisted surgery, an accurate tool to measure postoperative implant orientation could serve to test overall system accuracy and the consistency of performance. Computer-assisted surgery typically provides the surgeon with the measurements of cup orientation (abduction and version) relative to the anterior pelvic plane (APP). This plane, defined by the anterior superior iliac spines and pubic tubercles provides a reliable and repeatable anatomic reference system for the pelvis. However, the same reference system cannot be established in AP radiographs, and some definition of the transverse axis, typically a line connecting the teardrops, is used instead. Consequently, the radiographic measurements of cup orientation may differ from those recorded intraoperatively. Furthermore, the accuracy of radiographic measurement depends on the position of the X-ray source relative to the film as well as on the orientation of the pelvis at the time the X-ray was taken. It is commonly assumed that for the AP radiograph the anterior pelvic plane is parallel to the film plane. Unfortunately, this is not guaranteed by the standard radiographic protocol. Pelvic flexion can vary considerably even for the same patient [3]. These pelvic pose variations will negatively impact implant measurement accuracy. By co-registering a radiograph with both a synthetic X-ray, which is generated from a CT scan of the patient, and a projection of the implant CAD model, 2D/3D registration recovers the true pose of the implant with respect to the patient’s anatomy. A single CT scan, preoperative or postoperative, can be used to analyze an entire series of X-rays. The purpose of the present study is to validate both experimentally and clinically a new CT/X-ray matching algorithm based on mutual information. During the past decade, mutual information has received considerable attention as an accurate registration tool. Nonetheless, mutual information usually contains many local maxima that affect the robustness of registration. Moreover, since mutual information also produces erroneous global maxima, its use is generally restricted to local registration tasks. Considerable research has been directed at improving the measure. Spatial information is simply ignored by direct application of mutual information as it is invariant to pixel/voxel reshuffling (excepting values that are the results of interpolation). Methods that modify mutual information and incorporate spatial information into the measure have demonstrated improved robustness. One of these methods is regional mutual information (RMI), which extends the dimensionality of the histograms [7]. Another method incorporates spatial information by multiplying conventional mutual information by a function that compares the gradient vectors of the images [6]. In this study, we used conventional mutual information and gradient comparisons along with pixel masking to exclude additional independent rigid objects (femurs) and prevent them from affecting the similarity measure. Erroneous maxima are avoided by coarse initial alignments of the implants and CT scans by the user. The validation is performed by comparing the measurements of cup orientation using 2D/3D registration software, with direct CT measurements. The same clinical data were previously examined using interactive manual alignments of the synthetic and real radiographs [1]. In this study we report the results obtained with a fully automatic registration procedure.
2D/3D Registration for Measurement of Implant Alignment After THR
655
Fig. 1. Reference systems in the synthetic x-ray projection environment. The CT scan (CT) and the cup model (Cup) are placed relative to the x-ray film (Radiograph) in order to create a synthetic projection. The anatomic reference plane of the pelvis (APP) is specified in the CT scan.
2 Materials and Method 2.1 2D/3D Matching Protocol An original CT/ X-ray image-matching algorithm was developed to measure the pelvis orientation and acetabular cup position from AP radiographs using the following protocol: 1. An anatomic reference system for the pelvis CTpelvisTAPP is defined in the CT scan using the CT-based THR planning software. (T denotes a standard 4x4 rigid transformation matrix) 2. A real pelvic AP radiograph was digitized (image A). All radiographs were taken using the standard 40" (1016mm) source-to-film distance. The x-ray source locations were marked on the film using small metallic washers. 3. A synthetic pelvic AP radiograph is generated (image B) from the CT scan, with known (X-ray source position, film-to-source distance) and assumed (pose of the CT scan within the X-ray system: radiographTCTpelvis) projection parameters (Fig 1.). 4. Image similarity was evaluated using a combination of mutual information [8], [9] and gradient comparisons [6]. For two images A and B, with a joint probability mass function p(a,b) and marginal probability mass functions p(a) and p(b), the mutual information I(A;B) is defined as:
656
B. Jaramaz and K. Eckman
I ( A; B ) = ∑∑ p (a, b) log a∈ A b∈B
p ( a, b) . p (a ) p (b)
(1)
Mutual information can be rewritten using marginal and joint entropies in the following manner:
I ( A; B ) = H ( A) + H (B ) − H ( A, B ) .
5.
6.
7.
8.
(2)
The two images (real and synthetic radiograph) are registered by maximizing the similarity function. In order to minimize the effects of additional rigid bodies, the femurs and implant components were masked within the procedure. The masked pixels were excluded from the gradient comparison and mutual information objective function. Gross initial alignments of the implants and CT scans were provided by the user. The pose of the CT scan in the projection space was automatically manipulated until the best match between the real and the synthetic X-ray was found (Fig 2). A virtual projection of the appropriate CAD implant model was generated for a given position and spatial orientation of the cup. The initial coarse alignment is specified by the user. The cup position and orientation relative to the x-ray film radiographTcup is iteratively modified using another MI based automatic method until the virtual projection best matches the projection of the cup in the real radiograph (Fig 3). Cup orientation relative to the anatomic reference system of the pelvis APPTcup is calculated: APP
Tcup = APPTCTpelvis .
CTpelvis
Tradiograph .
radiograph
Tcup .
(3)
radiograph
9. In addition, pelvic flexion TAPP (inclination of the APP plane relative to the radiographic plane) is calculated from: radiograph
TAPP = radiographTCTpelvis .
CTpelvis
TAPP .
(4)
Rendering synthetic radiographs from CT data sets can be a computationally intensive process. Moreover, registering real and synthetic radiographs requires the generation of numerous synthetic radiographs. In order to achieve registration in a timely manner, the radiographs were synthesized using custom shader programs running on special graphics hardware. Specifically, an nVidia GeForce 7800GT was used for Xray synthesis. This allowed for fast rendering of high dynamic range, floating-point precision images, and resulted in registration times on the order of 60 seconds. 2.2 Direct Measurements of Cup Alignment from Postoperative CT To define the anatomic reference system of the pelvis, the anterior pelvic plane (APP) was defined in the CT-scan space. The two anterior superior iliac spines and the pubic tubercles, were identified manually by inspecting the three orthogonal cross sections of the CT scan. Because of the presence of the THR implants in the scan and the resulting imaging artifact, it was not possible to segment these landmarks automatically. Images were then segmented with an intensity threshold set to a level where only the metallic cup appears in the image. This selection was performed in
2D/3D Registration for Measurement of Implant Alignment After THR
657
order to eliminate the image reconstruction artifacts and allow generation of an artifact-free, segmented surface model of the acetabular cup. Renderings of a CAD 3D model of the cup were then generated and displayed on the computer screen, and overlaid onto CT cross sections and the segmented surface model of the cup. This model was manipulated until the best possible match was found in all views. The final orientation was recorded as cup abduction and version in the anatomic reference system of the pelvis. The differences between 2D/3D registration and direct CT-based measurements for cup abduction and version were defined as abduction and version errors in the 3D anatomic reference frame and used to validate the method. The pelvic orientation was measured with 2D/3D registration method and reported in terms of pelvic flexion relative to the plane of the x-ray film.
Fig. 2. Registration of the pelvis: (a) real radiograph (b) simulated radiograph corresponding to the solved pose of the CT in the x-ray space (c) real and simulated radiographs overlaid
2.3 Laboratory Experiment The procedure was first tested in the laboratory with a hard plastic phantom pelvis in which an acetabular cup was implanted. The model was mounted on an acrylic sheet in order to increase rigidity and to improve the definition of the anterior pelvic plane. Three small metallic fiducial markers were fitted to the acrylic plate in order to create a well-defined pelvic coordinate system. Two of these fiducials were placed near the anterior iliac spine points. The third fiducial was mounted between the left and right pubic tubercle points. A CT scan, with an in-plane resolution of 0.781 mm/pixel and an inter-slice distance of 1.25mm, was then obtained. A CT-based Preoperative Planner for THR [4] was used to determine the actual acetabular implant orientation from the CT scan. After measuring the acetabular implant orientation, the implant was digitally removed from the CT scan in order to simulate a preoperative CT and to make the alignment process more realistic. The acrylic sheet and the fiducials were digitally removed as well. The OptoTrak system (Northern Digital, Inc., Ontario, Canada) was used to verify the Planner cup orientation measurement. The locations of the fiducials and the location and orientation of the acetabular cup were digitized relative to a pelvic tracker using a series of pointer probe collections. The abduction/version of the cup differed from the Planner measurements by less than 0.2° in abduction and by less than 0.3° in version.
658
B. Jaramaz and K. Eckman
Fig. 3. Registration of the cup: (a) real radiograph (b) simulated projection for the solved pose of the cup (c) the real and simulated images overlaid
Four AP X-rays, one for training and three for testing, were taken of the pelvic assembly using the standard 40" source-to-film distance. A small metallic washer was used to mark the X-ray source position on the film. Three of these X-rays were taken with the pelvis in a near neutral pose and one was taken with a pronounced pelvic flexion of -11°. The source position ranged from above the bottom of the sacrum to above the top of the sacrum. A group of ten volunteers used 2D/3D registration software to measure implant pose. The measurements were taken in the following manner. First, the user roughly aligned the CT based synthetic X-ray with the real Xray. The user then instructed the system to perform an automatic CT/radiograph alignment using the procedure outlined in the previous subsection. Finally, the users were able to apply additional manual manipulations and the automatic method until they were satisfied with the match. 2.4 Clinical Experiment This analysis was performed using the postoperative pelvic X-rays and CT scans of 19 patients, who underwent bilateral (but not simultaneous) primary total hip replacement (THR), guided by surgical navigation [2]. The included patients had two preoperative pelvic CT scans, one before each intervention. The second of the two scans imaged the first total hip prosthesis, and this second scan was included in this study as a postoperative CT. There were 11 women and 8 men with a mean age of 61 years (range, 49-82 years). The same uncemented, metal-backed type of acetabular implant was used in all cases. (Trilogy, Zimmer, Warsaw, IN). Antero-posterior (AP) supine pelvic radiographs obtained closest to the date of the second CT scan were used for radiographic measurements.
3 Results 3.1 Experimental Measurements For the phantom study, the average error between the 2D/3D registration measurements and the CT Planner measurements was -0.05° in abduction and -0.37° in version with a standard deviation of 0.16° and 0.94° respectively. The range of
2D/3D Registration for Measurement of Implant Alignment After THR
659
values for abduction was -0.49° to 0.21° and -3.10° to 1.68° for version. A paired ttest analysis using Microsoft Excel 2000 was performed to determine if there was any significant difference (alpha > 0.017 using the Bonferroni multiple comparisons correction) in the results based on the X-ray image used. There was no significant difference between any pair of images. 3.2 Clinical Measurements 3.2.1 Direct CT Measurements of Cup Alignment and Pelvis Orientation Measured directly from postoperative CT scans of 19 patients, the mean (±standard deviation) CT cup abduction was 52° ± 5° and ranged from 43° to 59°. The mean acetabular cup version was 18° ± 7° (range -0.5°- 29°). Pelvis flexion angles in CT scans ranged from -14° to 15°, with a 1.6° mean and a standard deviation of 6.5°. This data range represents the anatomic inter-subject variability of the pelvic flexion angles. 3.2.2 Cup Position and Pelvic Orientation from AP X-Rays Using CT Matching The mean (± standard deviation) acetabular cup abduction was 52.9° ± 4.8°, ranging from 46.0° to 60.1°. The cup version ranged from 3.1° to 29.7°, with a mean (± standard deviation) of 18.7° ± 6.6°. The measurement differences, compared to CT measurements, were 0.4° ± 0.8° (mean ± standard deviation) for cup abduction and 0.6° ± 0.79° (mean ± standard deviation) for cup version. The individual measurements are shown in Figure 4. The maximum absolute errors were 2.15° for abduction and 2.03° for version. The mean pelvic flexion was 3.9° ± 4.9° (± standard deviation) and its values ranged from -5.9° to 11.2°.
Fig. 4. Differences in measured cup orientation between direct CT measurements and the 2D/3D registration method
660
B. Jaramaz and K. Eckman
4 Discussion Using 2D/3D registration with the automatic mutual information method, the threedimensional anatomic orientation of the acetabular implant can be determined accurately from a single AP X-ray. Implant version is measured with a slightly lesser accuracy than abduction, but still within the practical range. Version accuracy might be improved by adapting the 2D/3D registration algorithm to work with multiple Xrays views simultaneously. Tissue noise, scatter noise, and a lower resolution scanning protocol may reduce the accuracy of clinical measurements. The scatter produced by the metallic femoral head may also reduce the accuracy of the postoperative CT reference measurements. It is important to stress that the 2D/3D registration results for cup and pelvic orientation are all referenced anatomically. The difference between 2D/3D values and CT values for cup orientation were considered as 2D/3D registration measurement errors. The mean errors (± stdev), 0.4° ± 0.8° for abduction and 0.6° ± 0.8°for version are considered to be very good. This is an improvement over previously reported errors for manual matching. Although bilateral scans are used as the best practical ground truth the error between the CT and 2D/3D registration is of the same order of magnitude as the variability in direct CT measurements. Therefore, in the clinical experiment, beyond stating the difference between the two measurement techniques, it is impossible to attribute the error to either of the modalities/measurement techniques. It takes between 1 and 2 minutes to complete both automatic registrations. It can therefore be concluded that the presented method satisfies both accuracy and performance requirements to be used as a routine clinical tool. These measurements require CT data or some other form of 3D imaging. However, the algorithm requires just one pelvic CT scan that could be obtained preoperatively or postoperatively, at any moment in time, and then used for multiple postoperative X-ray measurements throughout the patient’s life. Nishihara and coworkers [5] recently reported a similar methodology. They used, however, a different approach to the CT/X-ray matching process, based on calculating and matching different ratios between pelvic diameters measured on AP X-rays and the 3D pelvic CT surface model. The authors found that the supine pelvic flexion variation of the studied patient group (101 patients) was even higher than our findings, ranging from -37° to 30°. The difference is probably due to their larger study population. The sample size in our clinical study was too small to derive any similar conclusions with sufficient significance. We intend to perform a larger clinical study in the future to examine the variability in patients’ pelvis position during radiographic examination and its effect on the perceived cup orientation.
References 1. Blendea S, Eckman K, Jaramaz B, Levison TJ, DiGioia AM III.: Measurements of Acetabular Cup Position and Pelvic Spatial Orientation After THA Using CT / X-ray Matching. Computer Aided Surgery, Volume 10, January 2005, 37-43. 2. DiGioia A, Jaramaz B, Blackwell M, et al.: The Otto Aufranc Award. Image guided navigation system to measure intraoperatively acetabular implant alignment. Clin Orthop 1998, Oct (355); 8-22
2D/3D Registration for Measurement of Implant Alignment After THR
661
3. Jaramaz B, DiGioia A, Blackwell M, Nikou C: Computer assisted measurement of cup placement in total hip replacement. Clin Orthop 3541(998) 70-81. 4. Nikou C, Jaramaz B, DiGioia AM III, et al.: POP: Preoperative planning and simulation software for total hip replacement surgery. MICCAI 1999, (1999), 868-875. 5. Nishihara S, Sugano N, Nishii T, Ohzono K, Yoshikawa H: Measurements of pelvic flexion angle using three-dimensional computed tomography. Clin Orthop 411 (2003),140-151. 6. Pluim JPW, Maintz, JBA, Viergever MA.: Image registration by maximization of combined mutual information and gradient information. IEEE Trans. Med. Imag., August 2000, Vol. 19, 809–814. 7. Russakoff DB, Tomasi C, Rohlfing T, Maurer CR Jr.: Image Similarity Using Mutual Information of Regions. ECCV 2004, LNCS 3023, 2004, 596–607. 8. Shannon C: A mathematical theory of communication. Bell System Technical Journal, Vol. 27, July 1948, pp. 379–423., 623-656. 9. Viola P, Wells WM.: Alignment by maximization of mutual information. in Proc. 5th Intl. Conf. Computer Vision, Cambridge, MA, June 1995, 16–23.
3D/2D Model-to-Image Registration Applied to TIPS Surgery Julien Jomier1 , Elizabeth Bullitt2 , Mark Van Horn2 , Chetna Pathak2 , and Stephen R. Aylward1 1
2
Kitware Inc. 28 Corporate Drive Clifton Park, New York 12065 USA Computer-Assisted Surgery and Imaging Laboratory, Department of Surgery The University of North Carolina at Chapel Hill Chapel Hill, 27599 USA {julien.jomier, stephen.aylward}@kitware.com, {bullitt, mark vanhorn, chetna pathak}@med.unc.edu Abstract. We have developed a novel model-to-image registration technique which aligns a 3-dimensional model of vasculature with two semiorthogonal fluoroscopic projections. Our vascular registration method is used to intra-operatively initialize the alignment of a catheter and a preoperative vascular model in the context of image-guided TIPS (Transjugular, Intrahepatic, Portosystemic Shunt formation) surgery. Registration optimization is driven by the intensity information from the projection pairs at sample points along the centerlines of the model. Our algorithm shows speed, accuracy and consistency given clinical data.
1
Introduction
Endovascular surgery consists of inserting a catheter into a major artery or vein and guiding that catheter through a vascular network to a target region. The main difficulty with endovascular treatments is that they are guided by two 2-dimensional projection fluoroscopic images which makes it difficult for the physician to visualize the needle and the target in 3D. This is especially true for Transjugular, Intrahepatic, Portosystemic Shunt (TIPS) surgery. During TIPS, the portal vein is physically disconnected from the hepatic vein, separated by the liver parenchyma prior to needle insertion, and when a contrast agent is injected it does no flow through the portal venous system. The target, portal vein, is therefore not visible when the needle is pushed through the liver from the hepatic vein. The overall goal of the computer-augmented TIPS project [10] is to provide 3-dimensional visualizations to the surgeon, and thereby improve procedure accuracy and decrease procedure time. Before a TIPS procedure, a 3-dimensional model of the target vasculature is created from a previous CT/MR image using a ridge traversal technique [8]. During the procedure, the needle is tracked in real time [4] and the three-dimensional visualizations are provided by a stereo polarized projection display. The challenge is that, because of breathing, needle pressure and heartbeat, the liver moves as the needle is guided through the R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 662–669, 2006. c Springer-Verlag Berlin Heidelberg 2006
3D/2D Model-to-Image Registration Applied to TIPS Surgery
663
target. Therefore, a real-time registration is performed by tracking an ovoid balloon catheter lodged in the hepatic vein and its movement is used to estimate the movement of the liver’s vessels [11]. In the context of this project, it is critical to initialize the position and orientation of the 3-dimensional vascular model with respect to the balloon, prior to any tracking. 3D/2D registration is one of the grand challenges in medical image analysis. From the literature, the different approaches to this problem can be divided into three main categories: (a) model-to-model registration in which external information about correspondences are added, (b) image-to-image optimization where the 3D image is projected and the registration is solved by optimizing pixel match metrics, (c) model-to-image algorithm, where a 3D model is created from the image volume and the quality of the fit of the model and target image is optimized. Liu et al [9] have shown that by designating similar curves, i.e blood vessels, in both the 3D image and digitally subtracted angiogram (DSA) images, an accurate alignment can be made. This technique, belonging to the model-to-model category, requires user interaction and is sensitive to manual picking. On the other hand, several image-to-image registration techniques have been presented and have shown promising results. Kerrien et al. [5], for instance, perform maximum intensity projection to simulate a 2D projected image from the volume. Others [6] have derived more complex registration metrics. Complete surveys of image-to-image similarity measures have also been published [1] [2]. Modelto-image techniques have also been used in sophisticated systems [3] where 3D fiducials are matched with their intensity representation in 2D projected images. The advantage of this technique is that it can be automated compared to (a) and shows faster convergence compared to (b). Our method belongs to the last category (c) and registers a preoperative model of the vasculature with two semi-orthogonal fluoroscopic images. Next we present our methods then we quantify speed and accuracy given clinical data.
2
Methods
In this section we describe the system in which our algorithm performs. First we detail the steps that must be performed to prepare for registration. Second we describe our registration algorithm. 2.1
Overview of the System
Model-to-image vascular registration requires a geometric representation of the patient’s vascular system. Prior to any TIPS operation, an MR or CT scan of the patient’s liver is acquired. Then, a vessel extraction technique [8] is used to segment the main vascular tree. This segmented vasculature constitutes our model and its geometric representation is described as a set of centerlines (collection of 3D points) with associated radius value at each point. Since this stage is performed before surgery, time can be spent to obtain an accurate model.
664
J. Jomier et al.
Fig. 1. System setup. 3-dimensional model of the vasculature is to be registered with two orthogonal fluoroscopic projections: anteroposterior (AP ) on the right and lateral (LAT ) on the left.
Next, the precise pose of the two image planes is assessed prior to any registration. The operating room for TIPS surgery is equipped with a biplane digital angiographic unit which produces X-ray angiograms video streams as biplane views, the anteroposterior (AP ) and the lateral (LAT ) projections, separated by approximately 90o . The two image planes of the fluoroscopic unit can be rotated as needed to obtain the best images. Therefore, calibration of the system is needed. Such calibration is performed using a phantom made of two orthogonal plexiglass panels supporting regularly spaced iron ball bearings. The phantom is roughly placed where the patient’s liver is going to be imaged. From the 3-dimensional model of the phantom and the two projected images, intrinsic and extrinsic parameters of the fluoroscopic unit are determined using epipolar geometry. After calibration, the pose of the two image planes and the projection matrices on these planes are defined in the coordinate frame of the phantom. The calibration does not currently correct for radial distortions. Results from the calibration have shown that, without correction, the maximum reconstruction error from the calibration is around 2mm and is located on the periphery of the image. Figure 1 shows the 3-dimensional model of the vasculature before alignment and the two projection images. 2.2
Registration
Our model-to-image metric is based on the work of Aylward et al. [7]. We seek the optimal pose of the 3-dimensional vascular model, extracted from the preoperative MR or CT, given the two projection images, AP and LAT .
3D/2D Model-to-Image Registration Applied to TIPS Surgery
665
Given a centerline point X in the 3-dimensional model, we project that point onto the 2-dimensional image using the projection matrix previously computed from the calibration stage. The associated radius R at that point is also projected on the plane by projecting a vector parallel to the image plane at position X with a magnitude R. After projection, a 2-dimensional centerline point x with associated radius r is defined. The metric is defined by the sum of the Gaussian-blurred intensity values in the 2D image at the projected model points. From the DSA images, one can notice that the vasculature has lower intensity values than the background, therefore the optimal alignment is obtained when the metric is minimal. For each point x, the amount of blurring is proportional to the radius r at that point. Also, since larger vessels have greater contrast on the X-ray angiograms, we weight the metric value at each point proportionally to the radius so that larger vessels drive the optimization process more than smaller ones. Equation 1 describes our metric. N 1 M = N ri Iσ=r (xi ) (1) i=1 ri i=1 Next we align the registration frame such that the Z-axis axis is orthogonal to the AP plane and the X-axis axis is orthogonal the LAT plane as shown in figure 2. We assume from the clinical setup that the two fluoroscopic digitizers are orthogonal, however, since they may not be perfectly orthogonal, we optimize the orientation of this registration frame to maximize these alignments. This axis configuration defines a relationship between each parameter of the rigid body transform and the two views as described in table 1. Table 1 details that parameter optimization could be separated by view. However the genetic algorithm optimizer [12] we are using does not support separable parameters and multiple metric values. Therefore we have chosen to tune
AP
β
Y
T LA
X
γ
α
Z
Fig. 2. Reference frame used for registration
666
J. Jomier et al.
Table 1. Relationship between the parameters of the transformation (described in figure 2) and the two projected views α LAT
Parameter View Plane
β AP,LAT
γ AP
x AP
y AP,LAT
z LAT
3.4
3.4
3.2
3.2
3.0
3.0
Metric Value
Metric Value
each parameter independently. Independent parameter optimization can be performed using two different strategies. First option consists of using each view separately and optimizing the parameters from that plane. For instance, from table 1, the AP view optimizes β,γ,x and y and the LAT view optimizes α,β,y and z. However, unreported experiments show that the presence of local minima in the metric precludes the registration from converging. The second optimization strategy consists of optimizing each parameter independently and use the appropriate view (or combination of views) to drive the registration. This optimization is used in our experiments. Figure 3 shows that our metric is smooth and free of local minima near its optimum. These graphs were generated using a vascular model of a human liver from a preoperative MR and the corresponding intraoperative DSA images.
2.8 2.6 2.4 2.2 2.0 1.8
2.8 2.6 2.4 2.2 2.0 1.8
1.6
1.6
20
50 15
30
10 5 0 −5
Y
−10 −15 −20
−20
−15
−10
−5
5
0
10
15
20
X
10 −10
Beta
−30 −50
−50
−40
−30
−20
−10
0
10
20
30
40
50
Alpha
Fig. 3. Plots of the model-to-image metric.left: translation in X and Y. right: rotation in alpha and beta.
Next we report results using clinical data.
3
Results
For the following experiments a Siemens Neurostar biplane digital angiographic unit was used to obtain X-ray angiograms. The fields of view ranged from 8.7◦ to 16.5◦ . Images were captured and stored as 8-bit 884x884 pixel images. Preoperative images of the 3D liver vasculature were acquired by MR on a Siemens MagicVision 1.5T system with collimation 0.86x0.86x3mm. Voxel size was variable, but around 1.0x1.0x3mm. We conducted multiple experiments to evaluate the speed, robustness and accuracy of our algorithm on five sets of images from five patients.
3D/2D Model-to-Image Registration Applied to TIPS Surgery
667
The first experiment verified the method’s ability to align the pre-operative model with the DSA images. In the operating room, calibration was performed before surgery as well as the extraction of the vascular trees from MR volumes. The pose of the 3D vascular model was initialized by aligning the center of mass of the vasculature with the origin of the reference frame (from the calibration phantom). This is a valid assumption since the vascular tree has to be centered on the DSA image to obtain good quality images. Second, the orientation of the model is initialized using prior knowledge about the position of the two image planes; by design, the AP view corresponds to the coronal plane and the LAT view to the sagittal plane of the MR. Given this initial pose, registration was conducted using a uniform subsampling of vascular models to 150 points for the metric. Our method converged for the five datasets in less than 20 seconds on a Pentium 4, 2.2GHz PC using 50 iterations, i.e each parameter was successively optimized 50 times. Figure 4 shows the qualitative validation of the registration. Next we conducted a Monte-Carlo validation experiment. For the five datasets, an initial registration was performed using Liu’s method [9], then 100 MonteCarlo iterations were computed by adding random offsets (±20mm) and rotations (±10o ). Table 2 summarizes the results of our experiment.
Fig. 4. Qualitative results of the 3D/2D registration. Vasculature before (left) and after registration (right) for both anteroposterior (top) and lateral (bottom) views.
668
J. Jomier et al.
Table 2. Final displacement from the Monte-Carlo experiment on five clinical datasets Parameter α β γ x y z
Mean (SD) 0.60(1.91)o 0.12(1.27)o 0.89(1.82)o −0.27(1.19)mm 0.61(0.91)mm 0.18(0.90)mm
Our registration converged 95% of the time. Registration failure occured when the registration was initialized far from the solution. This issue is discussed next.
4
Discussion and Conclusions
Even showing promising results, we are conscious that our method has some limitations. The main limitation is the use of rigid body transformation. We assume that non-rigid deformations are negligible for three reasons. First, TIPS patients possess livers that are hard, dense, fibrotic and generally rigid. This means that although both needle pressure and respiration may displace the liver, neither is likely to deform it. Second, the portions of the hepatic and portal venous systems relevant to TIPS are located within this rigid liver. Third, the liver is bounded on three sides by the bony ribcage and superiorly by the diaphragm; it is additionally tethered by the left and right triangular ligaments and by the falciform ligament, all of which limit or preclude rotation. The second limitation is the starting pose for the registration. From the clinical datasets we have found that 20mm and 10o were the maximal displacements. However, in some cases the registration fails and we are currently implementing a random sample consensus type algorithm which should improve our success rate. In conclusion, we have developed a new model-to-image registration algorithm, which is driven by the image intensities of two fluoroscopic projections images. Our technique does not require the specification of correspondences between blood vessel from the 3-dimensional model and their 2-dimensional projections. Our method shows excellent results on real datasets. We are conscious that the validation part of this study does not address the clinical accuracy of the registration and we are currently extending the validation to include this aspect. Our software was implemented using the Insight Toolkit [13].
References 1. G.P. Penney, J. Weese, et al.: A Comparison of Similarity Measures for Use in 2D3D Medical Image Registration. IEEE Trans. Med. Imaging 17(4): 586-595 (1998) 2. R.A. McLaughlin, J.H. Hipwell, et al.: A Comparison of a Similarity-Based and a Feature-Based 2-D-3-D Registration Method for Neurointerventional Use. IEEE Trans. Med. Imaging 12(8): 1058-1066 (2005)
3D/2D Model-to-Image Registration Applied to TIPS Surgery
669
3. Nicolau S., Pennec X., Soler L., and Ayache N: An Accuracy Certified Augmented Reality System for Therapy Guidance. In Proc. of the 8th European Conference on Computer Vision (ECCV 04), Part III, volume LNCS 3023, pages 79-91, 2004 4. Jolly B., Van Horn M., Aylward S., Bullitt E.: Needle tracking and detection in the TIPS endovascular procedure MICCAI 2003 LNCS 2878:953-954. 5. Kerrien E., Berger M-O, et al.: Fully automatic 3D/2D subtracted angiography registration. MICCAI 1999 LNCS 1679:664–671 6. Albert C. S. Chung, Ho-Ming Chan, et al.: 2D-3D Vascular Registration Between Digital Subtraction Angiographic (DSA) And Magnetic Resonance Angiographic (MRA) Images International Symposium on Biomedical Imaging 2004. 708-711. 7. Aylward S, Jomier J, et al.: Registration of Vascular Images International Journal of Computer Vision, November 2003, pages 15 8. Aylward S, Bullitt E: Initialization, Noise, Singularities, and Scale in Height-Ridge Traversal for Tubular Object Centerline Extraction IEEE Transactions on Medical Imaging, Feb, 2002, Pages 61-75 9. Liu A, Bullitt E, Pizer SM: 3D/2D registration using tubular anatomical structures as a basis MICCAI’98, Lecture Notes in Computer Science 1496: 952-963. 10. Rosenthal M, Weeks S, et al.: Intraoperative tracking of anatomical structures using fluoroscopy and a vascular balloon catheter MICCAI 2001; Lecture Notes in Computer Science 2208: 1253-1254 11. Venkatraman V, Van Horn M, Weeks S, Bullitt E: Liver Motion Due to Needle Pressure, Cardiac and Respiratory Motion During the TIPS Procedure. MICCAI 2004; Lecture Notes in Computer Science 3217:66-72. 12. Styner M, Gerig G: Evaluation of 2D/3D bias correction with 1+1ES optimization, Technical Report, BIWI-TR-179 13. The Insight Toolkit: National Library of Medicine Insight Segmentation and Registration Toolkit. http://www.itk.org
This work is funded by NIH-HLB R01 HL69808.
Ray-Tracing Based Registration for HRCT Images of the Lungs Sata Busayarat1 and Tatjana Zrimec1,2 1
School of Computer Science and Engineering University of New South Wales, Sydney, NSW, 2052, Australia {satab, tatjana}@cse.unsw.edu.au 2 Centre for Health Informatics University of New South Wales, Sydney, NSW, 2052, Australia
Abstract. Image registration is a fundamental problem in medical imaging. It is especially challenging in lung images compared, for example, with the brain. The challenges include large anatomical variations of human lung and a lack of fixed landmarks inside the lung. This paper presents a new method for lung HRCT image registration. It employs a landmark-based global transformation and a novel ray-tracing-based lung surface registration. The proposed surface registration method has two desirable properties: 1) it is fully reversible, and 2) it ensures that the registered lung will be inside the target lung. We evaluated the registration performance by applying it to lung regions mapping. Tested on 46 scans, the registered regions were 89% accurate compared with the ground-truth.
1
Introduction
The amount of imaging information produced by today’s High-resolution CT (HRCT) scanners is beyond the ability of a radiologist to process in normal clinical practice. A computer aided diagnosis (CAD) system is then required to scan the large number of images and draw radiologist’s attention on fewer but diagnostically useful images. Such system must solve the classical problems in medical image analysis - segmentation, feature extraction and registration - in a specific domain. This paper will focus on the registration problem in HRCT images of the lungs. A large body of literature has been published on registration techniques for medical imaging applications. Sonka [1] provided a recent survey of the field. Most of the works are in brain imaging domain [2,3], where the computer aided systems have relatively been well established. There are significantly fewer works reported on automatic image registration for HRCT images of the lung [4,5,6]. Lung HRCT image registration problem is challenging because lung varies, in terms of size, shape and position, from person to person more significantly than, for example, the brain. Detecting landmarks for registration is also challenging because there are relatively few landmarks in lung images and lung disease processes may alter the appearances of those landmarks. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 670–677, 2006. c Springer-Verlag Berlin Heidelberg 2006
Ray-Tracing Based Registration for HRCT Images of the Lungs
671
In this paper, an automatic registration for HRCT images of the lung is presented. It combines a conventional point-based registration with a novel surface registration that uses three-dimensional ray tracing to find the correspondence between points on images from two lungs. The method is developed to register a sparse scan (has gaps between slices) to a lung model constructed from a volumetric scan. The method is then applied to determining lung regions, clinically meaningful areas inside the lung.
2
3D Lung Model
The three-dimensional lung model used in this work is created from a volumetric HRCT scan of relatively normal lungs. It contains information about anatomical landmarks required for the registration. Lung surfaces are automatically detected using adaptive thresholding and active contour snakes [7]. We modified the method so that it takes the third-dimension continuity into account. Single-point landmarks, namely carina, sternum, spinal cord and lung roots, are manually identified to ensure the correctness. The lung surface model and the landmarks are shown in Fig. 1. To determine the lung regions, we used a technique proposed by [8]. Additional landmarks are required. Trachea is detected using knowledge-directed thresholding. Hilum, a wedge-shaped depression of the mediastinal surface of each lung, is detected using the curvature analysis of the mediastinal lung surface. The detected landmarks are then used to characterize eight different lung regions as shown in Fig. 2.
Fig. 1. A 3D lung model with landmarks
3
Global Transformation Using Fixed Landmarks
Apart from the significant anatomical variations in the human lung, different scanning protocols and different patient positions and orientations also contribute to the difficulties of comparing lungs from different HRCT scans. Global transformation or rigid image registration is used to align lungs for two different scans and help reduce the misplacing caused by the above factors. We use fixed
672
S. Busayarat and T. Zrimec
Fig. 2. Anatomical regions models. We use three models to represent eight different regions, including Perpheral-Intermediate-Central model (left), Apical-Middle-Basal model (middle) and Anterior-Posterior model (right).
reference points and three dimensional transformations for this registration. The reference points are stable anatomical landmarks, which are automatically detected. The transformations are performed in the following order: rotate, translate and scale. In our problem domain, registering a sparse scan to a 3D model, we choose to apply the transformation only to the sparse scan and keep the 3D model unchanged to save the computational time. The same operation will take approximately 20 times longer on volumetric data than on sparse data. However, after the registration, any results may have to be transformed back to the original data. Therefore, all the transformations we use are reversible. 3.1
Rotation
Rotation is used to eliminate the difference in patient orientation between two HRCT scans. We only rotate a scan in two dimensions, in the transverse plane, as we assume that there is no orientation difference in the longitudinal axis (patients always lay on horizontally flat beds). We use the image that contains the carina, the first bronchial bifurcation as the representative image to determine the suitable rotation. On the image, sternum and spinal cord are detected using knowledge-driven template matching [8]. Using the sternum and the spinal cord as landmarks (See Fig.1) the rotating angle is calculated so that, after the rotation, the lines between sternum and spinal cord’s center points of two registered scans have the same orientations. 3.2
Translation
Translation is used to eliminate the difference in patient’s lung position and scanning position. Lung root is used as the landmark for translation. Generally, the lung root is referred to as the hilum. Since we require a single-point landmark, not a region, we then defined the lung root as the point where the main bronchus enters the lung. Each lung of a sparse scan is translated so that its lung root is at the same position as the lung root of the 3D model. 3.3
Scaling
There are two main reasons why two lungs from two different scans have different sizes. One is obviously the anatomical variations. Factors like patient size,
Ray-Tracing Based Registration for HRCT Images of the Lungs
673
age, sex and race have contributions to the difference in lung size. The other reason is field-of-view setting of the scanner, which reflects the actual size (in millimeters) of a pixel. The field-of-view information is usually available in the image header. In this work, however, we only use imaging data as the input. The scaling is, therefore, done by using landmark-based transformation similar to the rotation and translation. The lung itself is used as the landmark. The dimensional difference of 3D bounding boxes of lungs in the sparse data and in the model is used to determine the required scaling. The results of the three transformations of a lung are shown in Fig. 3(a-d).
(a)
(b)
(c)
(d)
(e)
Fig. 3. Step-by-step results of lung registration in axial (upper row) and lateral (lower row) views. From left to right, (a) original lung from a sparse scan and the lung model, (b) result after rotation, (c) result after translation, (d) result after scaling and (e) final result after surface registration.
4
Lung Surface Registration Using 3D Ray Tracing
Ray tracing is a technique usually used in the field of computer graphics for real-time and offline rendering. In computer graphics, ray tracing works by projecting a ray of vision through each pixel on the screen from the observer [9]. Any objects in the scene that the ray hits, either directly or indirectly, will be used to determine the color of the on-screen pixel. We apply the same idea to 3D lung image registration, more specifically, registering a sparse scan to a 3D model. Instead of working out the colors of on-screen pixels, we use ray tracing to find correspondence between points on the surfaces of two lungs. In our case, the observer is the registration origin. The on-screen pixel corresponds to the point we want to register. The objects in the scene correspond to the surface of the lung. Fig. 4(a) illustrates how the technique works. S is a lung surface in a sparse scan of and V is another lung surface in a volumetric scan used to build the lung model. P is point in S that we want to register to V . Firstly, we use the global transformation discussed in section 3 with S to reduce the difference between S
674
S. Busayarat and T. Zrimec
and V . P also has to be transformed since it is a point in S. The transformed S and P are called S’ and P respectively. We also have one registered point as the result of global transformation, which is the lung root. The lung root is used as the origin, O, for the ray tracing. We then draw a 3D ray from the origin through P ’. We trace the ray until it hits the lung surfaces of S and V . The surface hitting points are called P s and Pv , respectively. The registered point, R, can then calculated using the distances from the origin to the surface hitting points as shown in Eq. 1. R=O+
|OP | × |OPv | |OPs |
(1)
The equation suggests that the surface registration is reversible (we can work out P given R). Combined with the reversible global transformation, a deformable model can be achieved at lower cost, as follows. We deform the sparse scan to fit the model and apply the reverse transformation to the model but only the part where the matching to the sparse scan exists. This generally saves about 90\% of the computation required to deform a model because, a volumetric scan is approximately 20 times larger than a sparse scan. Fig. 4(c) shows a 2D cross section of the model deformed to fit a lung in Fig. 4(b). The ray-tracing-based registration has another desirable property. Unlike other lung surface registration methods [4,6,5], it ensures that all the registered points are inside the lung model. A prove is given in Eq. 2. As a result, the deformed model is guaranteed to be a superset of the target sparse scan, which is suitable for internal lung structure mapping. Without this property, an inlung voxel of the sparse scan may be registered to a voxel in the volumetric scan that is outside the lung. Such voxel mapping is incorrect and further processing is required. |OPv | ) |OPs | |OPv | = O + max(|OP |) × |OPs | |OPv | = O + |OPs | × (since P is in S ) |OPs | = O + |OPv | = Pv
max(R) = O + max(|OP | ×
(2)
For the sparse scan, a completed lung surface needs to be approximated because a ray may hit the surface in between the slice gaps. The surface approximating algorithm works as follows. A number of triangle strips are generated by connecting lung boundary points between two consecutive slices. Scanline triangle filling algorithm [10] is then used to fill the triangle strips. The scanline algorithm is commonly used in computer graphics and is thus implemented in most of the VGA cards, which our surface approximation can benefit from. The method is optimized for registering a large number of points from the same scan to the same model because, in most applications, every pixels inside
Ray-Tracing Based Registration for HRCT Images of the Lungs
675
the lung needs to be registered. Considering the fact that several points are registered using the same 3D ray and finding the surface hitting points (Ps , Pv ) is expensive, the surface hitting points are cached. An experiment has shown that the caching reduces the computation by about three times.
5
Lung Region Mapping Using Lung Registration
Lung registration has many applications. Berke et. al. [5] developed a lung surface registration technique and applied it to nodule registration problem. Zhang [4] used a deformable lung atlas to guide pulmonary fissure detection algorithm. In this work, we apply our registration technique to lung regions. We have built a model of eight lung regions. The model is deformed to fit the input scan using the reverse transformation. The region information from the model can then be directly transferred to the input scan. Fig. 4(d) shows an example of the registered peripheral, intermediate and central regions.
Fig. 4. Ray-tracing based lung registration method and results. From left to right, (a) illustrated ray-tracing technique, (b) input lung image, (c) deformed model that fits the input, (d) registered peripheral, intermediate and central regions, and (e) ground-truth of the regions.
6
Evaluation
HRCT scans from 46 patient studies were used for evaluation. The images were acquired as a part of normal clinical practice using standard imaging protocols. The images were taken using a SIEMENS scanner, with 512x512 spatial resolution, 1.0mm slice thickness and 15mm slice distance (˜20 images per study). One volumetric HRCT scan, with 512x512 spatial resolution, 1.25mm slice thickness and 0.8 slice distance (396 images), was used to construct the lung model. We evaluated the performance of the proposed registration method by calculating the average surface-to-surface distance between a patient lung and the lung model, before and after the registration. The result in Table 1 indicates a slight distance reduction (20%) after the global transformation and a significant reduction (78%) after the surface registration. The differences between the left lungs are slightly greater than those of the right lungs. This may due to the heart artifact.
676
S. Busayarat and T. Zrimec
Since another main goal of this work is to apply the registration method to map lung regions, we also evaluated the correctness of the registered regions. This reflects how well the method works with structures inside the lung, not only the lung surface. We used the result of an automatic region characterization algorithm proposed in [8] with manual corrections as the ground-truth to calculate the accuracy. An example of the ground-truth is shown in Fig. 4(e). The result in Table 2 shows that the region information in the lung model was successfully registered to the patient lungs, with 89% accuracy. Table 1. Mean and startdard deviation of lung surface distance between the test lungs and the lung model, before and after the registration Mean Distance Std. Dev. (pixels) Before registration After global transformation After surface registration Left lung 26.2±5.7 20.1±3.8 6.1±0.9 Right lung 24.0±6.1 19.3±4.4 4.7±0.7 Overall 25.1±5.9 19.7±4.1 5.4±0.8
Table 2. Accuracy of the lung model’s regions registered to the test lungs. The accuracy was calculated by comparing the registered regions with the ground truth. Regions Accuracy (%) Peripheral-Intermediate-Central 84 Apical-Middle-Basal 87 Anterior-Posterior 95 Overall 89
7
Conclusions
We have presented a new image registration method for HRCT images of the lung. We have adapted a ray tracing technique used in 3D graphics to lung image registration. It has two desirable properties: 1) the registration is reversible and 2) the registered lung is completely wrapped inside the lung model. They, together, make it possible create a deformable lung model using significantly less computation. The result showed that the proposed registration method reduces the lung surface difference between patient scans and the lung model by 78%. We applied the method to the problem of registering lung regions, where the predefined region information was transferred from the model to a target lung. The registered regions were compared against the ground-truth 89% accuracy was achieved. This indicates that, even though we only use lung surface as the landmark for the registration, the internal lung volume was also registered accurately. However, a better result maybe obtained by using internal-lung landmarks, such as bronchial tree branching points, if one can solve the highly-altered-by-diseases problem of such landmarks.
Ray-Tracing Based Registration for HRCT Images of the Lungs
677
Acknowledgement We would like to thank Dr. Peter Wilson for providing medical knowledge and guidance. This research was partially supported by the Australia Research Council through a Linkage grant (2002-2004), with I-Med Network Ltd. and Philips Medical Systems as partners.
References 1. Sonka, M., Fitzpatrick, J.: Handbook of Medical Imaging. Volume 2. SPIE Press (2000) 2. Ferrant, M., Nabavi, A., Macq, B., Jolesz, F., Kikinis, R., Warfield, S.: Registration of 3-D interoperative MR images of the brain using a finite-element biomechanical model. In: IEEE Trans. Med. Imag. Volume 20. (2001) 129–140 3. Kapouleas, I., Alavi, A., Alves, W., Gur, R., Weiss, D.: Registration of threedimensional MR and PET images of the human brain without markers. Radiology 181 (1991) 731–739 4. Zhang, L., Reinhardt, J.: 3D pulmonary CT image registration with a standard lung atlas. SPIE 3978 (2000) 67–77 5. Betke, M., Hong, H., Thomas, D., Prince, C., Ko, J.: Landmark detection in the chest and registration of lung surfaces with an application to nodule registration. Medical Image Analysis 7 (2003) 265–281 6. Betke, M., Hong, H., Ko, J.: Automatic 3D Registration of lung surfaces in computed tomography scans. In: MICCAI. (2001) 725–733 7. Papasoulis, J.: LMIK Anatomy and lung measurements using active contour snakes. Master’s thesis, UNSW, Sydney, Australia (2003) 8. Zrimec, T., Busayarat, S.: A 3d model of the human lung with lung regions characterization. In: ICIP. (2004) 1149–1152 9. Glassner, A.: An Introduction to Ray Tracing. Academic Press Limited (1989) 10. Foley, J.: Introduction to Computer Graphics. Addison-Wesley (1994)
Physics-Based Elastic Image Registration Using Splines and Including Landmark Localization Uncertainties Stefan W¨orz and Karl Rohr University of Heidelberg, IPMB, and DKFZ Heidelberg, Dept. Bioinformatics and Functional Genomics, Biomedical Computer Vision Group Im Neuenheimer Feld 364, 69120 Heidelberg, Germany
[email protected]
Abstract. We introduce an elastic registration approach which is based on a physical deformation model and uses Gaussian elastic body splines (GEBS). We formulate an extended energy functional related to the Navier equation under Gaussian forces which also includes landmark localization uncertainties. These uncertainties are characterized by weight matrices representing anisotropic errors. Since the approach is based on a physical deformation model, cross-effects in elastic deformations can be taken into account. Moreover, we have a free parameter to control the locality of the transformation for improved registration of local geometric image differences. We demonstrate the applicability of our scheme based on 3D CT images from the Truth Cube experiment, 2D MR images of the brain, as well as 2D gel electrophoresis images. It turns out that the new scheme achieves more accurate results compared to previous approaches.
1
Introduction
The registration of 2D and 3D biomedical images is an important task, however, it is difficult and challenging. One reason is that in many applications it is still not quite clear which type of image information is optimal for matching. Another reason is that the spectrum of possible geometric differences is relatively large. Generally, nonrigid schemes are required for image registration (for a recent survey see [1]). A special class of nonrigid transformations are elastic transformations, which allow to cope with local shape differences, and which are typically based on an energy functional or the related partial differential equation (PDE). To compute an elastic transformation, often spline-based approaches are used, which can be subdivided into schemes based on a uniform grid of control points, where typically B-splines are used, and schemes based on a nonuniform grid of control points (e.g., [2,3,4,5,6,7,8,9,10]). The latter type of spline-based schemes generally requires a smaller number of control points (landmarks). Examples of such schemes are based on thin-plate splines (TPS, e.g., [4,6,8]), elastic body splines (EBS, e.g., [2]), and Gaussian EBS (GEBS, e.g., [5,9,10]). TPS are based R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 678–685, 2006. c Springer-Verlag Berlin Heidelberg 2006
Physics-Based Elastic Image Registration
679
Fig. 1. Deformations obtained for the displacement of a single landmark using TPS (left) as well as GEBS: ν = 0 (middle) and ν = 0.49 (right)
on the bending energy of a thin plate, which represents a relatively coarse deformation model. In comparison, EBS and GEBS are derived from the Navier equation, which describes the deformation of homogeneous elastic tissues (bodies). In spline-based registration approaches, generally an interpolation scheme is used that forces corresponding landmarks to exactly match each other. The underlying assumption is that the landmark positions are known exactly. In real applications, however, landmark localization is always prone to error. Therefore, to take into account landmark errors, approximation schemes have been proposed, for example, for TPS (e.g., [4]). However, approximation schemes for EBS and GEBS approaches have not yet been introduced. We have developed a physics-based approach for elastic image registration based on GEBS. To incorporate localization uncertainties of landmarks we introduce an extended energy functional related to the Navier equation. By inclusion of weight matrices we can cope with anisotropic uncertainties, i.e. uncertainties that are different in different directions. Our scheme results from an analytic solution of the corresponding extended Navier equation using Gaussian forces. Note that the derivation of our scheme significantly differs from the derivation of approximating TPS. The reason is that the underlying PDEs of GEBS and TPS as well as the assumed image forces (i.e. Gaussian forces and Dirac forces, respectively) are different. Central to our scheme is that it includes a material parameter (Poisson ratio ν), which defines the ratio between transverse contraction and longitudinal dilation of an elastic material. Therefore, cross-effects can be taken into account which is not the case for TPS. Fig. 1 illustrates this effect for the case of a single landmark which is displaced horizontally (particularly note the vertical deformations in Fig. 1, right). Moreover, we have a free parameter to control the locality of the transformation.
2 2.1
Gaussian Elastic Body Splines (GEBS) Interpolating GEBS
In this section, we briefly describe the interpolating GEBS approach [5]. This approach is based on the Navier equation of linear elasticity +μ μ Δu + (λ ) ∇ (div u) + f = 0
(1)
680
S. W¨ orz and K. Rohr
with the displacement vector field u, body forces f , as well as Lam´e constants > 0 describing material properties. Given Gaussian forces f (x) = c f (r) = μ , λ √ T r2 c ( 2πσf )−3 exp(− 2σ x2 + y 2 + z 2 , and the stan2 ) with x = (x, y, z) , r = f
dard deviation σf , an analytic solution of (1) can be derived [5]. The resulting matrix-valued basis function G (a 3 × 3 matrix) reads (up to a constant factor) 2 2 αr2 + σf2 r2 − 3σf2 e−ˆr e−ˆr G(x) = erf(ˆ r) − β 2 I+ erf(ˆ r) + 3β 4 xxT (2) r3 r r5 r √ where rˆ = r/( 2 σf ), α = 3 − 4ν, β = σf 2/π, and the error function erf(x) = x 2 2 π −1/2 0 e−ξ dξ. I denotes the 3 × 3 identity matrix and ν is the Poisson ratio λ + 2 ν = λ/(2 μ), 0 ≤ ν < 0.5. In contrast, in the original EBS approach [2] polynomial or rational forces are used. Using the interpolation condition qi = u(pi ), the scheme for spline-based elastic image registration is given by u(x) = n x + i=1 G(x − pi ) ci , where pi and qi (i = 1, . . . , n) denote the positions of the n landmarks of the source and target image, respectively. The coefficients ci represent the strength and direction of the Gaussian forces. 2.2
Approximating GEBS
With the interpolation approach described above the landmarks are matched exactly. This implicitly assumes that the landmark positions are known exactly. To take into account landmark localization errors we use the approximation condition qi ≈ u(pi ) and include 3 × 3 covariance matrices Σ i defining the anisotropic localization uncertainties of the landmarks i = 1, . . . , n. To derive approximating GEBS, we introduce an energy-minimizing functional consisting of two terms. The first term JElastic represents the elastic energy according to the Navier equation (1) without the force f and acts as a smoothness term. The corresponding Lagrange function is given by
2 2 2 LElastic = μ u2x + vy2 + wz2 + μ (uy + vx ) + (uz + wx ) + (vz + wy ) /2 u2 + v 2 + w2 + 2ux vy + 2uxwz + 2vy wz /2, +λ (3) x y z T
where u = (u, v, w) = u(x) and the subscripts x, y, and z denote first order partial derivatives. The second term JF orce represents Gaussian forces specified by corresponding landmarks pi and qi , and, in addition, incorporates the localization uncertainties Σ i of the landmarks. Here, we propose a quadratic approximation JF orce, which is given by the following Lagrange function 1 T f (x − pi ) (qi − u(x)) Σ −1 i (qi − u(x)) λn i=1 n
LF orce =
(4)
where λ > 0 denotes the regularization parameter. Note that LF orce includes Gaussian forces f (x − pi ) in contrast to Dirac forces in the case of TPS. The combined functional Jλ then reads
Physics-Based Elastic Image Registration
Jλ = JElastic + JF orce =
∞
−∞
∞
−∞
681
∞ −∞
(LElastic (x, u) + LF orce (x, u)) dx. (5)
The corresponding PDE can be derived as +μ μ Δu + (λ ) ∇ (div u) + ∇u LF orce = 0,
(6)
and represents an extension of the Navier equation (1). The solution to (6) can be determined in analytic form and it turns out that it is the same as in the case of interpolation, i.e. it consists of the same matrix-valued basis function G. However, the resulting linear system of equations to compute the coefficients ci differs. More precisely, the structure of the linear system of equations is the same for interpolating and approximating GEBS, that approximating GEBS except n include additional sums of weighted forces i=1 f (pj − pi ) Σ −1 i . 2.3
Means to Improve Stability and Efficiency
To compute the matrix-valued basis function G in (2), we have to consider two different aspects. First, for small values of r < 0.001 the computed values for G are numerically not stable, which is caused by fractions that involve very small numerators and denominators. However, this problem can easily be solved by employing the limit of G for r → 0, which is given by 1 1 2 10 lim G(x) = − 4ν I. (7) r→0 16πσ μ1−ν π 3 Second, G involves the computation of the Gaussian error function, which is a costly operation in comparison to, e.g., a multiplication. To reduce the computation time, we exploit the fact that for r σ the basis function G reduces to 1 1 3 − 4ν xxT G(x) ≈ I+ 3 . (8) 16π μ1−ν r r In our implementation, we use this approximation for r > 18σ where the error is below 1%, while the computation time is reduced by a factor of about 2.
3 3.1
Experimental Results 3D CT Truth Cube Images
To validate our scheme we have used 3D CT images from the Truth Cube experiment [11]. The Truth Cube is a silicone rubber cube, which comprises 343 regularly distributed small Teflon spheres (see Fig. 2, left). Using a compression plate, a force is applied to the top face yielding a compressed cube. Images have been acquired for the uncompressed cube as well as for three different compressions (nominal strains of 5%, 12.5%, and 18.25%). In each of the four 3D CT images, the positions of all 343 spheres have been determined. We have registered the 3D image of the uncompressed cube to each of the three compressed cube
682
S. W¨ orz and K. Rohr
Fig. 2. Truth Cube with 343 small spheres (left, from [11]) and sketch of Truth Cube with nine landmarks at the bottom face and nine landmarks at the top face (right) Table 1. Mean geometric error egeom and mean intensity error eint for the registration of 3D Truth Cube images as well as the improvement w.r.t. the unregistered case
Unregistered Interpol. TPS
absolute absolute improvement Approx. TPS absolute improvement Interpol. GEBS absolute improvement Approx. GEBS absolute improvement
5% strain egeom eint 2.6 mm 16.9 gr 0.6 mm 9.9 gr 75.5 % 41.7 % 0.6 mm 9.9 gr 75.5 % 41.7 % 0.8 mm 7.5 gr 68.2 % 55.6 % 0.5 mm 6.8 gr 81.2 % 60.0 %
12.5% strain egeom eint 6.1 mm 30.7 gr 1.6 mm 22.0 gr 74.5 % 28.3 % 1.6 mm 22.0 gr 74.5 % 28.3 % 1.7 mm 12.3 gr 72.0 % 59.9 % 1.1 mm 13.8 gr 82.7 % 55.2 %
18.25% egeom 9.2 mm 2.5 mm 72.9 % 2.5 mm 72.9 % 1.9 mm 78.8 % 1.3 mm 85.9 %
strain eint 41.2 gr 41.6 gr -1.0 % 41.6 gr -1.0 % 33.2 gr 19.6 % 28.3 gr 31.3 %
images. We placed nine landmarks at the (fixed) bottom face of the cube as well as nine landmarks at the top face (see Fig. 2, right). To quantitatively determine the registration accuracy, we computed the mean geometric error egeom between the 343 spheres of the (registered) source image (uncompressed cube) and the corresponding 343 spheres of the target image (compressed cube). In addition, we computed the mean intensity error eint . For the anisotropic errors we chose a value of σy = 1 vox in the direction of the landmark displacements as well as σxz = 8 vox (top) and σxz = 3 vox (bottom face) for the orthogonal directions. Table 1 summarizes the registration results. Applying the new approach, the mean geometric error egeom improved by 81.2%, 82.7%, and 85.9% w.r.t. the unregistered images for strains of 5%, 12.5%, and 18.25%, respectively. In comparison, for interpolating GEBS the results are worse with improvements of 68.2%, 72.0%, and 78.8%, respectively. For interpolating TPS the results are also worse, in particular, for a strain of 18.25%. Note that for approximating TPS the results did not improve in comparison to interpolating TPS, probably because TPS do not cope with cross-effects in deformations (see also below). The results based on the mean intensity error eint are comparable with the results for egeom . As an example, Fig. 3 shows 2D sections of the registered source image overlayed with the computed edges of the target image (18.25% strain). It can clearly be seen that the compression of the cube leads to a bending at the sides
Physics-Based Elastic Image Registration
683
Fig. 3. Truth Cube experiment: Source image overlayed with computed edges of the target image (18.25% strain) without registration as well as with registration using interpolating TPS, interpolating GEBS, and approximating GEBS (from left to right)
(see the computed edges). Comparing the registration results of TPS and GEBS, it can be seen that the registered image using TPS (middle left) does not reveal a bending whereas using GEBS (middle right and right) the bending is modeled much better. The reason is that with GEBS cross-effects are taken into account, i.e. a contraction in one direction leads to a dilation in orthogonal directions. 3.2
2D MR Images
We have also applied our approach to register pre- and postsurgical MR images of the human brain. Fig. 4 (left) shows 2D MR images of a patient before (source image, top) and after (target image, bottom) the resection of a tumor. 27 landmarks have been used along the contours of the tumor and the head (indicated by crosses). To simulate landmark localization in clinical routine, which is generally prone to error, we have misplaced five landmarks at the outer skull at the left side. For these misplaced landmarks, we defined the localization uncertainties in accordance with their displacements, whereas for the other landmarks we chose small isotropic uncertainties. Prior to elastic registration the images have initially been aligned by an affine transformation. Note that after affine transformation the ventricular system is well aligned, thus the resection of the tumor has only a minor effect on the position and size of the ventricular system. Fig. 4 shows the registered source image overlayed with the computed edges of the target image for interpolating GEBS (bottom middle) and approximating GEBS (bottom right). It can be seen that in both cases the vicinity of the tumor and resection area are well registered. However, it turns out that the misplaced landmarks significantly affect the registration result in the case of interpolating GEBS, see the unrealistic oscillations at the outer skull (Fig. 4, bottom middle, indicated by the arrows). In contrast, using approximating GEBS the influence of the misplaced landmarks is relative low, while the tumor and resection area are still well registered. For a comparison, we have also applied interpolating and approximating TPS (Fig. 4 top middle and right, respectively). In both cases the vicinity of the tumor and the resection area are well registered. However, there are significant errors at the ventricular system. In contrast, using GEBS only deforms the image in the vicinity of the landmarks and does not significantly affect other parts of the image.
684
S. W¨ orz and K. Rohr
Fig. 4. Registration of 2D MR brain images: Pre- (top left) and postsurgical image (bottom left) as well as registered source images overlayed with computed edges of the target image using interpolating TPS (top middle), approximating TPS (top right), interpolating GEBS (bottom middle), and approximating GEBS (bottom right)
3.3
2D Gel Electrophoresis Images
We have also applied our scheme to 2D gel electrophoresis images. As an example, Fig. 5 shows a section of a source image overlayed with computed edges of the target image (top left) and the target image overlayed with marked landmarks and computed error ellipses (bottom left). Using interpolating GEBS (bottom middle), it turns out that most corresponding spots except for the largest spot match quite well. The mean intensity error eint improves by 17.1% w.r.t. the unregistered case. In contrast, using approximating GEBS (bottom right) yields a significantly more accurate result since also the largest spot is well matched (eint improves by 40.1%). Interpolating and approximating TPS (top middle and top right) yield worse results with improvements of 0.8% and 35.1%, respectively.
4
Conclusion
We have presented a new elastic registration approach, which is based on a physical deformation model, uses Gaussian elastic body splines (GEBS), and incorporates localization uncertainties of the landmarks. Since the approach is based on a physical deformation model, cross-effects in elastic deformations can be taken into account. We have demonstrated the applicability of our scheme based on different types of images. It turned out that the new approach achieves more accurate registration results in comparison to previously proposed interpolating GEBS as well as TPS.
Physics-Based Elastic Image Registration
685
Fig. 5. Registration of 2D electrophoresis images: Source image overlayed with computed edges of the target image without registration (top left) as well as with registration using interpolating TPS (top middle), approximating TPS (top right), interpolating GEBS (bottom middle), and approximating GEBS (bottom right). Also, the target image overlayed with marked landmarks and error ellipses is shown (bottom left).
Acknowledgment This work has been funded by the Deutsche Forschungsgemeinschaft (DFG) within the project ELASTIR (RO 2471/2-1). The original MR images and the tumor outlines have been provided by OA Dr. med. U. Spetzger and Prof. Dr. J.M. Gilsbach, Neurosurgical Clinic, University Hospital Aachen of the RWTH.
References 1. Zitova, B., Flusser, J.: Image Registration Methods: a Survey. Image and Vision Computing 24 (2003) 977–1000 2. Davis, M., Khotanzad, A., Flaming, D., Harms, S.: Physics-based coordinate transformation for 3D image matching. IEEE Trans. Med. Imag. 16 (1997) 317–328 3. Joshi, S., Miller, M.: Landmark Matching Via Large Deformation Diffeomorphisms. IEEE Trans. on Image Processing 9 (2000) 1357–1370 4. Rohr, K., Stiehl, H., Sprengel, R., Buzug, T., Weese, J., Kuhn, M.: LandmarkBased Elastic Registration Using Approximating Thin-Plate Splines. IEEE Trans. on Medical Imaging 20 (2001) 526–534 5. Kohlrausch, J., Rohr, K., Stiehl, H.: A New Class of Elastic Body Splines for Nonrigid Registration of Medical Images. In: Proc. BVM, Springer (2001) 164–168 6. Johnson, H., Christensen, G.: Consistent Landmark and Intensity-based Image Registration. IEEE Trans. on Medical Imaging 21 (2002) 450–461 7. Rohde, G., Aldroubi, A., Dawant, B.: The adaptive bases algorithm for intensity based nonrigid image registration. IEEE Trans. Med. Imag. 22 (2003) 1470–79 8. Park, H., Bland, P., Brock, K., Meyer, C.: Adaptive registration using local information measures. Medical Image Analysis 8 (2004) 465–473 9. Kohlrausch, J., Rohr, K., Stiehl, H.: A New Class of Elastic Body Splines for Non-rigid Registration of Medical Images. J. Math. Imag. Vis. 23 (2005) 253–280 10. Pekar, V., Gladilin, E., Rohr, K.: An adaptive irregular grid approach for 3-D deformable image registration. Physics in Med. and Biol. 51 (2006) 361–377 11. Kerdok, A., Cotin, S., Ottensmeyer, M., Galea, A., Howe, R., Dawson, S.: Truth cube: Establishing physical standards for soft tissue simulation. Medical Image Analysis 7 (2003) 283–291
Piecewise-Quadrilateral Registration by Optical Flow – Applications in Contrast-Enhanced MR Imaging of the Breast Michael S. Froh1 , David C. Barber2 , Kristy K. Brock3 , Donald B. Plewes1 , and Anne L. Martel1 1
2
Department of Medical Biophysics, University of Toronto, Canada
[email protected] Department of Medical Physics, Central Sheffield Teaching Hospitals, Sheffield, UK 3 Physics Department, Princess Margaret Hospital, Canada
Abstract. In this paper we propose a method for the nonrigid registration of contrast-enhanced dynamic sequences of magnetic resonance(MR) images. The algorithm has been developed with accuracy in mind, but also has a clinically viable execution time (i.e. a few minutes) as a goal. The algorithm is driven by multiresolution optical flow with the brightness consistency assumption relaxed, subject to a regularized best-fit within a family of transforms. The particular family of transforms we have employed uses a grid of control points and trilinear interpolation. We present validation results from a study simulating non-rigid deformation by a biomechanical model of the breast, with simulated uptake of a contrast agent. We further present results from applying the algorithm as part of a routine breast cancer screening protocol.
1
Introduction
Dynamic contrast-enhanced MR imaging of the breast has been shown to be a highly sensitive screening technique for the early detection of breast carcinoma in patients at risk for hereditary breast cancer [1]. In this context, a baseline volumetric image of the patient is acquired, followed by an injection of a gadolinium diethylenetriamine penta-acetic acid (Gd-DTPA) contrast agent. A series of subsequent volumetric images are acquired and compared to the original baseline volume. In a tumour, Gd-DTPA accumulates in the extravascular, extracellular matrix around a lesion as a result of tumour angiogenesis, and increases the MRI signal due to T1 changes. The enhancing tumour is assessed on the dynamics of contrast medium uptake and morphological features of lesions, which aid in differentiating malignant for benign lesions [2,3]. Subtraction of pre-contrast scans from post-contrast scans significantly increases lesion conspicuity; however, interscan motion will create false positive regions of enhancement and suppress true lesions. While every reasonable effort is made to restrict gross patient motion, movement on the order of pixel spacing (∼ 1mm) is common. Thus, an extensive field of research has arisen to find methods to compute and compensate for patient motion in post-processing. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 686–693, 2006. c Springer-Verlag Berlin Heidelberg 2006
Piecewise-Quadrilateral Registration by Optical Flow
687
One of the earliest methods proposed for the registration of contrast-enhanced MR images of the breast was that of Zuo et al. [4] in 1996, in which they attempted to minimize the variance ratio between a pair of images to compute a rigid transformation. Breast motion, however, due to the soft tissues involved, is generally considered to be non-rigid. Kumar et al., also in 1996, devised an optical flow-based method to compute an affine transformation, followed by a single iteration of local flow [5]. Standard optical flow techniques rely on the assumption that the intensity of an object remains constant over time, while its position changes, though this assumption fails in the context of contrast-enhanced imaging. In 1997, Davis et al. proposed an “elastic body spline” technique [6], which produced a deformation based on matched control points selected a priori. Rueckert et al. developed a non-rigid registration method using a grid of control points, with displacement of non-grid points determined by cubic B-spline interpolation [7]. This algorithm has been validated against a biomechanical model employing finite-element methods [8]. Unfortunately, this method may require several hours to complete. The algorithm of Rueckert et al. was modified by Rohlfing et al. by the addition of an incompressibility constraint [9], to better handle cases in which the enhancing region may shrink (in order to yield further improvements in mutual information). More recently, Crum et al have proposed a multi-resolution fluid registration algorithm applied to breast MR images, with promising results in terms of execution time and accuracy [10]. Herein, we propose a non-rigid registration algorithm which employs a relaxed formulation from [11] of the classical optical flow equation [12]. This formulation introduces a linear “brightness shift” term to account for legitimate changes in signal intensity. Furthermore, to reduce the number of degrees of freedom, and to constrain calculated deformations, the optical flow equations guide the displacement of a grid of control points, with non-grid displacements computed by trilinear interpolation.
2
Derivation
We first explain the optical flow equations we use, based on the relaxed formulation of Gennert and Negahdaripour [11]. Using a relaxed form is necessary, as the brightness constancy assumption of Horn and Schunck [12] will not hold in the context of contrast-enhanced imaging. In Section 2.2, we reduce the number of degrees of freedom by applying a piecewise transformation governed by a grid of control points. While we apply a linear interpolation between control points, the theory is equally applicable to higher-order transforms (e.g. using cubic B-splines, perhaps). 2.1
Optical Flow with Linear Brightness Shift
Recall the optical flow equation of Horn and Schunck [12]: f (x) − m(x) = Δx(x)T ∇m(x),
(1)
688
M.S. Froh et al.
where f (x) represents the intensity value at voxel x in the fixed image, m(x) represents the intensity value of voxel x in the moved image, and Δx(x) is the displacement of voxel x between the fixed and moved images. This formulation was generalized by Gennert and Negahdaripour [11] to permit a linear transformation between intensity values between the fixed and moved images. We apply a symmetric form of this relaxed equation at each voxel: f (x) − m(x) =
1 Δx(x)T (∇f (x) + ∇m(x)) − Δa(x)(f (x) + m(x)) . 2
(2)
In this equation, Δa(x) represents a linear shift in voxel intensity with respect to the mean intensity at that position in the fixed and moved images. 2.2
Piecewise Deformation
If we apply Equation 2 at every voxel, we may encapsulate the resulting equations with the following linear system: Δx f −m=R . (3) Δa Given a matrix T which maps a deformation from a set of parameters, p, to the complete space of voxels (including intensity shift), we may write this system as f − m = RT p. While the theory allows any transformation to be used (affecting only the choice of T ), we select as our class of deformations the piecewise-quadrilateral transform: a grid of control points is placed over the image, and individual voxel parameters are computed by linear interpolation between adjacent control points (see e.g. [13]). We typically use a grid-spacing of 12mm, corresponding to approximately 16 pixels in-plane, and 4-6 slices. The resulting system of equations is ˆ Δx f −m=P , (4) ˆ Δa ˆ Δa) ˆ is the vector of control point displacements. where P = RT and (Δx, 2.3
Regularization, Normalization, and Preconditioning
We ensure smoothness of solutions by regularizing the normal equation derived from Equation 4. Thus, ˆ Δx T T 2 T P (f − m) = (P P + λ Q Q) , (5) ˆ Δa where Q is defined to ensure some correlation between displacements of adjacent control points. Specifically, our choice of Q applies a discretized Laplacian operator.
Piecewise-Quadrilateral Registration by Optical Flow
689
We wish to ensure that the intensity values of the fixed and moved images are within a similar range. Thus, we normalize the intensity values in each image (by scaling) to yield mean intensities of 1. We precondition P T P to account for the incomparability of scale between displacement and intensity shift. We compute a scaling factor for the intensity component of P by first generating an estimate for P T P using only intensity values from the fixed image (i.e. setting m = 0). We compute the trace of the subcomponents of P T P corresponding to the spatial displacements (tx , ty , tz ) and intensity shift (ta ), and scale future computed intensity shift components by (tx +ty +tz )/3t a . Further, to ensure consistency of the regularization parameter, we scale λ by tr(P T P )/ tr(QT Q), using the initial estimate for P T P after rescaling the intensity component. Finally, we solve Equation (5) using a preconditioned biconjugate gradient method (using the diagonal part of P T P + λQT Q as a preconditioner). 2.4
Update
We solve equation (5) at each iteration to compute a new transformation relative to the current estimate. Given the current transformation, Tn , we compute a new displacement Tn+1 between Tn (m) and f . The simplest way to update our transformation would be to set Tn+1 = Tn + Tn+1 . However, this would be inaccurate, as Tn+1 describes a displacement of control points in the space deformed by Tn . Thus, the correct update formula is Tn+1 = Tn + Tn−1 (Tn+1 ). Note that the regularization term in equation (5) must also be adjusted to account for the existing displacement (as the regularization should guarantee the smoothness of the composed transformation). In the interest of efficient computation, we sacrifice accuracy here and simply regularize the additive composition ˆ with Δx ˆ + Δxˆold ). (e.g. replacing Δx Following the update, we compute the normalized mutual information (NMI) between our current estimate and the target volume. If the mutual information increases, we accept the new transform. Otherwise, we assume that the smoothness constraint is restricting our ability to find a suitable solution, and so we discard the current estimate, reduce λ, and solve (5) again. If the mean con trol point displacement in Tn+1 falls below a defined threshold, we decrease λ (though we keep the current estimate), to avoid slow, asymptotic convergence. If λ falls below a defined threshold, the algorithm terminates, returning the last acceptable transformation.
3
Validation
We wished to verify objectively that our registration algorithm correctly computes non-rigid deformations between images in a contrast-enhanced sequence. We first simulated a contrast-enhanced sequence with no movement between successive images, shown in Figure 1. Next, we generated a known deformation, which was applied to each image in the sequence. Finally, we registered the non-deformed pre-contrast image in each sequence to each deformed image in
690
M.S. Froh et al. pre
post 1
post 2
post 2 - pre
S1
S2
S3
Fig. 1. Simulation studies. For each series, we show a slice from the pre-contrast baseline image, and the two post-contrast images. Contrast change is also shown within an outline of the breast. Volumes of contrast change for sequences S1 , S2 , and S3 were approximately 20 cm3 , 1 cm3 , and 280 cm3 respectively.
the sequence, to verify the accuracy of the algorithm in the presence of intensity changes. If the algorithm computes the correct transformation, it should displace each voxel in the same manner as the known deformation. We simulated physically plausible movement using finite-element methods, employing a biomechanical model of tissue properties. Finite-element methods have been used to produce realistic deformations in breast MR images in [8], for use in the validation of a nonrigid registration technique. Following segmentation of our input images, we labeled the fatty tissue, tumourous carcinoma, fibroglandular tissue, and skin. The Young’s moduli for these tissues and the Poisson’s ratio were set to the same values as in [8]. For each time series, we simulated a point displacement and compression between a pair of mammographic plates. After registering the data, we were able to compare computed voxel displacements to the true values produced by the biomechanical model. We compared our results to those obtained with a pure affine registration and an existing validated registration algorithm [14]. We used three sets of time series data, each with three time points (one pre-contrast and two post-contrast), and two deformations (point displacement and plate compression), for a total of eighteen volumes to be registered to the pre-contrast, undeformed volumes. Results are summarized in Figure 2. Our algorithm performed well over the eighteen data sets, reducing the mean per-voxel error from approximately 4.2mm before registration to approximately
Piecewise-Quadrilateral Registration by Optical Flow
691
5 Breast Fat Fibroglandular Tumour
4.5
4
Mean Error (mm)
3.5
3
2.5
2
1.5
1
0.5
0
Unreg
Affine
Optflow
VTK CISG
Fig. 2. Mean error per voxel, showing results over the whole breast and for each of the segmented regions (fat, fibroglandular tissue, and tumour tissue). Results shown are average values over the unregistered volumes, and over the volumes following affine registration, registration by our piecewise-quadrilateral optical flow method, and registration by the non-rigid method of Rueckert et al. [7].
0.77mm, significantly outperforming the affine registration method (with 2.1mm mean error), and obtaining results nearly as accurate at those of [14] (with 0.48mm mean error). Furthermore, our algorithm increased the mean correlation coefficient between the source and target volumes 0.8383 to 0.9783, and reduced the sum of squared differences (SSD) by an average of 63% (while the method of [14] achieved a mean correlation coefficient of 0.9780, and an SSD reduction of 68%.) The execution times per volume on a 3Ghz Intel Pentium 4 workstation were approximately 1 minute, 2 minutes, and 6 hours, for the affine algorithm, the piecewise-quadrilateral algorithm, and [14], respectively.
4
Clinical Results
The algorithm presented herein has been applied to over four hundred dynamic contrast-enhanced screening examinations between September 2004 and February 2006. Examining 49 examinations, between December 2005 and February 2006, with a total of 372 post-contrast volumes, we found that 178 volumes were untouched by the algorithm (as no significant movement was found), while the remainder saw an average 11% reduction in SSD and an increase in mean correlation coefficient from 0.980 to 0.985. Furthermore, several cases have been found in which features were undetectable on subtraction images prior to registration, but were clearly visible following registration. One such case is shown in Figure 3. In use, the registration method is straightforward, and may be launched with a single command from the scanner console. Registration is performed on a cluster of eight 1Ghz Intel Pentium III processors, with eight post-contrast volumes registered in approximately three minutes.
692
M.S. Froh et al.
Fig. 3. Subtraction images of a slice acquired from a screening subject before (left) and after (right) registration. Note the small circular region of enhancement near the centre of the breast in the registered image.
5
Conclusions
We have presented a fast non-rigid registration algorithm which attempts to overcome some of the traditional limitations of optical flow techniques, while computing accurate transformations in little time. In future, we may experiment with other families of deformations, such as cubic B-splines over a grid of control points. Simulations based on a biomechanical model of breast tissues support our claim that in the presence of contrast enhancement, the algorithm is able to accurately compute a correct deformation to compensate for non-rigid motion of the breast. The algorithm significantly outperformed an affine registration method, yielding accuracy nearly equal to that of an existing validated nonrigid method, in a fraction of the time. The algorithm has been applied to several hundred screening exams, compensating for actual patient motion. Results have been well-received by a collaborating radiologist and our method has successfully exposed areas of enhancement undetectable on unregistered subtraction images. Our experience to date indicates that this is an effective, practical, and timeefficient means to conduct co-registration of dynamic contrast-enhanced MR images for breast screening in the clinic.
Acknowledgments We wish to thank Elizabeth Ramsay for assistance in gathering the breast MRI data. We would also like to thank the Canadian Breast Cancer Research Alliance and the Terry Fox Foundation for support in breast MRI research.
References 1. Warner, E., Plewes, D.B., Hill, K.A., Causer, P.A., Zubovits, J.T., Jong, R.A., Cutrara, M.R., DeBoer, G., Yaffe, M.J., Messner, S.J., Meschino, W.S., Pirod, C.A., Narod, S.A.: Surveillance of BRCA1 and BRCA2 mutation carriers with magnetic resonance imaging, ultrasound, mammography, and clinical breast examination. Journal of the American Medical Association 292(11) (2004) 1317–1325
Piecewise-Quadrilateral Registration by Optical Flow
693
2. American College of Radiology(ACR): Breast Imaging Reporting and Data System (BI-RADS) (2nd edn.). American College of Radiology, Reston, Virginia (1995) 3. Kuhl, C.K., Mielcareck, P., Klaschik, S., Leutner, C., Wardelmann, E., Giesek, J., Schil, H.H.: Dynamic breast mr imaging: Are signal intensity time course data useful for differential diagnosis of enhancing lesions? Radiology 211 (1999) 101–110 4. Zuo, C., Jiang, A., Buff, B.L., Mahon, T.G., Wong, T.Z.: Automatic motion correction for breast mr imaging. Radiology 198(3) (1996) 903–906 5. Kumar, R., Asmuth, J.C., Hanna, K., Bergen, J., Hulka, C., Kopans, D.B., Weisskoff, R., Moore, R.: Application of 3d registration for detecting lesions in magnetic resonance breast scans. Proc. SPIE Medical Imaging 1996: Image Processing 2710 (1996) 646–656 6. Davis, M., Khotanzad, A., Flamig, D., Harms, S.: A physics based coordinate transformation for 3d image matching. IEEE Trans. Med. Imag. 16(3) (1997) 317–328 7. Rueckert, D., Sonoda, L.I., Hayes, C., Hill, D.L.G., Leach, M.O., Hawkes, D.J.: Nonrigid registration using free-form deformations: application to breast MR images. IEEE Trans. Med. Imag. 18(8) (1999) 712–721 8. Schnabel, J.A., Tanner, C., Castellano-Smith, A.D., Degenhard, A., Leach, M.O., Hose, D.R., Hill, D.L.G., Hawkes, D.J.: Validation of nonrigid image registration using finite-element methods: Application to breast mr images. IEEE Trans. Med. Imag. 22(2) (2003) 238–247 9. Rohlfing, T., Jr., C.R.M., Bluemke, D.A., Jacobs, M.A.: Volume-preserving nonrigid registration of mr breast images using free-form deformation with an incompressibility constraint. IEEE Trans. Med. Imag. 22(6) (2003) 730–741 10. Crum, W.R., Tanner, C., Hawkes, D.J.: Anisotropic multi-scale fluid registration: evaluation in magnetic resonance breast imaging. Physics in Medicine and Biology 50(21) (2005) 5153–5174 11. Gennert, M.A., Negahdaripour, S.: Relaxing the brightness constancy assumption in computing optical flow. Massachusetts Institute of Technology Artificial Intelligence Laboratory AIM-975 (1987) 12. Horn, B.K., Schunck, B.G.: Determining optical flow. Artificial Intelligence 17 (1981) 185–204 13. Barber, D.C.: Registration of low resolution medical images. Physics in Medicine and Biology 37(7) (1992) 1485–1498 14. Hartkens, T., Rueckert, D., Schnabel, J., Hawkes, D., Hill, D.: VTK CISG registration toolkit: An open source software package for affine and non-rigid registration of single- and multimodal 3d images. In: Proc. Bildverarbeitung in der Medizin 2002. LNCS, Leipzig, Springer Verlag (2002) 409–412
Iconic Feature Registration with Sparse Wavelet Coefficients Pascal Cathier CEA, DSV, DRM, SHFJ, Orsay, F-91400 France Abstract. With the growing acceptance of nonrigid registration as a useful tool to perform clinical research, and in particular group studies, the storage space needed to hold the resulting transforms is deemed to become a concern for vector field based approaches, on top of the traditional computation time issue. In a recent study we lead, which involved the registration of more than 22,000 pairs of T1 MR volumes, this constrain appeared critical indeed. In this paper, we propose to decompose the vector field on a wavelet basis, and let the registration algorithm minimize the number of non-zero coefficients by introducing an L1 penalty. This enables a sparse representation of the vector field which, unlike parametric representations, does not confine the estimated transform into a small parametric space with a fixed uniform smoothness : nonzero wavelet coefficients are optimally distributed depending on the data. Furthermore, we show that the iconic feature registration framework allows to embed the non-differentiable L1 penalty into a C 1 energy that can be efficiently minimized by standard optimization techniques.
1
Introduction
While nonrigid registration has been around for decades, its use has long been restricted to field experts. In recent years, however, we observed a multiplication of its applications and a democratization of its use, probably lead by the growing popularity of voxel-based morphometry (VBM) studies [1] and computerized atlases [2,3,4]. We believe that with the growing confidence in nonrigid registration fueled by results obtained in such group studies, nonrigid registration is to become a popular tool. Its massive use yet raises new problems. One of them, on which we focus in this article, is the problem of storage requirements. In a recent study we lead [5], which involved the registration of more that 22,000 pairwise registration of T1 MRI data, the disk space required to hold all the resulting vector fields indeed appeared critical. To reduce storage requirements, one can use parametric transform models, the most popular being the tensor product of B-splines [6,7]. By placing control points at every 4 voxels, the number of vector parameters drops to 1/43 = 1.6% of the total number of voxels. However, this data independant decimation has a cost : transform smoothness is increased uniformly. By contrast, elastic materials typically have sharp deformations with rapidly varying Jacobian values at their boundaries, and smooth variations inside uniform areas. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 694–701, 2006. c Springer-Verlag Berlin Heidelberg 2006
Iconic Feature Registration with Sparse Wavelet Coefficients
695
Another solution is to compress vector fields posterior to registration. The compression has to be lossy — lossless compression of floating point sequences is notably inefficient. Compared to the parametric approach, this solution better distributes spatially the information that needs to be kept. However, the procedure is still done without knowledge of the data. One does not know what is lost during compression, and thus compression is necessarily done at conservative levels. Furthermore, vector fields resulting from registration may badly expend on the basis used for compression, depending on the regularization. In this paper, we propose an algorithm that uses a wavelet decomposition of the vector field to register two images. The number of non-zero wavelet coefficients is minimized by including an L1 penalty in the global energy. It is a well-known trick used e.g. for signal filtering or feature selection [8], and which has been used by [9] in the close field of optical flow estimation. Furthermore, we show that this L1 penalty can be embedded into a smooth energy that can be minimized by standard optimization procedures. We finally present some preliminary results showing that registration can be achieved using only a very small number of non-zero wavelet coefficients.
2 2.1
Presentation of the Registration Algorithm Wavelet-Based Iconic Feature Registration
In [10], we introduced the concept of iconic feature registration. The main characteristic of iconic feature registration (IFR) is its similarity with geometric landmark registration in its use of corresponding points to guide the alignment. As opposed to other intensity-based techniques, IFR algorithms compute intensity similarity between corresponding points, and estimate the transform from these pairings by optimizing a geometric criterion. This formulation of registration is potentially more powerful because it better takes into account localization errors. See Fig. 1 for a schematic illustration of the differences between the IFR and the standard formulations. Although usually not recognized as such, IFR is a widespread technique. Block matching [2] falls into that category, as well as many other two-step techniques [11,12]. These are yet presented as ad hoc techniques lacking a solid variational framework. In [10], we proposed an energy minimization framework for IFR. The algorithm presented here relies on a generalization of this energy. Let I : ΩI ∈ R3 → R and J : ΩJ ∈ R3 → R be two images to be registered. The energy we minimize depends on two maps T and M going from ΩI to ΩJ that represent resp. the transform to be estimated, and a set of point matches: E(M, T ) = S(I, J ◦ M ) + d(M, T ) + R(T ),
(1)
where S(I, J ◦ M ) is an intensity (dis)similarity energy between images I and J related by matches M , R(T ) is a regularization energy accounting for priors on the transform, and d is a geometric distance between the matches M and the transform T , accounting for geometrical noise originating from errors in our
696
P. Cathier
Fig. 1. Schematic illustration of the difference between intensity feature based and standard intensity-based approaches. First two images: original images showing a large and a small circle. Third image: minimizing intensity differences results in a maximum overlap of the circles. Last image: an iconic-landmark based approach minimizes distances between iconic pairings, putting the smaller circle at a constant distance to the larger one, thus better distributing the alignment errors.
model of the transform, or to genuine differences between objects that we do not wish to recover. A Statistical Inference Viewpoint. Energy (1) can be related to the conditional log-probability log p(M, T |I, J), since log p(T, M |I, J) = log p(I, J|M, T ) + log p(M |T ) + log p(T ) − log p(I, J). (2) Assuming a uniform prior on images I and J, we identify S(I, J ◦ M ) with − log p(I, J|M, T ) = − log p(I, J|M ), R(T ) with − log p(T ) and d(M, T ) with − log p(M |T ). The minimization of (1) can thus be interpreted as a maximum a posteriori (MAP) estimation of the unknown transform T and matches M . An alternative would be to maximize log p(T |I, J) directly and treat M as a hidden variable. This yields a robust registration algorithm taking into account the uncertainty over M for the estimation of T . However, the set of matches M makes a bad hidden variable, because its conditional distribution cannot be deduced analytically from T but has to be sampled. Standard algorithms such as expectation-maximization (EM) become very computational intensive in that context. It can nevertheless be tractable, esp. in two dimensions [13]. Cubic Spline Wavelets. Cubic spline wavelets are one of the most useful wavelets in image processing and computer graphics, especially for reconstruction tasks. This popularity is due to the fact that it is smooth, symmetric, and can be efficiently implemented in the spatial domain by a so-called lifting scheme [14]. Wavelets are 1D creatures. In higher dimensions, the wavelet transform is done through multiresolution analysis (MRA), which alternates unidimensional transforms along the canonical axes. MRA yields basis functions Ψ,i that are tensor products of unidimensional wavelets and cubic B-splines. Details on this standard procedure can be found e.g. in [15]. In this work, 3D displacement fields representing M and T are decomposed on a cubic spline MRA wavelet basis Ψ,i . The transform T is written as
Iconic Feature Registration with Sparse Wavelet Coefficients
T (x) =
N L
Θ,i Ψ,i (x),
x ∈ R3 , Θ,i ∈ R3 ,
697
(3)
=1 i=1
where L is the total number of multiresolution levels, which depends on the image size, N is the number of basis functions at level , and Θ,i is the coefficient of the basis function Ψ,i . M is decomposed similarly, with coefficients θ,i . The very same basis has already been used in the past to register images in [16]. Regularization. The registration is regularized using the following L1 penalty on the wavelet coefficients: R(T ) = λR
N L
2 SΘ,i 1 ,
(4)
=1 i=1
where λR is a regularization strength parameter, S = (si,j ) ∈ R3×3 is a diagonal 3 matrix where si,i is the voxel size along the i-th axis, and x1 = i=1 |xi |. The 2 factor under the sum introduces a linear increase of the penalty with the frequency of the wavelets, which was chosen here to imitate a first-order penalty like linear elasticity. This is of course only an approximation, but it is quite satisfactory for registration purposes. The use of an L1 penalty is a well-known trick to optimize the number of non-zero parameters [8]. This property comes from the fact that this norm has a gradient discontinuity at its minimum, zero. When added to a smooth similarity criterion, the minimum escapes from zero only when the similarity strongly suggests so. A downside of the L1 norm is its anisotropy. The (non-squared) L2 norm would be a better choice in that respect, but at the price of worse compression ratios—coefficients in all directions have to be simultaneously small to be zeroed. Coupling M and T . The relationship between matches M and the transform T —or more precisely, the conditional probability p(M |T )—is the key feature of iconic feature registration. It is also the harder to model since it encompasses many sources of geometrical uncertainties and model inadequacies. In the following, this relationship is described by an Euclidean distance between the wavelet coefficients of M and T : d(M, T ) = λd
N L
2 2− S θ,i − Θ,i 2 ,
(5)
=1 i=1
3 with x22 = i=1 x2i . The factor 2− relates to the size of the spatial support of the wavelet — if cubic splines were orthogonal, this formula would be simply proportional to M − T 22 . 2.2
Iconic Feature Registration Without Alternating Minimization
IFR algorithms, following their geometrical counterparts like the iterative closest point (ICP) algorithm, usually proceed by alternating two steps: a matching
698
P. Cathier
step, where optimal matching pairs are sought, and a fitting step, where the transform is updated. These steps correspond resp. to the minimization of (1) w.r.t. M and T . Sometimes, however, we may derive a closed-form formula for the optimal transform T ∗ (M ) that depends on M when both the relationship p(M |T ) between matches and the transform and the prior p(T ) on the transform are simple enough. Replacing T by T ∗ (M ) in (1) then yields an energy that depends on M only, and we are left optimizing the matches M . When the optimal M ∗ is found, the optimal transform is simply given by T ∗ (M ∗ ). Avoiding the alternating minimization constraint can be computationally attractive. Alternating minimization forces each minimization step to lie within given subspaces of the parameter space, which forbids looking in directions given by linear combinations of gradients typically used by fast optimization procedures. Convergence to the minimum is thus linear, i.e. slow. Optimizing on M has yet some drawbacks. The matches are typically not constrained much, meaning that we may encounter many local minima. Also, a lot of time may be wasted finding the finer details of M at the highest frequencies, while these details, being heavily penalized by (4), do not impact the transform. This can be avoided by including a stopping criterion holding on T ∗ (M ). Application to Our Algorithm. From equations (4) and (5), we deduce a closedform formula for T ∗ (M ). Noting Θj,i and θj,i the j-th component, j ∈ {1, 2, 3}, of resp. Θ,i and θ,i , we have ⎧ ,i ,i ⎨ θj − K/sj,j if sj,j θj > K λR ,i ,i Θj = −θj + K/sj,j if sj,j θj,i < −K , with K = 22−1 . (6) ⎩ λd 0 otherwise Replacing T with T ∗ (M ) into (1), our energy becomes E (M, T ∗ (M )) = S(I, J ◦ M ) + λd
N L 3
2 fK sj,j θj,i ,
(7)
=1 i=1 j=1
with fK (x) =
x2 if |x| < K . 2K|x − K| + K 2 if |x| ≥ K
(8)
Therefore, in spite of the L1 constraint on the wavelet coefficients of T , it turns out that the objective function on M is smooth, as fK has a continuous derivative everywhere — it happens to be Huber’s M-estimator penalty function. Standard optimization algorithms can be used for its optimization. Minimization. In the following, we minimize E(M, T ∗ (M )) using a conjugate gradient descent. The computation of the partial derivatives ∂θ,i S(I, J ◦ M ) j typically requires the convolution of partial derivatives w.r.t. the displacement of a single point ∂M(x) S(I, J ◦ M ) with the basis functions Ψ,i . These partial derivatives are not given by the analysis transform because cubic spline wavelets are not orthogonal : the partial derivatives have to correspond to convolutions
Iconic Feature Registration with Sparse Wavelet Coefficients
699
Fig. 2. Synthetic registration experiment. From left to right: the original volume, the deformed volume, the original volume registered onto the deformed volume, the synthetic deformation, and the recovered deformation. The transform is recovered where the images bear sufficient information.
with cubic spline wavelets, not their duals. While not standard, it is not difficult to build a direct wavelet lifting transform corresponding to these convolutions. The energy is minimized in a multiresolution fashion. A Gaussian pyramid of images I and J is constructed. At each level of the pyramid, the transform is estimated, starting from the top. When convergence is reached, the estimated transform is propagated to the next level, and registration is resumed.
3
Experiments
In all the following experiments, volumes are 256 × 256 × 256 T1 -weighted MRI. We used the Gaussian-weighted local correlation coefficient as the similarity measure S [17]. At each level of the Gaussian pyramid, the basis of the last multiresolution level was systematically ignored, in part to address the problem mentioned in section 2.2 that fine details of M are long to compute yet have the least impact on T . 3.1
A Synthetic Experiment
We first present a synthetic experiment where the ground truth is known. We deformed a MR volume of a brain with the sinusoidal deformation presented in Fig. 2. We manually tuned the λR and λd parameters until the results were visually satisfying. The registration took about 5 hours to complete. Results in Fig. 2 show that the algorithm is able to recover the transform pretty well, except of course for areas lying in the background. Yet to achieve this result, only 262,144 non-zero wavelet coefficients were used. That represents a vector for about 0.53% of the voxels. 3.2
Registration of Different Brains
The next experiment deals with volumes of the brain of two different individuals. The images have been corrected for non-uniform bias, and the brains have been segmented and aligned affinely. Parameters λR and λd have been chosen manually to correspond to a conservative regularization, which is necessary for multisubject brain registration. The registration took about five hours to complete the task on a 1.8MHz Pentium. Figure 3 shows slices of the original volume
700
P. Cathier
Fig. 3. Registration of two different brains. From left to right: The source volume; The target volume; The registered volume; Intensity difference between the source and the target volumes; and between the registered and the target volumes. Regularization strength was set high enough to prevent over-registration of brain sulci.
pair and of the registered volume, as well as intensity differences between volumes before and after registration. The algorithm did a decent job at aligning brain volumes, while preserving unmatchable structures such as the details of the cortical sulci. However, the registration used only 242,296 non-zero wavelet coefficients. Reported to the entire volume, this is a vector for about 0.48% of the voxels.
4
Conclusion
This article presented registration from a new point of view: the storage space of the generated deformations, which we believe will become more and more problematic with the growing popularity of nonrigid registration in group studies. We proposed a solution based on describing the transform in terms of wavelets which minimizes the number of non-zero coefficients. Much work has still to be done to confirm the behavior of the approach. The impact of the anisotropy of the L1 constraint should be evaluated. Extensive studies should also give an idea of compression ratios we can expect depending on the imaging modalities and the desired regularity and accuracy of the transform. Finally, computation speed-up may be obtained by tuning the optimization precision depending on the multiresolution level, avoiding spending too much time finely estimating the highest frequencies of the matches that do not impact much the transform.
References 1. Ashburner, J., Friston, K.F.: Voxel-based morphometry — The methods. NeuroImage 12 (2000) 1427 – 1442 2. Collins, D.L., Evans, A.C.: ANIMAL validation and applications of nonlinear registration based segmentation. Int. J. of Pattern Recognition and Artificial Intelligence 11 (1997) 1271 – 1294
Iconic Feature Registration with Sparse Wavelet Coefficients
701
3. Thompson, P., Toga, A.W.: Detection, visualization and animation of abnormal anatomic structures with a deformable probabilistic atlas based on random vector field transformations. Medical Image Analysis 1 (1997) 271 – 294 4. Rueckert, D., Frangi, A.F., Schnabel, J.A.: Automatic construction of 3D statistical deformation models of the brain using nonrigid registration. IEEE Trans. Med. Im. 22 (2003) 5. Cathier, P., Fung, G., J.-F., M.: Population classification based on global brain features inferred from massive T1 MRI registration. In: Proc. of HBM. (2006) 6. Szeliski, R., Coughlan, J.: Spline-based image registration. Int. J. of Comp. Vision 22 (1997) 199–218 7. Kybic, J., Unser, M.: Fast parametric elastic image registration. IEEE Trans. Im. Proc. 12 (2003) 1427 – 1442 8. Tibshirani, R.: Regression selection and shrinkage via the lasso. J. Royal Stat. Soc. 1 (1995) 267 – 288 9. Ng, L., Solo, V.: Optical flow estimation using wavelet zeroing. In: Proc. of ICIP. (1999) III:722 – 726 10. Cachier, P., Bardinet, E., Dormont, D., Pennec, X., Ayache, N.: Iconic feature based nonrigid registration: The PASHA algorithm. Comp. Vision Im. Underst. 89 (2003) 272 – 298 11. Thirion, J.P.: Image matching as a diffusion process: an analogy with Maxwell’s demons. Medical Image Analysis 2 (1998) 243–260 12. Hayton, P.M., Brady, M., Smith, S.M., Moore, N.: A non-rigid registration algorithm for dynamic breast MR images. Artificial Intelligence 114 (1999) 124–156 13. Singh, A., Allen, P.: Image-flow computation: An estimation-theoretic framework and a unified perspective. CVGIP: Im. Underst. 56 (1992) 152 – 177 14. Daubechies, I., Sweldens, W.: Factoring wavelet transforms into lifting steps. J. Fourier Anal. Appl. 3 (1998) 247 – 269 15. Mallat, S.G., ed.: A wavelet tour of signal processing. Academic Press (1998) 16. Wu, Y.T., Kanade, T., Li, C.C., Cohn, J.: Image registration using wavelet-based motion model. Int. J. of Comp. Vision 38 (2000) 129 – 152 17. Cachier, P., Pennec, X.: 3D non-rigid registration by gradient descent on a Gaussian-windowed similarity measure using convolutions. In: Proc. of MMBIA (2000) 182 – 189
Diffeomorphic Registration Using B-Splines Daniel Rueckert1 , Paul Aljabar1 , Rolf A. Heckemann2, Joseph V. Hajnal2 , and Alexander Hammers3 2
1 Department of Computing, Imperial College London, UK Imaging Sciences Department, MRC Clinical Sciences Centre, Imperial College London, UK 3 Division of Neuroscience and Mental Health, MRC Clinical Sciences Centre, Imperial College London, UK
Abstract. In this paper we propose a diffeomorphic non-rigid registration algorithm based on free-form deformations (FFDs) which are modelled by B-splines. In contrast to existing non-rigid registration methods based on FFDs the proposed diffeomorphic non-rigid registration algorithm based on free-form deformations (FFDs) which are modelled by B-splines. To construct a diffeomorphic transformation we compose a sequence of free-form deformations while ensuring that individual FFDs are one-to-one transformations. We have evaluated the algorithm on 20 normal brain MR images which have been manually segmented into 67 anatomical structures. Using the agreement between manual segmentation and segmentation propagation as a measure of registration quality we have compared the algorithm to an existing FFD registration algorithm and a modified FFD registration algorithm which penalises non-diffeomorphic transformations. The results show that the proposed algorithm generates diffeomorphic transformations while providing similar levels of performance as the existing FFD registration algorithm in terms of registration accuracy.
1 Introduction The area of computational anatomy is a rapidly developing discipline [1]. With the increasing resolution of anatomical scans of the human brain a number of computational approaches for characterising differences in the shape and neuro-anatomical configuration of different brains have emerged. Most morphometric techniques exploit the fact that mapping data sets into a standard space not only accounts for anatomic and functional variations and idiosyncrasies of each individual subject, but also offers a powerful framework which facilitates comparison of anatomy and function over time, between subjects, between groups of subjects and across sites. Non-rigid registration is a key element in many morphometric applications since it enables the warping of images into a standard reference space. Most non-rigid registration techniques use either elastic [2,3], fluid [4,5,6,7] or other deformation models [8,9,10,11,12]. In general, most registration algorithms make the assumption that similar structures are present in both images. Therefore it is desirable that the deformation field be smooth and invertible (so that every point in one image has a corresponding point in the other). Such smooth, invertible transformations are called diffeomorphisms. This is particularly important in cases where the output of the registration, i.e. the deformation field, is analyzed further, for example in deformation-based morphometry [13,14]) or for the R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 702–709, 2006. c Springer-Verlag Berlin Heidelberg 2006
Diffeomorphic Registration Using B-Splines
703
construction of statistical shape models [15]. However, in other applications diffeomorphisms are less important. For example, in registration-based segmentation the quality of the segmentation propagation is more important than the properties of the underlying deformation fields. Recent comparisons of non-rigid registration algorithms for segmentation propagation have shown that non-rigid registration techniques which do not guarantee diffeomorphisms can outperform those which guarantee diffeomorphic deformations [16]. There are a number of different ways of constructing diffeomorphic deformations. For example, in many fluid registration algorithms [4,5] the deformation process is stopped when the Jacobian of the transformation falls below a pre-defined threshold. The images are then regridded and the deformation process is allowed to continue. Other strategies include the composition of local warps which are diffeomorphic [15]. In this paper we propose a diffeomorphic non-rigid registration technique which uses a free-form deformation (FFD) model based on B-splines. To construct a diffeomorphic deformation we compose the deformation as a sequence of FFDs. By imposing hard constraints on the maximum deformation within each FFD we ensure that each individual FFD is a diffeomorphism. The composition of these FFDs is therefore also a diffeomorphism. We compare the proposed diffeomorphic registration algorithm to an existing free-form registration algorithm [8]. We also compare this to a soft-constraint approach in which topology preservation is enforced by introducing a penalty term which penalizes transformations for which the determinant of the Jacobian falls below a pre-defined threshold.
2 Methods In practice, the anatomical variability between subjects cannot be sufficiently explained by an affine transformation which only accounts for differences due to position, orientation and size of the anatomy. To capture the anatomical variability, it is necessary to employ non-rigid transformations. A popular approach is to use spline-based transformations [8,17]. Using this approach, the deformation can be represented as a free-form deformation (FFD) based on B-splines which is a powerful tool for modelling 3D deformations and can be written as the 3D tensor product of the familiar 1D cubic B-splines, T(x) =
3 3 3
Bl (u)Bm (v)Bn (w)ci+l,j+m,k+n
(1)
l=0 m=0 n=0
where c denotes the control points which parameterise the transformation. The optimal transformation is found by minimising a cost function associated with the global transformation parameters as well as the local transformation parameters. The cost function comprises two competing goals: The first term represents the cost associated with the voxel-based similarity measure, in this case normalised mutual information [18], while the second term corresponds to a regularization term which constrains the transformation to be smooth [8]. The performance of this registration method is limited by the resolution of the control point mesh, which is linearly related to the computational complexity: more global and intrinsically smooth deformations can only be modelled using
704
D. Rueckert et al.
a coarse control point spacing, whereas more localized and intrinsically less smooth deformations require a finer spacing. To address this problem it is possible to use multilevel free-form deformations [19,8]. In this approach the free-form deformations at different control point spacings are added, i.e.: T(x, y, z) =
n
Ti (x, y, z)
(2)
i=1
When using multi-level FFDs, the intrinsic regularization of B-splines is often sufficient to achieve smooth deformations. 2.1 Diffeomorphic Non-rigid Registration Using Soft Constraints Recent studies have shown that this type of algorithm performs well in neuroimaging studies [16]. However, while this type of algorithm usually generates smooth deformations it does not guarantee that the resulting deformation field is diffeomorphic. One strategy to ensure that the FFD calculated by the non-rigid registration algorithm is diffeomorphic is to add a second term to the cost function that penalises any nondiffeomorphic transformations using the following penalty function P: γ2 − 2 if |J(x, y, z)| ≤ γ P = |(J(x,y,z)|2 (3) 0 otherwise Here |J(x, y, z)| represents the determinant of the Jacobian of the FFD. A similar penalty function has been first proposed by Edwards et al. [20] and effectively penalises any transformations for which the determinant of the Jacobian falls below a threshold γ. By penalising Jacobians that approach zero, we prevent the transformation from collapsing and keep it diffeomorphic. Note that simply using a smoothness penalty function would not be sufficient to guarantee a diffeomorphic transformation, since it is possible for a transformation to be smooth but non-diffeomorphic. The cost function C which the registration algorithm minimizes is a combination of the image similarity I (normalised mutual information as before) and the penalty function P: C = −I + λP The weighting parameter λ defines the trade-off between the image similarity and the penalty function. For the experiments in this paper γ was set to 0.3 and λ to 0.01. This means that the penalty term is only activated if the determinant of the Jacobian drops below 0.3. 2.2 Diffeomorphic Non-rigid Registration Using Hard Constraints In the following we will use a strategy for ensuring diffeomorphic FFDs which is based on the work of Choi and Lee [21]. In their work the authors have derived sufficient conditions for injectivity which are represented in terms of control point displacements. These sufficient conditions can be easily tested and can be used to guarantee a diffeomorphic FFD. Without loss of generality we will assume that the control points are
Diffeomorphic Registration Using B-Splines
705
arranged on a lattice with unit spacing. Let Δci,j,k = (Δxi,j,k , Δyi,j,k , Δzi,j,k ) be the displacement of control point ci,j,k . Let δx = max |Δxi,j,k |, δy = max |Δyi,j,k |, δz = max |Δzi,j,k |. Theorem 1. A FFD based on cubic B-splines is locally injective over all the domain if 1 1 1 δx < K , δy < K and δz < K . Choi and Lee [21] have determined a value of K ≈ 2.48 so that the maximum displace1 ment of control points given by the bound K is approximately 0.40. This means that the maximum displacement of control points is given by the spacing of control points in the lattice. For example, for a lattice with 20mm control point spacing the maximum control point displacement is 8mm while for a lattice with 2.5mm control point spacing the maximum control point displacement is 1mm. In practice the bounds on the displacements are too small to model any realistic deformations. To model large deformations we use a composition of FFDs as proposed in [22]. For each FFD in this composition, the maximum control point displacement is limited by theorem 1. This a fundamentally different to the multi-level FFDs mentioned earlier. In our approach the FFDs are concatenated, i.e. T(x, y, z) = Tn ◦ Tn−1 ◦ · · · ◦ T2 ◦ T1 (x, y, z)
(4)
so that the final deformation is a composition of FFDs. Since the composition of two diffeomorphisms produces a diffeomorphism we can construct a diffeomorphic deformation by ensuring that each individual FFD is diffeomorphic. Again, normalised mutual information is used as a similarity measure.
3 Results 3.1 MRI Data and Manual Segmentations To evaluate the proposed registration algorithm we have tested the algorithms on 20 brain MR images from normal subjects. The data consisted of T1-weighted 3D volumes, acquired using an inversion recovery prepared fast spoiled gradient recall sequence (GE), TE/TR 4.2 msec (fat and water in phase)/15.5 msec, time of inversion (TI) 450 msec, flip angle 20, to obtain 124 slices of 1.5 mm thickness with a field of view of 18 x 24 cm with a 192 x 256 matrix, 1 NEX. Scanning took place at the National Society for Epilepsy MRI Unit (Chalfont St Peter, Buckinghamshire, UK). Scan data were resliced to create isotropic voxels of 0.9375 x 0.9375 x 0.9375 mm, using windowed sinc interpolation. Each data set was accompanied by a set of labels in the form of an annotated image volume, where every voxel was coded as one of 67 anatomical structures (or background, code 0). These labels had been prepared using a protocol for manually outlining anatomical structures on two-dimensional sections from the image volume. The protocol used was an augmented version of a published prototype [23]. 3.2 Evaluation of Non-rigid Registration Via Segmentation Comparison We have evaluated three registration techniques: the standard non-rigid registration as described in [8] (method A), a non-rigid registration based on the composition of FFDs
706
D. Rueckert et al.
described in section 2.2 (method B) and a non-rigid registration which penalizes nondiffeomorphic transformations described in section 2.1 (method C). For methods A and C we used a coarse to fine approach with a control point spacing of 20mm, 10mm, 5mm and 2.5mm. For method A we also used a coarse to fine approach with similar control point spacings. At each level of control point spacing up to 10 FFDs are concatenated so that the final transformation is a composition of up to 40 FFDs. We then chose randomly one of the 20 MR brain images as a reference image and all other 19 images were registered to the reference image. A visual example of the registrations is shown in Figure 1. In this example both method A and B lead to visually better alignment than method C. A close-up of the corresponding deformation fields can be seen in Figure 2. The folding in the deformation field produced by method A is clearly visible while the deformation fields produced by the other two methods have no folding. However, it is also apparent that method B is better capable of modelling large deformations than method C. The manual expert segmentations allow us to make an assessment of the relative performance of the registration algorithms in aligning specific structures or tissue classes. As a measure of agreement between the manual expert segmentation and those obtained via registration, we used the similarity index (SI). For each anatomical structure m, it is defined as the ratio between the volume of the intersection (overlap) and the mean volume of the segmentations in the same coordinate space: SIm =
2n(Sam ∩ Sbm ) n(Sam ) + n(Sbm )
(5)
This measure ranges from 0 (for label sets that have no overlapping regions) to 1 (for label sets that are identical). Since the manual segmentation contains 67 anatomical structures, we have grouped the results into temporal, parietal, occipital and frontal cortex, grey matter nuclei, ventricular system, brain stem, cerrebellum and corpus callosum. The SIs reported for these groups of structures are the average SI values for the antomical structures within the groups. The results for all three algorithms are illustrated in Figure 3. For comparison we also include the SI values after affine registration only. Method A produces similar results to method B, and both are better than those obtained with Method C. 3.3 Evaluation of the Deformation Fields Produced by Non-rigid Registration For all registration algorithms we have additionally computed the percentage of voxels within the brain for which the determinant of the Jacobian is negative. In the case of the standard non-rigid registration (method A) this percentage was 1.47% on average across all registrations. In contrast to this both other methods produced no voxels for which the determinant of the Jacobian was negative. For the non-rigid registration which penalizes non-diffeomorphic transformations (method C) the trade-off between image similarity and penalty function was set to λ = 0.01. In this case all voxels have a positive Jacobian determinant. To investigate the choice of λ further we have repeated our experiments with values of λ between 10−2 and 10−8 . However, in these cases the percentage of voxels with negative Jacobian determinant is non-zero (increasing to 0.62%). This indicates that λ of 0.01 is close to the optimal value ensuring no folding of the transformation.
Diffeomorphic Registration Using B-Splines
707
4 Discussion and Conclusions In this paper we have proposed two registration methods for generating diffeomorphic FFDs. The first method is based on a penalty term which prevents the determinant of the Jacobian falling below a pre-defined threshold. The second method is based on a composition of FFDs. Each FFD within the composition is guaranteed to be diffeomorphic by limiting the maximum amount of control point displacement within an individual FFD. Our results have shown that the latter technique is able to achieve high registration accuracy (compared to exisiting FFD registration techniques) while generating diffeomorphic transformations. Our results have also shown that the first method generates diffeomorphic transformations, however at the expense of registration accuracy.
(a)
(b)
(c)
Fig. 1. This figure shows an example slice of a volumetric inter-subject registration: (a) after nonrigid registration using method A, (b) after non-rigid registration using method B and (c) after non-rigid registration using method C. The image shown is the reference image and the isolines shown are obtained from the transformed image.
(a)
(b)
(c)
Fig. 2. This figure shows an example slice of the volumetric deformation produced by intersubject registration: (a) after non-rigid registration using method A, (b) after non-rigid registration using method B and (c) after non-rigid registration using method C. The folding is clearly visible in (a). There is no folding in (b) and (c), however (b) is better capable of modelling large deformations.
708
D. Rueckert et al.
Fig. 3. This figure shows the SIs for different groups of anatomical structures
References 1. Grenander, U., Miller, M.I.: Computational anatomy: An emerging discipline. Quarterly of Applied Mathematics 56(4) (1998) 617–694 2. Bajcsy, R., Kovaˇciˇc, S.: Multiresolution elastic matching. Computer Vision, Graphics and Image Processing 46 (1989) 1–21 3. Gee, J.C.: On matching brain volumes. Pattern Recognition 32(1) (1999) 99–111 4. Christensen, G.E., Rabbitt, R.D., Miller, M.I., Joshi, S.C., Grenander, U., Coogan, T.A., van Essen, D.C.: Topological properties of smooth anatomic maps. In: Information Processing in Medical Imaging: Proc. 14th International Conference (IPMI’95). (1995) 101–112 5. Christensen, G.E., Rabbitt, R.D., Miller, M.I.: Deformable templates using large deformation kinematics. IEEE Transactions on Image Processing 5(10) (1996) 1435–1447 6. Bro-Nielsen, M., Gramkow, C.: Fast fluid registration of medical images. In: Proc. 4th International Conference Visualization in Biomedical Computing (VBC’96). (1996) 267– 276 7. Beg, M.F., Miller, M.I., Trouv´e, A., Younes, L.: Computing large deformation metric mappings via geodesic flows of diffeomorphisms. International Journal of Computer Vision 61(2) (2005) 139–157 8. Rueckert, D., Sonoda, L.I., Hayes, C., Hill, D.L.G., Leach, M.O., Hawkes, D.J.: Non-rigid registration using free-form deformations: Application to breast MR images. IEEE Transactions on Medical Imaging 18(8) (1999) 712–721 9. Davatzikos, C.: Spatial transformation and registration of brain images using elastically deformable models. Computer Vision and Image Understanding 66(2) (1997) 207–222 ´ Perex, P.: Hierarchical estimation of a dense deformation 10. Hellier, P., Barillot, C., M´emin, E., field for 3d robust registration. IEEE Transactions on Medical Imaging 20(5) (2001) 388–402 11. Shen, D., Davatzikos, C.: Hammer: Hierarchical attribute matching mechanism for elastic registration. IEEE Transactions on Medical Imaging 21(11) (2002) 1421–1439
Diffeomorphic Registration Using B-Splines
709
12. Thirion, J.P.: Image matching as a diffusion process: An analogy with Maxwell’s demons. Medical Image Analysis 2(3) (1998) 243–260 13. Ashburner, J., Hutton, C., Frackowiak, R., Johnsrude, I., Price, C., Friston, K.: Identifying global anatomical differences: Deformation-based morphometry. Human Brain Mapping 6 (1998) 638–57 14. Chung, M.K., Worsley, K.J., Paus, T., C. Cherif, D.L.C., Giedd, J.N., Rapoport, J.L., Evans, A.C.: A unified statistical approach to deformation-based morphometry. NeuroImage 14(3) (2001) 595–606 15. Cootes, T.F., Marsland, S., Twining, C.J., Smith, K., Taylor, C.J.: Groupwise diffeomorphic non-rigid registration for automatic model building. In: ECCV. (2004) 316–327 16. Crum, W.R., Rueckert, D., Jenkinson, M., Kennedy, D., Smith, S.M.: A framework for detailed objective comparison of non-rigid registration algorithms in neuroimaging. In: Seventh Int. Conf. on Medical Image Computing and Computer-Assisted Intervention (MICCAI ’04). (2004) 679–686 17. Kybic, J., Unser, M.: Fast parametric elastic image registration. IEEE Transactions on Image Processing 12(11) (2003) 1427–1442 18. Studholme, C., Hill, D.L.G., Hawkes, D.J.: An overlap invariant entropy measure of 3D medical image alignment. Pattern Recognition 32(1) (1998) 71–86 19. Lee, S., Wolberg, G., Chwa, K.Y., Shin, S.Y.: Image metamorphosis with scattered feature constraints. IEEE Transactions on Visualization and Computer Graphics 2(4) (1996) 337– 354 20. Edwards, P.J., Hill, D.L.G., Little, J.A., Hawkes, D.J.: A three-component deformation model for image-guided surgery. Medical Image Analysis 2(4) (1998) 355–367 21. Choi, Y., Lee, S.: Injectivity conditions of 2D and 3D uniform cubic b-spline functions. Graphical Models 62(6) (2000) 411427 22. Hagenlocker, M., Fujimura, K.: CFFD: a tool for designing flexible shapes. The Visual Computer 14(5/6) (1998) 271–287 23. Hammers, A., Allom, R., Koepp, M.J., Free, S.L., Myers, R., Lemieux, L., Mitchell, T.N., Brooks, D.J., Duncan, J.S.: Three-dimensional maximum probability atlas of the human brain, with particular reference to the temporal lobe. Human Brain Mapping 19(4) (2003) 224–247
Automatic Point Landmark Matching for Regularizing Nonlinear Intensity Registration: Application to Thoracic CT Images Martin Urschler1 , Christopher Zach2 , Hendrik Ditt3 , and Horst Bischof1 1
Institute for Computer Graphics & Vision, Graz University of Technology, Austria {urschler, bischof}@icg.tu-graz.ac.at 2 VRVis Research Centre, Graz, Austria
[email protected] 3 Siemens Medical Solutions, CTE PA, Forchheim, Germany
[email protected]
Abstract. Nonlinear image registration is a prerequisite for a variety of medical image analysis tasks. A frequently used registration method is based on manually or automatically derived point landmarks leading to a sparse displacement field which is densified in a thin-plate spline (TPS) framework. A large problem of TPS interpolation/approximation is the requirement for evenly distributed landmark correspondences over the data set which can rarely be guaranteed by landmark matching algorithms. We propose to overcome this problem by combining the sparse correspondences with intensity-based registration in a generic nonlinear registration scheme based on the calculus of variations. Missing landmark information is compensated by a stronger intensity term, thus combining the strengths of both approaches. An explicit formulation of the generic framework is derived that constrains an intra-modality intensity data term with a regularization term from the corresponding landmarks and an anisotropic image-driven displacement regularization term. An evaluation of this algorithm is performed comparing it to an intensity- and a landmark-based method. Results on four synthetically deformed and four clinical thorax CT data sets at different breathing states are shown.
1
Introduction
A large number of medical image analysis applications require nonlinear (deformable) registration of data sets acquired at different points in time. Especially when dealing with soft tissue organs, like the lung or the liver during breathing, motion differences have to be compensated for further analysis steps. Surveys on nonlinear registration methods in medical imaging can be found in Maintz and Viergever [1] or Crum et al. [2]. The literature distinguishes between intensityand feature-based nonlinear registration methods. Often feature-based methods are more accurate as long as the feature extraction or segmentation steps are reliable. Due to the reduction of the problem space, feature-based methods are
This work was funded by Siemens MED CT, Forchheim and Siemens PSE AS, Graz.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 710–717, 2006. c Springer-Verlag Berlin Heidelberg 2006
Automatic Point Landmark Matching
711
also significantly faster to compute. However, inaccuracies from the often difficult feature extraction step have severe effects on the registration performance, making the intensity based methods perform better in many practical applications. Feature-based registration often uses manually or automatically derived point landmarks [3]. After matching the resulting sparse displacement field from the point correspondences is interpolated/approximated in the TPS [4,3] framework which crucially depends on an even distribution of point correspondences over the data set. However, this even distribution can rarely be guaranteed by automatic matching algorithms. Building upon our previous work on automatic point landmark extraction, matching and registration [5] the goal of this work is to replace the TPS displacement field estimation by intensity-based registration, thus establishing a hybrid nonlinear registration scheme. The assumption that the coupling of landmark- and intensity-based registration is more powerful than the individual steps has already been investigated by other authors. Gee et al. [6] proposed a Bayesian framework to unify these approaches. Johnson and Christensen [7] combined landmark thin-plate spline registration with intensity registration using a viscous fluid regularization in a consistent manner. In Li et al. [8] consistent intensity registration was combined with airway tree branch point matching for deformable lung registration. Fan and Chen [9] initialized an optical flow based intensity registration method with a semi-automatic airway tree and lung segmentation. Hellier and Barillot [10] proposed a unified brain registration framework combining a global intensity approach with sparse landmark constraints. Liu et al. [11] showed a combined volumetric and surface matching approach for inter-subject brain registration. Papademetris et al. [12] recently proposed a hybrid registration combining B-Spline free-form deformation intensity registration with the Robust Point Matching algorithm. Many of these hybrid schemes need a large computational effort during registration or during automated segmentation. Others require manual landmark selection or tedious semi-automatic segmentation methods for pre-processing. In this work we present a novel fully-automatic hybrid nonlinear registration framework that is both efficient and does not require pre-processing. This framework uses a flexible variational formulation [13] where different data and regularization terms can be used and the corresponding landmarks from the automatic landmark matching [5] act as an additional regularization constraint.
2
Landmark-Based Registration
In [5] an automatic landmark extraction, matching and registration approach consisting of a four-stage pipeline was constructed, inspired by state of the art computer vision matching techniques [14]. First, a robust and reproducible point landmark extraction step based on 3D F¨orstner corner detection [3] is used which produces a large number of landmark candidates. Second, for each landmark candidate a local and a global descriptor is calculated. The local descriptor is a 3D extension of the widely-used SIFT descriptor representation [15], which can be interpreted as a local gradient location and orientation histogram. To capture
712
M. Urschler et al.
global correspondence information an approximated 3D shape context descriptor [16] is computed. For each point landmark shape context can be interpreted as a quantized distribution of relative distances to all other points in the set of landmark candidates. Given these two descriptors a cost function is formed in the third stage and a robust, conservative forward-backward matching approach is applied to find a set of sparse correspondences. Each correspondence is assigned a matching uncertainty value derived from the matching cost function. The final stage of the registration pipeline is a dense displacement field estimation in a TPS framework [4], where ideas from Rohr [3] are used to approximate the interpolation condition and to incorporate the matching cost uncertainties. A k-d tree based approximation of the TPS model is proposed to save computation time. However, the algorithm, while being very efficient in the first three stages, still needs large computational effort for the final dense displacement field estimation and shows a crucial dependence on the even distribution of matched landmarks over the input volume.
3
Intensity-Based Registration
The general nonlinear registration problem can be formulated as a minimization process of a cost functional that depends on a similarity function S and a regularization constraint P [13]. Reference image R and floating image F are considered as intensity functions R : ΩR → R and F : ΩF → R over the domains ΩR , ΩF ⊂ R3 . The formalized registration problem is now stated as the problem of finding a displacement field (u, v, w)T : ΩR → ΩF that minimizes (for a weight λ > 0) the continuous cost functional J [u, v, w] = S[u, v, w] + λP[u, v, w] dxdydz. A widely used choice for a similarity measure S is the L2 − norm of the intensity differences in intra-modality applications. With a regularization term P that minimizes the norm of the displacement field gradients in an image-driven anisotropic fashion [17] an optical flow registration scheme can be derived [9,10]. Minimization can be achieved by solving the Euler-Lagrange equations e.g. using a Gauss-Seidel fixed-point iteration scheme.
4
Hybrid Landmark and Intensity Registration
The landmark-based registration approach [5] works well as long as an evenly distributed set of correspondences can be found. To avoid this shortcoming and to properly combine the advantages of landmark- and intensity-based registration such that both approaches concurrently contribute to the optimal solution, we propose to use the sparse set of matched landmarks as an additional regularization constraint Q in the general nonlinear registration functional. This leads to the continuous cost functional (for weights λ > 0, μ > 0) J [u, v, w] = S[u, v, w] + λP[u, v, w] + μQ[u, v, w] dxdydz.
Automatic Point Landmark Matching
713
The L2 − norm data term is defined as S[u, v, w] = (1 − W (x, y, z))(R(x, y, z) − F (x + u, y + v, z + w))2 , where the factor (1 − W (x, y, z)) penalizes the data term near landmark correspondences. Anisotropic image-driven regularization is given as P[u, v, w] = ∇uT D(∇F )∇u + ∇v T D(∇F )∇v + ∇wT D(∇F )∇w, where D(∇F ) is the diffusion tensor that creates a dependency between the estimated displacement field and the image boundaries of the floating image such that regularization across image boundaries is penalized. Finally the landmark matching constraint is 2 Q[u, v, w] = W (x, y, z) (u, v, w)T − (uf , vf , wf )T . W (x, y, z) is a weighting function incorporating landmark correspondences and their matching uncertainties and (uf (x, y, z), vf (x, y, z), wf (x, y, z))T is the sparse displacement field from the automatic landmark matching step. Note that the additional landmark matching regularization term is independent of the explicit choice of data or regularization terms. Minimization of the cost functional J is achieved by setting the functional derivatives of J equal to 0 resulting in a coupled system of Euler-Lagrange partial differential equations. The first equation reads (1 − W (x, y, z))(R − F (u, v, w))
∂F + λdiv(D(∇F )u) − μW (x, y, z)(u − uf ) = 0 ∂x
with the other two showing the same basic structure. A semi-implicit scheme for solving the Euler-Lagrange equation makes use of a first-order Taylor approximation of F (x + u, y + v, z + w). After discretization using standard finite difference stencils a huge but sparse system of linear equations has to be solved. For this purpose the iterative Gauss-Seidel algorithm is utilized. The whole registration algorithm is performed in a multi-resolution manner to speed-up computation and to avoid local minima.
5
Experiments and Results
To assess the validity of the hybrid registration approach qualitative and quantitative evaluations were performed on synthetically transformed and clinical thorax CT data sets showing breathing motion. All experiments were done on a standard Windows notebook computer with 2GHz and 2GB RAM. 5.1
Synthetic Deformations
Synthetic experiments were performed on four test data sets (A,B,C,D) taken at inspiration, each of them having a volume size of 256x256x256 voxels. The
714
M. Urschler et al.
Fig. 1. Simulated breathing deformation. a) original data set A, b) the small and c) the large deformation.
applied transformation intends to model a simple displacement field similar to exhalation. A synthetic nonlinear transformation d(x, y, z) : R3 → R3 that simulates diaphragm and rib cage movement has been presented by the authors in [5], due to space limitations the detailed derivation is omitted here. d is defined using two parameters tvertical and tinplane . The four test data sets were synthetically transformed with a small (tvertical = 25mm, tinplane = 10mm) and a large (tvertical = 55mm, tinplane = 25mm) deformation. Fig. 1 a)-c) shows the effects of these deformations on data set A. Table 1 gives the results of these experiments by comparing the hybrid approach (”hyb”) with the landmarkbased approach (”lm”) from Section 2 and the optical flow intensity registration (”of ”) from Section 3. All comparisons are always performed only on those regions which are present in both registered data sets. The RMS of the intensity differences IN Trms before (”init ”) and after registration are calculated, as well as the difference of the resulting and the synthetic displacement fields in terms of the RMS of the whole displacement vector field DFrms and the maximum of Table 1. Simulated breathing transformation. Registration results in terms of the RMS of the intensity differences IN Trms , the RMS of the displacement field DFrms and the maximum of the displacement difference vector components DFmax .
DFmax DFrms IN Trms
Measure init of lm hyb of lm hyb of lm hyb
[HU] [HU] [HU] [HU] [mm] [mm] [mm] [mm] [mm] [mm]
Simulated Breathing 25-10 A B C D 373.85 313.70 291.08 348.11 77.045 75.084 64.972 75.529 61.888 61.891 55.351 69.595 57.719 45.187 48.769 44.441 4.4201 6.0393 4.1328 4.6852 1.2700 1.6068 1.2759 1.3829 1.6826 2.4241 1.7129 2.1525 19.926 22.976 16.925 18.366 22.712 21.981 21.586 22.219 17.488 19.770 16.489 17.909
Simulated Breathing 55-25 A B C D 539.98 468.57 438.07 513.42 123.43 106.01 102.27 116.022 149.69 137.94 105.22 178.82 111.47 103.32 101.64 108.74 9.4332 12.428 8.3835 10.039 4.3041 4.7178 3.9078 4.6794 4.7948 5.7478 5.0956 5.7385 40.826 48.667 40.868 43.429 49.008 50.426 47.969 49.375 35.090 36.737 32.985 41.322
Automatic Point Landmark Matching
715
the displacement difference vector components DFmax . Qualitative results are shown in terms of difference images of data set D in Fig. 3 a)-c). 5.2
Clinical Data
The algorithm was also evaluated on four clinical thoracic data sets (A,B,C,D), each of them consisting of two scans at different breathing states. The data sets show different problem characteristics. Data sets A and D differ by small breathing deformations. Data sets B and C differ by large breathing deformations and also show intensity variations due to diseases making them very hard to register. For the clinical data no gold standard displacement was available for comparison, therefore solely the decrease in the RMS of the intensity differences before and after registration are calculated and presented in Fig. 2. For qualitative results difference images are shown in Fig. 3 d)-e) for data sets C and D.
Fig. 2. Result chart showing the RMS of the intensity differences on the clinical data
6
Discussion and Outlook
The quantitative evaluation on the simulated data shows that the presented hybrid approach is an effective way to combine the advantages of landmark- and intensity-based registration. The RMS of the intensity differences on the small deformation data is slightly better than the individual methods, however all three methods are able to register these deformations very well. The displacement field outliers (DFmax ) are also improved while the RMS of the displacement field (DFrms ) is slightly worse due to the influence of the intensity registration. However with an absolute value of DFrms lying around 2mm this is already a very accurate result. The large deformation data is more demanding to register. Here the landmark-based approach has problems generating a dense set of correspondences. On the other hand, due to the large imposed deformation, the intensity-based registration has problems to align small vascular structures. With the hybrid approach the best of both approaches is achieved which can be seen in the decrease of IN Trms and on the qualitative results in Fig. 3 a)-c). The RMS of the displacement fields again decrease to a similar level as in the landmark registration and the displacement field outliers are reduced. The large absolute
716
M. Urschler et al.
Fig. 3. Selected Results. a)-c) Simulated Breathing results for the large deformation on data set D. a) shows difference images after optical flow, b) after landmark-based and c) after hybrid registration. d) depicts difference images of clinical data set C before and after hybrid registration, e) shows clinical data set D before and after hybrid registration. (Image contrast was enhanced to improve visibility.)
values of the displacement field outliers are due to problems at the border of the data sets. The clinical data also shows good correspondence after registration. However, registration of the vascular structures still can be improved. An important thing to consider is the proper choice of the algorithm parameters in the hybrid approach. Careful choice of parameters λ and μ are crucial. In our experiments, after appropriate pre-normalization of data and regularization terms, suitable values were determined empirically in a set of initial experiments and remained fixed during the evaluations. In our case we chose λ = 0.05 and μ = 2. For solving the Euler-Lagrange equations the linearizing Taylor approximation leads to the choice of a number of outer iterations. A trade-off of computation time and accuracy leads to more iterations on coarser resolutions compared to the finer ones. For the Gauss-Seidel stage three internal iterations were chosen.
Automatic Point Landmark Matching
717
Concerning algorithm runtime, as expected, the landmark-based approach performed fastest (around 1000s), while the intensity- and the hybrid approaches take around 1900s. The overhead of incorporating the landmark correspondences only slightly increased the runtime. Intended future work will investigate methods to speed-up the time consuming stages, e.g. by a multi-grid approach. Further, different data and displacement regularization terms will be investigated in combination with the landmark-matching constraint.
References 1. Maintz, J., Viergever, M.: A Survey of Medical Image Registration. Medical Image Analysis 2(1) (1998) 1–36 2. Crum, W.R., Hartkens, T., Hill, D.L.G.: Non-rigid image registration: theory and practice. The British Journal of Radiology 77 (2004) 140–153 3. Rohr, K.: Landmark-Based Image Analysis Using Geometric and Intensity Models. Computational Imaging and Vision. Kluwer Academic Publishers (2001) 4. Bookstein, F.: Principal Warps: Thin-Plate Splines and the Decomposition of Deformations. IEEE PAMI 11(6) (1989) 567–585 5. Urschler, M., Bauer, J., Ditt, H., Bischof, H.: SIFT and Shape Context for FeatureBased Nonlinear Registration of Thoracic CT Images. In: Proc. Computer Vision Applications to Medical Image Analysis. (2006), to appear. 6. Gee, J.C.: Probabilistic Matching of Deformed Images. PhD thesis, Univ. Pennsylvania, Philadelphia (1996) 7. Johnson, H.J., Christensen, G.E.: Consistent Landmark and Intensity-Based Image Registration. IEEE TMI 21(5) (2002) 450–461 8. Li, B., Christensen, G.E., Fill, J., Hoffman, E.A., Reinhardt, J.M.: 3D Inter-Subject Warping and Registration of Pulmonary CT Images for a Human Lung Model. In: Proc SPIE Conf on Medical Imaging. Volume 4683. (2002) 324–335 9. Fan, L., Chen, C.W.: 3D Warping and Registration from Lung Images. In: Proc SPIE Conf on Medical Imaging. Volume 3660. (1999) 459–470 10. Hellier, P., Barillot, C.: Coupling Dense and Landmark Based Approaches for Nonrigid Registration. IEEE TMI 22(2) (2003) 217–227 11. Liu, T., Shen, D., Davatzikos, C.: Deformable registration of cortical structures via hybrid volumetric and surface warping. NeuroImage 22 (2004) 1790–1801 12. Papademetris, X., Jackowski, A.P., Schultz, R.T., Staib, L.H., S., D.J.: Integrated Intensity and Point-Feature Nonrigid Registration. In: Proc MICCAI. Volume LNCS 3216, Springer (2004) 763–770 13. Modersitzki, J.: Numerical Methods for Image Registration. Oxford University Press (2004) 14. Mikolajczyk, K., Schmid, C.: Performance Evaluation of Local Descriptors. IEEE PAMI 27(10) (2005) 1615–1630 15. Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. Int J Computer Vision 60(2) (2004) 91–110 16. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE PAMI 24(4) (2002) 509–522 17. Weickert, J., Schn¨ orr, C.: A Theoretical Framework for Convex Regularizers in PDE-Based Computation of Image Motion. Int J Computer Vision 45(3) (2001) 245–264
Biomechanically Based Elastic Breast Registration Using Mass Tensor Simulation Liesbet Roose, Wouter Mollemans, Dirk Loeckx, Frederik Maes, and Paul Suetens Medical Image Computing (Radiology - ESAT/PSI), Faculties of Medicine and Engineering, University Hospital, Gasthuisberg, Herestraat 49, B-3000 Leuven, Belgium
[email protected]
Abstract. We present a new approach for the registration of breast MR images, which are acquired at different time points for observation of lesion evolution. In this registration problem, it is of utmost importance to correct only for differences in patient positioning and to preserve other diagnostically important differences between both images, resulting from anatomical and pathological changes between both acquisitions. Classical free form deformation algorithms are therefore less suited, since they allow too large local volume changes and their deformation is not biomechanically based. Instead of adding constraints or penalties to these methods in order to restrict unwanted deformations, we developed a truly biomechanically based registration method where the position of skin and muscle surface are used as the only boundary conditions. Results of our registration method show an important improvement in correspondence between the reference and the deformed floating image, without introducing physically implausible deformations and within a short computational time.
1
Introduction
Breast cancer is the most common cancer in women worldwide, accounting for 1 million new cases annually [1]. Currently, the detection and diagnosis of breast cancer primarily relies on projective X-ray mammography. In addition, contrastenhanced MRI images are acquired of women with dense breast tissue, for breasts with implants or scar tissue from prior surgery or in case of lobular cancer. Some patients undergo successive MRI scans with some months in between for followup of suspicious lesions. Analysis and comparison of such follow-up images by the radiologist during diagnosis is greatly facilitated by proper geometric registration of the images. Image registration for breast imaging is a field of particular clinical interest, and in the last decade, many valuable approaches have been published. Zuo et al. [2] were the first to present a registration algorithm for the alignment of a pair of images acquired before and after contrast administration, but their approach is restricted to rigid transformations, whereas the soft tissue structure of the breast makes a non-rigid deformation model more appropriate. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 718–725, 2006. c Springer-Verlag Berlin Heidelberg 2006
Biomechanically Based Elastic Breast Registration
719
Several non-rigid registration techniques have been proposed and implemented for the registration of pre- and post-contrast breast MRI images, including algorithms based on the optical-flow algorithm [3], on elastic alignment of detected edges in the image [4] and in particular on a B-spline deformation as introduced by Rueckert et al. [5] using mutual information [6] as voxel-based similarity measure that is robust against local contrast changes in the pre- and post-contrast images. However, when assessing lesion evolution over time from a series of coregistered follow-up images, the degrees of freedom of the non-rigid registration method have to be constrained such that the method is able to compensate for global differences in patient positioning, but at the same time preserves local image changes induced by the pathology itself (e.g. tumor growth). While some authors imposed ad-hoc regularization penalties on the deformation field, for instance aimed at restricting local volume changes [7,8], we instead propose a truly biomechanical model where (near) volume preservation and elastic behavior is automatically fulfilled. Biomechanical breast models, mostly based on Finite Element Models (FEM), have been developed and used in different applications, for instance to predict the deformation of the breast during biopsy [9], for the registration of projective X-ray mammography and volumetric MRI images [10] or for the prediction of the effect of breast augmentation [11]. In this paper, we present a method for elastic intra-subject MR breast image registration based on a biomechanical deformation model implemented as a Mass Tensor Model. We illustrate our approach with results from 3 sets of patient data, showing an important improvement in correspondence between the reference and the deformed floating image, without introducing physically implausible deformation and in short computational time. We use an implementation of the FEM-based Mass Tensor Model, introduced by Cotin et al. [12], which results in fast solution computations. Since we only want to recover the tissue changes due to different positioning of the breast in the imaging coil and different tension in the pectoralis major muscle, we choose the position of the skin and the muscle surface in the reference image as boundary constraint for the floating image. The deformation is thus only driven by the position of the segmented skin and muscle surface and not by the grey values in the rest of the image. A new equilibrium is then computed for the breast model subject to this boundary constraints and the floating image is deformed according to the deformation of the breast model.
2 2.1
Method Image Preprocessing
To initialize the position of the floating image F relative to the reference image R, a rigid registration algorithm is applied first to the two images [6]. Based on the transformed floating image, a volumetric tetrahedral mesh representation of the breast tissue is constructed, which is needed to perform the FEM computations. To this end, the breast tissue (skin, fat and fibroglandular tissue) is first segmented out of the MR image by thresholding, followed by a smoothing of
720
L. Roose et al.
Fig. 1. Left: Slice of an original MRI image, with breast tissue segmented. Right: Three dimensional surface rendering of the resulting mesh of the breast tissue.
the segmented volume and removal of islands classified as background inside the breast region (Fig. 1). However, in some slices this simple segmentation scheme still needs manual correction afterwards. The segmented volume is then converted to a tetrahedral volume mesh (Fig. 1). To assign boundary conditions to the floating image in the registration phase, skin and muscle surface need to be defined in the reference images. Therefore, after segmentation of the reference image, the skin and muscle surface in this image need to be indicated and are stored as triangular surface meshes. 2.2
Biomechanial Transformation Model
Floating image F needs to be registered to reference image R by applying a transformation φ : 3 → 3 , such that F (x) = F (φ(x)) corresponds to image R. For the transformation model φ, we use a biomechanically based finite element model. We follow the general approach proposed by Cotin et al. [12] to define internal forces at each vertex of the mesh, based on linear elasticity theory. This force Fi for vertex i at position Pi (having as original position P0i ) with the set of neighbouring points N (i) can be computed as Fi = [Kii ](P0i − Pi ) + [Kij ](P0j − Pj ) (1)
Ê
Ê
j∈N (i)
where the [Kij ] are tensors which are completely determined by the rest position and the material characteristics of the tetrahedra adjacent to vertex i. They can therefore be computed once and reused during the simulation. According to the conclusions of Tanner [13], we use a homogeneous linear elastic material model for the breast with a Poisson coefficient of 0.45 (corresponding to almost incompressibility). The general Newtonian equation of movement (with m the mass) m
d2 P dP =γ +F dt2 dt
(2)
is solved in an iterative quasi-static way: the position of a point at time t+1 is the solution of the static problem with boundary conditions given at time t. This
Biomechanically Based Elastic Breast Registration
721
makes the estimation of the damping coefficientγ unnecessary. The iterations are stopped at iteration t where the maximal displacement ||Pt−1 − Pti || over all i points drops below a threshold Dstop : max(||Pt−1 − Pti ||) < Dstop i i
2.3
(3)
Boundary Conditions
The deformation of the finite element model is driven by boundary conditions applied to the vertices on the skin surface and on the muscle-fat interface, in order to align both surfaces in the floating and the reference images. Before the first iteration these boundary conditions are defined. First the skin surface is segmented in the reference image, then the skin surface in the floating image is registered rigidly with this surface using Iterative Closest Points [14] and finally all vertices on the skin surface in the floating image are projected to the nearest point on the skin surface of the reference image. The same procedure is repeated for the muscle surface. These displaced vertices remain fixed during the iterations, while the other vertices in the mesh converge to their equilibrium position. 2.4
Deformation of the Floating Image
The transformation function φ(x) can be written as x+ u(x) with u(x) denoting the displacement vector, parameterized by the displacements P0i − Pi at each node of the mesh, as computed with the Mass Tensor simulation. If we write Ti for the tetrahedron defined by the four vertices Ti (j), j = 0, . . . , 3, then the vector displacement inside this tetrahedron Ti is defined as [12]: uTi (x) =
3 j=0
bj (x)(P0Ti (j) − PTi (j) )
(4)
where bj (x) are barycentric coordinates of the point x inside the tetrahedron Ti . This transformation is applied to the floating image in order to obtain the final registered image.
3 3.1
Results Image Acquisition
Our test dataset contains three pairs of MRI images, acquired with a 3-D FLASH sequence with TR=9 ms, TE=4.7 ms, flip angle=25◦, FOV=384 mm and axial slice orientation. The images have dimensions 384 x 384 x 64 with a voxel size of 1.04 mm x 1.04 mm x 2 mm. Each pair of images contains two dynamic MRseries of the same patient (1 pre-contrast and 7 post-contrast acquisitions), taken at different times (with intervals of 6, 5 en 17 months between both acquisitions for set 1, 2 and 3 respectively). For each pre-contrast pair, we registered the original image with the follow-up image. The segmented breast tissue is meshed with an average edge length of approximately 10 mm (± 200 vertices, ± 7500 tetrahedra).
722
3.2
L. Roose et al.
Convergence Behavior
To determine the convergence behavior of our iterative solution strategy, we repeatedly registered the images in set 1 for different values of Dstop and compared the results with a solution computed up to machine accuracy. The results are shown in Fig. 2. From the analysis of the mean distance between corresponding nodes in the meshes, it can be seen that the error is monotonically decreasing, approximately linear with log(Dstop ). The effect of this deviation of the node points on the transformed images is shown in the right graph of Fig. 2, showing the decrease in the total sum of squared differences (SSD) over the image and the number of voxels with different grey value. All simulations have been carried out on a Pentium M, 1.7 GHz, 1 Gb RAM, and the computation time is shown in the left graph of Fig. 2, showing computation times between 20 s for Dstop = 1 mm to 25 min for Dstop = 10−4 mm. To reach a solution where maximum 1% of the voxels is different from the final solution, Dstop is chosen at 0.005 mm corresponding with a computation time of approximately 90 seconds.
Fig. 2. Left: Computation time and mean error between the solution computed up to machine accuracy and solutions with different values of Dstop . Right: Difference in terms of sum of squared differences (SSD) and number of voxels with different grey values between this solution computed up to machine accuracy and solutions with different values of Dstop .
3.3
Experimental Data
To indicate the feasibility and usefulness of our method, we applied it to the three sets of patient data and compared the result of our non-rigid registration with the non-registered image, with a purely rigid registered image and with an intensity-based image registration using a spline-based deformation model [15]. Fig. 3 shows the registered images, the difference images between (registered) floating and the reference image and the deformations for one case. It can be seen that our non-rigid registration gives a better alignement than the purely rigid registration alone, for example in locating the black cyst underneath the nipple in the breast on the left of the image. The deformed raster in the last column shows that the deformation is smooth and without important volumetric changes. The
Biomechanically Based Elastic Breast Registration
723
(1)
(2)
(3)
(4)
(5) Fig. 3. Comparison of different registration techniques on a pair of breast images of the same patient, acquired with 6 months time in between. Registration is done on the segmented images, where only the breast tissue is preserved. The first row shows the reference image (with the lesion of interest indicated with the arrow), row 2, 3, 4 and 5 show respectively the floating image before registration, after rigid registration, after our non-rigid registration and after classical spline-based registration. The first column of row 2-5 shows the deformed floating image, the second column shows the difference image between floating and reference image (limited to the breast region) and the third column shows a regular raster pattern after application of the same deformation as the floating image, to illustrate the extent of deformation. Local areas of volume compression and stretching can be seen in the deformation field of the spline-based registration, whereas the deformation of our non-rigid registration is almost volume preserving.
classical spline-based algorithm gives a better alignment than our approach, but from the image of the deformation field, it can be seen that the deformation is physically not plausible for the nearly incompressible breast tissue and that the image is therefore not useful for diagnosis. Fig. 4 shows the reduction in SSD after rigid and non-rigid registration.
724
L. Roose et al.
Fig. 4. The SSD (relative to the SSD of the non-registered images) for the three patients cases, without registration, with rigid registration and with our non-rigid registration
4
Discussion and Conclusion
This paper has introduced a biomechanically based registration method for the registration of MRI images of the breast, taken at different time points. Contrary to the registration of images taken in one image sequence, we expect the two images to be aligned to contain major anatomical differences. The use of classical gray-value based free form deformation algorithms is therefore not appropriate for this type of images, because of the risk of removing diagnostically important features from the image. Instead of adding extra constraints to free form methods, we implemented a FEM-based model where elastic properties and compressibility behavior of the breast tissue are automatically incorporated in the model. Since we only want to correct for different patient positioning in the imaging coil and different tension of the pectoralis major muscle, we only use the skin and muscle surface as boundary constraints. The algorithm has been demonstrated on three sets of patient data, showing its ability to reduce the SSD with 70% on average, without introducing biomechanically non realistic deformations. Future works on this algorithm include the design of a more robust segmentation scheme and the coupling of segmentation, meshing and simulation to make our method fully automatic. Furthermore, we want to investigate the effect of changing the currently used boundary condition (connecting points to skin or muscle surface) to sliding boundary conditions (where the skin nodes of the floating image have a sliding contact with the skin surface of the reference image). Further experiments will be carried out to compare our algorithm with other non-rigid registration algorithms, with and without additional constraints.
References 1. Stewart, B., Kleihues, P.: World cancer report. Oxford University Press (2003) 2. Zuo, C.S., Jiang, A., Buff, B.L., Mahon, T.G., Wong, T.: Automatic motion correction for breast MR imaging. Radiology 198 (1996) 903–906 3. Kumar, R., Asmuth, J.C., Hanna, K., Bergen, J.: Application of 3D registration for detecing lesions in magnetic resonance scans. In: Medical Imaging: Image Processing, SPIE Press (1996)
Biomechanically Based Elastic Breast Registration
725
4. Lucht, R., Knopp, M.V., Brix, G.: Elastic matching of dynamic MR mammographic images. Magnetic Resonance in Medicine 43 (2000) 9–16 5. Rueckert, D., Sonoda, L., Hayes, C., Hill, D., Leach, M., Hawkes, D.: Nonrigid registration using free-form deformations: application to breast MR images. IEEE Trans. Medical Imaging (1999) 6. Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Suetens, P.: Multimodality image registration by maximization of mutual information. IEEE Trans. Medical Imaging 16 (1997) 187–198 7. Rohlfing, T., jr Maurer, C., Bluemke, D., Jacobs, M.: Volume-preserving nonrigid registration of MR breast images using free-form deformation with an incompressibility constraint. IEEE Trans Medical Imaging 22 (2003) 730–41 8. Loeckx, D., Maes, F., Vandermeulen, D., Suetens, P.: Nonrigid image registration using free-form deformations with a local rigidity constraint. In: Medical Image Computing and Computer-Assisted Intervention. (2004) 639–646 9. Azar, F.S., Metaxas, D.N., Schall, M.D.: Methods for modeling and predicting mechanical deformations of the breast under external pertubations. Med. Image Anal. 6 (2002) 127 10. Ruiter, N.: Registration of X-ray mammograms and MR-volumes of the female breast based on simulated mammographic deformation. PhD thesis, Universitaet Mannheim (2003) 11. Roose, L., Maerteleire, W.D., Mollemans, W., Suetens, P.: Validation of different soft tissue simulation methods for breast augmentation. In Lemke, H.U., ed.: Computer assisted radiology and surgery. Volume 1281 of ICS. (2005) 12. Cotin, S., Delingette, H., Ayache, N.: A hybrid elastic model allowing real-time cutting, deformations and force-feedback for surgery training and simulation. The Visual Computer 16 (2000) 437–452 13. Tanner, C., Degenhard, A., Schnabel, J., Smith, A., Hayes, C., Sonoda, L., Leach, M., Hose, D., Hill, D., Hawkes, D.: A method for the comparison of biomechanical breast models. Proceedings of the IEEE Workshop on Mathematical Methods in Biomedical Image Analysis (2001) 11–18 14. Besl, P.J., McKay, N.D.: A method for registration of 3-D shapes. IEEE Trans. Pat. Anal. and Mach. Intel. 14 (1992) 239–256 15. Loeckx, D.: Automated nonrigid intra-patient image registration using B-splines. PhD thesis, Katholieke Universiteit Leuven (2006)
Intensity Gradient Based Registration and Fusion of Multi-modal Images Eldad Haber1 and Jan Modersitzki2 1
Mathematics and Computer Science, Emory University, Atlanta, GA, USA
[email protected] 2 Institute of Mathematics, L¨ ubeck, Germany
[email protected]
Abstract. A particular problem in image registration arises for multimodal images taken from different imaging devices and/or modalities. Starting in 1995, mutual information has shown to be a very successful distance measure for multi-modal image registration. However, mutual information has also a number of well-known drawbacks. Its main disadvantage is that it is known to be highly non-convex and has typically many local maxima. This observation motivate us to seek a different image similarity measure which is better suited for optimization but as well capable to handle multi-modal images. In this work we investigate an alternative distance measure which is based on normalized gradients and compare its performance to Mutual Information. We call the new distance measure Normalized Gradient Fields (NGF).
1
Introduction
Image registration is one of today’s challenging medical image processing problems. The objective is to find a geometrical transformation that aligns points in one view of an object with corresponding points in another view of the same object or a similar one. An important area is the need for combining information from multiple images acquired using different modalities, sometimes called fusion. Typical examples include the fusion of computer tomography (CT) and magnetic resonance (MRI) images or of CT and positron emission tomography (PET). In the past two decades computerized image registration has played an increasingly important role in medical imaging (see, e.g., [2,10] and references therein). One of the challenges in image registration arises for multi-modal images taken from different imaging devices and/or modalities. In many applications, the relation between the gray values of multi-modal images is complex and a functional dependency is generally missing. However, for the images under consideration, the gray value patterns are typically not completely arbitrary or random. This observation motivated the usage of mutual information (MI) as a distance measure between two images cf. [4,18]. Starting in 1995, mutual information has shown to be a successful distance measure for multi-modal image registration. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 726–733, 2006. c Springer-Verlag Berlin Heidelberg 2006
Intensity Gradient Based Registration and Fusion of Multi-modal Images
727
Therefore, it is considered to be the state-of-the-art approach to multi-modal image registration. However, mutual information has a number of well-known drawbacks; cf. e.g., [14,13,12,17,20]. Firstly, mutual information is known to be highly non-convex and has typically many local maxima; see for example the discussion in [10], [20], and Section 3. Therefore, the non-convexity and hence non-linearity of the registration problem is enhanced by the usage of mutual information. Secondly, mutual information is defined via the joint density of the gray value distribution, and therefore, approximations of the density are required. These approximations are nontrivial to compute and typically involve some very sensitive smoothing parameters (e.g. a binning size or a Parzen window width, see [16]). Finally, mutual information decouples the gray value from the location information. Therefore judging the output of the registration process is difficult. These difficulties had stem a vast amount of research into mutual information registration, introducing many nuisance parameters to help and bypass at least some of the difficulties; see, e.g, [14]. These observations motivate us to seek a different image similarity measure which is capable to handle multi-modal images but better suited for optimization and interpretation. In this paper we investigate an alternative distance measure which is based on normalized gradients. As we show, the alternative approach is deterministic, simpler, easier to interpret, fast and straightforward to implement, faster to compute, and also much more suitable to optimization. The idea of using derivatives to characterize similarity between images is based on the observation that image structure can be defined by intensity changes. The idea is not new. In inverse problems arising in geophysics, previous work on joint inversion [9,21,7] discussed the use of gradients in order to solve/fuse inverse problems of different modalities. In image registration, a more general framework similar to the one previously suggested for joint inversion was given in [6] and [14]. Our approach is similar to [14] however, we have some main differences. Unlike the work in [14] we use only gradient information and avoid using mutual information. Furthermore, our computation of the gradient is less sensitive and allow us to deal with noisy images. This allows us to use simple optimization routines (e.g. Gauss Newton). The rest of the paper is organized as follows: In Section 2 we shortly lay the mathematical foundation of image registration. Section 3 presents an illustrative example showing some of the drawbacks of mutual information. In Section 4 we discuss the proposed alternative image distance measure. Finally, in Section 5 we demonstrate the effectiveness of our method.
2
The Mathematical Setting
Given a reference image R and a template image T , the goal of image registration is to find a “reasonable ” transformation, ϕ, such that the “distance ” between the reference image and a deformed template image is small. In this paper we use both affine linear and nonparametric registration. In the linear registration, we set the transformation ϕ to
728
E. Haber and J. Modersitzki
γ γ x γ ϕ(γ, x) = γ1 γ2 x1 + γ5 . 3 4 2 6
(1)
Given a distance measure D, the registration problem is to find a minimizer γ of f (γ) := D R( · ), T (ϕ(γ, · )) . (2) For nonparametric registration we set ϕ = x + u(x) and use regularization minimizing f (γ) := D R(x), T (x + u)) + αS(u). (3) where αS(u) is a regularization term such as elastic (see [10]).
3
An Illustrative Example
To emphasize the difficulty explained above, we present an illustrative example. Figure 1 shows a T1 and a T2 weighted magnetic resonance image (MRI) of a brain. Since the image modalities are different, a direct comparison of gray values is not advisable and we hence study a mutual information based approach. Figure 1c) displays our approximation to the joint density which is based on a kernel estimator, where the kernel is a compactly supported smooth function.Note that the joint density is completely unrelated to the spatial image
Fig. 1. Distance measures versus shifts; (a)Original BrainWeb [3] T1 (b)Original BrainWeb [3] T2 (c) the joint density approximation for the images, (d) the normalized gradient field dd (T, R) for the images, (e) negative mutual information versus shift, (f) normalized gradient field versus shift
Intensity Gradient Based Registration and Fusion of Multi-modal Images
729
content (though there is some interpretation [11]). We now slide the template along the horizontal axis. In the language of equation (1), we fix γ1,...,5 and obtain the transformed image by changing γ6 . Figure 1e) shows the negative mutual information versus the shift ranging from -2 to 2 pixels. This figure clearly demonstrates that mutual information is a highly nonconvex function with respect to the shift parameter. We have experimented with different interpolation methods and all have resulted in similar behavior. In particular, the curve suggests that there are many pronounced local minima which are closed in value to the global minimum. Our particular example is not by any means pathologic. Similar, and even less convex curves of Mutual Information appear also in [20] pp293 who used different interpolation. Figure 1d) displays a typical visualization of our alternative NGF distance between R and T (discussed in the next section). Note that for the alternative distance measure, image differences are related to spatial positions. Figure 1f) shows the alternative distance measure versus the shift parameter. For this particular example, it is obvious that the alternative measure is capable for multimodal registration and it is much better suited for optimization.
4
A Simple and Robust Alternative
The alternative multi-modal distance measure is based on the following simple though general interpretation of similarity: two image are considered similar, if intensity changes occur at the same locations. An image intensity change can be detected via the image gradient. Since the magnitude of the gradient is dependent upon the modality of the image, it would be unwise to base an image similarity measure on gradient magnitude. We therefore consider the normalized gradient field, i.e., the local orientation, which is purely geometric information. ∇I(x) , ∇I(x) = 0, n(I, x) := ∇I(x) (4) 0, otherwise. For two related points x in R and ϕ(x) in T or, equivalently, x in T ◦ ϕ, we look at the vectors n(R, x) and n(T ◦ ϕ, x). These two vectors form an angle θ(x). Since the gradient fields are normalized, the inner product (dot-product) of the vectors is related to the cosine of this angle, while the norm of the outer product (cross-product) is related to the sine. In order to align the two images, we can either minimize the square of the sine or, equivalently, maximize the square of the cosine. Thus we define the following distance measures 1 dc (T, R) = n(R, x) × n(T, x)2 ; Dc (T, R) = dc (T, R) dx, (5) 2 Ω 1 2 dd (T, R) = n(R, x), n(T, x) ; Dd (T, R) = − dd (T, R) dx, (6) 2 Ω Note that from an optimization point of view, the distances Dc or Dd are equivalent.
730
E. Haber and J. Modersitzki
The definition of the normalized gradient field (4) is not differentiable in areas where the image is constant and highly sensitive to small values of the gradient field. To avoid this problem we define the following regularized normalized gradient fields ∇I(x) nE (I, x) := , ∇I(x)E
∇I(x)E := ∇I(x) ∇I(x) + E 2 .
(7)
In regions where E is much larger than the gradients the maps nE (I, x) are almost zero and therefore do not have a significant effect on the measures Dc or Dd , respectively. Indeed, the choice of the parameter E in (7) answers the question, “what can be interpreted as a jump?”. As suggested in [1], we propose the following automatic choice: η E= |∇I(x)| dx, (8) V Ω where η is the estimated noise level in the image and V is the volume of the domain Ω. The measures (5) and (6) are based on local quantities and are easy to compute. Another advantage of these measures is that they are directly related to the resolution of the images. This property enables a straightforward multiresolution approach. In addition, we can also provide plots of the distance fields dc and dd , which enables a further analysis of image similarity; see, e.g., Figure 1d). Note that in particular dc = 0 everywhere if the images match perfectly. Therefore, if in some areas the function dc takes large values, we know that these areas did not register well.
5
Numerical Experiments
In our numerical experience, we have use various examples with results along the same lines. Since it is impossible to show every result we restrict ourselves to three illustrative and representative examples. In the first example we use the images in Figure 1. We take the T1 image and generate transformed versions of the image by using an affine linear transformation (1). We then use the transformed images to recover γ and the original image. The advantage of this synthetic experiment is that it is controlled, i.e., we know the exact answer to the registration problem and therefore we are able to test our algorithm under a perfectly controlled environment. We randomly choose γ and then run our code. Repeating the experiment 100 times we find that we are able to converge to the correct γ with accuracy of less than 0.25 pixels. In the second example we use the images from Viola’s Ph.D thesis [18]. We compare the results of both MI and our NGF registration. The difference between the MI registration and the NGF registration was less than 0.25 of a pixel, thus we conclude that the methods give virtually identical minima. However,
Intensity Gradient Based Registration and Fusion of Multi-modal Images
(a)
(b)
(c)
(d)
(e)
(f)
731
Fig. 2. Experiments with Viola’s example; (a) reference R, (b) template T , (c) registered T , (d) overlay of T and R (202 pixels checkerboard presentation), (e) cross product nT × nR , (f) joint density at the minimum
to obtain the minima using MI we needed to use a random search technique to probe the space which is rather slow. The global MI minima has the value of about −9.250×10−2 while the guess γ = 0 has the value of about −9.115×10−2. Thus the “landscape” of the MI function for this example is similar to the one plotted in Figure 1. In comparison, our NGF algorithm used only 5 iteration on the finest grid. The registration was achieved in a matter of seconds and no special space probing was needed to obtain the minima. The value of the NGF function at γ = 0 was −4.63 × 101 while at the minima its value was −2.16 × 102 thus our minima is much deeper compared with the MI minima. The results of our experiments are also presented in Figure 2. Another advantage of our method is the ability to quickly evaluate the registration result by looking at the absolute value of the cross-product |nT × nR |. This product has the same dimension as the images and therefore can be viewed in the same scale. If the match is perfect then the cross product should vanish and therefore any deviation from zero implies a imperfect fit. Such deviations are expected to be present due to two different imaging processes, the noise in the images and possible additional nonlinear features. Figure 2e) shows the cross product for the image matched above. It is evident that the matching is very good besides a small number of locations where we believe the image to be noisy. The previous two examples demonstrate the ability of our new algorithm to successfully perform parametric registration. Nevertheless, when the number of parameters is very large then stochastic optimization techniques stall and
732
E. Haber and J. Modersitzki
(a)
(b)
20
20
40
40
60
60
80
80
100
100
120
120 20
40
60
(c)
80
100
120
20
40
60
80
100
120
(d)
Fig. 3. Experiments with PET/CT images: (a) reference R, (b) template T , (c) MI registered T and and deformed grid, (d) NGF registered T and deformed grid
differentiable smooth distance measure functions are much more important. We therefore test our algorithm on a PET/CT registration, see Figure 3. We perform an elastic registration of CT and a PET images (transmission) of a human thorax. For this example, the differences between the MI and NGF approach are very small; cf. Figure 3. For the NGF no special effort was needed in order to register the images. For MI we needed a good starting guess to converge to a local minima.
References 1. U. Ascher, E. Haber and H. Haung, On effective methods for implicit piecewise smooth surface recovery, SISC 28 (2006), no. 1, 339–358. 2. Lisa Gottesfeld Brown, A survey of image registration techniques, ACM Computing Surveys 24 (1992), no. 4, 325–376. 3. C.A. Cocosco, V. Kollokian, R.K.-S. Kwan, and A.C. Evans, BrainWeb MR simulator, Available at http://www.bic.mni.mcgill.ca/brainweb/. 4. A. Collignon, A. Vandermeulen, P. Suetens, and G. Marchal, multi-modality medical image registration based on information theory, Kluwer Academic Publishers: Computational Imaging and Vision 3 (1995), 263–274. 5. J. E. Dennis and R. B. Schnabel, Numerical methods for unconstrained optimization and nonlinear equations, SIAM, Philadelphia, 1996. 6. M. Droske and M. Rumpf, A variational approach to non-rigid morphological registration, SIAM Appl. Math. 64 (2004), no. 2, 668–687.
Intensity Gradient Based Registration and Fusion of Multi-modal Images
733
7. L.A. Gallardo and M.A. Meju, Characterization of heterogeneous near-surface materials by joint 2d inversion of dc resistivity and seismic data, Geophys. Res. Lett. 30 (2003), no. 13, 1658–1664. 8. G. Golub, M. Heath, and G. Wahba, Generalized cross-validation as a method for choosing a good ridge parameter, Technometrics 21 (1979), 215–223. 9. E. Haber and D. Oldenburg, Joint inversion a structural approach, Inverse Problems 13 (1997), 63–67. 10. J. Modersitzki, Numerical methods for image registration, Oxford, 2004. 11. H. Park, P. H. Bland, K. K. Brock, and C. R. Meyer, Adaptive registration using local information measures, Medical Image Analysis 8 (2004), 465–473. 12. Josien PW Pluim, J. B. Antoine Maintz, and Max A. Viergever, Image registration by maximization of combined mutual information and gradient information, IEEE TMI 19 (2000), no. 8, 809–814. 13. J.P.W. Pluim, J.B.A. Maintz, and M.A. Viergever, Interpolation artefacts in mutual information based image registration, Proceedings of the SPIE 2004, Medical Imaging, 1999 (K.M. Hanson, ed.), vol. 3661, SPIE, 1999, pp. 56–65. 14. , Mutual-information-based registration of medical images: a survey, IEEE Transactions on Medical Imaging 22, 1999, 986–1004. 15. L. Rudin, S. Osher and E. Fatemi, Nonlinear total variation based noise removal algorithms, Proceedings of the eleventh annual international conference of the Center for Nonlinear Studies on Experimental mathematics : computational issues in nonlinear science, 1992, 259–268. 16. R. Silverman, Density estimation for statistics and data analysis, Chapman & Hall, 1992. 17. M. Unser and P. Th´evenaz, Stochastic sampling for computing the mutual information of two images, Proceedings of the Fifth International Workshop on Sampling Theory and Applications (SampTA’03) (Strobl, Austria), May 26-30, 2003, pp. 102–109. 18. Paul A. Viola, Alignment by maximization of mutual information, Ph.D. thesis, Massachusetts Institute of Technology, 1995. 19. G. Wahba, Spline models for observational data, SIAM, Philadelphia, 1990. 20. Terry S. Yoo, Insight into images: Principles and practice for segmentation, registration, and image analysis, AK Peters Ltd, 2004. 21. J. Zhang and F.D. Morgan, Joint seismic and electrical tomography, Proceedings of the EEGS Symposium on Applications of Geophysics to Engineering and Environmental Problems, 1996, pp. 391–396.
A Novel Approach for Image Alignment Using a Markov–Gibbs Appearance Model Ayman El-Baz1 , Asem Ali1 , Aly A. Farag1, and Georgy Gimel’farb2 1
2
Computer Vision and Image Processing Laboratory University of Louisville, Louisville, KY 40292 {elbaz, asem, farag}@cvip.Louisville.edu http://www.cvip.louisville.edu Department of Computer Science, Tamaki Campus University of Auckland, Auckland, New Zealand
[email protected].
Abstract. A new approach to align an image of a medical object with a given prototype is proposed. Visual appearance of the images, after equalizing their signals, is modelled with a new Markov-Gibbs random field with pairwise interaction model. Similarity to the prototype is measured by a Gibbs energy of signal co-occurrences in a characteristic subset of pixel pairs derived automatically from the prototype. An object is aligned by an affine transformation maximizing the similarity by using an automatic initialization followed by gradient search. Experiments confirm that our approach aligns complex objects better than popular conventional algorithms.
1
Introduction
Image registration aligns two or more images of similar objects taken at different times, from different viewpoints, and/or by different sensors. The images are geometrically transformed to ensure their close similarity. Registration is a crucial step in many applied image analysis tasks, e.g. to fuse various data sources (such as computer tomography (CT) and MRI data in medical imaging) for image fusion, change detection, or multichannel image restoration; form and classify multi-band images in remote sensing; update maps in cartography, perform automatic quality control in industrial vision, and so forth. Co-registered medical images provide more complete information about the patient, help to monitor tumor growth and verify treatment, and allow for comparing the patient’s data to anatomical atlases (For more detail about medical image registration see [1]). Most of the known registration methods fall into two main categories: featurebased and area-based techniques [2]. Feature based techniques use sparse geometric features such as points, curves, and/or surface patches, and their correspondences to compute an optimal transformation. Area-based methods such as the classical least square correlation match directly image signals to avoid feature extraction [3]. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 734–741, 2006. c Springer-Verlag Berlin Heidelberg 2006
A Novel Approach for Image Alignment
735
More powerful mutual information (MI) based image registration [4] exploits a probabilistic similarity measure that allows for more general types of signal deviations than correlation. The statistical dependency between two data sets is measured by comparing a joint empirical distribution of the corresponding signals in the two images to the joint distribution of the independent signals (e.g., see [5] for more details about MI based image registration). We consider a more general case of registering a medical object to a prototype with similar but not necessarily identical visual appearance under their relative 2D affine transformations and monotone variations of signal correspondences. The variations are suppressed by equalizing signals in the images. The co-registered equalized images are described with a characteristic subset of signal co-occurrence statistics. The description implicitly “homogenizes” the images, i.e. considers them as spatially homogeneous patterns with the same statistics. In contrast to the feature-based registration, the statistics characterize the whole object. In contrast to the conventional area-based techniques, similarities between the statistics rather than pixel-to-pixel correspondences are measured.
2 2.1
MGRF Based Image Registration Basic Notation
We denote Q = {0, . . . , Q−1}; R = [(x, y) : x = 0, . . . , X −1; y = 0, . . . , Y −1], and Rp ⊂ R a finite set of scalar image signals (e.g. gray levels), a rectangular arithmetic lattice supporting digital images g : R → Q, and its arbitrary-shaped part occupied by the prototype, respectively. A finite set N = {(ξ1 , η1 ), . . . , (ξn , ηn )} of (x, y)-coordinate offsets defines neighbors {((x + ξ, y + η), (x − ξ, y − η)) : (ξ, η) ∈ N } ∧ Rp interacting with each pixel (x, y) ∈ Rp . The set N yields a neighborhood graph on Rp to specify translation invariant pairwise interactions with n families Cξ,η of cliques cξ,η (x, y) = ((x, y), (x + ξ, y + η)). InterT action strengths are given by a vector VT = Vξ,η : (ξ, η) ∈ N of potentials T 2 Vξ,η = Vξ,η (q, q ) : (q, q ) ∈ Q depending on signal co-occurrences; here T indicates transposition.
2.2
Image Normalization
To account for monotone (order-preserving) changes of signals (e.g. due to different illumination or sensor characteristics), the prototype and object images are equalized using the cumulative empirical probability distributions of their signals on Rp . 2.3
MGRF Based Appearance Model
In line with a generic MGRF with multiple pairwise interaction [6], the Gibbs probability P (g) ∝ exp(E(g)) of an object g aligned with the prototype g ◦ on Rp is specified with the Gibbs energy E(g) = |Rp |VT F(g)
736
A. El-Baz et al.
where FT (g) is the vector of scaled empirical probability distributions of signal co-occurrences over each clique family: FT (g) = [ρξ,η FT ξ,η (g) : (ξ, η) ∈ N ] where ρξ,η =
|Cξ,η | |Rp |
is the relative size of the family and Fξ,η (g) = [fξ,η (q, q |g) : |C
(g)|
(q, q ) ∈ Q2 ]T ; here, fξ,η (q, q |g) = ξ,η;q,q are empirical probabilities of sig|Cξ,η | nal co-occurrences, and Cξ,η;q,q (g) ⊆ Cξ,η is a subfamily of the cliques cξ,η (x, y) supporting the co-occurrence (gx,y = q, gx+ξ,y+η = q ) in g. The co-occurrence distributions and the Gibbs energy for the object are determined over Rp , i.e. within the prototype boundary after an object is affinely aligned with the prototype. To account for the affine transformation, the initial image is resampled to the back-projected Rp by interpolation. The appearance model consists of the neighborhood N and the potential V to be learned from the prototype. 2.4
Learning the Potentials
The MLE of V is proportional in the first approximation to the scaled centered empirical co-occurrence distributions for the prototype [6]: 1 Vξ,η = λρξ,η Fξ,η (g ◦ ) − 2 U ; (ξ, η) ∈ N Q where U is the vector with unit components. The common scaling factor λ is also computed analytically; it is approximately equal to Q2 if Q 1 and ρξ,η ≈ 1 for all (ξ, η) ∈ N . In our case it can be set to λ = 1 because the registration uses only relative potential values and energies. 2.5
Learning the Characteristic Neighbors
To find the characteristic neighborhood set N , the relative energies Eξ,η (g ◦ ) = T ρξ,η Vξ,eta Fξ,η (g ◦ ) for the clique families, i.e. the scaled variances of the corresponding empirical co-occurrence distributions, are compared for a large number of possible candidates. Figure 1 shows a kidney prototype and its Gibbs energies Eξ,η (g ◦ ) for 5000 clique families with the inter-pixel offsets |ξ| ≤ 50; 0 ≤ η ≤ 50.
(a)
(b)
Fig. 1. Kidney image (a) and relative interaction energies (b) for the clique families in function of the offsets (η, ξ)
A Novel Approach for Image Alignment
737
To automatically select the characteristic neighbors, let us consider an empirical probability distribution of the energies as a mixture of a large “noncharacteristic” low-energy component and a considerably smaller characteristic high-energy component: P (E) = πPlo (E) + (1 − π)Phi (E). Because both the components Plo (E), Phi (E) can be of arbitrary shapes, we closely approximate them with linear combinations of positive and negative Gaussians. For both the approximation and the estimation of π, we use the efficient EM-based algorithms introduced in [6].
(a)
(b)
(c)
Fig. 2. (a) Most characteristic 76 neighbors among the 5000 candidates (a; in white), (b) the pixel-wise Gibbs energies for the prototype under the estimated neighborhood, and (c) Gibbs energies for translations of the object with respect to the prototype
The intersection of the approximate mixture components gives an energy threshold θ for selecting the characteristic neighbors: N = {(ξ, η) : Eξ,η (g ◦ ) ≥ θ} where Phi (θ) ≥ Plo (θ)π/(1 − π). The above example results in the threshold θ = 28 producing 76 characteristic neighbors shown in Fig. 2(a),(b) together with the corresponding relative pixel-wise energies ex,y (g ◦ ) over the prototype:
ex,y (g ◦ ) =
◦ ◦ Vξ,η (gx,y , gx+ξ,y+η )
(ξ,η)∈N
2.6
Appearance-Based Registration
The object g is affinely transformed to (locally) maximize its relative energy E(ga ) = VT F(ga ) under the learned appearance model [N , V]. Here, ga is the part of the object image reduced to Rp by the affine transformation
(a)
(b)
(c)
(d)
Fig. 3. Initialization (a), our (b), MI-based (c), and NMI-based (d) registration
738
A. El-Baz et al.
a = [a11 , . . . , a23 ]: x = a11 x + a12 y + a13 ; y = a21 x + a22 y + a23 . The initial transformation is a pure translation with a11 = a22 = 1; a12 = a21 = 0, ensuring the most “energetic” overlap between the object and prototype. The energy for different translations (a13 , a23 ) of the object relative to the prototype is shown in Fig. 2(c); the chosen initial position (a∗13 , a∗23 ) in Fig. 3(a) maximizes this energy. Then the gradient search for the local energy maximum closest to the initialization selects the six parameters a; Fig. 3(b) shows the final transformation aligning the prototype contour to the object.
3
Experimental Results
Due to space limitations, we focus on dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) of human kidney and low dose computed tomography (LDCT) of human lung commonly perceived as difficult for both the area- and feature-based registration. But the like results are obtained for several other types of complex objects (e.g., zebra photos, starfish photos or brain images). We compare our approach to two popular conventional techniques, namely, to the area-based registration using MI [4] and normalized MI [7]. Note that for both MI and NMI we used the implementations in ITK (Ver. 2.0). Results are shown in Fig. 3(b)–(e). To clarify why the MI- or NMI-based alignment is less accurate, Fig. 4 compares the MI / NMI and Gibbs energy values for the affine parameters that appear at successive steps of the gradient search for the maximum energy. Both the MI and NMI have many local maxima that potentially hinder the search, whereas the energy is practically unimodal in these experiments.
Fig. 4. Gibbs energy, MI, and NMI values at the successive steps of the gradient search
In the above example the object aligned with the prototype has mainly different orientation and scale. Figure 5 shows more diverse kidneys and their Markov-Gibbs appearance-based and MI-based alignment with the prototype in Fig. 1(a). Visually, the back-projection of the prototype contour onto the objects suggests the better performance of our approach. To quantitatively evaluate the
A Novel Approach for Image Alignment
739
(a)
(b)
(c) Fig. 5. Original kidneys (a) aligned with our (b) and the MI-based (c) approaches
(a)
(b)
Fig. 6. Overlap between the object masks aligned with our (a; 90.2%) and the MI-based approaches (b; 62.6%)
(a)
(b)
Fig. 7. Lung image (a) and relative interaction energies (b) for the clique families in function of the offsets (η, ξ)
740
A. El-Baz et al.
(a)
(b)
Fig. 8. (a) Most characteristic 173 neighbors among the 5000 candidates (a; in white) and the pixel-wise Gibbs energies (b) for the prototype under the estimated neighborhood
(a)
(b)
(c)
Fig. 9. Initialization (a), our (b), and MI-based (c) registration
(a)
(b)
(c) Fig. 10. Original lungs (a) aligned with our (b) and the MI-based (c) approaches
A Novel Approach for Image Alignment
(a)
741
(b)
Fig. 11. Overlap between the object masks aligned with our (a; 96.8%) and the MIbased approaches (b; 54.2%)
accuracy, masks of the co-aligned objects obtained by manual segmentation are averaged in Fig. 6. The common matching area is notably larger for our approach (90.2%) than for the MI-based registration (62.6%). Similar results obtained for the LDCT lung images are shown in Figs. 7–11: the common matching area 96.8% is for our approach vs. 54.2% for the MI-based one.
4
Conclusions
In this paper we introduced a new approach to align an image of a medical object with a given prototype whose appearance is modeled with a new Markov-Gibbs random field with pairwise interaction model. Experimental results confirm that image registration based on our Markov-Gibbs appearance model is more robust and accurate than popular conventional algorithms. Due to the reduced variations between the co-aligned objects, our approach results in more accurate average shape models that are useful, e.g. in image segmentation based on shape priors.
References 1. J. Maintz and M. Viergever, “A survey of medical image registration,” Medical Image Analysis, vol. 2, issue 1, pp. 1–36, 1998 2. B. Zitova and J. Flusser, “Image registration methods: a survey,” Image and Vision Computing, vol. 21, pp. 977–1000, 2003. 3. Pope and J. Theiler, “Automated image registration (AIR) of MTI imagery,” Proc. SPIE 5093, vol. 27, pp. 294-300, 2003. 4. P. Viola, “Alignment by maximization of mutual information,” Ph.D. dissertation, MIT, Cambridge, MA, 1995. 5. J. Pluim, J. Maintz, and M. Viergever, “Mutual-information based registration of medical images: a survey,” IEEE Trans. Medical Imaging, vol. 22, no. 8, August, 2003. 6. A. A. Farag, A. El-Baz, and G. Gimel’farb, “Precise Segmentation of Multi-modal Images,” IEEE Transactions on Image Processing, vol. 15, no. 4, pp. 952-968, April 2006. 7. C. Studholme et al., “An overlap invariant entropy measure of 3D medical image alignment,” Pattern Recognition, vol. 32, pp. 71–86, 1999.
Evaluation on Similarity Measures of a Surface-to-Image Registration Technique for Ultrasound Images Wei Shao1, Ruoyun Wu2, Keck Voon Ling1, Choon Hua Thng3, Henry Sun Sien Ho4, Christopher Wai Sam Cheng4, and Wan Sing Ng5 1
School of Electrical and Electronic Engineering, Nanyang Technological University 2 Clinical Research Unit, Tan Tock Seng Hospital, Singapore 3 Department of Oncologic Imaging, National Cancer Centre, Singapore 4 Department of Urology, Singapore General Hospital, Singapore 5 School of Mechanical and Aerospace Engineering, Nanyang Technological University
Abstract. Ultrasound is a universal guidance tool for many medical procedures, whereas it is of poor image quality and resolution. Merging high-contrast image information from other image modalities enhances the guidance capability of ultrasound. However, few registration methods work well for it. In this paper we present a surface-to-image registration technique for mono- or multimodal medical data concerning ultrasound. This approach is able to automatically register the object surface to its counterpart in image volume. Three similarity measurements are investigated in the rigid registration experiments of the pubic arch in transrectal ultrasound images. It shown that the selection of the similarity function is related to the ultrasound characteristics of the object to be registered. Keywords: Image registration, similarity measure, ultrasound, pubic arch.
1 Introduction Ultrasound (US) is widely used as an intra-operative guidance tool in many medical procedures, due to its safety, low-cost and simplicity of use. However, the major shortcoming of US imaging is its poor image quality, compared to Magnetic Resonance Imaging (MRI) and Computer Tomography (CT). Therefore, mapping the pre-operative information obtained from MRI or CT would be a great enhancement to the guidance ability of US, when a high-contrast image modality is infeasible for intra-operative uses. A number of techniques have been reported on the registration for ultrasound images. These include the fiducial-based method [1], the surface-based method [2-3] and the intensity-based method [4-7]. The fiducial-based approach usually need to implant markers onto the human body, accurate but invasive. The surface-to-surface means is simple and universal for all modalities, but it requires the extraction of corresponding surfaces or landmark points. The image-to-image method is superior in flexibility and automation because no tissue segmentation is needed. However, registration of ultrasound against other modalities has remained as a challenging task. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 742 – 749, 2006. © Springer-Verlag Berlin Heidelberg 2006
Evaluation on Similarity Measures of a Surface-to-Image Registration Technique
743
For example, the intensity-based approach is not a good choice for MRI-US registration although a few similarity measures, such as correlation ratio [8], mutual information [9], have been tried in some occasions. This is because the key to the intensity-based techniques is that there must be a comparable percentage of mutual information existing in both of the images; otherwise, the information present in MRI image but missing in ultrasound would deteriorate the similarity measure as outlier. Different from the traditional surface-to-surface and image-to-image registration patterns, the surface-to-image registration technique would be a good compromise between simplicity and flexibility. It extracts the organ surface from pre-operative images and directly utilizes the raw image information obtained intra-operatively. This approach would be especially helpful to the intra-operative ultrasound guided medical procedure when the time concern is critical. In this paper, we discussed the surface-to-image registration technique used for the pubic arch detection in transrectal ultrasound (TRUS) using the a prior information from MRI. Based on the acoustic property of the bone structure in ultrasound, we formulated three candidate similarity measures and made a primary evaluation using three sets of patient data.
2 Methods Pubic arch registration between the MRI and TRUS image spaces determines a rigid transformation, so six parameters are involved: the triplet translation vector (tx, ty, tz) and the three rotation angles (αx, αy, αz) against x, y, and z axes respectively. The pair of data to be matched is the physical coordinate (x, y, z) of the surface extracted from MRI images and the volume content (x’, y’, z’, I(x’,y’,z’)) from TRUS, where I(x’,y’,z’) denotes the ultrasound image intensity value at voxel (x’,y’,z’). Therefore the registration problem is formulated as an optimisation procedure that searches for a transformation T that maximizes the similarity measurement f, a function of T:
T = arg max f (T (x, y, z ), (x' , y ' , z ' , I (x ' , y ' , z ')))
(1)
The similarity between the geometric surface and the volumetric image was evaluated as a measure of the intensity values at the image voxels where the surface falls in or correlate. Three forms of similarity measure for the surface-to-image registration, were formulated based on different interpretation of the intensity features of pubic arch in ultrasound. The genetic algorithm (GA) [10] was employed as the search engine to maximize the formulated similarity measure, which is calculated as the fitness function in the optimization procedure, as the genetic optimization doesn’t require a proper initial guess for the transformation and the searching won’t be easily trapped by local maximums. 2.1 Similarity Between Surface Point and Voxel of High Intensity
Similarity between the geometric surface and the image content is calculated based on the fact that the image voxels on the surface should have high intensity. It is formulated as the average intensity on the surface. We refer it as “Aligned Intensity” (AI) measure in this paper.
744
W. Shao et al.
f AI (T ) =
1 N
N
∑ I ([T (x , y , z )]) i
i
(2)
i
i =1
where N is the number of points uniformly sampled on the surface. (xi , yi , zi ) is the ith point and [ (xi , yi , zi ) ] is the correspondent voxel coordinate in the image at which the intensity value is I([ (xi , yi , zi ) ]). The nearest-neighbor operator [⋅] is used to transform point coordinate to the image voxel grid. For any point falling outside the 3D image region, its contribution to the fitness function would be zero. 2.2 Similarity Between Surface Normal and Image Gradient
Similarity between the surface normal and the image gradient is derived from the fact that there is high image gradient at the boundary of the pubic arch as its intensity is much higher than its neighborhood. So the metric is formulated as the averaged image gradients along the normals of the surface [11]. We refer it as “Projective Gradient” (PG). f PG (T ) =
1 N
N
G
G
∑ G([T (x , y , z )])⋅ T (n(x , y , z )) i
i
i
i
i
(3)
i
i =1
G where G is the intensity gradient calculated from the original ultrasound image I. G n (xi , yi , z i ) is the normal vector at the i-th surface point (xi , yi , zi ) . 〈⋅〉 is the inner product operator. Similarly, the nearest-neighbor operator [⋅] is used to calculate the intensity value at any surface point in ultrasound image. 2.3 Similarity Between Surface/Shadow and Image Intensity
This similarity measure was formulated from the fact that the ultrasound can hardly penetrate the bone structure therefore the pubic arch appears as a high-intensity echogenic surface with homogenous dark in anterior shadow. So it was evaluated as averaged radial “gradient” between the voxels fallen at the surface against those at the rear, referred as “Intensity Shadow” (IS) measure in this paper.
f IS (T ) =
1 N
N
∑ i =1
⎛ ⎜ I ([T ( x i , y i , z i )]) − 1 ⎜ M ⎝
M
⎞
G
∑ I ([T ((x , y , z ) + k ⋅ c (x , y , z ))])⎟⎟⎠ i
i
i
i
i
i
(4)
k =1
where M is the user-defined depth of the posterior region (counted in voxels). G c (xi , yi , zi ) denotes the unit vector of the radial echo emitted from the ultrasound probe to the point (xi , yi , zi ) on surface at depth of zi. Since the 3D TRUS image was constructed by a series of transrectal images (in x-y plane) scanned at even intervals (in z direction), the gray value at any voxel was only determined by the reflected echo energy emit and received at the same z depth of the voxel. According to this rule, the intensity “gradient” is calculated radially, along the ultrasound propagation direction, from the transducer center to the surface point, and within the 2D image slice at this G depth. The radial vector c (xi , yi , zi ) at point (xi , yi , zi ) is represented as
Evaluation on Similarity Measures of a Surface-to-Image Registration Technique
G c ( xi , yi , zi ) =
(
(xi , yi , zi ) − (x p , y p , zi ) (xi , yi , zi ) − (x p , y p , zi ) 2
745
(5)
)
where x p , y p , zi is the coordinate of the transrectal ultrasound probe center at depth G of zi. So the z component of vector c (xi , yi , zi ) is 0 because of the coplanarity. It is noticeable that the IS measure is not strictly the same image “gradient” as the conventional concept since it is computed between the voxel at the surface and the mass of center of its posterior along the ultrasound propagation direction.
3 Experiments and Results 3.1 Data
The experiments were conducted using three sets of patient data, each with a pair of 3D MRI and TRUS images of the prostate and pubic. The MRI scans were performed in a 1.5 Tesla whole body GE MR systems (GE Inc, USA). Patients were kept in a supine position during the scan. T2-weighted fast spine-echo scanning was performed under TR/TE of ~6400.0/85.6ms, slice thickness of 3mm, and slice spacing of 0.5 mm. Imaging resolution was 256 × 256 pixels in field of view (FOV) of 12.0×12.0 mm2. During the intra-operative ultrasound imaging, patients were kept in a lithotomy position for the convenience of delivering biopsy or brachytherapy. The Aloka SSD1700 system (Medison Inc, USA) was used with a stepping unit to pull the endorectal probe. The transrectal images were acquired at 1.0 mm steps. The image resolution was 0.18×0.18 mm2. 3.2 Experimental Design
To evaluate the algorithm’s performance, we first conducted a self-registration test, i.e., registering the pubic arch surface segmented from TRUS image to the image itself. Intuitively, this surface was used not only as the input from other modalities, but also as the “ground truth” of the output. This manner established a so-called “bronze truth” [12] of the transformation, i.e., an identity transformation, for the validation purpose. The accuracy and consistency was assessed using the averaged values and standard deviations of the three translations and three rotation angles (rotation) solved by optimization in multiple rounds. In the other side, since no “ground truth” is easily available in the experiments between the MRI and TRUS image modalities, we computed the statistical deviation to evaluate the algorithm consistency, but sought the expert opinion to judge the acceptability (i.e., accuracy) of the result. For both of the experiments, a number of 450 Points (N=450) were sampled uniformly on the pubic surface according to their parametric coordinates on surface. The GA parameters in the registration algorithm was set to use crowding algorithm, linear scaling, tournament selection, 300 populations and 100 generations, 95% crossover rate and 1% mutation rate. The six transformation parameters, calculated in the local coordinate system of the pubic surface, were coded in the chromosomes. The
746
W. Shao et al.
search range of the translation was confined to the central half of the entire TRUS image volume, and the variation of angles were within –10 to 10 degree, according to the fact that the scanning orientation and manner was similar between the two modalities (both of them are transverse scan from the superior to the inferior). The rear width M was set to be 40, which means a thickness of around 7.2~10.2mm. The number of repeated experiments performed on each set of patient data is 15. 3.3 Results
The delineation of the pubic arch surface in the 3D MRI images, and its registration to the corresponding 3D TRUS image was demonstrated in Fig. 1.
(a)
(b)
(c)
Fig. 1. (a) Construction of pubic arch surface in MRI image stack by parallel contouring. (b) The relationship of the pubic arch surface with respect to the 3D TRUS image before the registration. (c) The pubic arch surface after the registration.
Table 1 illustrated the quantitative evaluation of the self-registration test. Statistics in forms of mean and standard deviation was calculated over the 15 repeated experiments of the three sets of patient data. Accuracy of the registration was indicated by the mean value of the translation and rotation vectors, while the algorithm consistency was revealed by the averaged standard deviation of those parameters. The data in Table 1 shown that the “Intensity Shadow” (IS) measure had the best performance among the three similarity matrices. To have a better understanding of the absolute error magnitude, we summarized the errors in separate components to a normalized form, the absolute displacement ||t|| and the rotation angle ||α|| against an arbitrary rotation axis. The averaged ||t|| was found to be 2.04±0.68mm, 3.31±1.15mm and 1.87±0.54mm for the AI, PG and IS measure, respectively. Correspondingly, the averaged ||α|| was 3.22±1.63°, 4.85±1.92° and 2.55±1.13°, respectively. It is obviously that the IS measure obtained the best result among the three measures for it has the highest accuracy and good consistency in average. Comparatively, the “Projective Gradient” (PG) measure yielded the worse accuracy and consistency. The “Aligned Intensity” (AI) measure delivered the intermediate accuracy, but it possessed a consistency comparable to that of the IS measure. For the cross-modal image registration, due to the limitation of our image collection protocols, we were unable to implant fiducial markers in the patients to establish a “golden standard”. Neither do we have a convenient image navigation system that
Evaluation on Similarity Measures of a Surface-to-Image Registration Technique
747
allows us to do an accurate manual matching at current stage, nor the “bronze standard” used for self-registration test is feasible because the surface segmented from different image sources varies dramatically. So the expert was invited to assess the failure rate, which is regarded as a qualitative evaluation of the registration accuracy, of the 45 experiments conducted for each similarity measure. Table 2 shown the failure rate of each measure, as well as a quantitative evaluation of the corresponding consistency, the deviation. We can see that a same conclusion can be draw that the proposed IS measure outperformed the other two candidates. Table 1. Comparison of the averaged self-registration error with “aligned intensity” (AI), “projective gradient” (PG) and “intensity shadow” (IS) using the three sets of TRUS data of patient f AI PG IS
tx ± δ tx -0.47±0.74 0.15±1.26 0.02±0.59
Translation (mm) ty ± δ ty tz ± δ tz 0.64±0.42 -1.41±0.70 1.22±1.00 -1.26±1.55 0.36±0.54 -0.37±0.88
αx ± δ αx -0.24±1.07 2.18±2.05 -0.11±0.97
Rotation (°) αy ± δ αy -1.30±1.93 -0.86±2.58 -0.58±1.41
αz ± δ αz 0.04±1.46 0.73±2.90 -0.75±1.17
Table 2. Comparison of the averaged cross-registration error with “aligned intensity” (AI), “projective gradient” (PG) and “intensity shadow” (IS) using the three set of TRUS data
f AI PG IS
Failure rate 57.8% 84.4% 0%
Consistency in translation (mm) ± δ tx ± δ ty ± δ tz ±0.84 ±2.19 ±0.48
±0.95 ±3.33 ±0.65
±0.97 ±3.63 ±1.12
Consistency in rotation (°) ± δ αy ± δ αz
± δ αx ±0.72 ±3.53 ±0.70
±1.19 ±3.02 ±1.66
±1.93 ±4.43 ±1.24
4 Discussion In the cross-modal registration test, we observed that the registration results using PG measure may lead to a complete mismatch and so does sometimes for the AI measure, while IS measure nearly can always locate the pubic arch around the true location (judge by human eyes). However, in [11] we have verified that the PG measure was quite successful for the surface-to-image registration of the prostate. This gave us an idea that the similarity measure behaves differently when it applied to different organs. To investigate the behaviors of the three similarity measures for pubic arch registration, we demonstrated the solution space of the three measures as a fitness map of the transformation parameters with the self-registration as the basis. Accordingly, a range of variations to the standard identity transformation were applied. Fig. 2 (a)-(c) demonstrated the fitness map over the transformation parameter tx-ty, tx-tz, ty-tz, αx-αy, αx-αz, and αy-αz, with other parameters confined to 0. The fitness maps of fIS shown as a sharp unimodal distribution: higher fitness values concentrated around the ideal solution (the image center), indicating even if the genetic optimization may converge to a local optimum other than the global one, the
748
W. Shao et al.
result won’t be far from the truth. fAI is fairly smooth but the unimodality is not sharp enough to guarantee the accuracy. fPG is far from unimodality because a lot of local maxima present all over the space. Obviously the function with sharp unimodal tendency is the best choice for the global optimum searching. Hence we can conclude that the GA-optimized registration using the IS measure as the fitness function, will be more reliable to find the transformation solution close to the truth.
(a)
(b)
(c) Fig. 2. Comparison of the solution space of AI, PG and IS measure. The fitness map was calculated over tx-ty, tx-tz, ty-tz, αx-αy, αx-αz, αy-αz with other parameters to be 0. From left to right: f(tx, ty, 0, 0, 0, 0), f(tx, 0, tz, 0, 0, 0), f(0, ty, tz, 0, 0, 0), f(0, 0, 0, αx, αy, 0), f(0, 0, 0, αx, 0, αz), f(0, 0, 0, 0, αy, αz). (a) Solution space of fAI. (b) Solution space of fPG. (c) Solution space of fIS.
5 Conclusion In this paper, we have presented a surface-to-image registration framework for ultrasound image registration. The use of the pre-operative surface and the intraoperative image is a good balance between human intervention and automation, ensuring not only accuracy but also real-time speed. Three kinds of similarity measures were investigated for the pubic arch registration in transrectal ultrasound image. Our experimental results shown that the “Intensity Shadow” (IS) measure is the best candidate for the pubic arch registration in TRUS image, as this measure best revealed the bone property in echo images. So we can conclude that it is important to choose a proper similarity measure according to the ultrasound characteristics of the object to be registered for the success of the registration. This technique would be helpful to determine the pubic arch interference (PAI) [13] for the prostate biopsy and brathytherapy application. And we believe it is also extendable to other bony structure registration if they manifest the same acoustic properties.
Evaluation on Similarity Measures of a Surface-to-Image Registration Technique
749
References 1. J. Trobaugh, W. Richard, K. Smith, and R. Bucholz, Frameless stereotatic ultrasonography: methods and applications, Computerized Medical Imaging and Graphics, 18(4): 235-246, 1994 2. C.A. Pelizzari, G.T.Y. Chen, D.R. Spelbring, R.R. Weichselbaum and C.T. Chen. Accu rate three-dimensional registration of CT, PET, and/or MR images of the brain, J Comput Assist Tomog, 13(1): 20-26, 1989 3. P. J. Besl and N.D. McKay, A method for registration of 3-D shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(2): 239-256, 1992 4. S. Periaswamy and H. Farid: Elastic registration in the presence of intensity variations. IEEE Transactions on Medical Imaging, 22(7): 865-874, 2003 5. R.N. Rohling, A.H. Gee, and L. Berman. Automatic registration of 3-D ultrasound images. Ultrasound Med. Biol, 24: 841–854, 1998 6. C.R. Meyer, J.L. Boes, B. Kim, P.H. Bland, G.L. Lecarpentier, J.B. Fowlkes, M.A. Roubidoux, and P.L. Carson. Semiautomatic registration of volumetric ultrasound scans. Ultrasound Med Biol, 25: 339–347, 1999 7. R. Shekhar and V. Zagrodsky. Mutual information-based rigid and nonrigid registration of ultrasound volumes. IEEE Transactions on Medical Imaging, 21: 9–22, 2002 8. A. Roche, X. Pennec, M. Rudolph, D.P. Auer, G. Malandain, S. Ourselin, L.M. Auer, and N. Ayache. Generalized correlation ratio for rigid registration of 3-D ultrasound with MR images. MICCAI 2000; LNCS 1935: 567-577, 2000 9. X. Huang, N.A. Hill, J. Ren,G. Guiraudon, D.R. Boughner, and T.M. Peters. Dynamic 3D ultrasound and MR image registration of the beating heart. MICCAI 2005, LNCS 3750: 171–178, 2005 10. M. Mitchell, An introduction to genetic algorithm, Cambridge, Mass. MIT press, 1996 11. R. Wu, K.V. Ling, W. Shao, and W.S. Ng. Registration of organ surface with intraoperative 3D ultrasound image using genetic algorithm. MICCAI 2003, LNCS 2878: 383390, 2003 12. P. Jannin, J.M. Fitzpatrick, D.J. Hawkes, X. Pennec, R. Shahidi, and M.W. Vannier. Validation of medical image processing in image-guided therapy. IEEE Transactions on Medical Imaging, 21(12): 1445-1449, 2002 13. S.D. Pathak, P.D. Grimm, V. Chalana, and Y. Kim, Pubic arch detection in transrectal ultrasound guided prostate cancer therapy, IEEE Transaction on Medical Imaging, 17(5): 762-771, 1998
Backward-Warping Ultrasound Reconstruction for Improving Diagnostic Value and Registration Wolfgang Wein1 , Fabian Pache1 , Barbara R¨ oper2 , and Nassir Navab1 1
Chair for Computer Aided Medical Procedures (CAMP), TU Munich Boltzmannstr. 3, 85748 Garching, Germany {wein, pache, navab}@cs.tum.edu 2 Clinic and Policlinic of Radiation Oncology, Klinikum Rechts der Isar, TU Munich, Germany
[email protected]
Abstract. Freehand 3D ultrasound systems acquire sets of B-Mode ultrasound images tagged with position information obtained by a tracking device. For both further processing and clinical use of these ultrasound slice images scattered in space, it is often required to reconstruct them into 3D-rectilinear grid arrays. We propose new efficient methods for this so-called ultrasound spatial compounding using a backward-warping paradigm. They allow to establish 3D-volumes from any scattered freehand ultrasound data with superior quality / speed properties with respect to existing methods. In addition, arbitrary MPR slices can be reconstructed directly from the freehand ultrasound slice set, without the need of an extra volumetric reconstruction step. We qualitatively assess the reconstruction quality and quantitatively compare our compounding method to other algorithms using ultrasound data of the neck and liver. The usefulness of direct MPR reconstruction for multimodal image registration is demonstrated as well.
1
Introduction
As a cost effective and non-invasive method, two-dimensional ultrasound is the most widely used medical imaging modality. As it depicts only two-dimensional planar images from the anatomy, a lot of research has been carried out to transform it to a three-dimensional modality [1]. Nowadays there are a number of 3D-ultrasound systems available, using either a mechanical wobbling setup or native 2D matrix arrays. They establish a fan of ultrasound planes, which in turn can be scan-line converted to yield a cartesian 3D volume. These volumes are however very limited in their size. Freehand ultrasound imaging uses mainly position sensing to record simultaneously the 2D ultrasound images and their position and orientation in space. This allows to establish the spatial relation from arbitrary movement of the ultrasound transducer, potentially covering a much larger area of the human anatomy. In order to precisely know the transformation in space of each image plane based on the measurement of e.g. a magnetic tracking sensor or optical tracking target attached to the ultrasound probe, a R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 750–757, 2006. c Springer-Verlag Berlin Heidelberg 2006
Backward-Warping Ultrasound Reconstruction
751
careful calibration is necessary [2]. Using the three-dimensional characteristics, in particular any-plane views, has many medical advantages, including – independency from examiner and, to some extent, used probe positions – freedom to display pathological process in any angle, e.g. along and perpendicular to its main axis for visualizing its full extend, or planes that focus on relations to relevant neighbouring normal tissue structures – possiblity to visualize planes parallel to the skin, that cannot directly be derived from diagnostic sweeps in B-mode – upvalue ultrasonography into a comparable line with other sectional imaging modalities that allow for free choice of plane at acquisition (e.g. MRT) or reconstruction (e.g. multi-slice CT) To exploit them, a reformatting either into a cartesian volume or a plane arbitrarily located in space is necessary. This process is denoted Spatial Compounding, and there are two distinct approaches to it, as already pointed out by [3]. A footprint of each of the B-mode images scattered in space can be created in the initially empty 3D volume. If an additional volume channel is used, an averaging can be achieved where several ultrasound planes intersect the same voxels, otherwise often the maximum is used as final intensity [4]. This so-called forward-warping is computationally efficient, while it has the potential to cause gaps in the reconstruction. It can also be used to directly create Multi-Planar Reconstructions (MPRs) by assuming a constant elevational thickness of each ultrasound image [5]. For this purpose, a polygon depicting the intersection of each ultrasound image with the desired plane is drawn onto the screen with hardware-accelerated texture-mapping. A backward-warping strategy would traverse the target plane or volume, for each grid point identify the relevant original ultrasound information and accumulate the voxel intensity using e.g. distance-weighted interpolation or other merging schemes. We will present an algorithm implementing this approach efficiently, despite the obviously high computational effort. A simplified approach is taken in [6], where a continuous probe motion without any ultrasound plane intersections is assumed, henceforth each voxel intensity is weighted from the two neighboring ultrasound slices, using the probe trajectory rather than the perpendicular projections.
2
Methods
For every discrete position xi ∈ V in the reconstruction volume V , we need to compute a set of tuples Ai = {(d, y)} where d is the distance of an original ultrasound data point to xi and y its intensity value. ∀xi ∈ V : Ai = {(d, y)|d < D; d = p − xi } p = Hj (u, v, 0, 1) ; T
y = Yj (u, v)
(1) (2)
Here, D is the maximum distance at which points should be taken into account for accumulation of the voxel intensity. The homogenous 4x4 transformation
752
W. Wein et al.
matrix Hj for a particular ultrasound slice j maps a 2D point (u, v)T of the slice plane into a point p in 3D. The corresponding ultrasound image intensity is denoted Yj (u, v). The main effort is now to determine the set Ai for each voxel from the whole ultrasound information available. However, we can restrict ourselves to points p originating from slices whose perpendicular distance to xi is smaller than D: Si = {j|di,j < D; di,j = (0, 0, 1, 0)Hj−1 xi } 2.1
(3)
Fast Slice Selection
In the following we devise an efficient means to successively compute Si for all voxels, using the following facts: – We can traverse the volume in a way such that the distance of two successive voxels is always xi+1 − xi ≤ k, where k is the maximum voxel spacing. Figure 1 illustrates a simple possible scheme. – If the distance of ultrasound slice j to xi is di,j , then the distance of the same slice to voxel xi+1 can not be smaller than di,j − k. – Hence this mentioned slice j can only be required for an voxel index i + di,j /k or later. We use a rotation queue with k/DV elements (DV being the volume diagonal), which is equivalent to the maximum distance of a slice contributing to the reconstruction, from a particular voxel. Each element contains a set of slice indices, at the beginning all slices are in the head element of the queue. For every voxel xi , all slices in the queue head are removed, their distance d is computed and they are reinserted into the rotation queue corresponding to that distance. If d < D, then the slice is added to Si and considered for accumulation of the voxel intensities. For the next voxel xi+1 , the rotation queue head is advanced.
Fig. 1. Volume traversal scheme for advancing only one voxel at a time
This will allow us to implement efficient backward-warping compounding methods to reconstruct three-dimensional volumes of ultrasound information. In order to create online Multi-Planar Reconstructions (MPRs) directly from
Backward-Warping Ultrasound Reconstruction
753
the original data, we will just consider a reconstruction volume with a single slice, arbitrarily located in space. 2.2
Intensity Accumulation
For a voxel xi , all pixels on each slice ∈ Si that satisfy d < D are added to the set of distance-intensity tuples Ai defined in equation 1. For a given set A = {(di , yi )}, the reconstructed voxel intensity can be any weighting or selection function f (A). We considered the following functions in our work: Inverse Distance Weighting. Originally defined in [7], it assures that the reconstructed intensity approximates the original data values for d → 0 (μ > 1 being a smoothness parameter): f (A) =
n
d−μ yi n i −μ j=1 dj i=1
(4)
Gaussian Weighting. A 3D Gaussian kernel with size σ can be used to weight the data values as well: n −d2i /σ2 i=1 yi e f (A) = n −d2 /σ2 (5) i i=1 e Nearest Sample. Here, the data value closest to the considered voxel is directly accepted as reconstructed intensity: f (A) = yi |di = min{di }
(6)
Weighted Median. Using the median has the advantage that one of the original intensities is chosen, which would be the centermost one from the sorted intensities [yi ]. In order to incorporate the distances, we ”stretch” the sorted list with their respective inverse linear weightings 1 dk ∀k ∈ [1 . . . n] : yk+1 ≥ yk , wk = 1 − D n i−1 i wk f (A) = yi wk ≤ k ≤ wk 2 k=1
3
(7)
k=1
Results
3.1
Computation Time
We implemented a forward-warping compounding algorithm for comparison purposes, which averages all ultrasound pixel hits onto reconstruction volume voxels, as described in [4]. The following table compares the impact of volume and slice resolution on both forward and backward compounding. 1
Linear mapping [0, D] → [1, 0], pixel beyond distance D are disregarded and therefore have weight 0.
754
W. Wein et al.
(a) MPR slice
(b) backward weighted (c) forward with median (MSE 54) smoothing (MSE 59)
(d) online MPR, back- (e) backward nearest (f) backward ward w. median sample (MSE 55) (MSE 57)
gauss
Fig. 2. Longitudinal MPR (a) from an axial freehand sweep of a right neck created using backward compounding with weighted median. Anatomical details, from superficial to deep structures: skin, subcutaneous tissue, platysma,sternocleidomastoid muscle, lymph nodes (left: normal, center+right: malignant), internal jugular vein with physiological pulsation, marginal section of carotid artery with arteriosclerotic plaque. Enlarged details of images interpolated from compounded volume (b,c,e,f) and online MPR (d).
forward backward multiple backward single DV DS = 256 DS = 454 DS = 256 DS = 454 DS = 256 DS = 454 16 517s 1687s 226s 401s 119s 295s 134 538s 1915s 712s 942s 442s 510s DV denotes the number of million voxels of the reconstruction volume, DS is the width and height of the input ultrasound slices. The times were taken using a set of 1024 slices, on an AMD64 3200+ with 1GB RAM. While forward compounding is largely unaffected by the number of voxels, the amount of pixels to be processed linearly affect the computation time. Backward compounding reacts to changes in both input and target data, however the increase depending on the size of the slices can be largely eliminated if only a single pixel of each slice is required for each voxel as in a nearest neighbor accumulation. 3.2
Qualitative and Quantitative Comparison
Figure 2(a) shows one reslice approximately orthogonal to the original slices. 200 slices were processed using different accumulation schemes. Mean-Squared-Error
Backward-Warping Ultrasound Reconstruction
(a) left sequence (b) right sequence (c) combination (SMD 667) (SMD 1007) (SMD 476)
755
(d) reference
Fig. 3. The MPR Viewport was frozen in place at the spatial location of the original slice (d). The individual contributions of the right and left sequence are (a) and (b). The SMD (Squared Mean Differences) of the reconstruction drops significantly when both sequences are combined in (c).
(MSE) values were calculated according to the leave-one-out strategy for the whole reconstruction volumes, as described in [6]. Each method has its own merits for specific applications. The more homogenous appearance of the gaussian and smoothing kernels are favourable for volume rendering and segmentation. On the other hand, methods that retain original intensities, median and nearest sample produce more diagnostically relevant images than methods recomputing the values. Nearest Neighbor is the fastest but also the most unforgiving on noisy datasources and jittery tracking information. The online MPR reconstruction (fig. 2(d)) provides the sharpest and detailed images, as one data resampling step is skipped. 3.3
Interleaved 3D-Ultrasound Data
Using the online backward-warping MPR reconstruction it is possible to fuse multiple freeehand ultrasound sweeps regardless of their relative spatial positions. The samples presented here were created using three sequences. A selected slice of the center sequence was recreated using data from sequences to the right and left, see Figure 3. Reconstruction using backward compounding with weighted median accumulation of one 256 × 256 slice from sweeps with a total of over 1000 slices takes about 1 second on an AMD64 3200+ using high quality settings. 3.4
Improving Registration
For multimodal registration of freehand ultrasound images with a CT scan, we had developed automatic image-based registration techniques in previous work [8]. There, a set of axial images from a continuous caudo-cranial sweep along the neck is selected and used for registration. Using our compounding methods, arbitrary planes of ultrasound information can be considered in addition. Figure 4 shows two original transversal images from a freehand ultrasound sequence along the jugular vessels, as well as two additional MPR-reconstruced slice images.
756
W. Wein et al.
Fig. 4. Examples for spatially related ultrasound images from the neck: Original axial slices (a+b) and longitudinal MPRs (c+d) using our reconstruction technique
One could argue that the examiner should rotate the ultrasound probe after the continuous caudo-cranial motion to acquire longitudinal images supporting the registration. This however will introduce significant deformation errors due to tissue compression distributed differently on the patient’s skin, as well as motion induced by the blood pulsating through the vessels. For a rigid registration, it is hence preferred to use additional planes derived directly from the data of the original continuous probe motion. This is done using our MPR reconstruction method, which allows in addition to create planes of information parallel to the skin surface within the body, which cannot be acquired by ultrasound itself. The robustness of automatic rigid CT-Ultrasound registration on such a sequence improved significantly when oblique reconstruction planes were considered for registration. The standard deviation of the Target Registration Error on a lymph node decreased from 3.2mm to 1.5mm for a random displacement study for the sequence depicted in figure 4, when two MPRs where considered in addition to 5 slices from the original sweep (only two of which are shown in the figure for better spatial impression).
4
Conclusion
We have developed new methods for spatial compounding of freehand ultrasound data, using a backward-warping approach which collects the scattered image information for each voxel in an efficient manner. They allow to perform reconstructions with superior quality and smaller computation time compared to forward-projection techniques known from literature, while a choice of smoothness and continuity versus retaining original image characteristics can be made using different accumulation functions. These algorithms can also be used to compute Multi-Planar Reconstructions (MPRs) in real-time from the original data. This further increases the image quality, as an extra interpolation step is saved that would be necessary when rendering MPRs from previously compounded 3D-volumes. As a result, more detailed diagnostic information can be
Backward-Warping Ultrasound Reconstruction
757
gathered and visualized not only for the person performing the ultrasound examination, but also for demonstration to collegues at a later point of time without the presence of the patient necessary. Pathological changes in tissue texture and their relation to anatomical landmarks can be demonstrated in an optimized plane without haste, which gives the possibility to combine the advantages of ultrasonography (high spatial resolution and tissue contrast depending on the frequency of the ultrasound device) with the advantages of sectional imaging in any plane. Furthermore, multimodal image-registration can be improved by adding oblique ultrasound information in addition to original image data. Finally, the online MPR algorithm can yield high-quality real-time visualization of oblique slices for freehand ultrasound systems.
References 1. Fenster, A., Downey, D.B., Cardinal, H.N.: Three-dimensional ultrasound imaging. Physics in Medicine and Biology 46(5) (2001) R67–R99 2. Mercier, L., Langø, T., Lindseth, F., Collins, D.: A review of calibration techniques for freehand 3-D ultrasound systems. Ultrasound in Med. & Biol. 31 (2005) 449–471 3. Barry, C., Allott, C., John, N., Mellor, P., Arundel, P., Thomson, D., Waterton, J.: Three-dimensional freehand ultrasound: Image reconstruction and volume analysis. Ultrasound in Med. & Biol. 23 (1997) 1209–1224 4. Rohling, R., Gee, A., Berman, L.: Automatic registration of 3-D ultrasound images. Technical report, Cambridge University Engineering Department (1997) 5. Prager, R., Gee, A., Berman, L.: Stradx: Real-time acquisition and visualization of freehand three-dimensional ultrasound. Medical Image Analysis 3 (1999) 129–140 6. Coupe, P., Hellier, P., Azzabou, N., Barillot, C.: 3d freehand ultrasound reconstruction based on probe trajectory. In: MICCAI Proceedings. (2005) 597–604 7. Shepard, D.: A two-dimensional interpolation function for irregularly spaced data. In: Proc. 23 Nat. Conf. ACM. (1968) 517–524 8. Wein, W., R¨ oper, B., Navab, N.: Automatic registration and fusion of ultrasound with CT for radiotherapy. In: MICCAI 2005 Proceedings. Lecture Notes in Computer Science, Springer (2005)
Integrated Four Dimensional Registration and Segmentation of Dynamic Renal MR Images Ting Song1, Vivian S. Lee2, Henry Rusinek2, Samson Wong2, and Andrew F. Laine1 1
Department of Biomedical Engineering , Columbia University, New York, NY, U.S.A. {ts2060, laine}@columbia.edu 2 Department of Radiology, New York University Medical Center, New York, NY, U.S.A.
[email protected],
[email protected],
[email protected]
Abstract. In this paper a novel approach for the registration and segmentation of dynamic contrast enhanced renal MR images is presented. This integrated method is motivated by the observation of the reciprocity between registration and segmentation in 4D time-series images. Fully automated Fourier-based registration with sub-voxel accuracy and semi-automated time-series segmentation were intertwined to improve the accuracy in a multi-step fashion. We have tested our algorithm on several real patient data sets. Clinical validation showed remarkable and consistent agreement between the proposed method and manual segmentation by experts.
1 Introduction Four dimensional dynamic contrast enhanced renal magnetic resonance (MR) imaging is increasely used to assess renal functions. This noninvasive and safe procedure is based on renal T1-weighted imaging acquired during 5-10 minutes after intravenous injection of a low dose of gadopentetate dimeglumine. For each kidney, the signal intensity can be measured in intrarenal compartments (cortex, medulla, and collecting system) and combined with arterial input function, used to compute the renal blood flow, blood volume, single kidney glomerular filtration rate (GFR) and other functional parameters in vivo [1]. Dynamic MR renography has broad clinical applications, but it requires an extensive image analysis that is complicated by respiratory motion. Since each examination yields at least 10-20 serial 3D images of the abdomen, manual registration and segmentation are prohibitively labor-intensive. Therefore, automated and semi-automated image registration and segmentation techniques to analyze dynamic MR renography are of great clinical interest. 1.1 Kidney Segmentation The challenging part for dynamic contrast enhanced image segmentation is that when contrast agent wash-in and wash-out ocurrs, image intensity values change rapidly as the time series evolves. Of course poor kidney function or stenotic vasculaturemay prevent the uptake of contrast agent, resulting in disjoined bright regions. Accurate and continuous boundary delineation is not always feasible. Moreover, the contrast agent can also wash into neighbor tissues, such as the spleen and liver. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 758 – 765, 2006. © Springer-Verlag Berlin Heidelberg 2006
Integrated Four Dimensional Registration and Segmentation
759
Basically, kidney segmentation techniques can be divided in two basic categories: spatial or temporal. In the spatial approach segmentation is performed separately at each time point [2, 3]. Temporal or vector segmentation considers each voxel’s intensity time course as a vector and classifies the tissues according their different features and behaviors occurring in the temporal domain [4-7]. Because the passage of contrast needs to be observed for several minutes over multiple breath-holds, dynamic imaging is affected by respiratory motion. Clearly the accuracy of kidney segmentation will strongly depend on robust registration over time. 1.2 Kidney Registration There has been limited work related to the registration of dynamic MRI data applicable to the abdominal region. These methods include mutual information (MI) [5], contour and template matching based methods [7-10], and phase correlation [11]. Prior work was restricted to either in-plane 2D motion or 3D translation only. The proposed automated method is more general and it corrects for both rotation and translation motion in 3D. Without kidney contour segmentation, time series image registration is challeng-ing. Therefore, for dynamic perfusion images, segmentation and registration presents the well-known dilemma of “chicken and egg”. Next we show that when the two methods are integrated and implemented alternatively, the overall performance is improved.
2 Methodology With this synergy in mind, a computer-aided integrated method was developed to cross thread image registration and segmentation processing of dynamic 3D MR renography. The flowchart of our method is shown in Fig. 1. The diagram shows the specific modules of interlacing registration and segmentation procedures used to improve overall performance of the integrated system. Similar ideas can be found in [7], where rough registration, segmentation, and fine registration were performed respectively. 2.1 Preprocessing and Rough Registration (Steps I-IV) Given a 4D MR renography data set, after cropping kidney out of the whole body images, 3D anisotropic diffusion with time-varying gradient threshold was applied to suppress noise as well as preserve image features. Then, a 3D over-complete dyadic wavelet expansion was applied and the modulus of wavelet coefficients from the detail channel at the 2nd level served as robust edge information. These representtations were fed into the next step, rough 4D registration. The rough 4D registration utilized a Fourier-based method to provide a robust estimation with voxel-wise accuracy for translation and rotation between each frame. Rotation and translation processing were separated by the Fourier Transform. The rotation between each frame was recovered by minimizing following energy function
E = ∫∫∫ ( F[ g ]( wx , wy , wz ) − F[ Rf ]( wx , wy , wz ) ) dwx dwy dwz , 2
(1)
760
T. Song et al.
Input cropped 4D Data
3D Anisotropic Diffusion (Step I)
3D Wavelet Edge Detection (Step II)
4D Fourier Based Registration (Step III)
4D Subvoxel Registration (Step IV)
4D Segmentation (Step V)
Refined Segmentation for Whole Kidney (Step VI)
4D Refined Registration and Segmentation for intrarenal structures (Step VII)
Fig. 1. Flowchart of proposed integrated algorithm showing intertwining of processing modules
and the optimum rotation axis and rotation angle was recovered by
(θ ,φ ,ψ ) = arg min E .
(2)
θ ,φ ,ψ
The minimization problem in equation (1) can be efficiently solved by the QuasiNewton Method [12]. This was the first time that 3D rotation issue was addressed in 4D MR renography image processing techniques. Next, the translation vector was estimated by a phase-correlation technique corr =
F[ g ]( wx , wy , wz )F * [ Rf ]( wx , wy , wz ) | F[ g ]( wx , wy , wz ) || F[ Rf ]( wx , wy , wz ) |
=e
j ⎡⎣ wx wy wz ⎤⎦ i t
,
(3)
where R stands for the rotation matrix and F stands for 3D Fourier Transform. To improve the registration accuracy, a 3D subvoxel registration method, extended from the work of Stone, et al [13] to three-dimensional gray-scale images in [14], was applied. By using phase properties of the Fourier transform, estimation for sub-voxel registration can be modeled as a classical linear fitting problem of the following system:
2π T ⎡ wx wy wz ⎤⎦ [ x0 y0 z0 ] = ⎡⎣ phase( F / G ) ⎤⎦ , N ⎣
(4)
Integrated Four Dimensional Registration and Segmentation
761
which can be solved by pseudo inverse methods or singular value decomposition for the subvoxel translation vector [ x0 , y0 , z0 ] . For details of the preprocessing and T
rough registration steps, please refer to our previous work [14]. 2.2 4D Time-Series Segmentation (Step V)
The temporal pattern of contrast uptake is a strong feature to distinguish kidney tissues from surrounding tissues as well as to differentiate different intrarenal tissues. Due to differences invasculature, filtration, and reabsorption, different tissue of interest enhance at different phases of the acquisition. For example, in a normal kidney, the peak uptake in the cortex occurs about 30 seconds after injection, in the medulla at 2-3 minutes, and in the collecting system at 4-6 minutes. In order to utilize the temporal information, a normative template pattern was constructed for the background and three renal tissues (cortex, medulla, collecting systems). Then the mean temporal dynamics for each class was derived for further temporal resolution. Thus the time-series segmentation algorithm is based on minimization of the following energy functional N
∑ ∑ dis( TD(v) − TD n =1 v∈Cn
Cn
),
(5)
where TDCn denotes the normative temporal dynamic pattern of tissue Cn . In equation (5), v presents for a voxel belonging to class Cn. The function dis stands for the Euclidean distance in T-dimensional vector space, where T is the total number of frames in each 4D series; TD(v) stands for the temporal dynamics for voxel v. After this step, rough segmentation for each tissue of interest was generated. 2.3 Refined Segmentation for Whole Kidney Volume (Step VI)
In order to reduce errors resulting from initial registration and segmentation, additional registration and segmentation steps are applied. The initial mask for the whole kidney was generated by combining the segmentation results for the cortex, medulla, and collecting system from the previous step. A narrowband mask containing voxels apart from the initial boundary of the whole kidney within ±3 voxels distance range was generated for each frame. Then, within this mask, a refined kidney boundary for each frame was segmented using the methods described in [15]. 2.4 4D Refined Registration and Segmentation for Intrarenal Structures (Step VII)
After the segmentations for the kidney boundaries were refined for each frame in the previous step, the binary images series of the kidney shell was used for calculating the refined parameters for registration, i.e. translation and rotation, using the binary version of the Fourier-based registration method with sub-voxel accuracy proposed in [14]. After the image series was realigned using the refined registration step, an additional temporal segmentation similar to the method used in Step V was applied to segment the intrarenal tissues of the kidney.
762
T. Song et al.
2.5 Clinical Evaluation
The above procedure was first tested on 16 pairs (object image and target image) of kidney contours (four subjects, four pairs of volumes) that represented small, medium and large (<1mm, 1-5mm, and >5mm) degrees of kidney motion. The images were selected from MR examinations of four subjects with renal insufficiency. An expert observer manually traced kidney contours to provide registration ground truth to evaluate the performance of the automated registration. Coordinates were defined as: x = head to feet, y = left to right, z = anterior to posterior). Conventional rotation angle parameters were expressed in degrees (θ, φ, ψ). The proposed method was then evaluated on 4D MR renography data sets from four patients with 41 time phases per kidney, with manual segmentation and registration performed by two M.D. experts in body radiology. Manual segmentations of the cortex, medulla and collecting system were used as the reference standard to identify segmentation overlap and volume evaluation. All translation results were expressed in voxels and an absolute voxel size (1.66, 1.66, 2.5) mm was used throughout the study.
3 Experiments and Results Fig. 2 shows coregistration erros derived from 16 pairs fo images. The mean value of rough translation errors were [0.5344, 0.6390, 0.1508] voxels; the mean value of refined translation errors were [0.2416, 0.5540, 0.1292] voxels. Rotation errors were mainly from the third parameter ψ, which is the rotation angle. Averaged rotation errors were [0.0000, 0.0003, -0.6630] degrees, which represents rotation in the sagittal plane. It was also shown that the refinement in registration improved the accuracy in the final alignment. Refined Translation Errors
Rotation Errors 3
2
2
2
1
1
1
0 -1 -2 -3
X
Y
Z
Degrees
3
Voxels
Voxels
Translation Errors 3
0
0
-1
-1
-2
-2
-3
X
Y
Z
-3
Theta
Phi
Psi
Fig. 2. Distribution of errors in 3D translation and rotation
On the four patient data sets, segmented volumes of intrarenal regions were calculated and compared with experts’ results which are shown in Fig. 3. Since the computer-based method excluded the dark boundaries around kidney from the cortex whereas experts tended to include them, our method yielded consistent undersegmentation compared to manual segmentation for the cortex; for the medulla and collecting system, our method was comparable to manual segmentation. Computer aided segmentation results and experts’ segmentation are illustrated at different time frames in Fig. 4. A 3D visualization of the registration results based on manually
Integrated Four Dimensional Registration and Segmentation
763
segmented ROIs and computer calculated results are also shown in Fig. 5 to illustrate the misalignment before registration and alignment after registration using manual segmentation or the computer-based method. Volume Stack Plot 250 Cortex Medulla CollSys
Volume (cc)
200 150 100 50 0
1 a 1b 1c
2a 2b 2c
3a 3b 3c
4a 4b 4c
Fig. 3. Volumes of cortex, medulla and collecting system on four patient’s data from 1 to 4. (a) and (b) are volumes from our two experts, and (c) are volumes from our automated method
(1a)
(1b)
(1c)
(2a)
(2b)
(2c)
(3a)
(3b)
(3c)
Fig. 4. Segmentation results for three different time points 1~3 on one patient. (a) and (b) are segmented ROIs from two experts; (c) is the final segmentation from proposed algorithm, which shows great agreement with manual results. Green is cortex; red is medulla; blue is collecting system. Color labels are transparently displayed over the original gray images.
(a)
(b)
(c)
Fig. 5. Illustration of typical registration results for two different time points: (a) original position (b) registered image using manual binary ROIs, and (c) registered images using our algorithm on gray-level images
764
T. Song et al.
4 Conclusions In 4D (3D plus time) MRI renography, manual alignments and delineation for each 4D dataset usually requires approximately 3-4 hours of a radiologist time at a workstation per case. This remains prohibitively costly and labor-intensive for practical clinical use. In this paper, we proposed a novel four-dimensional MRI renography registration-segmentation framework. The strength of the algorithm are in the integration of image segmentation and registration, which improved the overall performance of analysis. The proposed method was quantitatively evaluated on several real patient data sets, which yielded accurate and robust results. Refined average translation errors were almost less than half of a voxel; averaged rotation error was within one degree. Therefore, the proposed image registration and segmentation algorithm appears suitable for automated analysis of clinical MR renography data.
References 1. V. S. Lee, H. Rusinek, M. E. Noz, et al., "Dynamic Three-dimensional MR Renography for the Measurement of Single Kidney Function: Initial Experience," Radiology, vol. 227, pp. 289-294, 2003. 2. S. E. Yuksel, A. El-Baz, A.A.Farag, et al., "Automatic detection of renal rejection after kidney transplantation," International Congress Series, vol. 1281, pp. 773-778, 2005. 3. J. D. Grattan-Smith, M. R. Perez-Bayfield, R. A. Jones, et al., "MR imaging of kidneys functional evaluation using F-15 perfusion imaging," Pediatric Radiology, vol. 33, pp. 293-304, 2003. 4. Y. Sun, J. M. F. Moura, D. Yang, et al., "Kidney segmentation in MRI sequences using temporal dynamics," presented at IEEE International Symposium on Biomedical Imaging, 2002. 5. Y. Boykov, V. S. Lee, H. Rusinek, et al., "Segmentation of dynamic N-D data sets via graph cuts using markov models," presented at Proceedings of the 4th International Conference on Medical Image Computing and Computer-Assisted Intervention, 2001. 6. J. A. d. Priester, A. G. Kessels, E. L. Giele, et al., "MR renography by semiautomated image analysis: performance in renal transplant recipients," J Magn Reson Imag, vol. 14, pp. 134-140, 2001. 7. Y. Sun, M.-P. Jolly, and J. M. F. Moura, "Integrated Registration of Dynamic Renal Perfusion MR Images," presented at IEEE International Symposium on Image Processing, ICIP'04, Singapore, 2004. 8. Y. Sun, J. M. F. Moura, and C. Ho, "Subpixel Registration in Renal Perfusion MR Image Sequence," presented at IEEE International Symposium on BioImaging, ISBI'04, Crystal City, VA, 2004. 9. G. Gerig, R. kikinis, W. Kuoni, et al., "Semiautomated ROI Analysis in Dynamic MRIStudies: PartI: Image Analysis Tools for Automatic Correction of Organ Displacement," IEEE Transaction on Image Processing, vol. 11, pp. 221-232, 1992. 10. P. J. Yim, H. B. Marcos, M. McAuliffe, et al., "Registration of Time-Series Contrast Enhanced Magnetic Resonance Images for Renography," presented at 14th IEEE Symposium Computer-Based Medical Systems, 2001. 11. E. L. W. Giele, J. A. dePriester, J. A. Blom, et al., "Movement Correction of the Kidney in Dynamic MRI Scans Using FFT Phase Difference Movement Detection," Journal of Magnetic Resonance Imaging, vol. 14, pp. 741-749, 2001.
Integrated Four Dimensional Registration and Segmentation
765
12. D. F. Shanno, "Conditioning of Quasi-Newton Methods for Function Minimization," Matheamtics of Computing, vol. 24, pp. 647-656, 1970. 13. H. S. Stone, M. T. Orchard, E.-C. Chang, et al., "A Fast Direct Fourier-Based Algorithm for Subpixel Registration of Images," IEEE Transaction on Geoscience and Remote Sensing, vol. 39, pp. 2235-2243, 2001. 14. T. Song, V. S. Lee, H. Rusinek, et al., "Automatic 4-D Registration in Dynamic MR Renography Based on Over-complete Dyadic Wavelet and Fourier Transforms," Lecture Notes in Computer Science, vol. 3749, pp. 205-213, 2005. 15. C. Xu and J. Prince, "Snakes, Shapes, and Gradient Vector Flow," IEEE Transactions on Image Processing, vol. 7, pp. 359-369, 1998.
Fast and Robust Clinical Triple-Region Image Segmentation Using One Level Set Function Shuo Li1,2 , Thomas Fevens2 , Adam Krzy˙zak2, Chao Jin2 , and Song Li3 1
3
GE Healthcare, London, Ontario, Canada
[email protected] 2 Department of Computer Science and Software Engineering, Concordia University, Montr´eal, Qu´ebec, Canada {fevens, krzyzak, chao jin}@cs.concordia.ca School of Stomatology, Anhui Medical University, Hefei, Anhui, P.R. China
[email protected]
Abstract. This paper proposes a novel method for clinical triple-region image segmentation using a single level set function. Triple-region image segmentation finds wide application in the computer aided X-ray, CT, MRI and ultrasound image analysis and diagnosis. Usually multiple level set functions are used consecutively or simultaneously to segment triple-region medical images. These approaches are either time consuming or suffer from the convergence problems. With the new proposed triple-regions level set energy modelling, the triple-region segmentation is handled within the two region level set framework where only one single level set function needed. Since only a single level set function is used, the segmentation is much faster and more robust than using multiple level set functions. Adapted to the clinical setting, individual principal component analysis and a support vector machine classifier based clinical acceleration scheme are used to accelerate the segmentation. The clinical acceleration scheme takes the strengths of both machine learning and the level set method while limiting their weaknesses to achieve automatic and fast clinical segmentation. Both synthesized and practical images are used to test the proposed method. These results show that the proposed method is able to successfully segment the triple-region using a single level set function. Also this segmentation is very robust to the placement of initial contour. While still quickly converging to the final image, with the clinical acceleration scheme, our proposed method can be used during pre-processing for automatic computer aided diagnosis and surgery.
1
Introduction
Due to the complexity of medical image segmentation and the requirements of the clinical environment,, in terms of speed and accuracy, clinical image segmentation is one of most challenging topics in the computer aided medical image analysis and diagnosis. The triple-region segmentation problem deals with the segmentation of images that naturally consist of three distinct regions of interest. Triple-region R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 766–773, 2006. c Springer-Verlag Berlin Heidelberg 2006
Fast and Robust Clinical Triple-Region Image Segmentation
767
segmentation finds wide application in medical imaging due to the fact that most X-rays, CT, MRI and ultrasound images can be modeled as a triple-region segmentation problem. This paper reports on our work towards a robust and efficient clinical triple-region image segmentation method using one of the latest image processing techniques: the level set method. Level set methods have been shown to be one of the most versatile techniques for medical image segmentation due to its ability to capture the topology of shapes in medical imagery. Although some work have been reported on medical image segmentation using level set [1,2], there are many challenges remain in the application of level set for medical image segmentation, especially clinical image segmentation, even for triple-region images.
2 2.1
Challenges Coupled Versus Hierarchical
The main problem of the level set representation lies in the fact that a level set function is restricted to the separation of two regions. As soon as more than two regions are considered, the level set idea loses much of its attractiveness. Only a few papers focus on level set based segmentation for more than two regions, which can be divided into the following two categories: coupled level set and hierarchical level set. Coupled Level Set. Coupled level set employs multiple level set function simultaneously for multi-region segmentation. In [3], Zhao et al. assign one level set function to each region. The number of level set function is equal to the number of regions in the image. A very similar framework for classification was proposed by Samson in [4]. Both techniques, however, assume an initially fixed number of regions. This assumption is omitted in [5] where the number of regions is estimated in a preliminary stage by means of a Gaussian mixture estimate of the image histogram. This way, the number of mixture coefficients determines the number of regions. A considerably different approach is proposed by Vese and Chan in [6]. By their method, the level set functions are used in such a way that N regions are represented by only log2 N level set functions. Unfortunately, this will result in empty regions if log2 N is a float rather than a integer. These empty regions have undefined statistics, though the statistics still appear in the evolution equations. To segment a triple-region image using coupled level set method, three coupled level set functions are usually used. However in [7], Li et al. proposed a competitive level set which uses two coupled level set functions to segment triple regions. In general, the coupled level set suffers from the slow converge and local minimal problem. Hierarchical Level Set. Hierarchical level set segmentation [8,9] employs an hierarchical approach to extend two region segmentation method to multiple region segmentation. The image is first segmented into two regions by one level set function. Then based on the variance analysis of each region, the program
768
S. Li et al.
decides to further segment one or both regions. The procedure is done recursively until the whole image is properly segmented. Although comparing to coupled level set, hierarchical level set has advantage as easy implementation and fast segmentation. However for triple-region segmentation, it requires using two level set segmentation consecutively. And therefore the processing time is at least doubled. As described above, the prominence of current level set based segmentation is lost as soon as more than two regions come into play. The purpose of this paper is to address this problem for a particular image segmentation problem: tripleregion medical image segmentation using level set framework while preserving its advantages. 2.2
Clinical Segmentation Using Level Set
Moreover as indicated in [10], although efficient, level sets are not suitable for clinical segmentation. This is not only because it is usually a time consuming method, when coupled with complicated medical structure, even using only three coupled level set functions can be very time consuming, but also because of the following factors: level set is sensitive to the placement of initial contour. Therefore the running time of level set method heavily relies on the position and size of the initial curves and the complexity of objects, coupled level set functions do not converge for some initial curves and for some cases, different initial contours may give different segmented results. A good level set based clinical image segmentation method should be able to overcome all of these problems.
3
Proposed Framework
To overcome the above problems, we propose a novel idea of using single level set function to segment the three regions. The proposed level set functional formulates triple-region segmentation as two dual-region segmentation. Since only one level set function is used, it is much faster and robust, comparing to existing level set method. To adapt the processing in clinical setting, we combine the strength of pattern classifier and level set to achieve a robust and fast clinical segmentation for three regions medical images. A pattern classifier and individual principal component analysis are used to accelerate the pathological level set segmentation. The evolution of the level set function achieves a fast and robust triple-region segmentation since the initial contour provided by the SVM and PCA is very close to the final contour. 3.1
Level Set Modelling
For a clear description, the three regions are named as bright region (BR), grey region (GR), and dark region (DR) as shown in the Fig. 1. According the Mumford and Shah modelling, the correct segmentation can be obtained by minimizing energy function given as:
Fast and Robust Clinical Triple-Region Image Segmentation
E(Φ) = λ1
ΩBR
(u−cBR )2 dxdy 2 σBR
+λ3
ΩDR
+ λ2
(u−cDR ) 2 σDR
2
ΩGR
(u−cGR )2 dxdy 2 σGR
dxdy
769
(1) ,
where ci and σi are mean intensity value and variance of the region Ωi and λi is a constant. To formulate the three regions segmentation problem into two region segmentation framework, a new energy function is proposed as shown in Eq. 2. By the functional proposed, the triple-region segmentation is naturally formulated simultaneously as solving two dual region segmentation problems: grey region versus dark region and grey region versus bright region. E(Φ) = λ1
2
Ω−
2
BR ) DR ) M in( (u−c , (u−c )dxdy + λ2 σ2 σ2 BR
DR
Ω+
(u−cGR )2 dxdy 2 σGR
(2)
where the function M in(x, y) returns the smaller value of x and y.
Fig. 1. Triple-region modelling
Fig. 2. Feature extraction diagram
To achieve fast and robust segmentation, a hybrid level set functional that combines minimal variance (Eq. ??), the optimal edge integrator [10] and the geodesic active contour model [10] is used: E = E(Φ) − γ1 ELAP + γ2 EGAC ,
(3)
where γi are constants, the geodesic active contour (EGAC ) and edge functional (ELAP ) are defined in Eq. 4. EGAC (C) = g(C)dxdy (4) ELAP (C) = C < ∇, n > ds + ΩC Ku |∇u|dxdy. Here Ku is the mean curvature of the level set function, n is the unit vector normal to the curve and ds is the arc length of curve C.
770
S. Li et al.
Function g(x, y) is an inverse edge indicator function, which is defined as g(x, y) = α2 /(α2 + |∇u|2 ), where α is a constant and ∇ is the gradient operator. The level set function Φ is derived from the functional in Eq. 3 as shown: 2 ∂Φ ∇Φ GR ) = δε (Φ) γ2 div(g |∇Φ| ) − λ2 (u−c H(Φ) − γ1 uξξ 2 σ GR ∂t 2
(5)
2
BR ) DR ) +λ1 M in( (u−c , (u−c )(1 − H(Φ)) , σ2 σ2 BR
DR
where H(·) is the Heaviside function, div(·) is the divergence operator, and uξξ = Δu − Ku |∇u|. 3.2
Clinical Acceleration
Designed for the dental clinical setting, the segmentation has a clinical acceleration scheme, which divides the segmentation into two stages: a training stage and a clinical segmentation stage. During the training stage, first, manually chosen representative images, which consist of three regions, are segmented by the level set segmentation method proposed. Then the results are used to train a support vector machine (SVM) after PCA. During the clinical segmentation stage, images are first classified by the trained SVM after PCA, which provides initial contours. The evolution of the level set function will give the final segmentation result. Instead of using global PCA, we use the individual PCA approach described in [11], since it is able to reduce more dimensions than global PCA. The feature extraction process is described in Fig. 2.
4
Experimental Results
Both a synthesized image and real medical images (CT and X-ray) are used to test the proposed method. The results are shown and analyzed below. 4.1
Segmentation
First the proposed level set segmentation method is tested using a synthesized triple-region image as shown in Figs. 3, and 4. The original image consists of three Gaussian distributed regions. As shown in the Fig. 3, the proposed level set method successfully segments the grey region from the dark region and bright region with one level set function only. Compared to using three level set functions,
Fig. 3. Segmentation results on synthesized triple-region image. (a) Iteration 0. (b) Iteration 10. (c) Iteration 20. (d) Iteration 40. (e) Iteration 60. (f) Iteration 80.
Fast and Robust Clinical Triple-Region Image Segmentation
771
Fig. 4. Segmentation results on synthesized triple-region image with different initial contours. (a) Iteration 0. (b) Iteration 60. (c) Iteration 100. (d) Iteration 0. (e) Iteration 60. (f) Iteration 100.
one level set function is much faster and more robust. Moreover the segmentation method is robust to position of the initial contour. Regardless of where the initial contour is placed, where you put the initial contour, the level set still able to converge to the final correct segmentation as shown in Figs. 3 and 4.
Fig. 5. Segmentation results on CT image with contour of middle region. (a) Iteration 0. (a) Iteration 10. (c) Iteration 20. (d) Iteration 40. (e) Iteration 60. (f) Iteration 80.
Fig. 6. Segmentation results on CT image. (a) Iteration 0. (a) Iteration 10. (c) Iteration 20. (d) Iteration 40. (e) Iteration 60. (f) Iteration 80.
Figs. 5 to 7 show segmentation results on CT head scan. Fig. 5 shows the evolution of zero contour and Fig. 6 shows the region segmentation in correspondent to zero curve evolution. Fig. 7 shows the contour and region evolution on another slice of scan. Figs. 8(a-c) show the segmentation results on an astrophysics
Fig. 7. (a) Iteration 0. (b) Iteration 30. (c) Iteration 50. (d) Iteration 0. (e) Iteration 30. (f) Iteration 50.
772
S. Li et al.
Fig. 8. (a) Iteration 0. (b) Iteration 50. (c) Iteration 90. (d) Iteration 0. (e) Iteration 50. (f) Iteration 90.
X-ray. As we can see, even though the image is very noisy and boundaries are also very unclear, the segmentation can still be achieved. Figs. 8(d-f) show the segmentation results on a bone X-ray. 4.2
Clinical Acceleration
Fig. 9 shows the clinical acceleration results. Since the PCA and SVM are able to provide initial boundaries closes to the final boundaries, it takes only few iterations to achieve the final segmentation as shown in the comparison of Figs. 3, 4, 6 and 9. Since that individual PCA is able to reduce the dimension to 10% of the original dimension, which greatly speed up the SVM training and classification. This makes the proposed segmentation method easily being used in the real time application.
Fig. 9. Segmentation results on using clinical acceleration. (a) Initial contour provided by SVM . (b) Iteration 5. (c) Iteration 15. (d) Initial contour provided by SVM. (e) Iteration 5. (f) Iteration 15.
5
Conclusion
To facilitate the application of the level set method in clinical triple-region image segmentation, a single level set function based segmentation method is proposed. The method is based on the boundary characteristics of the triple regions. By the new proposed energy modelling, the triple-region segmentation is put into the framework of dual regions segmentation. Using the functional proposed, the triple-region segmentation is naturally formulated as two dual region segmentation problems. The clinical image segmentation acceleration used takes the strengths of both machine learning and the variational level set method while limiting their weaknesses to achieve automatic and fast clinical segmentation. Both a synthesized image and real images are used to test the proposed method.
Fast and Robust Clinical Triple-Region Image Segmentation
773
Experimental results show that the proposed method is able to provide a fast and robust clinical triple-region image segmentation using only one single level set function. The proposed method can be used as pre-processing step for automatic computer aided diagnosis. The proposed method can be also easily extended to multiple region segmentation using a hierarchical scheme.
References 1. J. ha An, Y. Chen, F. Huang, D. C. Wilson, and E. A. Geiser, “A variational pde based level set method for a simultaneous segmentation and non-rigid registration.,” in MICCAI, pp. 286–293, 2005. 2. P. Yan and A. A. Kassim, “Mra image segmentation with capillary active contour.,” in MICCAI, pp. 51–58, 2005. 3. H.-K. Zhao, T. F. Chan, B. Merriman, and S. Osher, “A variational level set approach to multiphase motion,” Journal of Computational Physics, vol. 127, no. 1, pp. 179–195, 1996. 4. C. Samson, L. Blanc-Fraud, G. Aubert, and J. Zerubia, “A level set model for image classification,” International Journal of Computer Vision, vol. 40, no. 3, pp. 187–197, 2000. 5. N. Paragios and R. Deriche, “Coupled geodesic active regions for image segmentation: A level set approach,” in ECCV ’00: Proceedings of the 6th European Conference on Computer Vision-Part II, (London, UK), pp. 224–240, Springer-Verlag, 2000. 6. L. Vese and T. Chan, “A multiphase level set framework for image segmentation using the Mumford and Shah model,” International Journal of Computer Vision, vol. 50, no. 3, pp. 271–293, 2002. 7. S. Li, T. Fevens, A. Krzy˙zak, C. Jin, and S. Li, “Semi-automatic decay detection on dental x-rays using variational level set,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI), vol. 3749 of Lecture Notes in Computer Science, (Palm Springs, California, USA), pp. 670–678, Springer, 2005. 8. M. Jeon, M. Alexander, W. Pedrycz, and N. Pizzi, “Unsupervised hierarchical image segmentation with level set and additive operator splitting,” Pattern Recognition Letters, vol. 26, pp. 1461–1469, July 2005. 9. A. Tsai, J. Yezzi, A., and A. Willsky, “Curve evolution implementation of the Mumford-Shah functional for image segmentation, denoising, interpolation, and magnification,” IEEE Trans. on Image Processing, vol. 10, pp. 1169 – 1186, 2001. 10. S. Li, T. Fevens, and A. Krzy˙zak, “Image segmentation adapted for clinical settings by combining pattern classification and level sets.,” in Medical Image Computing and Computer-Assisted Intervention (MICCAI), vol. 3216 of Lecture Notes in Computer Science, (St-Malo, France), pp. 160–167, Springer, 2004. 11. X. Liu, T. Chen, and B. V. K. V. Kumar, “Face authentication for multiple subjects using eigenflow.,” Pattern Recognition, vol. 36, no. 2, pp. 313–328, 2003.
Fast and Robust Semi-automatic Liver Segmentation with Haptic Interaction Erik Vidholm1 , Sven Nilsson2 , and Ingela Nystr¨ om1 1
2
Centre for Image Analysis, Uppsala University, Sweden
[email protected] Dept. of Radiology and Clinical Immunology, Uppsala University Hospital, Sweden
Abstract. We present a method for semi-automatic segmentation of the liver from CT scans. True 3D interaction with haptic feedback is used to facilitate initialization, i.e., seeding of a fast marching algorithm. Four users initialized 52 datasets and the mean interaction time was 40 seconds. The segmentation accuracy was verified by a radiologist. Volume measurements and segmentation precision show that the method has a high reproducibility.
1
Introduction
One of the most important steps in medical image analysis is segmentation, i.e., the process of classifying data elements as object or background. Segmentation is needed in diagnostics, therapy monitoring, surgery planning, and several other medical applications. To manually segment the structures of interest in medical datasets is a very tedious and error-prone procedure, while fully automatic segmentation is, despite decades of research, still an unsolved problem. Therefore, many methods are semi-automatic, i.e., the segmentation algorithm is provided with high-level knowledge from the user [1]. A successful semi-automatic method takes advantage of the user’s ability to recognize objects and the ability of the computer to delineate objects. A common recognition task in semi-automatic segmentation is initialization by placement of seed points inside the object of interest. The interactive part is highly dependent on the user interface. Interfaces that rely on two-dimensional (2D) interaction have many drawbacks when the data is three-dimensional (3D) since it is not straight-forward how to map 2D interaction into 3D space. It has been shown that by using true 3D interaction with haptic feedback, more efficient semi-automatic methods can be obtained [2]. Liver segmentation is of importance in hepatic surgery planning, where it is a first step in the process of finding vessels and tumours, and the classification of liver segments [3,4]. Liver segmentation may also be useful for monitoring patients with liver metastases, where disease progress is correlated to enlargement of the liver [5, p. 580]. Low contrast between organs and the high shape variability of the liver make automatic segmentation a hard task. In [4], a reference 3D liver model is deformed to fit the liver contour in the image. The method performs well, except for atypical livers where manual interaction is needed. Common for many semi-automatic methods is their slice-based 2 12 D nature [6]. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 774–781, 2006. c Springer-Verlag Berlin Heidelberg 2006
Fast and Robust Semi-automatic Liver Segmentation
775
In this work, we demonstrate how a haptic user interface can be used to facilitate initialization of a fast marching algorithm [7] in 3D in order to segment the liver from CT images.
2
Fast Marching Segmentation
Fast marching methods [7] are numerical schemes for solving the Eikonal equation: |∇u| = C, (1) where u is time of arrival and C is a cost (“slowness”) function. The common use of fast marching in image segmentation is to design a proper cost image C, provide a set of seed points P with arrival time equal to zero, and then propagate a front from these points until a certain arrival time is reached, i.e., solving Equation (1). The classical gradient approximations, e.g., centered differences, are not wellsuited for discretization of Equation (1). In [7], it is shown how the use of upwind schemes make the solution stable. We use the Godunov scheme ⎡ ⎤1/2 −x +x max(Dijk u, −Dijk u, 0)2 + −y +y u, −Dijk u, 0)2 + ⎦ |∇u| ≈ ⎣ max(Dijk = Cijk , (2) −z +z max(Dijk u, −Dijk u, 0)2 ±x where i, j, k denotes the indices in the discretization and Dijk are standard forward and backward difference operators. To solve Equation (2) for u, we use the algorithm in [8]. The central idea in fast marching is the observation that information propagates from smaller values of u to larger values. Using this property, the algorithm builds the solution outwards from the boundary condition. The boundary condition is expressed at the set of seed points: u(p) = 0, p ∈ P. The algorithm is accelerated by limiting the computational domain to a narrow band in the proximity of the front. This narrow band is stored in a fast priority queue based on a minimum heap data structure. The cost image C should be designed to achieve low costs in homogeneous parts and high costs at edges and is typically based on the image gradient magnitude. We compute our cost image in four steps. First, we suppress noise in the original image I(x) by using bilateral filtering [9]. This is an edge preserving smoothing filter that combines domain and range filtering, i.e., it takes into account both spatial closeness and intensity similarity of voxels. We use a filter kernel composed of two Gaussians, one for the domain (σd ) and one for the range (σr ). The filtered image is computed as 2 2 2 2 1 1 −1 IBF (x) = h (x) I(ξ)e− 2 ||ξ−x|| /σd e− 2 (I(ξ)−I(x)) /σr dξ, Ω
where h−1 (x) normalizes the kernel. As the next step, we apply a voxel-wise Gaussian filter with mean intensity μG and standard deviation σG : 1
IG (x) = 1 − e− 2 (IBF (x)−μG )
2
2 /σG
.
776
E. Vidholm, S. Nilsson, and I. Nystr¨ om
Fig. 1. The different steps in generating the cost image. Here, a 2D slice is shown, whereas all computations are in 3D. From upper left: original CT image, result of bilateral filtering, result of voxel-wise Gaussian filter, gradient magnitude, the resulting cost image (α = 0.3), and the intensity profile along the red line in the cost image.
This gives low values for voxels with intensity close to μG . By weighting IG with its gradient magnitude, we obtain our final cost function: C(x) = αIG (x) + (1 − α)|∇IG (x)|,
α ∈ [0, 1].
Figure 1 illustrates the steps in the cost image generation and a typical intensity profile of the liver border in the cost image. We have a low cost inside the object, high cost at the edges, and a middle-valued cost outside the object. By this design of C, we aim to prevent the common problem of leakage. Once we have our cost image, we can specify seed points and run the fast marching algorithm. The problem that remains is to find at which arrival time we put a threshold to obtain our final segmentation. We use a technique similar to the energy measure suggested in [10]. The average cost value of the voxels belonging to the front is used to determine a suitable arrival time threshold. The average cost at time u is 1 C(u) = C(x), NFront x∈Front where NFront is the current number of voxels in the narrow band. In the beginning of propagation, the average cost is typically close to zero (assuming that the initialization is inside the object). During propagation, the average cost will increase as more and more voxels approach the object boundary. Eventually,
777
Front mean cost value
Fast and Robust Semi-automatic Liver Segmentation
Time of arrival
Fig. 2. The average cost of the front voxels at each time instance. The time of arrival at the maximum peak is used as a threshold to obtain the segmentation.
when the front penetrates the object boundary the average cost will decrease again. If we plot the average cost as a function of time, we typically obtain the graph in Figure 2. Note the similarity between the shape of this graph and the cost image edge profile in Figure 1. We locate the maximum peak in the graph and use the corresponding time of arrival as threshold.
3
Initialization with Haptic Interaction
The initialization is important for the fast marching algorithm. Most important is that the seeds must be placed inside the object. It is also desired to have them placed centrally in order to reduce the risk of leakage. To represent the shape extent, the seeds should also be placed in protrusions, see Figure 3 (left). To facilitate the initialization, we use a 3D interface with haptic feedback to draw regions of seed points. Our interface consists of a Reachin display [11] that combines a PHANToM desktop haptic device with supporting systems. See Figure 3 (right). The application is developed in the Reachin API 3.2 which is a scene-graph API for haptic and graphic visualization written in C++. The basic view in the program is an orthogonal tri-planar construction of the dataset. The haptic interaction consists of a spring force that guides the user to hold the cursor in the closest plane. The interface consists of only the most basic functions that are needed for this specific application: Draw. Draw spherical seed regions of radius 5 voxels at the cursor position Clear. Clear the current seed region and start over Browse. Translate the three planes according to the cursor position Contrast/Brightness. Adjust the graylevel display Save. Save the current seed points and exit The idea is to use the haptic guidance to put the cursor in one of the planes and then translate this plane through the volume while drawing true 3D seed regions inside the liver. The co-located haptics and graphics guide the user in positioning numerous representative seed-points in a very fast manner.
778
E. Vidholm, S. Nilsson, and I. Nystr¨ om
Fig. 3. Left: A screenshot from the application where seed regions are being drawn centrally and representative for the shape extent. Right: A user working with the haptic display. The PHANToM device is positioned beneath a semi-transparent mirror in order to obtain co-location of graphics and haptics.
4
Experiment and Results
We used abdominal contrast enhanced venous phase CT images from 26 patients with either carcinoid or endocrine pancreas tumour1 . Each patient underwent two CT examinations with time interval 3–17 months. The 52 images were acquired with a Siemens Sensation 16 CT scanner. The image dimensions are 512 × 512 × Nz , where Nz ranges from 116 to 194. The resolution in xy is between 0.54–0.77 mm and the slice thickness is between 2.5–3.0 mm. Four users performed seeding of the 52 datasets using the haptic interface. One user was an experienced radiologist and the other three users were medical image analysts. The users performed the task independent of each other. Before the real task started, there was a training session on a randomly selected dataset. The ordering of the anonymized datasets were random and the same for all users. All users completed the 52 seeding tasks in less than one hour. The interaction times and the number of seeds are summarized in Table 1. Given the regions of seed points, we computed the mean and variance of seed voxel values, μP and σP . The cost images were then generated with the parameters σd = 1.0 mm, σr = σG = 2σP , μG = μP , and α = 0.3. On a standard PC (2.4 GHz, 1 MB RAM), the computation of the cost images was about one minute per image and for the fast marching segmentation about 20 seconds per image and user. The segmentation results were visually inspected by the radiologist and the majority were considered successful. In a few cases, there were problems with leakage due to low contrast, especially at the heart and the stomach. We found that these problems were easily repaired with manual interaction when necessary. Figure 4 shows slices with seed regions and resulting liver contour and surface renderings of three segmentation results. Note that the leakage problem is emphasized. To obtain a measure of the between-user reproducibility of 1
Dr Hans Frimmel is acknowledged for providing the datasets.
Fast and Robust Semi-automatic Liver Segmentation
779
Table 1. Interaction times and number of seeds for all users (a) Interaction times (s) User U1 U2 U3 U4 Total
Min 22 20 24 13 13
Max Mean Stdd. 173 42 24 146 41 26 110 53 19 51 26 9 173 40 23
(b) Number of seed voxels User U1 U2 U3 U4 Total
Min 2346 815 635 1434 635
Max 8425 2752 1833 5975 8425
Mean 4453 1550 1001 4047 2763
Stdd. 1647 427 250 1049 1811
Fig. 4. Top row: slices with the contour of the segmentation result (blue) and the seed regions (red). Bottom row: surface renderings of the segmentation results in the top row. To the bottom right the problem of leakage at the heart is encircled.
the method, we computed the correlation coefficient for liver volume estimates between the users, see Table 2. The mean liver volume for our datasets, taken as the average of the four segmentations, was 1731 cm3 with standard deviation 669 cm3 . Comparing volume values alone is not sufficient to determine the precision of the method. As suggested in [12], a measure of the inter-operator precision ηij for two segmentation results Si and Sj is the fraction of intersection and union: Si Sj ηij = . Si Sj We have computed ηij and the corresponding coefficient of variation (CV) for all possible pairs inour experiment, see Table 3. We also computed the precision 4 4 for all four users k=1 Sk / k=1 Sk and obtained the mean value 0.944 with corresponding CV 5.48%.
780
E. Vidholm, S. Nilsson, and I. Nystr¨ om
Table 2. Correlation coefficients for estimated liver volumes between the four users and a 95% confidence interval
U1 U2 U3 U4
U1 U2 U3 U4 ♦ 0.998(0.996–0.999) 0.979(0.963–0.988) 0.987(0.977–0.992) ♦ ♦ 0.983(0.970–0.990) 0.987(0.978–0.993) ♦ ♦ ♦ 0.994(0.989–0.996) ♦ ♦ ♦ ♦
Table 3. Inter-user precision and coefficient of variation (CV) in % for the four users. The precision is averaged between the users for all possible pairs of segmentations and the CV is the corresponding ratio of standard deviation and mean.
U1 U2 U3 U4
U1 U2 U3 U4 ♦ 0.972(2.01%) 0.960(5.36%) 0.969(3.67%) ♦ ♦ 0.967(4.91%) 0.973(3.59%) ♦ ♦ ♦ 0.973(3.80%) ♦ ♦ ♦ ♦
Fig. 5. Left: A deformable surface model fitted to one of our segmentation results. Right: A slice of a spleen segmented with our method showing contour and seed region.
5
Conclusions and Future Work
We have demonstrated the use of a haptic 3D interface for fast interactive segmentation by using the fast marching algorithm. The mean interaction time required for the four users was 40 seconds which enables practical use of the method. Comparisons of interaction times with and without haptic feedback will be undertaken in order to quantify the relevance of haptics. Inter-user precision measures show that the method is robust despite the high shape variability of the liver and the users’ very different approaches to place seed regions. According to the radiologist, the segmentation results are sufficiently accurate for liver volume quantification. In future work, the segmentation results will be verified with comparison to manually segmented ground-truth data. We believe that the presented method can be used as a first step on the way to accurate delineation of the liver with methods that involve shape constraints, e.g., deformable surfaces or level-set methods. In Figure 5 (left), we show a
Fast and Robust Semi-automatic Liver Segmentation
781
surface model that has been deformed under a distance field generated from one of our segmentations. We have also tried the method on other organs with promising results. Figure 5 (right) shows a segmentation of the spleen.
References 1. Olabarriaga, S.D., Smeulders, A.W.M.: Interaction in the segmentation of medical images: A survey. Medical Image Analysis 5(2) (2001) 127–142 2. Harders, M., Sz´ekely, G.: Enhancing human-computer interaction in medical segmentation. Proceedings of the IEEE 9(91) (2003) 1430–1442 3. Meinzer, H.P., Thorn, M., C´ ardenas, C.E.: Computerized planning of liver surgery– an overview. Computers and Graphics 26 (2002) 569–576 4. Soler, L., Delingette, H., Malandain, G., et al.: Fully automatic anatomical, pathological, and functional segmentation from CT scans for hepatic surgery. Computer Aided Surgery 6(3) (2001) 131–142 5. Fauci, A.S., Braunwald, E., Isselbacher, K.J., Hauser, D.L., Kasper, D.L., Wilson, J.D., Martin, J.B., Longo, D.L., eds.: Harrison’s Principles of Internal Medicine. 12th edn. McGraw-Hill, New York (1998) 6. Schenk, A., Prause, G., Peitgen, H.O.: Efficient semi-automatic segmentation of 3D objects in medical images. In: Proc. MICCAI’00. LNCS 1935 (2000) 186–195 7. Sethian, J.A.: Level set methods and fast marching methods. Cambridge University Press (1999) 8. Kimmel, R., Sethian, J.A.: Optimal algorithm for shape from shading and path planning. Journal of Mathematical Imaging and Vision 14(3) (2001) 237–244 9. Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: Proc. 6th Int. Conf. on Computer Vision. (1998) 839–846 10. Yan, J., Zhuang, T.: Applying improved fast marching method to echocardial boundary detection in echocardiographic images. Pattern Recognition Letters 24(15) (2003) 2777–2784 11. Thurfjell, L., McLaughlin, J., et al.: Haptic interaction with virtual objects: The technology and some applications. Industrial Robot 29(3) (2002) 210–215 12. Udupa, J.K., Leblanc, V.R., Schmidt, H., et al.: A methodology for evaluating image segmentation algorithms. In Sonka, M., Fitzpatrick, J.M., eds.: Proceedings of SPIE Medical Imaging 2002, SPIE (2002) 266–277
Objective PET Lesion Segmentation Using a Spherical Mean Shift Algorithm Thomas B. Sebastian1 , Ravindra M. Manjeshwar1, Timothy J. Akhurst2 , and James V. Miller1 1
2
GE Research, Niskayuna, NY Department of Nuclear Medicine, MSK Cancer Center, New York, NY
Abstract. PET imagery is a valuable oncology tool for characterizing lesions and assessing lesion response to therapy. These assessments require accurate delineation of the lesion. This is a challenging task for clinicians due to small tumor sizes, blurred boundaries from the large point-spread-function and respiratory motion, inhomogeneous uptake, and nearby high uptake regions. These aspects have led to great variability in lesion assessment amongst clinicians. In this paper, we describe a segmentation algorithm for PET lesions which yields objective segmentations without operator variability. The technique is based on the mean shift algorithm, applied in a spherical coordinate frame to yield a directional assessment of foreground and background and a varying background model. We analyze the algorithm using clinically relevant hybrid digital phantoms and illustrate its effectiveness relative to other techniques.
1
Introduction
FDG PET and FDG PET+CT allow clinicians to make inferences on the nature of lesions from local measures of metabolic rate. Clinicians can infer lesion characterization (inflammation verses cancer), establish lesion boundaries for radiation therapy, and estimate response to therapy from the FDG PET local measures of metabolic rate. These tasks require clinicians to estimate the boundaries of lesions in three dimensions and summarize the metabolic activity over these three dimensional regions. These are challenging tasks due to the inhomogeneity of the metabolic rate across the lesion, the inhomogeneity of the metabolic rate across the background, and the potential for high metabolic rates in nearby anatomy. Furthermore, lesions can be small relative to the point-spread-function of the imaging device and their boundaries may be blurred by respiratory motion during long image acquisitions. Figure 1 illustrates some of the challenges in estimating boundaries and activity. These challenges have led clinicians to use simple criteria to evaluate lesions. For instance, simple window-level and thresholding techniques are often used for segmenting lesions in FDG PET and the maximum uptake over a region is used to summarize the activity over the lesion. These simple approaches are suboptimal for quantifying activity and lead to a large degree of variability R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 782–789, 2006. c Springer-Verlag Berlin Heidelberg 2006
Objective PET Lesion Segmentation Using a SMS Algorithm
783
Fig. 1. Left: Example lesion imaged with PET. Right: Cropped axial slices of the same lesion with segmentation using proposed approach overlaid.
amongst clinicians. Hence, the goal of this paper is to develop an objective FDG PET lesion segmentation algorithm that requires minimal user interaction. Several algorithms have been proposed for medical image segmentation. However, we found many traditional segmentation algorithms failed to properly segment PET lesions. Gradient based techniques failed because of difficulty in estimating reliable gradients on small or diffused lesions. Region based techniques failed because the interior of a lesion is considerably inhomogeneous. Classification based techniques [1,2] applied at the pixel level failed because of the inhomogeneity of the interior of the lesion and the inhomogeneity of the background. Classification methods that included neighborhood constraints [3,4] had some success but failed when the background varied around the lesion boundary. In this paper, we present a lesion segmentation technique that provides clinicians with an objective estimation of lesion boundaries from which they can make inferences on lesion characterization, plan radiation therapy, and estimate response to therapy. The main challenge in segmenting PET lesions is the inhomogeneous nature of the lesion and the background. We apply the mean shift algorithm in a spherical coordinate frame which facilitates varying background models around the lesion boundary. The piecewise constant approximation produced by the spherical mean shift is used to estimate a directionally varying background model, which is then used to classify the background and foreground of the lesion. The spherical mean shift (SMS) algorithm is particularly effective in segmenting lesions that are adjacent to normal tissue with high uptake like the liver or mediastinum, and inhomogeneous lesions. Validation of segmentation algorithms is challenging because of the difficulty in obtaining reproducible ground-truth. To address this problem for PET lesions, we use hybrid digital phantoms [5]. Models of lobulated lesions are projected into sinogram space, combined with sinograms from real subjects, and reconstructed using a PET simulator that accurately models the point-spread-function and noise properties of a real PET system. The hybrid digital phantoms provide images which are indistinguishable from real lesions [5] while allowing for precise control over the ground truth. The SMS algorithm is shown to outperform two other algorithms: neighborhood expectation maximization (NEM) and spatiallyweighted fuzzy clustering (SFCM).
784
2
T.B. Sebastian et al.
Mean Shift
Fukunaga and Hostetler introduced the mean shift algorithm [6,7] to estimate the gradient of a probability density function given a set of samples from the distribution. Using a hill climbing algorithm, the gradient estimate can be used to identify the modes of the underlying distribution. The mean shift algorithm has been used for clustering [6], segmentation [8], and tracking [9]. Let {xi } denote the set of n d-dimensional samples from the distribution and let fij denote each sample at the j th mean shift iteration where fi0 = xi . The mean shift algorithm is fij+1 = fij + M (fij ) (1) where M (fij ) is the mean shift vector n
M (fij ) =
fkj w(
k=1 n k=1
fkj − fi0 2 ) h
f j − fi0 2 w( k ) h
− fi0
(2)
and w is a non-negative, non-increasing weight function and h is a bandwidth parameter. To apply mean shift to image segmentation[8], each pixel is treated as a ddimensional sample of its position and intensity, fi0 = [x, y, z, I(x, y, z)] . The mean shift algorithm iteratively moves the samples, converging on the modes of the joint spatial+intensity domain. Pixels that converge to the same mode are labeled as being from the same segment.
3
Spherical Mean Shift
The straightforward application of the mean shift to PET lesions fails to properly segment small lesions and lesions with diffuse boundaries. The algorithm merges these lesions into the background. For larger lesions, the mean shift algorithm produces a segmentation that underestimates the volume of convex lesions. To address these issues, we apply the mean shift algorithm to a lesion that has been resampled into spherical coordinates. First, we define a small region of interest (ROI) around the lesion in Cartesian space, centered on the lesion and comfortably sized at approximately 3× the size of the lesion. The imagery within the ROI is resampled into spherical coordinates: radial distance to the center, r, azimuthal angle in the xy-plane, θ, and polar angle from the z-axis, φ x = r cos θ sin φ,
y = r sin θ sin φ,
z = r cos φ.
(3)
where r = [0, ∞], θ = [0, 2π] and φ = [−π, π]. The resampling results in an image indexed by r, θ, and φ. We refer to this resampling process as lesion unwrapping. We could apply the mean shift algorithm along each radial line independently, obtaining a piecewise constant approximation of the image intensity along each
Objective PET Lesion Segmentation Using a SMS Algorithm
785
radial direction, as in [10] in the study of vessel cross-sections. However, processing each radial line independently results in piecewise constant approximations that are inconsistent across radial lines, as azimuthal and polar spatial information is not utilized. Therefore, we apply the mean shift algorithm over a neighborhood of radial lines in the unwrapped lesion. The samples in the mean shift are the voxel’s spatial coordinates and intensity, fj0 = [r, θ, φ, I(r, θ, φ)] . The bandwidths for the mean shift procedure are chosen such that hθ ≈ hφ hr . The intensity (range) bandwidth is estimated as a function of the intensity value at the center of the ROI. To compute the mean shift vector, we use a kernel of size 21×3×3 (r, θ, φ), which corresponds to a cone in the spherical coordinate frame. The spherical mean shift algorithm iteratively moves the samples, converging on the modes of the joint (spherical) spatial+intensity domain. Voxels that converge to the same mode are labeled as being from the same segment.
4
Directional Statistical Background Model
The spherical mean shift algorithm of Section 3 results in a piecewise constant approximation of the signal intensity. This initial segmentation is used to estimate a directionally varying statistical model of the background. In most cases, a global model of the background will not suffice because of the significant variation in the background on different sides of the lesion. For example, a lesion that is adjacent to high-uptake regions such as heart or mediastinum may have higher background levels to one side. To address this variation, we estimate a different statistical model for the background in each radial direction. The background model for each radial line is once again based on a neighborhood of radial lines. We identify all the voxels in the bundle of the neighborhood that are associated with the segment with the lowest intensity in the initial spherical mean shift segmentation. A robust estimate of the minimum can be obtained by not considering small regions in this step. The statistical model of the background in the given radial direction is estimated from the original PET image intensity values of these candidate voxels. Currently, we use a simple statistical model composed of the sample mean μb and the standard deviation σb . All voxels within μb ± σb are assigned to the background and the remaining voxels to the lesion. By repeating this process on all radial lines, we segment the lesion in spherical coordinates. Finally, we resample the segmentation back into Cartesian coordinates using nearest neighbor interpolation.
5
Lesion Database
Phantom acquisitions are routinely used for performance evaluation of segmentation algorithms because they provide reproducible, ground-truth information. However, these phantoms typically represent geometrically-simple shapes and uniform activity distributions superimposed on homogeneous backgrounds. This scenario differs dramatically from the complex shapes of lesions and the inhomogeneous lesion and background activity distributions encountered in clinical
786
T.B. Sebastian et al.
Fig. 2. Schematic diagram illustrating how hybrid phantoms are generated
PET images. Therefore, phantom images may not provide a clinically relevant and/or adequate challenge in evaluating segmentation algorithms. To address this concern, we use hybrid phantoms, consisting of digital lesions embedded seamlessly into actual FDG PET patient data sets [5], Figure 2. Digital lesions of random shapes, sizes and non-uniform voxel activity distributions were constructed from combinations of ellipsoidal primitives. An analytical simulation of the GE Advance NXiTM PET scanner operating in 2D acquisition mode was used to generate emission sinograms of the digital lesions. The PET acquisition simulation modeled the attenuation of activity from the lesions due to patient tissue as well as the physical properties of the PET scanner (spatial resolution, energy resolution and Poisson statistics). The lesion emission sinograms were attenuated using patient transmission data and added to patient FDG emission sinograms that were corrected for decay, randoms and scatter, but not attenuation. The combined emission sinograms were reconstructed to seamlessly embed the digital lesion into clinical patient data. It must be emphasized that the digital lesions were embedded into patient data in the sinogram domain and not in the image domain.The hybrid images generated by this technique have been determined to be indistinguishable from clinical images by an expert nuclear medicine physician (with experience in reading more than 5000 scans) [5]. The reconstructed images contain ground truth lesions with differing patterns of FDG uptake, a variety of shapes and sizes, and clinically relevant background textures and uptake from adjacent anatomies. These hybrid phantoms provide a realistic challenge to a segmentation algorithm, yet contain information of the ground truth, which cannot be obtained from actual patient tumors. Thirty-five clinical reference patient datasets acquired on the GE Advance NXiTM scanner, were selected for their low tumor burden in the lungs. Two hundred and eighty digital lesions with volumes of 5-40 ml and maximum sourceto-background ratios ranging from 3:1 to 6:1 were generated. The digital lesions were embedded inside the lung region of the reference patient datasets with some lesions positioned in isolation, others close to the mediastinum and the remaining
Objective PET Lesion Segmentation Using a SMS Algorithm
787
close to the lung wall. All images were reconstructed with the iterative OSEM (28/2) algorithm on a 128x128 grid with a pixel size of 4.29mm and slice thickness of 4.25mm.
6
Results
We illustrate the performance of the spherical mean shift algorithm in Figures 34. Figure 3 shows three cases with the challenges mentioned in Section 1. Figure 4 shows a lesion segmented with the spherical mean shift (SMS) algorithm and compares it against the results from a spatially-weighted fuzzy clustering method (SFCM) and a neighborhood expectation maximization method (NEM).
(a)
(b)
(c)
Fig. 3. SMS segmentation results. Columns show the original image, ground truth, and segmentation results. SMS produces a viable segmentation when (a) the lesion is adjacent to another region of high uptake, (b) the lesion has a complex topology, and (c) the lesion is inhomogeneous and has poor contrast.
788
T.B. Sebastian et al.
Fig. 4. Segmentation comparison. Clockwise from top left: True lesion, SMS, SFCM, NEM. Table 1. Algorithm comparison on 280 hybrid digital phantoms of various sizes
5cc 10cc 25cc 40cc
#Fail 25 20 15 13
SFCM DSC 0.336±0.357 0.408±0.357 0.379±0.336 0.596±0.322
#Fail 11 9 11 8
NEM DSC 0.573±0.312 0.702±0.242 0.642±0.309 0.760±0.222
#Fail 4 3 0 0
SMS DSC 0.731±0.171 0.805±0.096 0.812±0.094 0.858±0.065
To further quantify the performance of the spherical mean shift, we apply the algorithm to the aforementioned hybrid phantom lesions. Table 1 summarizes the results of SMS and provides a comparison to SFCM and NEM. If an algorithm generated a gross oversegmentation — mislabeling more than half of the foreground as background, or generated a gross undersegmentation — mislabeling more background as foreground as there is foreground in the ground truth, then the algorithm was said to have failed on that lesion. Dice Similarity Coefficients (DSC) [11] 2 |A ∩ B| DSC(A, B) = (4) |A| + |B| were calculated and means and standard deviations of the DSC scores are reported for each algorithm and each nodule size. From the table, it is apparent that the spherical mean shift is more robust since it succeeds in producing a segmentation 97.5% of the time (compared to SFCM=74% and NEM=86%). The
Objective PET Lesion Segmentation Using a SMS Algorithm
789
spherical mean shift is more accurate as it achieves higher DSC values for every lesion size tested. Finally, the spherical mean shift is more precise as it achieves lower standard deviations in DSC values.
7
Conclusions
We present an objective segmentation algorithm that can robustly, accurately, and precisely segment small lesions is PET imagery. We use the mean shift algorithm, applied in a spherical coordinate frame, to produce a directionally varying background model. This background model is used to classify the foreground and background along each radial direction. We analyze the algorithm using 280 clinically relevant hybrid digital phantoms and qualitatively and quantitatively compare the spherical mean shift algorithm to two other algorithms.
References 1. N. Otsu. A threshold selection method from gray level histograms. T-SMC, 1:62– 66, 1979. 2. A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society, Series B, 39:1–38, 1977. 3. C. Ambroise, M. Dang, and G. Govaert. Clustering of spatial data by the EM algorithm. In Geostatistics for Environmental Applications, volume 9 of Quantitative Geology and Geostatistics, pages 493–504. Kluwer Academic Publishers, 1997. 4. Y. Tolias and S. Panas. Image segmentation by a fuzzy clustering algorithm using adaptive spatially constrained membership functions. T-SMC, 28(3):359–369, 1998. 5. R.M. Manjeshwar, T.J. Akhurst, B.B. Beattie, H.E. Cline, D.E. Gonzalez Trotter, and J.L. Humm. Realistic hybrid digital phantoms for evaluation of image analysis tools in positron emission tomography. In SNM, 2002. 6. K. Fukunaga and L.D. Hostetler. The estimation of the gradient of a density function, with applications in pattern recognition. T-IT, 21(1):32–40, 1975. 7. Y. Cheng. Mean shift, mode seeking and clustering. T-PAMI, 17(8):790–799, 1995. 8. D. Comaniciu and P. Meer. Mean shift: A robust approach toward featue space analysis. T-PAMI, 24(5):603–619, 2002. 9. D. Comaniciu, V. Ramesh, and P. Meer. Real-time tracking of non-rigid objects using mean shift. In CVPR 2000, volume 2, pages 142–149, 2000. 10. H. Tek, D. Comaniciu, and J. Williams. Vessel detection by mean-shift based ray propagation. In MMBIA, pages 228–235, 2001. 11. L.R. Dice. Measures of the amount of ecologic association between species. Ecology, 26:297–302, 1945.
Multilevel Segmentation and Integrated Bayesian Model Classification with an Application to Brain Tumor Segmentation Jason J. Corso1, Eitan Sharon2 , and Alan Yuille2 1
Medical Imaging Informatics, University of California, Los Angeles, CA, USA
[email protected] 2 Department of Statistics, University of California, Los Angeles, CA, USA
Abstract. We present a new method for automatic segmentation of heterogeneous image data, which is very common in medical image analysis. The main contribution of the paper is a mathematical formulation for incorporating soft model assignments into the calculation of affinities, which are traditionally model free. We integrate the resulting modelaware affinities into the multilevel segmentation by weighted aggregation algorithm. We apply the technique to the task of detecting and segmenting brain tumor and edema in multimodal MR volumes. Our results indicate the benefit of incorporating model-aware affinities into the segmentation process for the difficult case of brain tumor.
1
Introduction
Medical image analysis typically involves image data that has been generated from heterogeneous underlying physical processes. Segmenting this data into coherent regions corresponding to different anatomical structures is a core problem with many practical applications. For example, it is important to automatically detect and classify brain tumors and measure properties such as tumor volume, which is a key indicator of tumor progression [1]. But automatic segmentation of brain tumors is very difficult. Brain tumors are highly varying in size, have a variety of shape and appearance properties, and often deform other nearby structures in the brain [2]. In general, it is impossible to segment tumor by simple thresholding techniques [3]. In this paper, we present a new method for automatic segmentation of heterogeneous image data and apply it to the task of detecting and describing brain tumors. The input is multi-modal data since different modes give different cues for the presence of tumors, edema (swelling), and other relevant structure. For example, the T2 weighted MR modality contains more pertinent information for segmenting the edema (swelling) than the T1 weighted modality. Our method combines two of the most effective approaches to segmentation. The first approach, exemplified by the work of Tu and Zhu [4], uses class models to explicitly represent the different heterogeneous processes. The tasks of segmentation and class membership estimation are solved jointly. This type of R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 790–798, 2006. c Springer-Verlag Berlin Heidelberg 2006
Multilevel Segmentation and Integrated Bayesian Model Classification
791
approach is very powerful but the algorithms for obtaining these estimates are comparatively slow. The second approach is based on the concept of normalized cuts [5] and has lead to the segmentation by weighted aggregation (SWA) algorithm due to Sharon et al. [6,7]. SWA was first extended to the 3D image domain by Akselrod-Ballin et al. [8]. In their work, multiple modalities are used during the segmentation to extract multiscale segments that are then classified in a decision tree algorithm; it is applied to multiple sclerosis analysis. SWA is extremely rapid and effective, but does not explicitly take advantage of the class models used in [4]. In this paper, we first describe (§2) how we unify these two disparate approaches to segmentation by incorporating Bayesian model classification into the calculation of affinities in a principled manner. This leads (§3) to a modification of the SWA algorithm using class-based probabilities. In §4 we discuss the specific class models and probability functions used in the experimental results.
2
Integrating Models and Affinities
In this section we discuss the main contribution of the paper: a natural way of integrating model classes into affinity measurements without making premature class assignments to the nodes. Let G = (V, E) be a graph with nodes u, v ∈ V. Each node in the graph is augmented with properties, or statistics, denoted sv ∈ S, where S is the space of properties (e.g., R3 for standard red-green-blue data). Each node also has an (unknown) class label cu ∈ C, where C is the space of class labels. The edges are sparse with each node being connected to its nearest neighbors. Each edge is annotated with a weight that represents the affinity of the two nodes. The affinity is denoted by wuv for connected nodes u and v ∈ V. Conventionally, the affinity function is of the form wuv = exp (−D(su , sv ; θ)) where D is a non-negative distance measure and θ are predetermined parameters. The goal of affinity methods is to detect regions R ⊂ V with low saliency defined by u∈R,v ∈R / wuv Γ (R) = . (1) u,v∈R wuv Such regions have low affinity across their boundaries and high affinity within their interior. An efficient multiscale algorithm for doing this is described in §3. By contrast, the model based methods define a likelihood function P ({su }|{cu }) for the probability of the observed statistics/measurements {su } conditioned on the class labels {cu } of the pixels {u}, and put prior probabilities P ({cu }) for the class labels of pixels. Their goal is to seek estimates of the class labels by maximizing the posterior probability P ({cu }|{su }) ∝ P ({su }|{cu })P ({cu }). In this paper, we restrict ourselves to a simple model where P (su |cu ) is the conditional probability of the statistics su at a node u with class cu , and P (cu , cv ) is the prior probability of class labels cu and cv at nodes u and v.
792
J.J. Corso, E. Sharon, and A. Yuille
We combine the two approaches by defining class dependent affinities in terms of probabilities. The affinity between nodes u, v ∈ V is defined to be the probability Pˆ (Xuv |su , sv ) = wuv of the binary event Xuv that the two nodes lie in the same region. This probability is calculated by treating the class labels as hidden variables which are summed out: P (Xuv |su , sv ) = P (Xuv |su , sv , cu , cv )P (cu , cv |su , sv ) , ∝ =
cu
cv
cu
cv
cu
cv
P (Xuv |su , sv , cu , cv )P (su , sv |cu , cv )P (cu , cv ) , P (Xuv |su , sv , cu , cv )P (su |cu )P (sv |cv )P (cu , cv ) ,
(2)
where the third line follows from the assumption that the nodes are conditionally independent given class assignments. The first term of (2) is the model-aware affinity calculation: P (Xuv |su , sv , cu , cv ) = exp −D (su , sv ; θ[cu , cv ]) . (3) Note that the property of belonging to the same region Xuv is not uniquely determined by the class variables cu , cv . Pixels with the same class may lie in different regions and pixels with different class labels might lie in the same region. This definition of affinity is suitable for heterogeneous data since the affinities are explicitly modified by the evidence P (su |cu ) for class membership at each pixel u, and so can adapt to different classes. This differs from the conventional affinity function wuv = exp (−D(su , sv ; θ)), which does not model class membership explicitly. The difference becomes most apparent when the nodes are aggregated to form clusters as we move up the pyramid, see the multilevel algorithmic description in §3. Individual nodes, at the bottom of the pyramid, will typically only have weak evidence for class membership (i.e., p(cu |su ) is roughly constant). But as we proceed up the pyramid, clusters of nodes will usually have far stronger evidence for class membership, and their affinities will be modified accordingly. The formulation presented here is general; in this paper, we integrate these ideas into the SWA multilevel segmentation framework (§3). In §4, we discuss the specific forms of these probabilities used in our experiments.
3
Segmentation by Weighted Aggregation
We now review the segmentation by weighted aggregation (SWA) algorithm of Sharon et al. [6,7], and describe our extension to integrate class-aware affinities. As earlier, define a graph Gt = (V, E) with the additional superscript indicating the level in a pyramid of graphs G = {Gt : t = 0, . . . , T }. Denote the multimodal intensity vector at voxel i as I(i) ∈ RM , with M being the number of modalities. The finest layer in the graph G0 = (V 0 , E 0 ) is induced by the voxel lattice: each voxel i becomes a node v ∈ V with 6-neighbor connectivity, and node properties
Multilevel Segmentation and Integrated Bayesian Model Classification
793
set according to the image, sv = I(i). The affinities, wuv , are initialized as in . §2 using D(su , sv ; θ) = θ |su − sv |1 . SWA proceeds by iteratively coarsening the graph according to the following algorithm: 1. t ← 0, and initialize G0 as described above. 2. Choose a set of representative nodes Rt ⊂ V t such that ∀u ∈ V t v∈Rt wuv ≥ β v∈V t wuv . 3. Define graph Gt+1 = (V t+1 , E t+1 ): (a) V t+1 ← Rt , and edges will be defined in step 3f. wuU (b) Compute interpolation weights puU = , with u ∈ V t and t+1 wuV
V ∈V
U ∈V . (c) Accumulate statistics to coarse level: sU = u∈V t puUtsupvU . v∈V (d) Interpolate affinity from the finer level: w ˆUV = (u=v)∈V t puU wuv pvV . (e) Use coarse affinity to modulate the interpolated affinity: t+1
WUV = w ˆUV exp (−D(sU , sV ; θ)) .
(4)
(f) Create an edge in E t+1 between U = V ∈ V t+1 when WUV = 0. 4. t ← t + 1. 5. Repeat steps 2 → 4 until |V t | = 1 or |E t | = 0. The parameter β in step 2 governs the amount of coarsening that occurs at each layer in the graph (we set β = 0.2 in this work). [6] shows that this algorithm preserves the saliency function (1). Incorporating Class-Aware Affinities. The two terms in (4) convey different affinity cues: the first affinity w ˆUV is comprised of finer level (scale) affinities interpolated to the coarse level, and the second affinity is computed from the coarse level statistics. For all types of regions, the same function is being used. However, at coarser levels in the graph, evidence for regions of known types (e.g., tumor) starts appearing resulting in more accurate, class-aware affinities: WUV = w ˆUV P (XUV |sU , sV ) ,
(5)
where P (XUV |sU , sV ) is evaluated as in (2). We give an example in §4 (Figure 4) showing the added power of integrating class knowledge into the affinity calculation in the case of heterogeneous data like tumor and edema. Furthermore, the class-aware affinities compute model likelihood terms, P (sU |cU ), as a by-product. Thus, we can also associate a most likely class with each node in the graph: c∗U = arg maxc∈C P (sU |c).
4
Application to Brain Tumor Segmentation
Related Work in Brain Tumor Segmentation. Segmentation of brain tumor has been widely studied in medical imaging. Here, we review some major approaches to the problem; [1,3] contain more complete literature reviews.
794
J.J. Corso, E. Sharon, and A. Yuille
Fletcher-Heath et al. [9] take a fuzzy clustering approach to the segmentation followed by 3D connected components to build the tumor shape. Liu et al. [1] take a similar fuzzy clustering approach in an interactive segmentation system that is shown to give an accurate estimate of tumor volume. Kaus et al. [10] use the adaptive template-moderated classification algorithm to segment the MR image into five different tissue classes: background, skin, brain, ventricles, and tumor. Prastawa et al. [3] present a detection/segmentation algorithm based on learning voxel-intensity distributions for normal brain matter and detecting outlier voxels, which are considered tumor. Lee et al. [11] use the recent discriminative random fields coupled with support vector machines to perform the segmentation. Data Processing. We work with a dataset of 30 manually annotated glioblastoma multiforme (GBM) [2] brain tumor studies. Using FSL tools [12], we preprocessed the data through the following pipeline: (1) spatial registration, (2) noise removal, (3) skull removal, and (4) intensity standardization. The 3D data is 256 × 256 with 24 slices. We use the T1 weighted pre and post-contrast modality, and the FLAIR modality in our experimentation. Class Models and Feature Statistics. We model four classes: non-data (outside of head), brain matter, tumor, and edema. Each class is modeled by a Gaussian distribution with full-covariance giving 9 parameters, which are learned from the annotated data. The node-class likelihoods P (su |cu ) are computed directly against this Gaussian model. Manual inspection of the class histograms indicates there are some non-Gaussian characteristics in the data, which will be modeled in future work. The class prior term, P (cu , cv ), encodes the obvious hard constraints (i.e. tumor cannot be adjacent to non-data), and sets the remaining unconstrainted terms to be uniform according to the maximum entropy principle. For the modelaware affinity term (3), we use a class dependent weighted distance: P (Xuv |su , sv , cu , cv ) = exp −
M
θcmu cv
m su − sm v
,
(6)
m=1
where superscript m indicates vector element at index m. The class dependent coefficients are set by hand using expert domain knowledge. They are presented in Table 1 (we abbreviate non-data, ND). For a given class-pair, the modalities Table 1. Coefficients used in model-aware affinity calculation. Rows are modality and columns are (symmetric) class pairs.
XXX XX cu , cv ND, ND ND, Brain Brain, Tumor Brain, Edema Tumor, Edema XXX m X 1 T1 T1 with Contrast FLAIR
3 1 3 1 3
0 1 2 1 2
0 1 0
0 0 1
0 1 2 1 2
Multilevel Segmentation and Integrated Bayesian Model Classification
795
that do not present any relevant phenomena are set to zero and the remaining are set uniformly according to the principle of maximum entropy. For example, edema is only detectable in the FLAIR channel, and thus at the Brain–Edema boundary, only the FLAIR channel is used. We will experiment with techniques to automatically compute the coefficients in future work. Note that the coefficients are symmetric (i.e., equal for Brain, Tumor and Tumor, Brain), and we have included only those pairs not excluded by the hard constraints discussed above. The sole feature statistic that we accumulate for each node in the graph (SWA step 3c) is the average intensity. The feature statistics, model choice and model-aware affinity form is specific to our problem; many other choices could also be made in this and other domains for these functions. Extracting Tumor and Edema from the Pyramid. To extract tumor and edema from the pyramid, we use the class likelihood function that is computed during the Bayesian model-aware affinity calculation (2). For each voxel, we compute its most likely class (i.e., c∗u from §3) at each level in the pyramid. Node memberships for coarser levels up the pyramid are calculated by the SWA interpolation weights (puU ). Finally, the voxel is labeled as the class for which it is a member at the greatest number of levels in the pyramid. We also experimented with using the saliency function (1) as recommended in [6], but we found the class likelihood method proposed here to give better results. Results. We computed the Jaccard score (ratio of true pos. to true pos. plus false pos. and neg.) for the segmentation and classification (S+C) on a subset of the data. The subset was chosen based on the absence of necrosis; the Gaussian class models cannot account for the bimodality of the statistics with necrosis present. The Jaccard score for tumor is 0.85 ± 0.04 and for edema is 0.75 ± 0.08. From manual inspection of the graph pyramid, we find that both the tumor and edema are segmented and classified correctly in most cases with some false positives resulting from the simple class models used. In Figures 2-4, we show three examples of the S+C to exemplify different aspects of the algorithm. For space reasons, we show a single, indicative slice from each volume. Each example has a pair of images to compare the ground truth S+C against the automatic S+C; green represents tumor and red represents edema. Figure 2 shows the pyramid overlayed on the T1 with contrast image. Figure 3 demonstrates the importance of using multiple modalities Fig. 1. Class labels overin the problem of brain tumor segmentation. On the layed on a slice layer top row, the pyramid is overlayed on the T1 with contrast modality and on the bottom row, it is overlayed on the FLAIR modality. We easily see (from the left most column) that the tumor and edema phenomena present quite different signals in these two modalities. Yet, the two phenomena are accurately segmented; in Figure 1, we show the class labels for the layer in
796
J.J. Corso, E. Sharon, and A. Yuille
Fig. 2. Example of tumor detection and segmentation in post-contrast T1 weighted
Fig. 3. Example segmentation demonstrating importance of multiple modalities
the third column. E indicates edema, T indicates tumor, and any non-labeled segment is classified a brain. Figure 4 shows a comparison between using (rows 1 and 2) and not using (row 3) class-aware affinities. We see that with our extension, the segmentation algorithm is able to more accurately delineate the complex edema structure.
5
Conclusion
We have made two contributions in this paper. The main contribution is the mathematical formulation presented in §2 for bridging graph-based affinities and model-based techniques. In addition, we extended the SWA algorithm to integrate model-based terms into the affinities during the coarsening. The modelaware affinities integrate classification without making premature hard, class assignments. We applied these techniques to the difficult problem of segmenting and classifying brain tumor in multimodal MR volumes. The results showed
Multilevel Segmentation and Integrated Bayesian Model Classification
797
Fig. 4. Example segmentation comparing the integration of class information into the affinity calculation (rows 1 and 2, T1 post-contrast and FLAIR, resp.) versus no class information (row 3). See §4 for a full explanation of the figures on this page.
good segmentation and classification and demonstrated the benefit of including the model information during segmentation. The model-aware affinities are a principled approach to incorporate model information into rapid bottom-up algorithms. In future work, we will extend this work by including more complex statistical models (as in [4]) involving additional feature information (e.g. shape) and models for the appearance of GBM tumor. We will further extend the work to automatically learn the model-aware affinity coefficients instead of manually choosing them from domain knowledge.
Acknowledgements Corso is funded by NLM Grant # LM07356. The authors would like to thank Dr. Cloughesy (Henry E. Singleton Brain Cancer Research Program, University of California, Los Angeles, CA, USA) for the image studies, Shishir Dube and Usha Sinha.
References 1. Liu, J., Udupa, J., Odhner, D., Hackney, D., Moonis, G.: A system for brain tumor volume estimation via mr imaging and fuzzy connectedness. Computerized Medical Imaging and Graphics 29(1) (2005) 21–34 2. Patel, M.R., Tse, V.: Diagnosis and staging of brain tumors. Seminars in Roentgenology 39(3) (2004) 347–360
798
J.J. Corso, E. Sharon, and A. Yuille
3. Prastawa, M., Bullitt, E., Ho, S., Gerig, G.: A brain tumor segmentation framework based on outlier detection. Medical Image Analysis Journal, Special issue on MICCAI 8(3) (2004) 275–283 4. Tu, Z., Zhu, S.C.: Image segmentation by data-driven markov chain monte carlo. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(5) (2002) 657–673 5. Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8) (2000) 888–905 6. Sharon, E., Brandt, A., Basri, R.: Fast multiscale image segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Volume I. (2000) 70–77 7. Sharon, E., Brandt, A., Basri, R.: Segmentation and boundary detection using multiscale intensity measurements. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Volume I. (2001) 469–476 8. Akselrod-Ballin, A., Galun, M., Gomori, M.J., Filippi, M., Valsasina, P., Basri, R., Brandt, A.: Integrated segmentation and classification approach applied to multiple sclerosis analysis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. (2006) 9. Fletcher-Heath, L.M., Hall, L.O., Goldgof, D.B., Reed Murtagh, F.: Automatic segmentation of non-enhancing brain tumors in magnetic resonance images. Artificial Intelligence in Medicine 21 (2001) 43–63 10. Kaus, M., Warfield, S., Nabavi, A., Black, P.M., Jolesz, F.A., Kikinis, R.: Automated segmentation of mr images of brain tumors. Radiology 218 (2001) 586–591 11. Lee, C.H., Schmidt, M., Murtha, A., Bistritz, A., Sander, J., Greiner, R.: Segmenting brain tumor with conditional random fields and support vector machines. In: Proceedings of Workshop on Computer Vision for Biomedical Image Applications at International Conference on Computer Vision. (2005) 12. Smith, S.M., Jenkinson, M., Woolrich, M.W., Beckmann, C.F., Behrens, T.E.J., Johansen-Berg, H., Bannister, P.R., Luca, M.D., Drobnjak, I., Flitney, D.E., Niazy, R., Saunders, J., Vickers, J., Zhang, Y., Stefano, N.D., Brady, J.M., Matthews, P.M.: Advances in functional and structural mr image analysis and implementation as fsl. NeuroImage 23(S1) (2004) 208–219
A New Adaptive Probabilistic Model of Blood Vessels for Segmenting MRA Images Ayman El-Baz1 , Aly A. Farag1 , Georgy Gimel’farb2 , Mohamed A. El-Ghar3, and Tarek Eldiasty3 1
2
Computer Vision and Image Processing Laboratory, University of Louisville, Louisville, KY 40292 {elbaz, farag}@cvip.Louisville.edu http://www.cvip.louisville.edu. Department of Computer Science, Tamaki Campus, University of Auckland, Auckland, New Zealand 3 Urology and Nephrology Department, University of Mansoura, Mansoura, Egypt
Abstract. A new physically justified adaptive probabilistic model of blood vessels on magnetic resonance angiography (MRA) images is proposed. The model accounts for both laminar (for normal subjects) and turbulent blood flow (in abnormal cases like anemia or stenosis) and results in a fast algorithm for extracting a 3D cerebrovascular system from the MRA data. Experiments with synthetic and 50 real data sets confirm the high accuracy of the proposed approach.
1
Introduction
Accurate cerebrovascular segmentation using non-invasive MRA is a valuable tool for early diagnostics and timely treatment of intracranial vascular diseases. Among three common MRA techniques, such as time-of-flight MRA (TOF– MRA), phase contrast angiography (PCA), and contrast enhanced MRA (CEMRA). Today’s most popular techniques for segmenting blood vessels from MRA data can be roughly classified in two categories: deformable models and statistical methods. The former iteratively adjust an initial boundary surface to blood vessels by optimizing an energy function that depends on image gradient and surface smoothness. Topologically adaptable surfaces make classical deformable models more efficient in segmenting intracranial vasculature [1]. Geodesic active contours implemented with level set techniques offer flexible topological adaptability to segment MRA images [2] including more efficient adaptation to local geometric structures represented e.g. by tensor eigenvalues [3]. Fast segmentation of blood vessel surfaces is obtained by inflating a 3D balloon with fast marching methods [4]. In [5], the capillary action of the blood inside the arteries and veins was modelled and used as an external force for the deformable model to segment the blood vessels. Several methods have been developed to segment the blood vessels based on extraction of skeleton of blood vessels and using multiscale schemes to allow for the diversity of vessel sizes [6,7]. In these approaches, the R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 799–806, 2006. c Springer-Verlag Berlin Heidelberg 2006
800
A. El-Baz et al.
centerline models can be generated explicitly, implicitly or via postprocessing by vessel modelling methods. The MRA images are multi-modal in the sense that particular modes of the marginal probability distribution of signals are associated with regions-ofinterest. To the best of our knowledge, the only adaptive statistical approaches for extracting blood vessels from the MRA data were proposed by Noble and her group [8,9]. The marginal distribution is modeled with a mixture of two Gaussian and one uniform or Rician components for the stationary CSF (cerebrospinal fluid) and bones, brain tissues (white and gray matter), and arteries, respectively. The uniform component presumes the blood flow is strictly laminar. The mixture is identified (estimated) with a conventional EM algorithm. Our paper derives a more general probability model of blood vessels on MRA images to account for normal and abnormal states of the vascular system, i.e. for both laminar and turbulent blood flow without and with stenosis. To accurately separate blood vessels from other regions-of-interest, the marginal distribution is precisely approximated with an adaptive linear combination of the derived model and a number of dominant and subordinate discrete Gaussians rather than with a mixture of only three pre-selected Gaussian and uniform or Rician components. Experiments show that our adaptive model results in significantly better segmentation of MRA images.
2
Probability Model of Vascular Signals
Let q; q ∈ Q = {0, 1, . . . , Q − 1}, be the Q-ary signals (image intensity, or gray level). Conventional models of intensities for vessel voxels in [8,9] assume laminar blood flow with parabolic velocity flow through a circular cross-section of the r2 vessel [10]. Then the intensity profile for a vessel is qr = qmax 1 − R2 where qr is the intensity at the distance r from the center of a vessel of radius R and the constant qmax ≤ Q − 1 depends on scanner. In this case the intensities over the circular cross-section are distributed uniformly with the probability density: 1 ϕlam (q) = qmax in the range [0, qmax ]. Nonetheless, the laminar flow holds only for subjects with normal vascular systems [11]. Various diseases change either blood velocity or viscosity or both and cause the turbulent flow. Turbulence depends on the diameter of vessel and blood velocity and viscosity. Due to lower blood viscosity, anemia leads frequently to turbulence. Artery constrictions increasing blood velocity (see Fig. 1) and vascular diseases such as thrombosis, embolism, thyrotoxicosis, atherosclerosis, and valvular heart diseases also result in turbulence [11]. Typically, the turbulence adds a uniform random factor ξ in the range [−1, 1] to the parabolic intensity profile [11]: 2 2 r qr = qmax 1 − ξ 2 (1) R To derive the probability density of the intensities in Eq. (1) over the vessel, let ar = πr2 and amax = πR2 be a circular area of radius r and the maximum
A New Adaptive Probabilistic Model of Blood Vessels
801
Fig. 1. Influence of constriction (C) on the blood velocities in a vessel (arrows indicate flow directions) and ranges of velocities at each cross-section along the vessel [11]
(a)
(b)
(c)
Fig. 2. Probability densities for Eq. (4) with β = 0.0, 0.2, 0.4, . . . , 1.0 and synthetic cross-section images of a blood vessel with laminar (β = 0, b) and turbulent (β = 1, c) flow
area for the circular vessel cross-section x for radius R, respectively. Let f (q|ar ), ϕtur (q), and Φtur (x) = Pr(q ≤ x) = 0 ϕtur (q)dq Φ(x) denote the conditional density of intensities on the border of ar , the unconditional density, and the probability of the intensities over the whole vessel, respectively. √ −1 The density of y = ξ 2 ∈ [0, 1] is p(y) = 2 y because Pr(y ≡ ≤ x) √ √ √ q(r) amax Pr(− x ≤ ξ ≤ x) = 2 x. In accord with Eq. (1), y = ar 1 − qmax ,
ar and the intensities qr ∈ qmax 1 − amax , qmax have the conditional density √ 1 f (q|ar ) = 12 aar max qmax qmax −q . The probability distribution of the intensities over the vessel area is then: Φtur (x) =
1 amax
amax 0
dar
x
f (q|ar )dq = 2 −
0
x qmax
−2 1−
x qmax
(2)
Therefore, the unconditional probability density is: 1 1 ϕtur (q) = − qmax (qmax − q) qmax
(3)
802
A. El-Baz et al.
Since the MRA may represent both normal and abnormal subjects, the model of vascular signals can be built as a mixture of the laminar and turbulent components: ϕ(q) = (1 − β)ϕlam (q) + βϕtur (q) ≡
1−2β qmax
β qmax (qmax −q)
+√
(4)
Probability densities for different mixing weights β ∈ [0, 1] in this model are presented in Fig. 2.
3
Adaptive Model of Multi-modal MRA
MRA images contain three regions-of-interest (signal classes): (i) darker CSF, bones and fat, (ii) brain tissues (gray matter and white mater), and (iii) brighter blood vessels. Marginal signal distributions for the first two classes are typically of intricate shape that differs much from the conventional individual Gaussians in [8,9]. The model in Eq. (4) describes only circular vessels and should have additional terms changing its shape to account for variations of the blood flow due to stenosis. Generally, no predefined probability model can accurately describe all the signal variations due to changes in blood velocity and viscosity, vessel diameter, and scanner sensitivity. Therefore, we propose an adaptive probability model to handle both normal and abnormal MRA images. It mixes three submodels representing the abovementioned major image areas (abbreviated by “csf”, “bt”, and “bv”, respectively):
pMRA (q) = αi ϕi (q) (5) i∈{bv,csf,bt}
where αi are the mixing weights (α1 + α2 + α3 = 1). Each of the three submodels ϕi (q) is a mixture of one dominant component with a linear combination of several sign-alternate subordinate components chosen to closely approximate corresponding parts of an empirical marginal signal distribution Femp = (femp (q) : q ∈ Q). The dominant component for the blood vessels submodel is the discrete parametric distribution Ψθ,bv = (ψbv (q|θ) : q = 0, . . . , qmax ) with a shorthand notation θ = (β, qmax ) for its parameters. It is obtained by integrating the density in Eq. (4) over unit intervals corresponding to the integer values q ∈ Q: q+1
ϕbv (q)dq ≡ (1 − β) qmax1 +1 q+1 +β 2 1 − qmaxq +1 − 1 − qmax +1 −
ψbv (q|θ) =
q
1
qmax +1
Because typically qmax = Q − 1, this distribution has only a single parameter β. Two other dominant components are discrete Gaussians (DGs) defined in [12] as a discrete probability distribution Ψθ = (ψ(q|θ) : q ∈ Q) integrating a normal parametric density over unit intervals: ψ(q|θ) = Φθ (q + 0.5) − Φθ (q − 0.5) for q = 1, . . . , Q − 2, ψ(0|θ) = Φθ (0.5), ψ(Q − 1|θ) = 1 − Φθ (Q − 1.5) where Φθ (q) is
A New Adaptive Probabilistic Model of Blood Vessels
803
the cumulative Gaussian probability function with parameters θ = (μ, σ2 ), i.e. the mean μ, and variance σ 2 . The subordinate part of each submodel ϕi (q) is a linear combination of discrete Gaussians (LCDGs) with Ci,p positive and Ci,n negative components under obvious restrictions on their weights. To identify the three submodels (estimate parameters of their dominant components and numbers and parameters of the positive and negative subordinate components), we use the EM-based techniques introduced in [12]. The only difference here is the non-analytical estimation of the parameter β on the M-steps using the gradient-based search for the global maxq max imum of the goal likelihood function G(β) = π(i = bv|q)femp (q) ln ψbv (q|β) q=0
where π(i|q) is the responsibility of the submodel i for q. For simplicity, we do not restrict our model to only a proper subset of the LCDGs that ensures non-negative signal probabilities. This restriction may be ignored due to very close approximation provided by the model.
4
Segmentation of Blood Vessels
To justify the adaptive model of Eq. (5), Fig. 3 shows how different scanners effect the measurements. These three TOF-MRA slices were acquired for a subject with anemia using a Picker 1.5T Edge MRI scanner with resolution of 512 × 512 × 93, a subject with parietal lobe hemorrhage using a Signa Horizon GE 1.5T scanner with resolution 512×512×150, and a normal subject using a state-of-art Siemens 3T scanner with resolution 512 × 512 × 125, respectively. The slice thickness is 1 mm in all the cases.
A
B
C
Fig. 3. Three TOF-MRA slices with their empirical distributions femp (q) overlaid with the dominant mixtures p3 (q)
804
A. El-Baz et al.
Fig. 4. Estimated marginal densities for the three classes MRA shown in Fig 3
t2 = 187, β = 0.98
t2 = 140, β = 0.18
t2 = 68, β = 0.038
t2 = 125, β = 0.12
t2 = 117, β = 0.08
t2 = 119, β = 0.05
Fig. 5. Segmentation results of the proposed approach
The models of Eq. (5) were built with the EM-based approach (see [12] for detail). Figure 3 presents both the marginal empirical distributions Femp and the initial 3-component dominant mixtures for them containing the two Gaussian components and our model of blood vessels in Eq. (4). The estimated parameters β of the latter are 0.92, 0.18, and 0.038 for the slices A, B, and C in Fig. 3, respectively, that reflects levels of blood turbulence expected from physics-based considerations. The second step of the proposed approach is to model the deviation between the empirical density and the three dominant modes, in this paper we used our previous modified EM algorithm [12] to perform this step, the final results of this approach are shown in Fig. 4. Figure 5 shows the segmentation results using the estimated densities shown in Fig. 4.
A New Adaptive Probabilistic Model of Blood Vessels
5
805
Validation
It is very difficult to accurately get manual segmented complete vasculare trees to validate our algorithm. To quantitatively evaluate its performance, we created three 3D phantoms in Fig. 6 with geometrical shapes similar to blood vessels with known ground truth. These three phantoms mimic bifurcations, zero and high curvature existing in any vascular system, and their changing radii simulate both large and small blood vessels. To make the distributions of these three phantoms similar to MRA images, first we compute the empirical class distributions p(q|bv), p(q|csf), and p(q|bt) from the signals that represent blood vessels, CSF, and brain tissues from the MRA images segmented by a radiologist (we have selected 200 images from a data set of over 5000 images of 50 subjects). Then, The phantoms signal are generated by using the inverse mapping methods. The resulting phantom’s histograms are similar to those in Fig. 3. “Cylinder”
“Spiral”
“Tree”
Error 0.18%
Error 1.34%
Error 0.29%
Fig. 6. Segmentation results of 3D phantoms (error shown in white color)
The total segmentation error is evaluated by a percentage of erroneous voxels with respect to the overall number of voxels in the ground truth 3D phantom. Figure 6 shows that, the maximum error obtained by the proposed approach is 1.34% and the minimum error is 0.18%. These results confirm the high accuracy of the proposed approach. Therefore, our adaptive model notably improves the accuracy of segmenting the MRA images acquired with different scanners. The conventional approaches either assume a purely laminar blood flow or pre-select a simple parametric distribution in attempts to take account of actual signal features. By contrast, our model is derived from the physical description of the blood flow and thus can accurately handle both normal and abnormal cases. Moreover, the estimated weights β ∈ [0, 1] in Eq. (4) provide a natural measure of the percentage of abnormality of the blood flow for a particular subject.
806
6
A. El-Baz et al.
Conclusions
We presented a new physically justified adaptive probability model of blood vessels on magnetic resonance angiography (MRA) images. It accounts for laminar (normal subjects) and turbulent blood flow (abnormal cases like anemia or stenosis). The high accuracy of segmenting MRA images with our approach is confirmed by experts-radiologists and also validated using special 3D geometrical phantoms. Our present C++ implementation of the algorithm on a single 2.4 GHz Pentium 4 CPU with 512 MB RAM takes about 49 sec to segment 93 TOF-MRA slices of size 512 × 512 pixels each. The proposed model is suitable for segmenting both TOF-MRA and PC-MRA images. Experiments with the latter type was not included in this paper due to space limitations.
References 1. T. McInerney and D. Terzopoulos, “Medical image segmentation using topologically adaptable surface,” Proc. CVRMED-MRCAS’97, pp. 23–32, 1997 2. L. M. Lorigo, O. D. Faugeras, W. E. L. Grimson, and R. Keriven, “Curves: Curve evolution for vessel segmentation,” Medical Image Analysis, vol. 5, pp. 195–206, 2001. 3. O. Wink, W. J. Niessen, and M. A. Viergever, “Fast delineation and visualization of vessels in 3-D angiographic images,” IEEE Trans. Med. Imaging, vol. 19, pp. 337– 346, 2000. 4. T. Deschamps and L. D. Cohen, “Fast extraction of tubular and tree 3D surfaces with front propoagation methods,” Proc. 16th ICPR, pp. 731–734, 2002. 5. P. Yan, A. A. Kassin, “MRA images segmentation with capillary active contour,” Proc. MICCAI’05, pp. 51–58, 2005. 6. A. F. Frangi, W. J. Niessen, R. M. Hoogeveen, T. van Walsum, M. V. Viergever, “Model-based quantitation of 3-D magnetic resonance angiographic images,” IEEE Transactions on Medical Imaging, vol. 18, pp. 946–956, 1999. 7. S. R. Aylward, E. Bullitt, “Initialization, noise, singularities, and scale in height ridge traversal for tubular object centerline extraction,” IEEE Transactions on Medical Imaging, vol. 21, pp. 61–75, 2002. 8. D. L. Wilson and J. A. Noble, “An adaptive segmentation algorithm for time-offlight MRA data,” IEEE Trans. Med. Imaging, vol. 18, pp. 938–945, 1999. 9. A. Chung, J. A. Noble, P. Summers, “Fusing speed and phase information for vascular segmentation of phase contrast MR angiograms,” Medical Image Analysis, vol. 6, pp. 109–128, 2002. 10. C. G. Caro et al., The Mechanics of the Circulation, Oxford University Press, 1978. 11. W. F. Ganong, Review of medical physiology, McGraw-Hill,15th edition, 1991. 12. A. A. Farag, A. El-Baz, and G. Gimel’farb, “Precise Segmentation of Multi-modal Images,” IEEE Transactions on Image Processing, vol. 15, no. 4, pp. 952-968, April 2006.
Segmentation of Thalamic Nuclei from DTI Using Spectral Clustering Ulas Ziyan1 , David Tuch2 , and Carl-Fredrik Westin1,3 1
MIT Computer Science and Artificial Intelligence Lab, Cambridge MA, USA
[email protected] 2 Novartis Pharma AG, Basel, Switzerland
[email protected] 3 Laboratory of Mathematics in Imaging, Brigham and Women’s Hospital, Harvard Medical School, Boston MA, USA
[email protected]
Abstract. Recent work shows that diffusion tensor imaging (DTI) can help resolving thalamic nuclei based on the characteristic fiber orientation of the corticothalamic/thalamocortical striations within each nucleus. In this paper we describe a novel segmentation method based on spectral clustering. We use Markovian relaxation to handle spatial information in a natural way, and we explicitly minimize the normalized cut criteria of the spectral clustering for a better optimization. Using this modified spectral clustering algorithm, we can resolve the organization of the thalamic nuclei into groups and subgroups solely based on the voxel affinity matrix, avoiding the need for explicitly defined cluster centers. The identification of nuclear subdivisions can facilitate localization of functional activation and pathology to individual nuclear subgroups.
1
Introduction
All of the sensory pathways of the human brain (with the exception of the olfactory pathway) project to the cortex via relay neurons in the thalamus. These relay neurons are clustered into discrete clusters or nuclei that can be delineated based on histological or functional criteria. Delineation of the nuclei is important for surgical treatment of the thalamus and for localization of functional brain activation to specific nuclei. Recently, it has been shown that the thalamic nuclei can be resolved using a type of magnetic resonance imaging (MRI) called diffusion tensor imaging (DTI) [1, 2]. DTI measures the molecular diffusion, i.e., Brownian motion, of the endogenous water in tissue. In fibrous biological tissues such as cerebral white matter, water diffusion is anisotropic. In nerve tissue, the diffusion anisotropy is thought to be due to the diffusion barrier presented by the cell membrane, although the physical mechanisms for diffusion anisotropy in nerve tissue are not entirely understood [3]. On DTI, the thalamus shows distinct clusters of diffusion orientation. These clusters correspond in anatomic location and fiber orientation to the thalamic R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 807–814, 2006. c Springer-Verlag Berlin Heidelberg 2006
808
U. Ziyan, D. Tuch, and C.-F. Westin
nuclei [2]. Diffusion orientation thus provides a new anatomic criterion for distinguishing thalamic nuclei. Although DTI can resolve thalamic nuclei, a segmentation algorithm is required to explicitly delineate the nuclei from the DTI data. Several methods have been proposed to resolve the thalamic nuclei, including the use of k-means [2], level-sets [4] and tract reconstructions from manually defined regions on the cortical surface [1]. A weakness of the k-means algorithm is its geometric bias towards ellipsoidal clusters. Further, both k-means and level sets show sensitivity to initialization, and susceptibility to local minima. The present report describes a new approach for segmentation of thalamic nuclei using the spectral segmentation algorithm described by Shi and Malik [5] with some modifications. The spectral segmentation algorithm has attracted considerable interest in the pattern recognition community due its computational simplicity, strong empirical performance, and rich underlying theory [6, 7, 8]. The algorithm is based on a classical result from spectral graph partitioning which relates the second smallest eigenvector of the Laplacian matrix of a graph to optimal partitions. The spectral clustering algorithm has a number of desirable features. For example, the algorithm consists entirely of direct matrix operations and is therefore computationally efficient. Furthermore, the algorithm does not require a geometric representation of the clusters and therefore contains no explicit geometric biases. DTI data from 10 healthy participants were collected and segmented using the modified spectral clustering to demonstrate feasibility of the proposed method.
2
Theory
Spectral clustering reduces segmentation into a graph partitioning problem. The nodes of the graph are chosen to be the data points, and the links connecting the nodes, commonly referred as edges, are assigned weights based on similarities of the data points being linked together. In the case of DTI data, the nodes can be chosen as the diffusion tensors, and the edge weights can be related to the similarities of the neighboring tensors. In Section 2.1, we will describe the details of the graph construction for DTI data, and in Section 2.2, we will describe a procedure to solve the graph partitioning problem via explicit minimization of an objective function defined on the graph. Once the graph partitioning is completed, the algorithm produces a hierarchical tree, from which different levels of segmentations can be extracted. 2.1
Graph Construction
Our approach to nuclei segmentation is to partition the DTI data into compact regions with homogeneous diffusion properties, since similar diffusion properties are likely to be indicative of belonging to the same nuclei. For that purpose we are looking to construct a graph in which the nodes within a homogeneous region are well connected and the nodes in different homogeneous regions are not. This
Segmentation of Thalamic Nuclei from DTI Using Spectral Clustering
809
graph is constructed based on the spatial distances between the voxels as well as the tensor dissimilarities. The first step in the construction is to choose an appropriate metric to quantify the tensor dissimilarities. There are a number of possible metrics for this purpose, such as angular difference between the principle eigenvector directions, f1 (T i , T j ) = arccos (|v i · v j |) ,
(1)
where v i is the principal eigenvector of tensor T i . The absolute value of the dot product solves the problem with the sign ambiguity of the eigenvectors (if v 1 is an eigenvector, so is −v 1 ). Another option is to use the full tensor information, f2 (T i , T j ) =
trace((T i − T j )2 ) .
(2)
f2 (T i , T j ) is explored in several DTI studies under names such as generalized tensor dot product [9], Frobenius norm [2] and Euclidean distance metric [10]. We also experimented with a recently proposed information theoretic measure, symmetric K-L divergence [11], f3 (T i , T j ) =
−1 trace(T −1 i T j + T j T i ) − 2n ,
(3)
where n = 3 for diffusion tensors. After choosing a tensor dissimilarity metric, we need to incorporate spatial relations of the voxels into the graph. One way to do so is to linearly combine the tensor dissimilarity metric, f (T i , T j ), with a spatial distance, s(xi , xj ), as in [2]: d((xi , T i ), (xj , T j )) = f (T i , T j ) + γs(xi , xj ) ,
(4)
where γ is a weighting factor to control the trade-off between the tensor dissimilarity and the spatial distances. However, selecting an appropriate weighting factor is complicated by the fact that the spatial distances and the tensor dissimilarities are inherently different types of features. To avoid this problem, we use Markovian relaxation, which provides a natural way to incorporate the spatial information. This is done by constructing a sparse affinity matrix, W s , with non-zero similarities only between face neighbors: −f (T i ,T j )2 exp , if s(xi , xj ) ≤ 1 σ2 W s (i, j) = (5) 0, otherwise. The parameter σ is chosen to be the sample variance of f (T i , T j ) for neighboring voxels. We then calculate the full affinity matrix, W , using Markovian relaxation [12]. Specifically, we first convert the sparse weight matrix into a one-step transition probability matrix whose rows and columns add up to one: 1 W s (i, j), if i =j P 1 (i, j) = × (6) maxl d(l) maxl d(l) − d(i), if i = j ,
810
U. Ziyan, D. Tuch, and C.-F. Westin
where d is a vector containing row sums of W s , d(i) = j W s (i, j). The diagonal adjustment ensures that the underlying random walk has a uniform steady state distribution, and therefore every part of the corresponding Markov field is explored at the same speed during the relaxation. Once the one-step transition probability matrix is computed, it is possible to calculate the n-step transition probabilities through Markovian relaxation, which is equivalent to raising the one-step transition probability matrix to the power n: P n = P n1 [12]. n is chosen to be smallest integer that will produce a non-zero transition probability between every node. Finally the diagonal elements of P n are set to zero to produce the full affinity matrix W . 2.2
Graph Cuts
Once the graph is constructed, one can calculate the k-way graph partition using Normalized Cuts criteria. Formally, the problem is now reduced to the following: Given a graph, G = (V , E) and its edge weight matrix W = {ωij }, find a set k of disjoint sets V 1 , V 2 , . . . , V k , such that i=1 V i = V , which minimizes the normalized cut (NCut): NCut (V 1 , V 2 , . . . , V k ) =
k asso(V i , V ) − asso(V i , V i ) i=1
asso(V i , V )
,
(7)
where asso(A, B) = i∈A,j∈B ωij [5]. Finding the normalized cuts, even for the case where k = 2, is NP-complete due to the combinatorial nature of the discrete permutations, therefore, we need a polynomial time algorithm that will provide a reasonable solution. Shi and Malik suggest a number of possibilities for such an algorithm. In this paper, we use a combination of the methods suggested. Specifically, we first normalizethe weight matrix; M = D−1 W , where D is a diagonal matrix with D(i, i) = j ωij . We then calculate the two largest eigenvectors of the normalized weight matrix, M . The largest eigenvector is a vector of constants due to the normalization and therefore discarded. We then consider cutting W into two clusters by thresholding the second eigenvector of M at every possible threshold level, looking for the 2-way cut with the minimum NCut value. Once the cut is made, we repeat this process for each of the two newly created clusters, until a very high threshold of a 2-way NCut value is reached. This threshold is typically close to 1, which is the level at which every point is clustered by itself regardless of the weights. This splitting is followed by a greedy merging algorithm, minimizing NCut at every merging step, to reduce the number of clusters to k. During this merging, the algorithm produces a hierarchical tree, from which different levels of segmentation can be extracted by varying k. At the final stage of our algorithm we allow single data node swaps between clusters to further reduce the NCut value. At every iteration of this stage, the algorithm considers the possibility of reassigning a single node to another cluster. If the resulting cut has a smaller NCut value than before, that node is reassigned to its new cluster. The algorithm stops if there are nodes that can be reassigned.
Segmentation of Thalamic Nuclei from DTI Using Spectral Clustering
3 3.1
811
Methods Image Acquisition and Pre-processing
DTI data were acquired using a twice-refocused spin-echo EPI sequence [13] on a 3 Tesla Siemens Trio MRI scanner using an 8-channel head coil. The sequence parameters were TR/TE=8400/82 ms, b=700 s/mm2 , gmax=26 mT/m, 10 T2 images, 60 diffusion gradient directions, 1 average, total acquisition time 9 min 59 seconds. The field-of-view was 256×256 mm and the matrix size was 128×128 to give 2 × 2 mm in-plane resolution. The slice thickness was 2 mm with 0 mm gap. Correction for motion and residual eddy current distortion was applied by registering all of the scans to the first acquired non-diffusion-weighted scan for each participant. The registration used a 12 degree-of-freedom global affine transformation and a mutual information cost function [14]. Trilinear interpolation was used for the resampling. The diffusion tensor, the tensor eigensystem, and the FA metric were calculated for each voxel using the formulas of [15] and [16]. The diffusion tensor and FA volumes were normalized to MNI-space (Montreal Neurological Institute) by registering each participant’s T2 volume to a skullstripped version of the MNI 152-subject T2 template [17] and then applying the transformation to the diffusion tensor and FA volumes with 12 degree-of-freedom global affine transformation and a mutual information cost function [14]. The registration transformation was then applied separately to the FA and tensor volumes. The FA volume was resampled using trilinear interpolation and the tensor volume was resampled using nearest neighbor interpolation. The tensors were reoriented using the rotational portion of the atlas transformation. 3.2
Segmentation
Thalamus masks were drawn manually for each individual by a trained neuroanatomist. The masks were drawn for each hemisphere on each individual’s MNInormalized FA map. Each hemisphere further segmented into its seven nuclei on the corresponding tensor map by the neuro-anatomist. The masked thalamus DTI data was later segmented separately for each hemisphere using the spectral algorithm described in Section 2.
4
Results
For all thalamic data sets, spectral reordering of the voxel affinity matrix revealed significant clustering structure (Figure 1D). The network created by the algorithm provides necessary information for the clustering and the spectral ordering followed by the modified spectral clustering algorithm was able to identify the clusters. These clusters are presented in Figure 2 (Right) for one of the individuals along with the expert labels (Left). The automatically segmented clusters are similar in both hemispheres which were processed independently, and they match well with the expert segmentation. All 10 subjects were segmented using the modified spectral clustering algorithm at two different levels (of the hierarchical tree) with k = 7 and k = 12. The
812
U. Ziyan, D. Tuch, and C.-F. Westin A
C
E
B
D
F
Fig. 1. A schematic outline of spectral segmentation algorithm. (A) A single slice tensor data (B) Initial graph corresponding to W s (C) Unordered W (D) Ordered and clustered W (E) Clusters on the slice (F) Clusters in 3D.
Ant
Ant
MD
MD
VL/VP LGN MGN
VL/VP IL/CM
Pu
LGN
IL/CM Pu
MGN
Fig. 2. Left: 3D rendering of expert segmentation of both hemispheres from one subject. Right: The same subject segmented by spectral clustering with k = 12.
experiment is repeated with W s and W to quantify the improvement achieved by the Markovian relaxation. The resulting clusters are then named in accordance with the expert labels. At this stage several clusters are allowed to inherit the same labels for a quantitative comparison. These final segmentations are then compared with the expert labels using the Dice volume overlap measure [18]. Mean and standard deviation of the volume overlaps from the 10 subjects are presented in Table 1.
Segmentation of Thalamic Nuclei from DTI Using Spectral Clustering
813
Table 1. Dice volume overlap comparison of spectral clustering results with expert labels for three tensor distance metrics f1 , f2 , f3 . The experiment is repeated with the sparse affinity matrix, W s , and with the product of Markovian relaxation, W , at clustering levels of k = 7 and k = 12. A score of 100 would indicate a perfect match. Ws k=7 Tensor Dist. f1 f2 f3 f1 Mean 75.3 60.6 62.9 79.6 St. Dev. 5.7 4.9 5.0 5.6
5
W Ws W k=7 k = 12 k = 12 f2 f3 f1 f2 f3 f1 f2 f3 67.2 65.9 80.5 68.3 68.6 83.4 72.7 74.2 4.8 5.6 4.6 5.4 3.9 4.1 4.9 5.1
Conclusions
In this paper, we presented a modified spectral clustering method for segmenting the thalamic nuclei using DTI data. Our method can automatically identify nuclear groups and subgroups solely based on the voxel affinity without any prior knowledge of cluster centers. Our experiments indicate that the angular distances between principle eigenvector directions of tensors produce more consistent segmentations compared to full tensor based measures.
Acknowledgements The authors would like to thank Jonathan J Wisco for drawing the thalamus masks and assigning anatomical labels to the clusters. This work was supported by NINDS NS46532, NCRR RR14075, NCI CA09502, P41-RR13218 (NAC), GlaxoSmithKline, the Athinoula A. Martinos Foundation, the Mental Illness and Neuroscience Discovery (MIND) Institute, and the National Alliance for Medical Image Computing (NAMIC) (NIBIB U54 EB005149) which is funded through the NIH Roadmap for Medical Research.
References [1] Behrens, T.E., Johansen-Berg, H., Woolrich, M.W., Smith, S.M., WheelerKingshott, C.A., Boulby, P.A., Barker, G.J., Sillery, E.L., Sheehan, K., Ciccarelli, O., Thompson, A.J., Brady, J.M., Matthews, P.M.: Non-invasive mapping of connections between human thalamus and cortex using diffusion imaging. Nat Neurosci 6(7) (2003) 750–757 [2] Wiegell, M.R., Tuch, D.S., Larsson, H.B., Wedeen, V.J.: Automatic segmentation of thalamic nuclei from diffusion tensor magnetic resonance imaging. Neuroimage 19 (2003) 391–401 [3] Beaulieu, C.: The basis of anisotropic water diffusion in the nervous system - a technical review. NMR Biomed 15(7-8) (2002) 435–455 [4] Jonasson, L., Hagmann, P., Richero Wilson, C., Bresson, X., Pollo, C., Meuli, R., Thiran, J.P.: Coupled, region based level sets for segmentation of the thalamus and its subnuclei in DT-MRI. In: ISMRM. (2005) 731
814
U. Ziyan, D. Tuch, and C.-F. Westin
[5] Shi, J., Malik, J.: Normalized cuts and image segmentation. PAMI 22(8) (2000) 888–905 [6] Higham, D., Kibble, M.: A unified view of spectral clustering. Mathematics Research Report (2004) 1–17 [7] Kannan, R., Vempala, S., Vetta, A.: On clusterings: Good, bad and spectral. J ACM 51(3) (2004) 497–515 [8] Weiss, Y.: Segmentation using eigenvectors: A unifying view. In: ICCV. (1999) 975 [9] Jones, D.K., Horsfield, M.A., Simmons, A.: Optimal strategies for measuring diffusion in anisotropic systems by magnetic resonance imaging. MRM 42(3) (1999) 515–525 [10] Alexander, D.C., Gee, J.C., R., B.: Similarity measures for matching diffusion tensor images. In: BMVC. (1999) [11] Wang, Z., Vemuri, B.C.: An affine invariant tensor dissimilarity measure and its applications to tensor-valued image segmentation. In: CVPR (1). (2004) 228–233 [12] Tishby, N., Slonim, N.: Data clustering by markovian relaxation and the information bottleneck method. In: NIPS. (2000) 640–646 [13] Reese, T.G., Heid, O., Weisskoff, R.M., Wedeen, V.J.: Reduction of eddy-currentinduced distortion in diffusion MRI using a twice-refocused spin echo. MRM 49(1) (2003) 177–182 [14] Jenkinson, M., Bannister, P., Brady, M., Smith, S.: Improved optimization for the robust and accurate linear registration and motion correction of brain images. Neuroimage 17(2) (2002) 825–841 [15] Basser, P.J., Mattiello, J., LeBihan, D.: Estimation of the effective self-diffusion tensor from the NMR spin echo. J Magn Reson B 103(3) (1994) 247–254 [16] Pierpaoli, C., Basser, P.J.: Toward a quantitative assessment of diffusion anisotropy. MRM 36(6) (1996) 893–906 [17] Mazziotta, J.C., Toga, A.W., Evans, A., Fox, P., Lancaster, J.: A probabilistic atlas of the human brain: theory and rationale for its development. the international consortium for brain mapping (icbm). Neuroimage 2(2) (1995) 89–101 [18] Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26 (1945) 297–302
Multiclassifier Fusion in Human Brain MR Segmentation: Modelling Convergence Rolf A. Heckemann1 , Joseph V. Hajnal1 , Paul Aljabar2 , Daniel Rueckert2 , and Alexander Hammers3 1
Imaging Sciences Department, MRC Clinical Sciences Centre, Imperial College at Hammersmith Hospital Campus, London, UK
[email protected] 2 Department of Computing, Imperial College London, UK 3 Division of Neuroscience and Mental Health, MRC Clinical Sciences Centre, Imperial College at Hammersmith Hospital Campus, London, UK
Abstract. Segmentations of MR images of the human brain can be generated by propagating an existing atlas label volume to the target image. By fusing multiple propagated label volumes, the segmentation can be improved. We developed a model that predicts the improvement of labelling accuracy and precision based on the number of segmentations used as input. Using a cross-validation study on brain image data as well as numerical simulations, we verified the model. Fit parameters of this model are potential indicators of the quality of a given label propagation method or the consistency of the input segmentations used.
1
Introduction
Established methods for segmenting magnetic resonance (MR) images of the human brain rely either on automatic tissue classification or on manual delineation of anatomical regions. Automating anatomical segmentation would enable exciting new image analysis applications in diagnostic radiology and imaging research, such as brain volumetry on cohorts. Given an unsegmented target image and an atlas (a reference MR image with a corresponding set of manually generated labels), estimating spatial anatomical correspondence between the image pair makes it possible to adapt the segmentation of the atlas to the target image [1,2,3]. This process is sometimes referred to as label propagation. Errors in the reference (atlas) segmentation and the anatomical correspondence estimate affect the accuracy of the propagated label set. The precision of the method (i.e., agreement between propagated labels from multiple atlases onto a target) depends on the consistency with which the different atlases have been segmented.
Rolf Heckemann was supported by a research fellowship from the Dunhill Medical Trust. The authors would like to thank the National Society for Epilepsy, Chalfont St Peter, Buckinghamshire, UK, for making available the original 30 MRI datasets used in the creation of the atlases. This work was funded by the UK EPSRC as part of the IXI Project.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 815–822, 2006. Springer-Verlag Berlin Heidelberg 2006
816
R.A. Heckemann et al.
By combining multiple segmentations of a single target using vote rule decision fusion [4], random errors in the atlas segmentation and registration tend to cancel, resulting in an improved target segmentation. This has been demonstrated for confocal microscopy images of bee brains [5] as well as for MR images of human brains (albeit with large atlas regions) [1]. Decision fusion has also been applied successfully to a method of parcellation of the human cortex [6]. The method can also be refined by differential weighting of the input classifiers based on expectation-maximization algorithms [7,8]. The aim of this work was to investigate the relationship between the number of equally weighted input classifiers and the segmentation improvement achieved. We hypothesized that the rate of convergence with increasing atlas numbers can be modelled using a characteristic equation. By fitting model parameters based on limited input data, the maximum achievable segmentation accuracy as well as the quality of the registration can be determined. We carried out a leave-one-out cross-validation study on 30 expertly segmented human brain MR image volumes. The effect of the input data upon the fitted model was explored using simulations.
2
Method
Data. Three-dimensional T1-weighted MR volumes were available from 30 normal volunteers, age range 20–54 years, median age 30.5 years, 15 male, 15 female, 25 strongly right-handed, 5 non-right-handed. Each data set was accompanied by a set of labels in the form of an annotated image volume, where every voxel was coded as one of 67 anatomical structures (or background, code 0). These labels had been prepared using a protocol for manually outlining anatomical structures on two-dimensional sections from the image volume. The protocol used was an augmented version of a published prototype [9]. Image Registration and Label Propagation. Every subject was paired with every other subject for image registration. Intracranial structures were extracted from the target MR images using “BET” [10]. All image pairs were aligned using 3D voxel-based registration, maximizing normalized mutual information [11] in three steps. Rigid and affine registration corrected for global differences. In the third, nonrigid step, alignment of details in the image pair was achieved by manipulating a free-form deformation represented by displacements on a grid of control points blended using cubic B-splines [12]. The spacing of control points defines the local flexibility of the nonrigid registration. It was carried out in a multi-resolution fashion using successive control point spacings of 20 mm, 10 mm, 5 mm and 2.5 mm. The final transformation was applied to the atlas labels using nearest-neighbor interpolation, generating 29 propagated label volumes for each target individual. In addition, we generated propagated label volumes based on less detailed registration, using the intermediate (rigid, affine and nonrigid with a control point spacing of 20 mm) transformation output to propagate the label volumes. Segmentation Comparison. As a measure of agreement between two segmentations of a target, we used the similarity index (SI). For a pair of labels, it
Multiclassifier Fusion in Human Brain MR Segmentation
817
is defined as the ratio between the volume of the intersection (overlap) and the mean volume of a pair of labels in the same coordinate space [13]. To compare full label sets, we calculated the mean SI over all structures. 1 2n(Lka ∩ Lkb ) 67 n(Lka ) + n(Lkb ) 67
SIm =
(1)
k=1
Lka,b : Labels compared; k: structure code; n: Number of voxels. The measure ranges from 0 (for label sets that have no overlapping regions) to 1 (for label sets that are identical). Decision Fusion Model. Label volumes represent classifiers that assign a structure label to every voxel in the corresponding MR image volume. To combine the information from multiple individual propagated label volumes into a consensus segmentation, the classifiers were fused on a per-voxel basis using vote rule decision fusion as described by Kittler et al. [4]. For each of the 30 target brain images, we created consensus (fused) segmentations from subsets of propagated label volumes of varied sizes (n = {3, 5, 7, .., 29}). Where possible (for n ≤ 13), we used multiple non-overlapping subsets. Individually propagated segmentations and fused label volumes were then compared with the manual label volume to determine the behaviour of SIm as a function of n. Similarly, fused label volumes from independent sets of classifiers were compared with each other. As the number of input label volumes increases, the SI values are expected to increase from an initially low level resulting from the combined effects of both systematic and random errors, towards an asymptotic value that reflects only the systematic errors. Assuming a Gaussian distribution for SI, we expect it to evolve with the number (n) of classifiers fused according to the secular equation: b SIm (n) = 1 − a − √ n
(2)
where a and b are parameters to be determined. Parameter a determines the asymptotic upper limit and reflects systematic differences between the labels being compared, whereas b is related to the random variability of the propagated labels. A small value of a implies consistent labelling. Numerical Simulations. Numerical simulations were performed using a twodimensional model as follows: A filled circle of radius 10 mm in an image matrix with 1 mm 1 mm pixels was defined to represent a ground truth label Iorig . A random free-form deformation was applied by overlaying a grid of control points with 8 mm spacing and displacing these control points. The displacements were drawn from a Gaussian distribution with a mean of zero and a standard deviation of σsys . The resulting label Isys represents an approximation to the ideal circular label containing systematic error. The model label, Isys , was then deformed multiple times with grid-point displacements drawn from a different Gaussian distribution with mean zero and standard deviation σrand , producing
818
R.A. Heckemann et al.
an ensemble of shapes that contained random errors, representing propagated labels. In step 3, independent ensembles were fused to create labels (I n f used ) that represented estimates of the original, circular label, subject to systematic and random error. The agreement (as measured by SI) of individual and fused labels with the original gold standard label and with other fused labels was then investigated to explore the behaviour of SI as a function of the number of classifiers fused.
3
Results
MR Data. The accuracy of individual propagated label sets as measured by their mean SIm with the manually generated target label set was 0.754 (range 0.692–0.799, SD = 0.016, n = 870). Compared to this baseline result, decision fusion consistently improved the level of agreement with the manual sets. The maximum accuracy was achieved using 29 classifiers: 0.836 (range 0.820–0.853, SD = 0.009, n = 30). The relationship between the number of input classifiers and the agreement level was described by Equation 2, with 95 % confidence intervals not exceeding the uncertainty of the individual data points. Fit parameters were a = 0.144 and b = 0.10 (see Fig. 1, fused-manual). The precision of segmentation-fusion as measured by mean SIm between independently generated fused label sets was also dependent on the number of input classifiers for each fused set. For three classifiers, SIm was 0.812 (SD = 0.005, n = 270). Adding classifiers resulted in progressive improvement of this precision measure, up to 0.908 (SD = 0.003, n = 30) for subsets containing the maximum possible odd number of 13 independent classifiers (see Fig. 1, fusedfused). Again, the model described the curve appropriately, albeit with a wider confidence margin. Parameter estimates were a = 0.016 and b = 0.29. Model parameters were dependent on the level of detail considered by the image registration algorithm underlying the label propagation process. The model fit for coarse registrations resulted in higher values of the a parameter, indicating that the maximum achievable agreement level for infinite classifier numbers is lower. The b parameter was also higher, indicating that the approach to convergence as classifier numbers increase is slower when coarser registration is used (Fig. 2). Simulated Label Data. The SI between the fused and the gold standard label was well approximated by the model for nearly all σsys and σrand . The model fit was worse for SI calculated between two fused labels, reproducing the finding in the experimental data. Fig. 3 shows a plot of SI versus n, where the random parameters were σsys = 4 and σrand = 5. The variation of the a and b parameters with σrand is shown in Fig. 4. Parameter b was found to be linear over the range of values studied. Parameter a was negative for low values of σrand , indicating a failure of the Gaussian assumption as the SI values approached the limiting value of 1. For multiple simulated labels the distribution of SI values passed a Kolmogorov-Smirnov normality test (α = 0.01, n = 100) once the level of systematic error (σsys ) exceeded 1mm.
Multiclassifier Fusion in Human Brain MR Segmentation
819
Fig. 1. SI vs. number of subjects in subset. Parameters a and b are shown as determined by model fitting. Error bars indicate standard deviation. Dashed lines indicate 95 % confidence intervals of the model fit. Agreement between fused labels is high with a poorer model fit.
Fig. 2. Fused-to-fused comparisons: SI vs. number of subjects per subset (n) for different registration strategies. The a and b parameters are shown as determined by nonlinear model fitting.
4
Discussion and Conclusions
When multiple propagated segmentations of a brain image are regarded as classifiers and combined using a suitable decision fusion algorithm, the resulting fused segmentation can be more accurate than any of the constituent segmentations,
820
R.A. Heckemann et al.
Fig. 3. Plots of SI versus n for various comparisons of simulated labels. The comparison between If used and Iorig is well fitted by the model. Overlaps between Isys and Iorig are shown for comparison. Random parameters: σsys = 4 and σrand = 5.
Fig. 4. Fit parameters a and b in simulated data. Both parameters increased with increasing inconsistency of individual classifiers (as parameterized by σrand ).
as random errors in the source labelling as well as the propagation process will tend to cancel each other out. The result will still be subject to systematic bias arising from the input data and the process itself. We investigated the relationship between the number of classifiers used to form a fused segmentation and the accuracy of the result, and found a model that describes this relation. By performing parameter fitting, it was possible
Multiclassifier Fusion in Human Brain MR Segmentation
821
to quantify the impact of the scale of the transformation used to propagate individual brain segmentations (affine, coarse nonrigid, fine nonrigid). The model thus provides a possible quality measure. In Fig. 2, all the methods of label propagation show increasing precision with increasing n. Thus precision is not in itself a guarantee of appropriate labelling (fusion of many labels without any registration will lead to convergence to a mean label, but not a useful one). It is notable, however, that the approach to consistency with the number of fused labels was slower for less detailed registration methods. The b value was largest when no registration was employed and showed the minimum value when the non-rigid registration with the smallest control point spacing was used. b may therefore represent a means of comparing the accuracy of different registrations. As Fig. 1 shows, the data from fused label propagation on the human brain was consistent with the model described by Equation 2. For comparisons between independent fused label volumes, the a parameter of the model fit was much smaller, indicating that the averaging that takes place in the label fusion process leads to highly precise (if not necessarily unbiased) results. The simulations reproduced all key features of the experimental results: the b parameter correlates with σrand (Fig. 4), showing that it can be used as an indication of the precision of segmentations fused in the experiment. The a parameter describes the accumulated systematic error that cannot be eliminated by considering further classifiers in the fusion process. We conclude that the model (Equation 2) describes the core behavior well. The appropriateness of the model depends on the assumption that SI values produced for fused labels are normally distributed. Formally, this assumption cannot be correct, since SI has both upper and lower bounds. Nevertheless, SI values produced by random perturbations of the label boundary passed a test of consistency with a normal distribution. This makes the use of the chosen model plausible. Our results suggest that the model can be used to estimate the veracity of fused propagated labels in the absence of a gold standard segmentation. In future work, we are planning to assess the usability of the model for assessing segmentation quality when using fused label propagation in clinical and scientific application scenarios.
References 1. Svarer, C., Madsen, K., Hasselbalch, S.G., Pinborg, L.H., Haugbol, S., Frokjaer, V.G., Holm, S., Paulson, O.B., Knudsen, G.M.: MR-based automatic delineation of volumes of interest in human brain PET images using probability maps. Neuroimage 24(4) (2005) 969–979 2. Iosifescu, D.V., Shenton, M.E., Warfield, S.K., Kikinis, R., Dengler, J., Jolesz, F.A., Mccarley, R.W.: An automated registration algorithm for measuring MRI subcortical brain structures. Neuroimage 6(1) (1997) 13–25 3. Fischl, B., Salat, D.H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., Dale, A.M.: Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron 33(3) (2002) 341–355
822
R.A. Heckemann et al.
4. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans Pattern Analysis and Machine Intelligence 20(3) (1998) 226–239 5. Rohlfing, T., Brandt, R., Menzel, R., Maurer, C.R.: Evaluation of atlas selection strategies for atlas-based image segmentation with application to confocal microscopy images of bee brains. Neuroimage 21(4) (2004) 1428–1442 6. Klein, A., Mensh, B., Ghosh, S., Tourville, J., Hirsch, J.: Mindboggle: Automated brain labeling with multiple atlases. BMC Med Imaging 5(1) (2005) 7. Warfield, S.K., Zou, K.H., Wells, W.M.: Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Med Imaging 23(7) (2004) 903–921 8. Rohlfing, T., Russakoff, D.B., Maurer, C.R.: Performance-based classifier combination in atlas-based image segmentation using expectation-maximization parameter estimation. IEEE Trans Med Imaging 23(8) (2004) 983–994 9. Hammers, A., Allom, R., Koepp, M.J., Free, S.L., Myers, R., Lemieux, L., Mitchell, T.N., Brooks, D.J., Duncan, J.S.: Three-dimensional maximum probability atlas of the human brain, with particular reference to the temporal lobe. Hum Brain Mapp 19(4) (2003) 224–247 10. Smith, S.M., Jenkinson, M., Woolrich, M.W., Beckmann, C.F., Behrens, T.E., Berg, J.H., Bannister, P.R., De Luca, M., Drobnjak, I., Flitney, D.E., Niazy, R.K., Saunders, J., Vickers, J., Zhang, Y., De Stefano, N., Brady, J.M., Matthews, P.M.: Advances in functional and structural MR image analysis and implementation as FSL. Neuroimage 23 Suppl 1 (2004) 11. Studholme, C., Hill, D.L.G., Hawkes, D.J.: An overlap invariant entropy measure of 3D medical image alignment. Pattern Recognition 32(1) (1999) 71–86 12. Rueckert, D., Sonoda, L.I., Hayes, C., Hill, D.L., Leach, M.O., Hawkes, D.J.: Nonrigid registration using free-form deformations: application to breast MR images. IEEE Trans Med Imaging 18(8) (1999) 712–721 13. Zijdenbos, A.P., Dawant, B.M., Margolin, R.A., Palmer, A.C.: Morphometric analysis of white matter lesions in MR images: method and validation. IEEE Trans Med Imaging 13(4) (1994)
Active Surface Approach for Extraction of the Human Cerebral Cortex from MRI Simon F. Eskildsen and Lasse R. Østergaard Dept. of Health Science and Technology, Aalborg University, Denmark Abstract. Segmentation of the human cerebral cortex from MRI has been subject of much attention during the last decade. Methods based on active surfaces for representing and extracting the cortical boundaries have shown promising results. We present an active surface method, that extracts the inner and outer cortical boundaries using a combination of different vector fields and a local weighting method based on the intrinsic properties of the deforming surface. Our active surface model deforms polygonal meshes to fit the boundaries of the cerebral cortex using a force balancing scheme. As a result of the local weighting strategy and a self-intersection constraint, the method is capable of modelling tight sulci where the image edge is missing or obscured. The performance of the method is evaluated using both real and simulated MRI data.
1
Introduction
During the last decade, several methods for extracting the boundaries of the human cerebral cortex from magnetic resonance imaging (MRI) have been proposed [1,2,3,4,5,6,7,8]. The segmentation of the cerebral cortex may facilitate extraction of important anatomical features, such as the cortical thickness, which may be utilised in studying the progress of a long list of neurodegenerative diseases, and in turn may aid in diagnosing these diseases. Furthermore, anatomical models of the cortex may be useful in connection with surgery simulation, preoperative planning, and postoperative evaluation. The human cerebral cortex is a complex, highly convolved sheet-like structure. In MRI the cortical boundaries are often obscured or partly missing because of poor contrast, noise, inhomogeneity artifacts and partial volume averaging originating from the acquisition. Opposite banks of tight sulci on the outer boundary may meet inside the sulcal folds and appear as connected in MRI. Active surfaces have the ability to compensate for obscured and incomplete image edges. However, in brain MRI, information of the outer cortical boundary may be completely missing in several tight sulci. The most promising methods for delineating the outer boundary use information of the white matter/grey matter (WM/GM) boundary to fit the surface to the outer cortical boundary. MacDonald et al. used a coupled surface approach, where the inner and outer surface simultaneously were deformed under proximity constraints maintaining a predefined minimum and maximum distance between the inner and outer boundary [6]. Zeng et al. also used the coupled surfaces approach in a level set framework [3]. The coupled surfaces approach has the advantage of explicitly using information of both R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 823–830, 2006. c Springer-Verlag Berlin Heidelberg 2006
824
S.F. Eskildsen and L.R. Østergaard
cortical boundaries to detect the outer boundary. This solves the problem of penetrating the deep narrow sulci. The drawbacks are the computational expense, and the constraints of a predefined distance, which may prevent the detection of abnormal thin or thick areas of the cortex. Kim et al. proposed a modification to the method by MacDonald et al. which does not contain a coupled surface constraint [8]. This method has shown promising results. Another approach by Dale et al. identified the inner cortical boundary, and expanded this surface towards the outer boundary [4]. This has the advantage that all sulci are present in the initial state, and enables the preservation of the sulci during deformation, even though the evidence of the outer boundary may be missing in the MRI data. The tight sulcal folds are modelled by preventing self-intersections in the deforming surface, thus the delineation of the folds is placed equidistant from the sulcal walls of the inner boundary. This single surface approach is fast and captures all the tight sulci. However, the expansion of the surface towards the outer boundary is sensitive to small errors or irregularities in the initial surface, which may lead to modelling of non-existent sulci. Xu et al. used a Generalised Gradient Vector Flow (GGVF) to define a direction toward the central layer of the cortex [2]. This solution provided a fast and consistently convergence of the surface, but tight sulci with no evidence of the outer boundary were not captured by this method. Recent work by Han et al. expands the surface from the central layer toward the outer boundary using a topology-preserving geometric deformable model [7]. In this approach the GGVF is only included in the model when reconstructing the central cortical layer. This paper presents an active surface approach for cortex extraction characterised by the inclusion of a GGVF in the extraction of the outer cortical boundary and the use of a local weighting strategy based on the intrinsic properties of the deforming surface.
2
Methods
The strategy for reconstructing the cerebral cortex is to first extract the inner boundary, and then displace this surface towards the outer boundary under the influence of internal and external forces. The inner boundary is extracted using the method disclosed in our earlier work [9] ensuring a surface topology of a sphere. The following explains the deformation that fits a surface to the outer cortical boundary using a surface estimating the inner boundary. 2.1
Deformation Process
The active surface is a non-parametric triangular mesh. The surface is deformed by iteratively updating each vertex with a vector defined as the sum of deformation forces. This deformation scheme has the advantage of being fast (O(n)) and eliminates problems regarding granularity, which is found in discrete methods. Even though convergence may be fast, absolute equilibrium is never reached, due to the iterative nature of the algorithm. Therefore, a threshold for the update vector is given that defines whether or not a vertex has moved during an
Active Surface Approach for Extraction of the Human Cerebral Cortex
825
iteration. The stop criterion is met when a sufficiently small number vertices are displaced during an iteration. During surface deformation the surface is remeshed at prespecified intervals using a simple mesh adaption algorithm. The remeshing is based on the vertex density of the surface. This is done to avoid clustering of vertices and allowing the surface to expand where necessary, i.e. the distribution of vertices are kept uniform throughout the surface. The surface remeshing algorithm does not change the topology of the surface, but is allowed to alter the surface geometry. Finally, the surface is prevented from self-intersecting during deformation using the same principle as described in [4]. 2.2
Internal Forces
Internal forces are applied to keep the vertices well-distributed and achieve a smooth characteristic of the surface. The internal forces used in this paper are similar to conventional smoothing forces [1,5] in form of a tensile and a flexural force. The tensile force is calculated by an approximation of the Laplacian [5]: L(i) =
1 x(j) − x(i) , m
(1)
j∈N (i)
where x(i) is the position of vertex i, N (i) are the neighbour vertices to i, x(j) is the position of i’s neighbour j, and m is the number of vertices in N (i). The flexural force is calculated by an approximation of the squared Laplacian [5]: L2 (i) =
1 L(x(j)) − L(x(i)) m
(2)
j∈N (i)
Both L and L2 are decomposed into a tangential and a normal component of the force vector as in the method of Dale et al.[4]. This enables adjustment of the contractive effect of the internal forces by weighting each component. The internal forces have the effect of smoothing and flattening the surface, however, as the target boundary is highly convolved with both peaked and flat areas, the internal forces should be relaxed in certain areas of the surface and increased in others. The deforming surface is used as a reference for the curvature of the target boundary to obtain local curvature weighting of the internal forces. To enable the surface to compensate for errors in the initial surface, and facilitate some degree of surface curvature alteration, the curvature values are recalculated at prespecified intervals during the deformation process. The curvature is estimated at each vertex of the deforming surface using the expression: σ(i) , if w(i) · n(i) ≤ 0 ρ(i) = (3) −σ(i) , otherwise 1 x(j) − x(i) −1 σ(i) = π − 2cos · w(i) , (4) m |x(j) − x(i)| j∈Ng (i)
826
S.F. Eskildsen and L.R. Østergaard
where Ng (i) is a geodesic neighbourhood around vertex i, w(i) is a unit vector pointing from i towards the centre of gravity of Ng (i), n(i) is the unit vector normal at i, and m is the number of vertices in Ng (i). Curvature values of zero are found in flat areas, positive values in convex areas and negative values in concave areas. Note that the size of Ng has great influence on the curvature values and should be chosen carefully. The curvature values are Gaussian filtered (σ = 1), normalised, and in this form used to weight the internal forces: (5)
u ˆint (i) = f (ρ(i))uint (i), where f is a weighting function defined as 1 f (x) = 1 − tan(|x|), x ∈ [−1; 1] 2 2.3
(6)
External Forces
The outer boundary of the cerebral cortex follows approximately the same convexities and concavities as the inner cortical boundary. Hence, a surface delineating the inner cortical boundary is used as an initial estimate of the outer cortical boundary. This inner surface is displaced in the direction of the local surface normals until the surface meets itself (see figure 1), and thereby model sulci, even though only little or no image information is available. For this purpose a pressure force [1] is used. The force is similar to the external force used in [5,4], but based on fuzzy memberships of the tissue classes as described in [2]. The fuzzy memberships are calculated using the fuzzy c-means algorithm [10]. The pressure force is expressed as: p(i) = ∆µ(i)n(i) ∆µ(i) = µW M (i) + µGM (i) − µCSF (i),
(7)
where µ is the membership values (trilinearly interpolated) and n(i) is the unit vector normal at vertex i. A weighting function is applied to the membership difference to ensure a degree of freedom at membership differences close to zero: (8)
p(i) ˆ = c1 g(∆µ(i))n(i),
(a) Initial
(b) Deforming
(c) Final
Fig. 1. Illustration of how the pressure force enables modelling of narrow sulci with no CSF evident. As the deformable surface (grey line) is pushed away from the WM/GM boundary in the direction of the local surface normals, it will eventually meet itself inside narrow sulci.
Active Surface Approach for Extraction of the Human Cerebral Cortex
827
where c1 is a weighting constant and g(x) = x(2 − 2cos(x)), x ∈ [−1 : 1]
(9)
Surface normals, approximated from a discrete mesh, may be misleading, as they can be perturbed by noise in the surface. This may erroneously cause modelling of non-existing features, when the surface is displaced over larger distances. Increasing the influence of the internal forces resolves this problem, but also prevents the surface from reaching small concavities, which are truly evident in the MRI. To solve the problem, the pressure force is combined with a GGVF force similar to the one used by Xu et al. [2], but with an edge map of the outer cortical boundary instead of the central cortical layer. This edge map is the first order derivative of the sum of the WM and GM fuzzy memberships. The GGVF force performs best at the gyri where information of the GM/CSF boundary is evident in the MRI, thus the normal vector is combined with the GGVF vector so the GGVF vector dominates the normal vector at the crown and ridges of gyri, and the normal vector dominates the GGVF vector along the fundi, and walls of sulci. The local surface curvature, calculated in a geodesic neighbourhood, is used for balancing the influence of the GGVF vector and the normal vector: 1 − ρ(i) 1 + ρ(i) uext (i) = c2 p(i) ˆ + g(i) , (10) 2 2 where p(i) ˆ is the pressure force vector at vertex i, g(i) is the GGVF vector at vertex i, ρ(i) is the curvature value at i given in (3), and c2 is a constant. Gradient information is used to scale the update vector u, so the magnitude of the update vector is reduced when the magnitude of the gradient increases. This is done by mapping the normalised gradient magnitudes with the function: π h(x) = cos( x), x ∈ [0 : 1] 2
(11)
and scaling the update vector u by the result. The update vector is unchanged when there is no gradient and greatly shortened when a strong gradient is present at the given vertex position. Information of the gradient is used only when close to the GM/CSF boundary and suppressed when far from the boundary. The weighted membership difference g(∆µ(i)) in (8), that provides an estimate for the spatial position of the GM/CSF boundary, is therefore utilised to weight the influence of the gradient. The resulting update vector is given as a weighted sum of a gradient weighted term and a non-gradient weighted term: u(i) = ((1 − τ )cos τ = |g(∆µ(i))|,
π
2 |∇(i)|
+ τ )(ˆ uint (i) + uext (i)),
(12)
where ∇(i) is the image gradient trilinearly interpolated at vertex i, u ˆint (i) is the weighted sum of the internal forces given in (5), and ∆µ(i) is given in (7).
828
3
S.F. Eskildsen and L.R. Østergaard
Results
Simulated MRI scans of a brain phantom1 [11] and 36 T1 weighted MRI datasets of young normal subjects from the International Consortium for Brain Mapping (ICBM) database were used for testing the general behaviour of the deformation. The same weighting constants were used in all test cases. Initial surfaces isomorph to a sphere were generated and fitted to the inner cortical boundary. The initial surfaces consisted of approximately 1.5 · 105 vertices. During deformation this number was increased to approximately 2.0 · 105 . The deformation process converged after 30-40 iterations with the stop criterion of (#moved vertices) < (1% of total vertices). The deformation of the outer surface required approximately 20 minutes on a 3 GHz Pentium 4 processor. The self-intersection tests performed throughout the deformation of the inner and outer surface were responsible for the majority of the processing time. Figure 2 shows three different modes of the deformation process in a selected part of the simulated MRI. The three modes differ in their external forces, the internal forces are the same for all three modes. In the first mode, only the
Fig. 2. Outer surface deformation process using different external forces at different stages in the process. Left to right: Deformation process at iterations 0,5,15 and 30. Top: Only pressure force is enabled. Middle: Only GGVF force is enabled. Bottom: Combination of both forces balanced by the curvature weighting function. 1
The brain phantom was provided by the McConnel Brain Imaging Centre at the Montreal Neurological Institute, http://www.bic.mni.mcgill.ca
Active Surface Approach for Extraction of the Human Cerebral Cortex
829
Fig. 3. Example of a generated cortex from ICBM data. Left: Rendering of outer surface. Right: Inner (black) and outer (white) surfaces superimposed onto MRI.
pressure force is enabled, simulating the method by Dale et al. [4]. This clearly shows that the use of the pressure force alone result in irregularities in the surface. This is especially evident at top of gyri. In the second mode, only the GGVF force is enabled, simulating the method by Xu et al. [2]. In this case the surface does not reach the fundus of sulci without evidence of CSF. There is also an undesirable behaviour in some of the sulci, because the surface is attracted to the nearest visible GM/CSF image edge. The last mode shows deformation with both the pressure force and the GGVF force enabled, balanced using the curvature weighting function. Now the tight sulci are being modelled correctly while avoiding surface irregularities on top of gyri. All 36 cortices from the ICBM database were automatically reconstructed and qualitatively assessed by visual inspection. An example of an extracted outer surface is shown in figure 3. As it can be seen from the figure, the extracted surface appears smooth, realistic and major gyri and sulci are easily recognised. The qualitative assessment of the accuracy of the extracted surfaces was made by superimposing the surfaces onto the MRI and visually inspecting the contours (figure 3, right). As it can be observed from the figure the outer cortical boundary is accurately delineated. Tight sulci are modelled even when the crowns of adjacent gyri are not separated in the image data, and the surface tend to be placed at a position equidistant to the WM walls when no CSF is evident in sulci. This indicates that the method follows the intended behaviour.
4
Summary and Conclusion
This paper presented a new method for extracting the outer boundary of the human cerebral cortex from MRI. The active surface approach combines a conventional pressure force with fuzzy tissue classifications, and a generalised gradient vector flow force, while locally weighting the forces based on the surface
830
S.F. Eskildsen and L.R. Østergaard
curvature. Preliminary tests were conducted on both simulated data and real data of young normal subjects. The primary results of these tests indicate that the method is fast, robust and accurate for segmenting the cortical boundary in both simulated and real neuroanatomical data. Still, the method needs further validation, as it must be able to perform on data with varying quality and from a varying population, if it is going to be applicable in everyday clinical use. Future work include a large scale validation on both healthy subjects and subjects with altered cortical morphology.
Acknowledgements Test data were provided courtesy of the International Consortium of Brain Mapping and McConnell Brain Imaging Centre, Montreal Neurological Institute, McGill University.
References 1. Cohen, L.D., Cohen, I.: Finite-element methods for active contour models and balloons for 2D and 3D images. IEEE Trans. Pattern Analysis and Machine Intelligence (1993) 2. Xu, C., Pham, D.L., Rettmann, M.E., Yu, D.N., Prince, J.L.: Reconstruction of the human cerebral cortex from magnetic resonance images. IEEE Trans. Medical Imaging 18(6) (1999) 467–480 3. Zeng, X., Staib, L.H., Schultz, R.T., Duncan, J.S.: Segmentation and measurement of the cortex from 3-d mr images using coupled-surfaces propagation. IEEE Trans. Medical Imaging 18(10) (1999) 100–111 4. Dale, A.M., Fischl, B., Sereno, M.I.: Cortical surface-based analysis i: Segmentation and surface reconstruction. NeuroImage 9 (1999) 179–194 5. McInerney, T., Terzopoulos, D.: Topology adaptive deformable surfaces for medical image volume segmentation. IEEE Trans. Medical Imaging 18(10) (1999) 840–850 6. MacDonald, D., Kabani, N., Avis, D., Evans, A.C.: Automated 3-D extraction of inner and outer suraces of cerebral cortex from mri. NeuroImage 12 (2000) 340–356 7. Han, X., Pham, D., Tosun, D., Rettmann, M., Xu, C., Prince, J.: Cruise: Cortical reconstruction using implicit surface evolution. NeuroImage 23(3) (2004) 997–1012 8. Kim, J.S., Singh, V., Lee, J.K., Lerch, J., Ad-Dab’bagh, Y., MacDonald, D., Lee, J.M., Kim, S.I., Evans, A.C.: Automated 3-d extraction and evaluation of the inner and outer cortical surfaces using a laplacian map and partial volume effect classification. NeuroImage 27(1) (2005) 210–221 9. Eskildsen, S.F., Uldahl, M., Østergaard, L.R.: Extraction of the cerebral cortical boundaries from mri for measurement of cortical thickness. Progress in Biomedical Optics and Imaging - Proceedings of SPIE 5747(2) (2005) 1400–1410 10. Pham, D., Prince, J.: Adaptive fuzzy segmentation of magnetic resonance images. IEEE Transactions on Medical Imaging 18(9) (1999) 737–752 11. Collins, D.L., Zijdenbos, A.P., Kollokian, V., Sled, J., Kabani, N., Holmes, C., Evans, A.: Design and construction of a realistic digital brain phantom. IEEE Transactions on Medical Imaging 17(3) (1998) 463–468
Integrated Graph Cuts for Brain MRI Segmentation Zhuang Song, Nicholas Tustison, Brian Avants, and James C. Gee Penn Image Computing and Science Lab, University of Pennsylvania, USA
[email protected]
Abstract. Brain MRI segmentation remains a challenging problem in spite of numerous existing techniques. To overcome the inherent difficulties associated with this segmentation problem, we present a new method of information integration in a graph based framework. In addition to image intensity, tissue priors and local boundary information are integrated into the edge weight metrics in the graph. Furthermore, inhomogeneity correction is incorporated by adaptively adjusting the edge weights according to the intermediate inhomogeneity estimation. In the validation experiments of simulated brain MRIs, the proposed method outperformed a segmentation method based on iterated conditional modes (ICM), which is a commonly used optimization method in medical image segmentation. In the experiments of real neonatal brain MRIs, the results of the proposed method have good overlap with the manual segmentations by human experts.
1
Introduction
Accurate and efficient voxel based segmentation of white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) is essential for quantitative brain image analysis. The main challenge of automatic brain tissue segmentation stems from the problem of the absence of distinct boundaries between different brain tissues. This problem is due to a combination of multiple factors, such as imaging noise, field inhomogeneity, partial volume effect, and intrinsic tissue variation. To overcome these difficulties, it is necessary to integrate complementary information derived from the image as well as prior knowledge [1]. This paper presents a new method of information integration to facilitate the brain tissue segmentation, in a graph based framework [2]. Markov Random Field (MRF) theory provides a sound background to model context dependent image segmentation, in which image segmentation is formulated as a problem of energy minimization. Let P denotes the set of voxels in the image. The MRF-based energy function describes the total energy associated with assigning each voxel p ∈ P one of the labels in the label set L. There are various forms of energy functions to model specific image segmentation problems. Finding the energy minimum yields the desired image segmentation. A
This work was supported in part by the USPHS under grants DA-015886, MZ402069, NS-045839, and MH-072576.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 831–838, 2006. c Springer-Verlag Berlin Heidelberg 2006
832
Z. Song et al.
successful image segmentation depends crucially on both the formulated energy function and the optimization method. Let an image be denoted by I, a defined neighborhood system by N , and the labeling on the image by f , the MRF-based energy function has the basic form given by EI (f ) = λ Dp (fp ) + Vp,q (fp , fq ), (1) p∈P
{p,q}∈N
where the data term Dp (fp ) is a function derived from the observed data and measures how well the label fp can be assigned to the voxel p based on a chosen probabilistic model. The smoothness term Vp,q (fp , fq ) measures the neighborhood interaction by penalizing discontinuities between the voxel pair {p, q} in a specified neighborhood system N . The coefficient λ weights the relative contribution between the data term and the smoothness term. For an MRF-based energy function, global optimization methods, such as Gibbs Sampler [3], are often computationally too inefficient to be applicable to 3D medical image segmentation. On the other hand, iterated conditional modes (ICM) [4], a deterministic optimization method which has been widely applied to medical image segmentation, is well known to suffer from local minima trapping [5]. The graph cut method is a relatively recent development in minimizing context-dependent MRF problems [2,6]. It has been proven to be able to locate global minima for a certain class of two-label energy functions [5,7]. Although global minimization is NP-hard for multi-label energy functions, the graph cut method can guarantee strong approximation to global minima [2]. In the graph formulation, the MRF-based energy function in Equ.(1) is coded into the edge weights. The cost of graph cut is equal to the total energy of the corresponding segmentation. To address the specific problems in brain MRI tissue segmentation, we propose an automatic brain tissue segmentation method based on the multi-label graph cut method described in [2]. Our method integrates intensity, local boundary information, and tissue priors into the edge weight metrics. In addition, inhomogeneity correction is incorporated in the adaptive graph construction. This is done by updating the edge weights according to the intermediate inhomogeneity field estimation. The proposed method was validated on both simulated and real brain MRIs.
2
Method
The binary graph cut method and the extension to multi-label graph cuts by the α-expansion algorithm are described in [6] and [2], respectively. Given an image with a set of voxels P, the goal of segmentation is to partition P into K K disjoint sets Pi , i.e., P = i=1 Pi and Pi Pj = ∅, if i = j. For the proposed algorithm, the image is represented by a weighted graph, G = V, E, where the set of nodes is denoted by V = P L. L denotes the set of terminal nodes which represents the labels. The set of edges is denoted by E = EN ET , where EN denotes the set of voxel-to-voxel edges in the defined neighborhood system
Integrated Graph Cuts for Brain MRI Segmentation
833
Terminal Nodes (Labels) GM
WM
CSF
p
q
t-links
n-links
Voxel Nodes
Fig. 1. Example of the graph with three terminals for brain MRI tissue segmentation of gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF). The set of nodes V includes all voxels and terminals.The set of edges E includes all n-links and t-links.
(n-links) and ET denotes the set of voxel-to-terminal edges (t-links). A typical graph construction for brain tissue segmentation is illustrated in Fig.1, where the three terminal nodes refer to the three brain tissue types. The set of edges E can be formulated such that a segmentation derived from the solution of the graph cut algorithm [2] minimizes certain MRF-based energy functions. In the graph construction, Dp (·) in Equ.(1) describes the edge weights of the t-links, whereas Vp,q (·) describes the edge weights of the n-links. A graph cut C ∈ E is a set of edges such that the linked nodes are in disjoint sets while each voxel node has to connect with only one terminal node which corresponds to its label. The resulting graph is denoted by G = V, E − C. The cost of the cut |C| is the sum of its edge weights in the edge set C. 2.1
Information Integration in Edge Weight Metrics
In this graph cut method, t-links are the major force in segmentation, while n-links enforce smoothness in a specified neighborhood. Intensity is the primary voxel-wise image feature used to define the edge weight metrics for both t-links and n-links. Generally the weights of t-links ET are derived from a certain probabilistic model according to the intensity distribution of each tissue type in the image, and the weights of n-links are derived from the similarity measured between nodes. To cope with the complexity and variability in brain MRI, prior knowledge of brain anatomy and tissue properties is an important resource to be integrated into automatic segmentation. Furthermore it is well understood that the intensity-based methods are susceptible to field inhomogeneity in MRIs [1]. Local boundary information provides complementary information which helps to detect the actual boundary between different tissues. In the following section, we describe the integration of tissue priors, local boundary information, and intensity information in the edge weight metrics.
834
Z. Song et al.
Integration of Tissue Priors in T-Links. To integrate prior knowledge of brain tissues, the MRF energy function given in Equ.(1) is reformulated to include the atlas-based prior knowledge which becomes E(f ) = γEI (f ) + (1 − γ)EA (f ),
(2)
where the new energy E(f ) is biased by the atlas-based energy term EA (f ), and the user-selected parameter, γ ∈ [0, 1], moderates the two terms and is derived empirically. The atlas-based energy EA is derived from the probabilistic atlas priors, which are generated by registering a sufficient number of pre-segmented MRIs to a canonical atlas space. Trained individuals segment brain tissues on each of the MRIs used in the atlas construction. Effective atlas construction relies on a robust inter-subject registration method. A symmetric diffeomorphic flow-based registration method was used [8]. The prior probability that a voxel is assigned a label is estimated by averaging the manual labeling of that voxel over the set of registered MRIs within the canonical atlas space. This average is denoted as PA . The energy contribution from each voxel labeling is defined as EA (fp ) = − ln PA (fp ),
(3)
where fp ∈ L. Therefore, conjoining Equ.(1), (2), and (3) leads to the following total energy function to be minimized, E(f ) = (λγDp (fp ) + (1 − γ)EA (fp )) + γ Vp,q (fp , fq ). (4) p∈P
{p,q}∈N
Based on Equ.(4),the edge weight of t-links in the graph is Tp (Li ) = λγDp (Li ) + (1 − γ)EA (Li ),
(5)
where Li ∈ L, and Dp (Li ) is defined as the negative log-likelihood of the image intensity distribution, Dp (Li ) = − ln PI (Ip |Li ), (6) where PI (Ip |Li ) is estimated by a Gaussian model of the intensity distribution of each tissue type. Integration of Local Boundary Information in N-Links. Intensity and local boundary information are combined into the calculation of the edge weights of n-links. Image intensity is denoted as Ip for voxel p ∈ P. The Lorentzian error norm [9], a type of robust metric, is used to measure the intensity difference between two voxel nodes p and q within a neighborhood, 2 1 |Ip − Iq | ρ(p, q) = ln 1 + , 2 σ where the robust scale σ can be estimated from the input image [9]. This quantity is used to calculate the intensity-based component of the edge weights of n-links R Wp,q = 1/(1 + ρ).
Integrated Graph Cuts for Brain MRI Segmentation
835
The boundary-based component of the n-link edge weight is derived from intervening contour probabilistic map B [10]. This map is computed through a transformation using the gradient image G. For any voxel p, Bp = 1 − exp(−Gp /σG ), where σG is the normalization factor. The probability to have a gradient-based boundary between any two voxels {p, q} is defined as the maximal intervening contour probability in the set of voxels along the line joining p and q, denoted as Mp,q . The boundary-based component of the edge weight of the n-links is thus defined as B Wp,q = 1 − max (Bx ). x∈Mp,q
Combining the intensity and boundary components, the n-link edge weight metric becomes I B Wp,q = cWp,q + (1 − c)Wp,q , (7) where the coefficient c ∈ [0, 1] is used to weight the importance of intensity versus boundary information to enforce the edge preserving smoothness. It is noted that Wp,q is equivalent to Vp,q (fp , fq ) in Equ.(4) after assigning a label to each voxel. 2.2
Adaptive Inhomogeneity Correction
The adaptive component of the segmentation combines inhomogeneity correction and segmentation in an iterative mode. MRI inhomogeneity is characterized by a low frequency field in the image domain. The inhomogeneity field is assumed to be multiplicative and uniform for any tissue types. Initial intensity means are estimated by the K-d tree based K-means clustering method. Given the intermediate segmentation results, the intensity mean μLi of each label Li is updated. The voxel value in quotient image Q is computed by Qp = Ip /μfp , where Ip denotes the intensity of voxel p, fp denotes the label assigned to voxel p. The image Q is the input to the field estimation function. A cubic B-spline approximation scheme for scattered data given in [11] is applied to estimate the inhomogeneity field H. The inhomogeneities are corrected by dividing the original image intensity by the estimated inhomogeneity field at each voxel H(p). This subsequently updates Dp (Li ) of voxel p for every terminal node Li in the t-link metric in Equ.(5).
3
Experimental Results
The proposed method was validated on both simulated and real brain MRIs. Current implementation of the method is in 2-D. The 3-D image segmentation is processed slice-by-slice, while inhomogeneity correction was performed in the 3-D volume. The 3-D graph cut algorithm has the same formulation except that the neighborhood structure is in 3-D and the computation is thus more expensive. A volumetric overlap metric was used to compare the segmentation results of the proposed method and the ground truth. For each label Li ∈ L, the volumetric overlap metric is defined as
836
Z. Song et al. ICM
IGC
A
1.0 0.9
0.9
0.8
0.8
0.7
0.7
0.6
1
3
5
7
9
B
1.0
0.6
1
3
%NOISE
5
7
9
%NOISE
Fig. 2. Overlap metrics for the segmentation of (A) GM and (B) WM of BrainWeb simulated T1 weighted images with 40% field inhomogeneity, using our integrated graph cut algorithm (IGC) and the one in [12] using ICM to minimize the energy function. It is noted that atlas priors are not used in this experiment, and both methods have the component of adaptive inhomogeneity correction.
OA,B (Li ) =
VA B (Li ) VA B (Li )
where VA B (Li ) denotes the number of voxels labeled as Li by both the proposed method and the ground truth, and VA B (Li ) denotes the number of voxels labeled as Li by either the proposed method or the ground truth. Simulated images were generated from the BrainWeb MR simulator with 40% field inhomogeneity and 1%-9% noise [13]. The simulated T1-weighted images had voxel dimensions 181 × 217 × 181, with voxel size 1 × 1 × 1 mm. As shown in Fig.2, the graph cuts have stable performance in spite of image noise and field inhomogeneity, and consistently outperforms an ICM based segmentation method [12]. To make a fair comparison, the atlas weight γ in Equ.(2) was set to be zero in this experiment. Atlas priors were used in the following experiment of real images. The real images used were MRI scans of neonatal brains, which are more difficult than adult brains for tissue segmentation because of its greater intrinsic tissue variation. The neonates used in this study were term newborn infants with the ages less than 10 days when myelination of white matter was at an early stage. Therefore, image intensities of myelinated and non-myelinated white matters were considered to be within the variance range of a single tissue type. It is noted that, in general, a T2-weighted neonatal MRI has better contrast than a T1-weighted MRI. Axial T2-weighted images were scanned by using spin-echo pulse sequence. The image voxel size is 0.35 × 0.35 × 3mm. These images have low contrast-to-noise ratio, large partial volume effect, and exhibit severe field inhomogeneity. Leave-one-out cross validation was applied to ten T2-weighted brain neonatal MRIs. Gray and white matters in all these MRIs were delineated by two human experts. Skull and other brain tissues were excluded from the images according to the manual delineation. In each round, nine of the ten MRIs were used to generate the probabilistic atlases for gray and white matters. The constructed atlas was then used in the segmentation of the remaining neonate. Fig.3 shows
Integrated Graph Cuts for Brain MRI Segmentation
837
Fig. 3. Example axial slices of T2 weighted brain MRIs of three neonates and the corresponding segmentation results of the proposed method
some example slices of the segmentation results by the proposed method. The segmentation results were compared with the corresponding manual segmentations. The volumetric overlap metric was calculated for gray and white matter respectively, and the results for the ten neonates are summarized in Table 1. As shown in the table, the results of the proposed method for real neonatal images are approximately in the range of inter-rater variance. The construction of the probabilistic atlas is difficult for the low axial resolution and low contrast-tonoise ratio in the image. In addition, the performance of the method can be improved by using a more realistic probabilistic model Dp (·) in Equ.(6).
4
Discussion
This paper describes a new method of information integration in a graph-based framework [2] for brain MRI segmentation. Although the graph cut method has provable capability to minimize energy functions, it requires careful formulation of the edge weight metrics to achieve successful performance for specific problems. In the proposed method, intensity, local boundary information, and tissue priors have been integrated in the edge weight metrics. Moreover image Table 1. Overlap metrics of the proposed method, the integrated graph cut (IGC), and the manual segmentation for the ten real neonatal brain MRIs. The manual segmentation is derived from the intersection of two manual segmentations by two raters. The second row of the table shows the overlap metrics of the two manual segmentations. GM WM IGC/Manual 0.593 ± 0.035 0.625 ± 0.055 InterRater 0.712 ± 0.022 0.710 ± 0.041
838
Z. Song et al.
inhomogeneity correction and image segmentation are combined in an iterative mode. This paper presents positive results of the proposed method applied to both simulated and real brain MRIs. More solid validation and comparison with other popular brain image segmentation methods will be done in future studies.
References 1. Ashburner, J., Friston, K.J.: Unified segmentation. Neuroimage 26 (2005) 839–851 2. Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Machine Intell. 23(11) (2001) 1222–1239 3. Geman, S., Geman, D.: Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell. 6 (1984) 721–741 classic. 4. Besag, J.: On the statistical analysis of dirty picture. J. R. Stat. Soc., B 48(3) (1986) 259–302 5. Greig, D., Porteous, B.T., Seheult, A.H.: Exact maximum a posteriori estimation for binary images. J. R. Stat. Soc., B 2 (1989) 271–279 6. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Machine Intell. 26 (2004) 1124–1137 7. Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts? IEEE Trans. Pattern Anal. Machine Intell. 26(2) (2004) 147–159 8. Avants, B.: Shape Optimizing Diffeomorphisms for Medical Image Analysis. PhD thesis, University of Pennsylvania (2005) 9. Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. New York: Wiley (1987) 10. Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and texture analysis for image segmentation. International Journal of Computer Vision 43(1) (2001) 7–27 11. Lee, S., Wolberg, G., S.Y., S.: Scattered data interpolation with multilevel bsplines. IEEE Trans. Visualization and Computer Graphics 2(4) (1997) 337–354 12. Yan, M., Karp, J.: An adaptive bayesian approach to three dimensional mr brain segmentation. In: Proc. XIVth Int. Conf. Information Processing Medical Imaging. (1995) 201–213 13. Collins, D., Zijdenbos, A., Kollokian, V., Sled, J., Kabani, N., Holmes, C., A.C., E.: Design and construction of a realistic digital brain phantom. IEEE Trans. Med Imaging 17 (1998) 463–468
Validation of Image Segmentation by Estimating Rater Bias and Variance Simon K. Warfield1,2 , Kelly H. Zou2, and William M. Wells2 1
2
Computational Radiology Laboratory, Dept. Radiology, Children’s Hospital Dept. Radiology, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis St., Boston, MA 02115 USA {warfield, zou, sw}@bwh.harvard.edu Abstract. The accuracy and precision of segmentations of medical images has been difficult to quantify in the absence of a “ground truth” or reference standard segmentation for clinical data. Although physical or digital phantoms can help by providing a reference standard, they do not allow the reproduction of the full range of imaging and anatomical characteristics observed in clinical data. An alternative assessment approach is to compare to segmentations generated by domain experts. Segmentations may be generated by raters who are trained experts or by automated image analysis algorithms. Typically these segmentations differ due to intra-rater and inter-rater variability. The most appropriate way to compare such segmentations has been unclear. We present here a new algorithm to enable the estimation of performance characteristics, and a true labeling, from observations of segmentations of imaging data where segmentation labels may be ordered or continuous measures. This approach may be used with, amongst others, surface, distance transform or level set representations of segmentations, and can be used to assess whether or not a rater consistently over-estimates or under-estimates the position of a boundary.
1 Introduction Previous work for estimating a reference standard from segmentations has utilized the Expectation-Maximization (EM) algorithm to estimate performance characteristics and the hidden “true” segmentation from a collection of independent binary segmentations indicating presence or absence of a structure in a given image and from a collection of segmentations with multi-category labellings, such as gray matter, white matter and cerebrospinal fluid [1]. The method has been used to characterize image segmentations [2] and to infer labellings from repeated registrations [3]. These approaches are not appropriate when an ordering is present in the segmentation labels, such as when the segmentation boundary is represented by a distance transform or level set. The objective of this work was to generalize the existing methodology in simultaneous ground truth estimation and performance level estimation to segmentations with ordered labels. We propose here a model in which we summarize the quality of a rater-generated segmentation by the mean and variance of the distance between the segmentation boundary and a reference standard boundary. We propose an ExpectationMaximization algorithm to estimate the reference standard boundary, and the bias and variance of each rater. This enables interpretation of the quality of a segmentation generator, be it an expert human or an algorithm. A good quality rater will have a small R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 839–847, 2006. c Springer-Verlag Berlin Heidelberg 2006
840
S.K. Warfield, K.H. Zou, and W.M. Wells
average distance from the true boundary (low bias) and high precision (small variance of distance from the true boundary). We applied the proposed algorithm to the assessment of brain tumor segmentations. We evaluated the algorithm by testing its operation on phantoms to confirm its operation with a known reference standard, and we assessed the significance of the method by comparing and contrasting to other means of measuring segmentation similarities with the Dice [4] overlap measure and with STAPLE [1].
2 Method 2.1 Notations and Assumptions We observe a set of segmentations of an image, created by some number of human expert or algorithmic raters. It is our goal to estimate the true labeling of the image, and to estimate performance characteristics of each rater in comparison. The true labeling is unknown (hidden), but if it was not hidden, then it would be straightforward to compute bias and variance parameters for each rater characterizing the way in which the rater labeling differs from the true labeling. Since the true labeling is hidden, we describe here an Expectation-Maximization algorithm [5] in order to estimate the true labeling and rater performance parameters. Our EM algorithm proceeds iteratively, first estimating the complete data log-likelihood function, and then identifying the parameters that maximize the estimated complete data log-likelihood function. The estimation of the complete data log-likelihood involves computing the expected value of this function conditioned upon the observed rater labelings and previous estimates of the rater parameters. Computation of this expectation requires estimation of the conditional probability of the true score given the observed scores and rater performance parameters. Let i = 1, . . . , I index the raters generating a segmentation of an image and j = 1, . . . , J index the voxels in the image. The label of each voxel is a continuous scalar, such as may be obtained from a distance transform or level set representation of a segmentation (but is not restricted to those sources) and is referred to as a score. The score si j assigned by rater i for voxel j has the following composition: si j = τ j + βi + εi j ,
(1)
where τ j is the underlying true score for this voxel, and βi is the bias of rater i. The error term is assumed to have an uncorrelated normal distribution, i.e., εi j ∼ N(0, σ2i ) for each rater i. That is, we characterize the rater performance by a bias βi and a variance σ2i . We assume that, given an image to be scored, the raters score the image independently. We wish to estimate the the bias and variance that characterizes the performance of each rater. If the true score was known, we could estimate the rater parameters θ = (σ, β) by solving the complete data log likelihood problem: θˆ = arg max log Pr(s, τ|σ, β) θ
(2)
Since the true score is unknown, we instead estimate the complete data log likelihood using Expectation-Maximization [6] by computing its expected value under a conditional probability density function for the true scores given the rater scores and previous estimates of the rater parameters. That is, we identify the rater parameters by solving:
Validation of Image Segmentation by Estimating Rater Bias and Variance
θ(k) = arg max E log Pr(s, τ|σ, β)s, θ(k−1) θ
841
(3)
In order to compute this expectation, we need a conditional probability density for the true scores given the rater scores and rater parameter estimates. We now derive the required densities and estimators. In the absence of spatial correlations between voxels, the joint distribution of the scores of all the voxels conditional upon the true scores and rater parameters, is assumed to have the form : J I si j − (τ j + βi ) Pr(s|τ, σ, β) = ∏ ∏ φ , (4) σi j=1 i=1 where φ(·) is the probability density function (pdf) of the standard normal distribution, N(0, 1). For notational simplicity, we write the pdf N(µ, σ2 ) as φσ (µ). 2.2 A Conditional Probability Density Function for the True Scores Bayes’ theorem is applied as follows in order to derive the posterior distribution Pr(τ|s, σ, β) from the distribution of the observed scores Pr(s|τ, σ, β). Since the true score is independent of the rater bias and variance, the posterior distribution is Pr(τ, σ, β) Pr(s, σ, β) Pr(τ) Pr(σ, β) = Pr(s|τ, σ, β) · , Pr(s, σ, β) Pr(τ) = Pr(s|τ, σ, β) · . Pr(s|σ, β)
Pr(τ|s, σ, β) = Pr(s|τ, σ, β) ·
(5) (6) (7)
Upon integrating the expression of Equation 7 over τ, since the marginal distribution integrates to 1, we have τ
Pr(τ|s, σ, β)dτ =
τ
Pr(s|τ, σ, β) ·
Pr(τ) dτ, Pr(s|σ, β)
1 Pr(s|τ, σ, β) Pr(τ), Pr(s|σ, β) τ Pr(s|τ, σ, β) Pr(τ) Pr(τ|s, σ, β) = . τ Pr(s|τ, σ, β) Pr(τ)dτ 1=
(8) (9) (10)
We have considerable freedom in choosing the prior distribution on τ, Pr(τ) = ∏Jj=1 Pr(τ j ). A multivariate Gaussian distribution is a natural choice, and in the absence of other information we may choose a multivariate uniform distribution, with scale parameter h, J 1 Pr(τ) = ∏ Uh (τ j ) = J , (11) h j=1
842
S.K. Warfield, K.H. Zou, and W.M. Wells
which simplifies the form of the posterior: Pr(τ|s, σ, β) =
Pr(s|τ, σ, β) h1J 1 hJ
=
τ Pr(s|τ, σ, β)dτ
,
Pr(s|τ, σ, β) . Pr(s|τ, σ, β)dτ τ
(12) (13)
Let the bias-adjusted score be denoted µi j = si j − βi . From Equation 4 and Equation 10, we have I 1 J Pr(τ|σ, µ) = ∏ Pr(τ j ) ∏ φσi (µi j − τ j ), (14) Z j=1 i=1 where the normalizing constant is
Z=
τJ
...
J
I
∏ Pr(τ j ) ∏ φσi (µi j − τ j )dτ1 . . . dτJ .
τ1 j=1
(15)
i=1
Therefore we can write for each voxel, I 1 Pr(τ j ) ∏ φσi (µi j − τ j ) Zj i=1 I (µi j − τ j )2 1 1 = Pr(τ j ) ∏ √ exp − Zj 2σ2i i=1 2πσi
I 1 = (2πσ2i )−1/2 exp(wi j ) Pr(τ j ). ∏ Z j i=1
Pr(τ j |µ, σ) =
(16)
which is a product of normal distributions with mean µi j and variance σ2i and where wi j = −
1 I (µi j − τ j )2 ∑ σ2 2 i=1 i
1 I 1 2 ∑ σ2 (µi j − 2µi j τ j + τ2j ) 2 i=1 i I µ2 I I µi j 1 1 ij 2 = − ∑ 2 − 2τ j ∑ 2 + τ j ∑ 2 2 i=1 σi i=1 σi i=1 σi =−
(17)
If Pr(τ j ) of Equation 16 is uniform, then completing the square and identifying the terms in a Gaussian probability density function yields the following variance and mean terms I 1 1 = , ∑ 2 2 σ i=1 σi si j −βi σ2i 1 σ2
(18)
∑Ii=1 µj =
.
(19)
If Pr(τ j ) of Equation 16 is distributed N(µτ j , σ2τ j ) then it acts analogously to another rater with the specified mean and variance.
Validation of Image Segmentation by Estimating Rater Bias and Variance
843
Observe that the mean score for voxel j indicated by this distribution is an inverse rater variance weighted sum of the difference between the score of the rater and the rater bias, and that the variance of the distribution is the harmonic mean of the rater variances. 2.3 Estimating the Complete Data Log-Likelihood The parameters maximizing the conditional expectation of the complete data log-likelihood can be found from θ(k) = argmax E log Pr(s, τ|σ, β)s, θ(k−1) (20) θ (k−1) = argmax E log Pr(s|τ, σ, β)s, θ (21) θ
I J 1 1 = argmax E ∑ ∑ log exp(− 2 (µi j − τ j )2 ) s, θ(k−1) (22) 2 σ,β 2σi (2πσi ) i=1 j=1 I J 1 = argmax ∑ ∑ − log σi − 2 (µ2i j − 2µi j E(τ j ) + E(τ2j )) (23) σ,β i=1 j=1 2σi Now, given the distribution Pr(τ|s, σ, β) above, we have (k−1)
E(τ j ) = µ j E(τ2j )
(24) 2
= var(τ j ) + E(τ j ) 2 (k−1)
= (σ )
+ (µ2j )(k−1)
(25) (26)
Hence, θ
(k)
I
=
J
arg max ∑ ∑
σ,β i=1 j=1
1 (k−1) 2 2 (k−1) 2 (k−1) − log σi − 2 (µi j − 2µi j µ j + (σ ) + (µ j ) ) 2σi (27)
On differentiating Equation 27 with respect to the parameters β, σ and solving for a maximum we find the following estimators for the rater performance parameters: (k)
βi
=
(σ2i )(k) =
J
1 J
j=1
1 J
∑
∑ (si j − τ j J
(k−1)
),
(k) (k−1) 2 si j − βi − τ j + (σ2 )(k−1)
(28) (29)
j=1
The rater bias estimator is the average observed difference between rater score and the previously estimated ‘true’ score over all the voxels, and the rater variance estimator is a sum of a term describing a natural empirical variance estimator (given estimates of the rater bias and the true score) and a term that is the harmonic mean of the previous estimates of the rater variances over all raters. Thus, the estimated variance for a rater cannot be smaller than the variance associated with the true score estimate.
844
S.K. Warfield, K.H. Zou, and W.M. Wells
3 Results We applied the proposed estimation scheme to a set of digital phantoms, generated by synthetic raters with pre-specified bias and variance parameters, in order to determine if the estimation scheme would correctly identify the bias and variance of segmentation generators for which we knew the true parameter values. We applied the proposed estimation scheme to MRI scans of four different brain tumors. We compared the estimated true contour to that obtained from averaging the segmentations. We compared the segmentations identified as best and as worst by the algorithm to the original MRI scan visually, and by computing a spatial overlap measure between the reference standard and each segmentation. 3.1 Estimation of Parameters of Synthetic Raters Using a Digital Phantom Segmentations were generated by randomly perturbing an image consisting of a rectangular region with intensity 100 and a rectangular region of intensity 200. Ten random segmentations were generated, drawing from five synthetic raters with a bias of +10 units and a variance of 100 units and five synthetic raters with a bias of -10 units and a variance of 50 units. The estimation scheme was executed and estimates of the true image and of the performance characteristics of each rater were obtained. This is illustrated in Figure 1. The estimated bias was 9.9968 ± 0.003 and −9.9968 ± 0.0002 which is a tight estimate around the true value. The estimated variance was 100.16 ± 0.28 and 50.112 ± 0.101 which again is very close to the specified values of the rater variances.
(a) Synthetic True Labeling
(b) Rater segmentation with bias (c) Estimated true labelling of +10, variance of 100
Fig. 1. Estimated true labels from synthetic raters. The specified labeled data was created with two regions of intensity equal to 100 and 200 respectively. Five raters with a bias of 10 and a variance of 100, and five raters with a bias of -10 and a variance of 50 were simulated to create synthetic segmentations. The estimation scheme was used to estimate each rater bias and variance, and to estimate the hidden true labeling. As can be seen in (c), despite the noisy and limited observations such as that shown in (b), the estimated reference standard appears very similar to the specified data of (a). The closeness of the estimated parameters for each of the raters to the specified values confirms the estimate is effective.
Validation of Image Segmentation by Estimating Rater Bias and Variance
845
3.2 Segmentations of Brain Tumor MRI by Human Raters Each brain tumor was segmented up to three times by each of nine raters for a maximum of twenty seven segmentations. Each rater segmentation consisted of a closed contour delineating the extent of the tumor that the rater perceived in the MRI. A signed distance transform of each contour was computed in order to obtain a segmentation with each voxel representing the shortest distance to the boundary. The estimation scheme was executed to find the best overall reference standard utilizing all of the segmentations. The scheme was run for 100 iterations, resulting in the sum of total estimated true scores changing by less than 0.01 at the final iteration, which required less than 90 seconds of computation time on a PowerBook G4 in each case. One illustrative brain tumor is shown in Figure 2, together with the average contour, the estimated true contour and the rater segmentations the algorithm indicated to be most like and most different from the true contour.
(a) MR image of tumor
(b) Average contour
(c) Estimated true contour
(d) Best rater
(e) Worst rater
(f) STAPLE binary estimate.
Fig. 2. Estimated true segmentation from 21 human rater segmentations. The estimated true contour is consistent with the human raters, and a better reflection of the true tumor boundary than the average. The best and worst rater segmentations as determined by the algorithm is displayed. For comparison, a binary estimate obtained with STAPLE is shown in (f).
We sought to summarize the spatial overlap between the rater segmentations, the average segmentation enclosed by the contour obtained by averaging each of the signed distance transforms, and the segmentation defined by the zero level set of the estimation
846
S.K. Warfield, K.H. Zou, and W.M. Wells
procedure. Different raters perceived the tumor boundary differently, and we measured the pairwise spatial overlap using the Dice measure. The coefficient of variation of the Dice overlap measures between the estimated reference segmentation and each of the rater segmentations was 0.15, which was smaller than the coefficient of variation of the Dice overlap measures between the average segmentation and the raters (0.17) and the coefficient of variation of the Dice overlap measures between each rater and the other raters (which ranged from the smallest of 0.16 to 0.36). This indicates that there was considerable variability between the raters, but that the estimated segmentation was more consistent with the raters than both the average of the segmentations and the other rater segmentations, and so is a good summary of the segmentations.
4 Discussion and Conclusion Validation of image segmentations using an estimated reference standard is a valuable strategy for clinical imaging data where the true segmentation is unknown. Previous algorithms were designed for binary or unordered multi-category labels. Such methods strongly penalize segmentations that may differ by small mis-localizations. The novel approach developed here is suitable for segmentations represented by a surface or boundary from which a distance may be computed, or for boundaries represented directly by a level set. This may be especially well suited to complicated structures where the true boundary is challenging to estimate, such as in MRI of brain tumors, or where spatial mislocalization of the segmentation is expected and should be tolerated. An example of the latter situation would be in the estimation of center lines of blood vessels, or the colon, or in the analysis of spicules in mammography. We demonstrated that the estimation scheme was able to recover the true parameters of synthetic raters and from this form a good estimate of the true image structure using digital phantoms. We demonstrated that from a collection of human segmentations of brain tumors, the estimation scheme was able to identify a reference standard that was closer to the rater segmentations than the raters were to each other. We compared the reference standard to an estimate obtained by averaging and the results demonstrate that the average segmentation is less representative of the true anatomy. We demonstrated that the estimation scheme was able to rank the human raters, identifying the best and worst segmentations and providing valuable estimates of the rater bias and variance. Future work will examine the possibility of incorporating a model for spatial correlation in the true labeling, which will enable us to relax the assumption of voxelwise independence. It will also be interesting to examine alternatives for the prior probability of each label, such as may be derived from an anatomical atlas. Acknowledgements. This investigation was supported by NSF ITR 0426558, a research grant from CIMIT, a research grant from the Whitaker Foundation, grant RG 347A2/2 from the NMSS, and by NIH grants R21 MH67054, R01 RR021885, P41 RR013218, and U41 RR019703.
Validation of Image Segmentation by Estimating Rater Bias and Variance
847
References 1. S. K. Warfield, K. H. Zou, and W. M. Wells, “Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation,” IEEE Trans Med Imag, vol. 23, pp. 903–921, 2004. 2. T. Rohlfing, D. B. Russakoff, and C. R. Maurer, “Expectation maximization strategies for multi-atlas multi-label segmentation,” in Proceedings of International Conference of Information Processing in Medical Imaging, pp. 210–221, 2003. 3. T. Rohlfing, D. B. Russakoff, and C. R. Maurer, “Performance-based classifier combination in atlas-based image segmentation using expectation-maximization parameter estimation.,” IEEE Transactions On Medical Imaging, vol. 23, pp. 983–994, August 2004. 4. L. R. Dice, “Measures of the amount of ecologic association between species,” Ecology, vol. 26, no. 3, pp. 297–302, 1945. 5. A. Dempster, N. Laird, and D. Rubin, “Maximum-likelihood from incomplete data via the EM algorithm,” J. Royal Statist. Soc. Ser. B., vol. 39, pp. 34–37, 1977. 6. G. J. McLachlan and T. Krishnan, The EM Algorithm and Extensions. New York, New York: Wiley-Interscience, 1996.
A General Framework for Image Segmentation Using Ordered Spatial Dependency Mika¨el Rousson and Chenyang Xu Department of Imaging and Visualization Siemens Corporate Research, Princeton, NJ Abstract. The segmentation problem appears in most medical imaging applications. Many research groups are pushing toward a whole body segmentation based on atlases. With a similar objective, we propose a general framework to segment several structures. Rather than inventing yet another segmentation algorithm, we introduce inter-structure spatial dependencies to work with existing segmentation algorithms. Ranking the structures according to their dependencies, we end up with a hierarchical approach that improves each individual segmentation and provides automatic initializations. The best ordering of the structures can be learned off-line. We apply this framework to the segmentation of several structures in brain MR images.
1
Introduction
Different anatomical structures often have strong spatial dependency among each other. This spatial dependency is usually present in a hierarchical manner, i.e., the shape and pose variations of one structure is fully or partially bounded by those of other more stable structures. We refer to this type of spatial dependency as ordered spatial dependency due to its ordered nature. Radiologists routinely rely on ordered spatial dependency to help them locating and identifying structures that have large variations in shape, pose, and appearance by searching its presence relative to other structures that are much easier to identify. In this paper, we would like to take benefit from this inter-structure ordered spatial dependency in an explicit manner by proposing a novel general image segmentation framework. The proposed framework learns the ordered spatial dependency from pre-segmented training images and applies the learned model to improve both the performance and robustness of individual segmentation algorithms utilized. A key benefit of the proposed framework is that it is not another new segmentation algorithm but a new general framework that could integrate any existing segmentation algorithms. The motivation of this work arises from the fact that many powerful and effective segmentation algorithms such as seeded region growing[1], watershed[16], active contours[9,3], and graph cuts[2,12] have been proposed and used widely in particular for medical image segmentation applications. The topic of devising a segmentation framework that combines existing segmentation algorithms to achieve better results has recently emerged as a new promising research direction [13]. The work [13] proposed a framework that computes an improved segmentation result based on optimizing parameter R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 848–855, 2006. c Springer-Verlag Berlin Heidelberg 2006
A General Framework for Image Segmentation
849
values of multiple segmentation algorithms. Similarly, our proposed framework also improves the segmentation results over the individual algorithms utilized, but it achieves this in a different way by focusing on segmenting objects in a hierarchical manner using the ordered spatial dependency. Observing that structures often have a strong spatial dependency, we define a new spatial prior based on neighboring structures. This dependency is introduced by registering the structure of interest to a common reference coordinate system based on neighboring structures. This can be achieved by computing the elastic matching of the neighboring structures from one image to a reference one, and then applying it to the structure of interest. This modeling can be implemented for each structure, based on the ones already segmented. This leads us to the definition of a hierarchical segmentation framework. The proposed work has two important contributions: 1) the explicit modeling and utilization of ordered spatial dependency for segmentation; 2) the estimation of the optimal segmentation sequence for segmenting multiple structures. Our work is closely related to atlas-based segmentation (cf. [5,11,8]), which treats segmentation as a registration problem by elastically matching a pre-segmented atlas to the target image. Atlas-based segmentation approaches are generally better-suited for segmenting structures that are stable over the population of study. Our proposed framework uses elastic matching to enforce the spatial dependency and restricts the plausible segmentation space rather than using it to obtain the final segmentation. The actual segmentation of each structure is still performed using a pre-selected segmentation algorithm. Other closely related work are the active shape and appearance models [6,7,10], which assume a statistical correlation between the shape or appearance of the organs over population. Our proposed segmentation framework uses a weaker assumption by modeling the relative locations of the structures between one and another. It is also worth noting that recent works on joint segmentation and registration [17,15,14], that also use segmentation and registration in an iterative manner, are primarily used for segmenting two or more images simultaneously and do not use ordered dependency. In fact, one could incorporate active shape and appearance models within our framework as its building blocks like any other segmentation algorithms. In the following sections, we detail the proposed segmentation framework and demonstrate its utility in segmenting multiple structures from MR brain images.
2
Modeling Inter-structure Spatial Dependency
Let {S1 , . . . , SN } be the set of structures of interest in an image. We assume a dataset of M annotated images to be available. We note {sij ; i = 1, . . . , N, j = 1, . . . , M } the complete set of structures. Given a manual segmentation s of a structure S ∈ {S1 , . . . , SN }, we propose a smooth approximation of the conditional probability of an image location x to be inside the structure s: pS (x|s) ∝ exp(H (φ(x)) − 1),
850
M. Rousson and C. Xu
where φ is the distance transform of s and H is a regularized Heaviside function with controlling the level of smoothness [3]. This distribution gives high probability to voxels inside s and low probability to the ones outside, and the smoothness of the transition is related to the distance to the interface. Also, we note the conditional probability of a voxel x to belong to the background of s as pS¯(x|s): pS¯(x|s) ∝ exp(−H (φ(x))). Now, if we want to combine all the annotated instances of a structure Si to define the spatial prior probability of the structure Si , we need to place the manual segmentations in a common reference1 . It is at this point that we consider the ordered spatial dependency, i.e., Si ’s dependency on known neighboring structures. The principle is to align all the instances sij of Si to a common coordinate system using the known structures as anchors. This is done by estimating a warping between each instance of the anchor structure(s) to a chosen reference. These warpings are obtained using an image based registration algorithm [4] applied on level set representations of each structure instance. If several anchor structures are available, they are merged to form a single shape composed of several components. This allows us to constrain even more the deformation field between the structures. Then, these warpings are applied to the corresponding structures sij . Let s˜ij be the segmentation transformed by the warping ψij and φ˜ij be its level set representation, the spatial prior probability of Si and its background S¯i are defined in the reference image as the geometric mean of each individual prior: ⎛ pSi (x) ∝ ⎝
M j=1
⎞1 ⎛ ⎞1 M M M exp H (φ˜ij (x)) − 1 ⎠ , pS¯i (x) ∝ ⎝ exp −H (φ˜ij (x)) ⎠ j=1
Up to now, we did not give any detail about which anchor structures were considered to estimate the warping. A priori, we do not know which structure Si is spatially dependent on. We propose to learn these dependencies by defining the spatial probability of Si with respect to other structures {Sk , k = i}. We denote Vi all possible subsets of {Sk , k = i}, vi ∈ Vi a subset of segmented structures, and vij the corresponding annotated structures in the training image j. With these notations, all structure sij are registered to a reference image by estimating the warpings that align vij to a reference set vir . Therefore, for each choice of subset vi , we end up with different registrations and hence, a different spatial prior probability for Si . Since this probability is subject to the selected subset vi , we will use the notations pSi (x|vi ) and pS¯i (x|vi ) to denote respectively the prior probabilities of a voxel to be inside and outside the structure Si , given a set of known segmentations vi . The choice of the optimal subset of reference for each structure is studied in Section 3.2. 1
The common reference is chosen arbitrary in this work. It would be interesting to study how important this choice is.
A General Framework for Image Segmentation
851
In the following section, we incorporate this spatial prior in a general segmentation framework and we detail the complete framework obtained when a level set based approach is considered.
3
Integrating Spatial Prior in the Segmentation Process
Given an image I and a set of segmented structures v, we want to extract another structure s of the class S (the class S is one of the classes Si considered previously). We consider a statistical formulation of this segmentation problem using a maximum a posteriori estimation. This consists in maximizing the posterior conditional distribution p (s|I, v). Making the assumption that I and v are not correlated, the optimal structure is the one maximizing p (s|I, v) = p (s|I) p (s|v). The first term can be expressed with any statistically defined segmentation algorithm whereas the other one can integrate the spatial prior learned in the previous section. To incorporate this prior knowledge, we make the assumption that the prior probabilities of the locations x are independent and identically distributed. This allows us to incorporate the spatial prior probability term introduced in the previous section: p (s|v) = p S (x|v) p S¯(x|v), x∈sin
x∈sout
where sin and sout are respectively the parts of the image inside and outside the structure s. This formulation is very general and any efficient segmentation approaches like graph-cuts [2,12] and surface evolutions [9,3] can be considered. In the following, we develop our system using level set based surface evolutions but this should not be seen as the only possibility. 3.1
Level Set Based Segmentation
In the level set framework, the structure of interest is represented as the zero crossing of an embedding function φ : Ω → R : s = {x ∈ Ω|φ(x) = 0}. Hence, the problem of finding the surface s becomes the one of finding a real function φ that maximizes: p (s|I, v) → p (φ|I, v). Equivalently, the optimal solution can be obtained from the minimization of the energy: E(φ) = − log p (φ|I, v) = − log p (φ|I) − log p (φ|v) We follow [3] to define the first term with a region-based criteria and a regularity constraint. To use the spatial prior, we first need to register the anchor structures from the current image to the reference ones used for modeling. Let ψ be the obtained warping, the whole energy can be written as follows: E(φ) = − (Hφ log pin (I(x)) + (1 − Hφ ) log pout (I(x)) + ν|∇Hφ |) dx Ω −λ (Hφ log p S (ψ(x)|v) + (1 − Hφ ) log p S¯(ψ(x)|v)) dx, Ω
852
M. Rousson and C. Xu
where pin and pout are the intensity distributions inside and outside the structures. They can be estimated on-line or a priori from the learning set. We minimize this energy using a gradient descent obtained from the Euler-Lagrange equations. This gives the following curve evolution:
∇φ pin (I) p S (ψ(x)|v) φt = δ(φ) ν div + log + λ log |∇φ| pout (I) p S¯(ψ(x)|v) ⎛ ⎞
M ∇φ p (I) λ in = δ(φ) ⎝ν div + log + 2H (φ˜cj (ψ(x))) − 1 ⎠ |∇φ| pout (I) M j=1 where φ˜cj stand for the warpings estimated during the modeling phase for the current shape. The segmentation of s is obtained by evolving φ according to this equation until convergence (the initialization is discussed in the next paragraph).
M For an efficient implementation, j=1 2H (φ˜cj (x)) − 1 can be estimated offline, and then warped to the current image domain using ψ. 3.2
Hierarchical Segmentation
For a given ordering, we can run the segmentation algorithm on each structure successively. This process can be initialized automatically if we are able to segment the first structure without any spatial prior. In most medical images, this can be done easily by starting with the envelope of the body. Then, to segment each structure, we also need to initialize the associated level set. A straightforward solution is to place seeds inside each structure. Obviously, this would give a good initialization but it requires user interaction. The spatial prior can be used to make these initializations automatic by selecting the voxels with a prior probability greater than a threshold τ . More precisely, the initial level set φ0i used to extract the structure Si is set as follow2 : ⎧ ⎨ φ0 (x) = +1, if log p S (ψ(x)|v) ≥ τ, i p S¯(ψ(x)|v) ⎩ 0 φi (x) = −1, otherwise. With this technique, the segmentation of all N structures can be obtained automatically. Only the weights , ν, λ and τ must be set before starting the process.
4
Estimation of the Optimal Segmentation Sequence
To learn the subset vi that helps the segmentation of Si the best, we propose to apply the segmentation on a second set of annotated images. For each vi , we can measure the quality of the segmentation according to a chosen similarity measure M between the automatic and “true” segmentation. Assuming that, 2
Once initialized with the spatial prior, the level set is projected to a signed distance functions. This is repeated after each iteration of the level set evolution.
A General Framework for Image Segmentation
853
Fig. 1. Location priors corresponding to the optimal ordering - From left to right are the shown the reference image, and the spatial priors of (1) the lateral ventricle given the skull, (2) the thalamus given the skull and the lateral ventricle, and (3) the caudate nucleus given all other structures
if Sj depends on Sk , Sk cannot depend on Sj , the objective is to estimate the best ordering of the structures such that structures classified higher can be used to extract lower-classified ones. Once all the segmentations obtained for a given ordering, we measure the overall quality of the process by comparing the results with the manual segmentations according to a similarity measure M. The optimal ordering is then given by: ˆ = arg max O O∈O
N M
M(sij , sˆij (O)),
i=1 j=1
where O is set of all permissible orderings, and sˆij (O) is the segmentation obtained automatically in the image j for the structure Si using the ordering O. As for the similarity measure, we use the Dice coefficient. In general the number of structures to extract is relatively small (<10) and all combinations can be tested. If we fix the first structure, the number of combinations is equal (N − 1)!. Even though, this number gets high for N = 10, this is an off-line process and the user can introduce heuristics to reduce the possible orderings. For example if choosing a given structure at a high level conducts to a bad segmentation of the next one, a whole set of possible orderings can be discarded. We are conducting further investigations to reduce efficiently the possible orderings in a more theoretical way.
5
Hierarchical Segmentation of Brain Structures in MR Images
We validate this segmentation framework on several structures of the brain in MR images: the lateral ventricles, the caudate nucleus, the thalamus and the skull. These structures were annotated manually in 13 different sagittal slices. The first step is the learning of the optimal ordering. To start this process, we must be able to segment automatically the first anchor structure. For the brain image shown in Figure 1, this is relatively simple if we consider the skull. Initializing the level set with a seed in the background is sufficient to get its
854
M. Rousson and C. Xu
Fig. 2. Location priors for each possible ordering - Each column shows the spatial prior obtained for one particular ordering. Priors corresponding to the optimal ordering are shown in the third column.
Fig. 3. Examples of automatic segmentations - From left to right: segmentation without prior and with manual initialization, automatic segmentation of the same image, and three different results obtained with the “optimal” ordering
segmentation. Then, each structure can be initialized automatically by following Section 3.2. We set the threshold τ to the maximum value of the location map J minus 0.1. This guarantees to give a seed inside the structure. Having 4 structures, the number of possible orderings is 6. We have tested our algorithm for each possible choice. Figure 1 shows the sequence that maximizes the overall Dice coefficient between the obtained segmentations and the manual ones for the whole training set. Figure 2 shows the location maps estimated for each ordering. Once the optimal ordering known, we can validate the approach using a leaveone-out strategy on the 13 available images. A few results are presented in Figure 3. As quantitative validation, we computed the Dice coefficient for each of the 52 automatically computed segmentation, giving an average above 0.8.
6
Conclusion
We have presented a novel image segmentation framework that learns the ordered spatial dependency among structures to be segmented and applies it in a hierarchical manner to both provide automatic initializations and improve each individual segmentation algorithm’s performance. We demonstrated the efficacy of the proposed framework by applying it to the MR brain image segmentation with level set algorithm as its segmentation algorithm. Future work includes applying this framework to more applications with more types of segmentation algorithms. We believe that the paradigm of “boosting” segmentation performance
A General Framework for Image Segmentation
855
by combining existing segmentation algorithms into a systematic framework is a promising research direction and the work presented in this paper is one step along this direction.
References 1. R. Adams and L. Bischof. Seeded region growing. IEEE T. PAMI, 16(16):641–647, 1994. 2. Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. IEEE T. PAMI, 23(11):1222–1239, 2001. 3. T.F. Chan and L.A. Vese. Active contours without edges. IEEE T. IP, 10(2):266– 277, 2001. 4. C. Chefd’hotel. Geometric Methods in Computer Vision and Image Processing: Contributions and Applications. PhD thesis, Ecole Normale Suprieure de Cachan, April 2005. 5. D.L. Collins, C.J. Holmes, T.M. Peters, and A.C. Evans. Automatic 3-D modelbased neuroanatomical segmentation. Human Brain Mapping, 3:190–208, 1995. 6. T.F. Cootes, D. Cooper, C.J. Taylor, and J. Graham. Active shape models - their training and application. Comput. Vis. Image Und., 61(1):38–59, 1995. 7. T.F. Cootes, G.J. Edwards, and C.J. Taylor. Active appearance models. IEEE T. PAMI, 23(6):681–685, 2001. 8. S.L. Hartmann, M.H. Parks, P.R. Martin, and B.M. Dawant. Automatic 3-D segmentation of internal structures of the head in MR images using a combination of similarity and free-form transformations: part II, validation on severely atrophied brains. IEEE T. MI, 18(10):917–926, 1999. 9. M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models. Int’l J. Comp. Vis., 1(4):321–331, 1988. 10. A. Kelemen, G. Sz´ekely, and G. Gerig. Elastic model-based segmentation of 3-D neuroradiological data sets. IEEE T. MI, 18(10):828–839, 1999. 11. R. Kikinis, M.E. Shenton, and et al. A digital brain atlas for surgical planning model-driven segmentation, and teaching. IEEE T Vis. Comp. Graphics, 2(3):232– 240, 1996. 12. H. Lombaert, Y. Sun, L. Grady, and C. Xu. A multilevel banded graph cuts method for fast image segmentation. In Proc. ICCV, pages 259–265, 2005. 13. M. Maddah, K.H. Zou, W.M. Wells, R. Kikinis, and S.K. Warfield. Automatic optimization of segmentation algorithms through simultaneous truth and performance level estimation (STAPLE). In Proc. MICCAI, pages 274–282, 2004. 14. G. Unal and G. Slabaugh. Coupled PDEs for non-rigid registration and segmentation. In Proc. CVPR, pages 168–175, 2005. 15. B. Vemuri and Y. Chen. Joint image registration and segmentation. In S. Osher and N. Paragios, editors, Geometric Level Set Methods in Imaging, Vision, and Graphics, pages 251–269. Springer Verlag, 2003. 16. L. Vincent and P. Soille. Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE T. PAMI, 13(6):583–598, 1991. 17. A. Yezzi, L. Zollei, and T. Kapur. A variational framework for joint segmentation and registration. In Proc. Workshop on MMBIA, pages 44–49, 2001.
Constructing a Probabilistic Model for Automated Liver Region Segmentation Using Non-contrast X-Ray Torso CT images Xiangrong Zhou1, Teruhiko Kitagawa1, Takeshi Hara1, Hiroshi Fujita1, Xuejun Zhang1, Ryujiro Yokoyama2, Hiroshi Kondo2, Masayuki Kanematsu2, and Hiroaki Hoshi2 1 Department of Intelligent Image Information, Division of Regeneration and Advanced Medical Sciences, Graduate School of Medicine, Gifu University, Gifu 501-1194, Japan {zxr, kitagawa, hara, fujita, zhang}@fjt.info.gifu-u.ac.jp 2 Department of Radiology, Gifu University School of Medicine and University Hospital, Gifu 501-1194, Japan
Abstract. A probabilistic model was proposed in this research for fullyautomated segmentation of liver region in non-contrast X-ray torso CT images. This probabilistic model was composed of two kinds of probability that show the location and density (CT number) of the liver in CT images. The probability of the liver on the spatial location was constructed from a number of CT scans in which the liver regions were pre-segmented manually as gold standards. The probability of the liver on density was estimated specifically using a Gaussian function. The proposed probabilistic model was used for automated liver segmentation from non-contrast CT images. 132 cases of the CT scans were used for the probabilistic model construction and then this model was applied to segment liver region based on a leave-one-out method. The performances of the probabilistic model were evaluated by comparing the segmented liver with the gold standard in each CT case. The validity and usefulness of the proposed model were proved.
1 Introduction With the development of the CT technology, the whole body scans using X-ray CT or MRI scanner are practicable for the clinical purpose. Now, the radiologists can easily make a volumetric torso X-ray CT scan in as little as 20-30 seconds to examine health-related issues such as vascular problems, liver problems and the detection of lesions in different organs and tissue within the torso region. However, the interpretation for such a torso CT scan that includes over 1000 axial slices of CT images on a screen or films without any oversight is tedious for radiologists, The computer-aided diagnosis (CAD) system that can show the 3-D anatomical structure of human body and find out the location of the suspicious regions is strongly expected to reduce the burden and increase the accuracy of the medical image interpretation. Among the different organ and tissue regions, liver is one of the most important diagnosis target organs of the CAD system. Lesion detection and surgery planning of liver always require the CAD system to extract the liver region firstly from CT images. However, without any prior knowledge or assistance of human operators, the R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 856 – 863, 2006. © Springer-Verlag Berlin Heidelberg 2006
Constructing a Probabilistic Model for Automated Liver Region Segmentation
857
traditional image processing methods (such as gray-level thresholding, region growing, etc.) can not provide a reliable automated extraction of liver region especially in non-contrast CT images. The prior knowledge (always appears as an atlas) that shows the spatial location, shape and density (CT number) of liver region are required as the assistance for the automated liver segmentation process. In order to provide the prior knowledge for abdominal organ segmentations, Park et al. proposed a method for abdominal probabilistic atlas construction [1] using the contrast abdominal CT images and some further approaches were also reported [2,3]. However, how to use the atlas to realize a fully-automated liver segmentation could not be solved completely and the performance evaluations of such atlases were limited based on a small database [1, 2]. In this paper, we proposed a method to estimate the liver probability in each of CT cases specifically and use the estimated probability to segment the liver region automatically. The performance of the estimated liver probability and liver segmentation results were evaluated by a coincidence ratio measurement and a characteristic curve analysis based on manual extraction results of liver regions in 132 normal liver CT cases and 20 abnormal CT cases. In the following sections, we firstly describe the outline of our liver segmentation method in Section 2, and then, show the details of liver probability estimation method in Section 3. Experimental results are presented in Section 4 and discussions on the accuracy evaluations of the probability estimation and liver segmentation are described in Section 5. Finally, a conclusion is given in Section 6.
2 Outline of the Liver Segmentation The outline of our proposed liver region segmentation scheme is shown in Fig. 1. The input of the scheme is a non-contrast CT scan covering human torso region with an isotopic spatial resolution. The output of the scheme is a 3-D binary image in which the voxels of the liver region are labeled with ‘1’ and the voxels of other regions are labeled with ‘0’. The proposed scheme is composed of 2 principal steps: (1) Automated estimation of the liver probability in the inputted CT images. After this step, each voxel in the CT images is attached with a value (0.0-1.0) showing the probability belongs to the liver region. (2) Automated segmentation of liver region in CT images based on the estimated liver probability.
Liver segmentation
Liver probability estimation Input
Liver probability
Output
Fig. 1. Outline of our automated liver segmentation scheme
858
X. Zhou et al.
3 Automated Estimation of Liver Probability 3.1 Definition of the Probabilistic Liver Model The simplest model of liver region (described as event A) can be considered as a 3-D connected component that appears in a special location (event B) of human body with a special CT number (event C) in CT images. We define the P(A) as the probability of each voxel that belongs to liver region on CT images, P(B) as the probability of liver location in the spatial anatomical structure of human torso, and P(C|B) as the probability of the density (CT number) in liver region on CT images. Using the above definitions, we can get the following equation.
P( A ) = P( B ∩ C ) = P( B ) × P( C | B )
(1)
The liver probability P(A) for a special CT scan can be calculated by P(B) [probability of the anatomical location of liver region in human torso] and P(C|B) [probability of liver density on CT images]. This research tries to develop the methods to estimate the P(B) and P(C|B) dynamically and generate the probability P(A) for liver segmentation. 3.2 Construction of a Liver Atlas The liver atlas that used in this research is defined as a probability of liver location (includes shape and position) under a normalized anatomical structure of human body [3]. In order to guarantee the correctness and accuracy of the atlas, arranging the anatomical structure (composed of bone frame, diaphragm, and body surface) in different CT cases and investigating the variation of the liver locations based on a large number of the CT cases are necessary. A standard anatomical structure surrounding the liver region is defined for liver region arrangements [3]. We deform the anatomical structures in different CT cases to the position of the standard anatomical structure firstly, and then vote the liver region in each CT case to the spatial positions of the standard anatomical structure. The voting result that shows the liver probabilities in each spatial position in the standard anatomical structure is regarded as the liver atlas [3]. 3.3 Estimation of the Liver Probability on Spatial Location Using the constructed liver atlas in Section 3.2, we developed a procedure to estimate the probability of the liver region in an inputted CT case automatically as shown in Fig.2. This procedure included following 3 processing steps: (1): The bone region was extracted using a gray-level thresholding method [5] and the diaphragm was identified based on the shape of air regions inside of lung [6]. (2): Using a calibration of the landmarks (composed of 8 vertexes of a circumscribed hexahedron of bone frame and body surface, 200-300 sampling points of diaphragm surface) between the inputted CT cases and standard anatomical structure [3], a transformation matrix of the Thinplate spline (TPS) [4] was calculated. (3): The liver atlas in the standard anatomical structure was deformed by TPS using the calculated transformation matrix to show the probability P(B) of liver location in the inputted CT images [Fig.2].
Constructing a Probabilistic Model for Automated Liver Region Segmentation
859
Warping Transformation matrix of TPS CT image
Anatomical structure
Liver atlas
Probability of liver location
Standard anatomical structure
Fig. 2. Estimation of the liver probability on spatial location (P(B)) based on liver atlas
3.4 Estimation of the Liver Probability on Density After the P(B) estimation, we proposed a method to estimate the P(C|B) that can be considered as the liver probability on density (CT number) feature. We assumed a Gaussian distribution to approximate the density distribution of liver by the following equation.
1
p( yi | i ∈ Liver ) =
2πσ 2
⎛ − ( yi − μ ) 2 ⎞ ⎟ exp⎜ 2 ⎜ ⎟ 2 σ ⎝ ⎠
(2)
The Gaussian parameters could be estimated by observing the density histogram of the region that each voxel i should satisfy the condition [Pi(B)>0]. The CT number which was the maximum peak in the histogram was selected as the mean value μ and
14000 12000 10000 8000 6000 4000 2000 0 -50
Observed density distribution Estimated Gaussian model
1.0 Liver probability
The number of voxels
a Gaussian distribution N ( μ ,σ ) which has the same FWHM (full width half maximum) with the observed density histogram was decided. Then, we calculated the
0 0
50 100 150 CT number [H.U.]
200
Probability transformation
Fig. 3. Estimation of the liver probability on density (P(C|B))
860
X. Zhou et al.
P(C|B) from the CT images by a probability transformation using the N ( μ ,σ ) as a characteristic curve [Fig.3]. A likelihood image was generated to show the probability P(C|B) which indicates the probability of liver on CT numbers [Fig.3]. 3.5 Calculating the Liver Probability for Automated Segmentation
The probability P(A) for CT images can be obtained simply by a multiplication of P(B) and P(C|B). In fact, we generated a total likelihood image to show the liver probability P(A) by an image multiplication. Based on the P(A), the liver region can be extracted by simply selecting the voxel i that satisfied the condition [Pi(A) > th], and then, the binary regions were refined by a binary morphological processing [using a ball kernel with a radius = r]. At last, the biggest connected component in 3-D was decided as the liver region. The procedure was designed in a fully-automatic mode without any assistance by operators.
4 Experiments 132 normal liver cases of non-contrast torso CT images were used in experiment. Each CT case has an isotopic spatial resolution of about 0.6 mm and density (CT number) resolution of 12 bits. The ground truth of liver region (gold standard) in each CT case was instructed by two experienced radiologists (authors H.K. and M.K.) using a semi-automatic segmentation method for atlas construction and accuracy evaluation for segmentation results. The performance of our method was evaluated based on a leave-one-out method that selects 131 cases to construct a liver atlas and use it to segment the liver region from the surplus 1 case. The proposed liver segmentation method was also applied to the additional 20 abnormal liver cases (fat liver) using the constructed liver atlas from 132 normal cases. The parameters (th =0.03, r =2 (voxels)) were determined by our experience in liver segmentation. The experimental results of one CT case is shown in Fig.4.
(g)
(a)
(b)
(c)
(d)
(e)
(f)
(h)
Fig. 4. An example of liver segmentation results. (a) original CT image (1 coronal slice), (b) liver atlas, (c) P(B) : liver probability on spatial location, (d) P(C|B) : liver probability on density, (e) P(A) : liver probability, (f) liver segmentation result (1 coronal slice), (g) ground truth of liver (3-D view), (h) liver segmentation result (3-D view).
Constructing a Probabilistic Model for Automated Liver Region Segmentation
861
The coincidence ratio between the segmented liver region and the ground truth of liver region was used to evaluate the accuracy of the liver segmentation method. A characteristic curve analysis shows the relationship between the liver extraction ratio and over extraction ratio of liver was used to evaluate the performance and accuracy of proposed liver model (liver probabilities).
5 Discussion We found that the liver region was segmented correctly in each of CT cases. The mean value of the coincidence ratio between segmentation results and gold standards was 0.932 for normal liver cases and 0.898 for fatty liver cases, respectively. The results showed that the proposed method has the ability to segment the liver region in non-contrast CT images even for the abnormal liver cases. The error was caused by the misrecognition of a part of muscles and other tissues around the liver regions. Refining the liver segmentation results based on edge information of liver will be applied to reduce such error in the future. The performance and accuracy of the estimated liver probability were evaluated by the described characteristic curve analysis which is similar to the ROC analysis [Fig.5]. The area A under the curve was used as the evaluation standard. The mean value of the A was 0.99 with the standard deviation of 0.02 for 132 normal cases using leave-one-out method [Fig.5]. We confirmed that the liver probability estimation based on liver atlas can provide satisfied information for liver segmentation from noncontrast CT images.
Characteristic curves of one CT case 1 P(A) 0.8 Extraction ratio
Area(P(B)) = 0.852 0.6
Area(P(C | B)) = 0.515 P(B)
0.4
Area(P(A)) =0.986
Area values of three probabilities estimated based on Leave-one-out method using 132 normal liver cases
P(C | B)
0.2
Mean Std. Dev.
0 0
0.2
0.4
P(B)
P(C | B)
P(A)
0.90 0.05
0.61 0.14
0.99 0.02
0.6 0.8 1 Over extraction ratio
Fig. 5. Performance evaluation results of the liver probabilities using a characteristic curve analysis
We found the density distribution of the ground truth of liver region in each CT case was quite close to a Gaussian distribution [Fig.2]. However, the mean value and standard deviation of the liver density in each case varied largely. Using our Gaussian parameter estimation, we confirmed the mean value of the coincidence ratios between the real density distribution and estimated Gaussian model was about 94%. The
862
X. Zhou et al.
margin of the error was caused by the vessels that were recognized incorrectly as a part of liver region during the semi-automatic segment process. The experiential result showed that the density of liver region can be presented by a Gaussian distribution and Gaussian parameters should be estimated respectively from the CT images for the different patient cases. Normalizing the liver location in different patient cases was very important before the liver atlas construction; we compared the results of the liver atlas with and without the normalization [Fig.6]. We measured the convergence of probability of liver location and confirmed that our method (warping the diaphragm and bone structure to reduce the variance of liver location and shape in different CT images) was very effective to improve the accuracy of the atlas on liver location [Fig.6].
Convergence of probability
1.0
0
(a)
(b)
(104) 5 4
(b)
3 2 1 0
(a) 0
0.2 0.4 0.6 0.8 Probability of liver existence
1
(c)
Fig. 6. Liver atlas construction results (a) liver atlas without anatomical structure arrangements, (b) liver atlas with anatomical structure arrangements, (c) probability convergences of the (a) and (b)
6 Conclusion We proposed a method to construct a probabilistic liver model for automated liver segmentation from non-contrast CT images. The proposed method was applied to segment liver regions in 152 CT cases for performance evaluations. We confirmed that the proposed liver model can improve the robustness and effectiveness of liver segmentation process and provide a feasible solution to extract the liver region from the non-contrast CT images automatically.
Acknowledgements Authors thank to members of Fujita Laboratory and especially to Dr. G. Lee for her valuable discussion. This research was supported in part by research grants of Grantin-Aid for Scientific Research on Priority Areas, in part by Ministry of Health, Labour, and Welfare under a Grant-In-Aid for Cancer Research, and in part by the Knowledge Cluster Initiative of the MEXT, Japanese Government.
Constructing a Probabilistic Model for Automated Liver Region Segmentation
863
References [1] H.Park, P.H. Bland, and C.R. Meyer: Construction of an Abdominal Probabilistic Atlas and its Application in Segmentation, IEEE Trans. Med. Imag. 22 (4) (2003) 483-492. [2] H.Kobatake, A. Shimizu, X. Hu, et al.: Simultaneous Segmentation of Multiple Organs in Multi-Dimensional Medical Images, Proc. of the First International Symposium on Intelligent Assistance in Diagnosis of Multi-dimensional Medical Images, Part I, (2005) 11-18. [3] X.Zhou, T.Kitagawa, K.Okuo, et al.: Construction of a Probabilistic Atlas for Automated Liver Segmentation in Non-contrast Torso CT Images, Proc. of the 19th International Congress and Exhibition CARS 2005, International Conference Series 1281, Elsevier (2005), 1163-1168. [4] F. L. Bookstein: Principal Warps:Thin-Plate Splines and the Decomposition of Deformations, IEEE Trans. PAMI, 11 (6) (1989) 567-585. [5] X. Zhou, T. Hara, H. Fujita, et al.: Automated Segmentations of Skin, Soft-tissue and Skeleton from Torso CT images, Proc. of SPIE-Medical Imaging 2004, vol. 5370, (2004) 1634-1639. [6] X. Zhou, T. Hara, H. Fujita, et al.: Preliminary study for automated recognition of anatomical structure from torso CT images, Proc. of the 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, Shanghai, China, paper#340, (2005).
Modeling of Intensity Priors for Knowledge-Based Level Set Algorithm in Calvarial Tumors Segmentation Aleksandra Popovic1 , Ting Wu1 , Martin Engelhardt2 , and Klaus Radermacher1 1
Chair of Medical Engineering, Helmholtz-Institute for Biomedical Engineering, RWTH Aachen University, Germany
[email protected] 2 Neurosurgical Clinic, Ruhr-University Bochum, Germany Abstract. In this paper, an automatic knowledge-based framework for level set segmentation of 3D calvarial tumors from Computed Tomography images is presented. Calvarial tumors can be located in both soft and bone tissue, occupying wide range of image intensities, making automatic segmentation and computational modeling a challenging task. The objective of this study is to analyze and validate different approaches in intensity priors modeling with an attention to multiclass problems. One, two, and three class Gaussian mixture models and a discrete model are evaluated considering probability density modeling accuracy and segmentation outcome. Segmentation results were validated in comparison to manually segmented golden standards, using analysis in ROC (Receiver Operating Curve) space and Dice similarity coefficient.
1
Introduction
Segmentation of calvarial tumors in Computed Tomography (CT) images is an important research topic in the framework of the CRANIO project for computer and robot assisted craniotomy [1]. Calvarial tumors can be located in both soft and bone tissue, occupying wide range of image intensities, making automatic segmentation and computational modeling a challenging task. CT is the imaging modality of choice in the CRANIO project due to superiority Magnetic Resonance Imaging (MRI) in geometrical precision [2], required for intraoperative registration and robotic resection accuracy, and in depiction of the cranial bone involvement [3]. In our previous studies, the advantage of the level set approach in comparison with other methods has been demonstrated [4] followed by the extension of the level set framework with intensity priors [5]. We have demonstrated improvements in segmentation accuracy and convergence of level set propagation as prior knowledge about the intensity distribution of calvarial tumors is integrated into the level set speed function alongside image specific gradients. Monomodal Gaussian intensity modeling has been used. The objective of this study is to analyze methods to model intensity distribution of calvarial tumors for establishing a belief map, the likelihood that an image element, i.e. voxel, belongs to the tumor class. The belief map is used as one of the R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 864–871, 2006. c Springer-Verlag Berlin Heidelberg 2006
Modeling of Intensity Priors for Knowledge-Based Level Set Algorithm
865
speed terms in the level set propagation and for seed points generation. Hence, it is an essential part of the automatic segmentation process presented here. Gaussian mixture models (GMMs) are commonly used in the tissue classification problems under the hypothesis that each physical tissue type corresponds to one Gaussian distribution [6]. Expectation-Maximization algorithm is a standardly used method to predict parameters of the GMM. Touhami et al. [7] have proposed a discrete method to model prior intensities for knowledge-based segmentation of kidneys from 2D CT images, attractive for its computational simplicity in the contrast to the complexness of parameter optimization for the mixture models. Intensity priors have already been used within the level set framework, utilizing different appearance of brain tumors in T1 and T2 weighted MRI images [8] or with trained mixture models [9,10]. In this paper the following open issues will be tackled: – Which method for modeling of intensity priors is the most suitable for calvarial tumors? – If Gaussian mixture model is used, how many classes should be assumed? – What is the influence of this modeling on accuracy and robustness of the segmentation algorithm?
2 2.1
Methods Intensity Priors
Let I(x), I : R3 → R, x = [x, y, z]T , be a 3D gray-level CT image, and S(x), S : R3 → {0, 1}, the result of an expert manual segmentation, then T (x), I : R3 → R, an extracted tumor gray-level region is defined as follows: I(x), ∀x | S(x) = 1 T (x) = . (1) 0, ∀x | S(x) = 0 The objective is to model intensities of manually extracted tumor regions Ti (x), i = 1...N , where N is the total number of extracted tumors in the training set. The Gaussian mixture model approach assumes that the intensities take values of n Gauss distributions, with probability density function (PDF): P DFg (t) =
n−1
πi · Gμi ,σi (t),
(2)
i=0
where πi is a proportion of the i-th class and Gμ,σ (t) is a Gaussian distribution: t−μ 1 Gμ,σ (t) = √ · e− 2σ2 . σ 2π
(3)
The optimization objective is to find three parameters for each class, i.e. mean value, standard deviation, and proportion. Expectation Maximization (EM) algorithm can optimize likelihood of the distribution, but could fail to find the global maximum [11].
866
A. Popovic et al.
Important to notice is that the EM-GMM parameter prediction is performed upon the entire training set, treating it as a single sample, rather then finding parameter sets for each tumor and averaging them, as in our previous study [5]. An alternative approach is to discretely model distribution, as in [7]: ⎛ ⎞ N −1 1 ⎝1 P DFd (t) = δ(Ti (x) − t)⎠ , (4) N i=0 k Ti (x)=1
where k is the number of voxels x so that Ti (x) = 1 and δ is Dirac delta function. Obvious advantage of this approach is its forward computation without initialization problems. Following computation of prior PDF from the training set, a belief map is generated for each image under investigation. In each voxel, x, of the image, I(x), assumed belief that the voxel is inside of the tumor region could be defined as follows: B(x) = P DFg,d (I(x)). (5) Since probability densities for a given intensity are typically very small, the image obtained from Eq. 5 is rescaled to the intensity range [−1, 1]. Negative probability is the surface evolution self correcting term allowing negative values of the speed function (see section 2.2). 2.2
Level Set Algorithm
Level sets provide a computational and tracking framework for evolution of closed surface(s) along the perpendicular direction under a given velocity field. For image segmentation problems, the movement rule is specified with a spatially varying speed function, F (x), dependant on the image features. The surface evolution is defined as: ∂Ψ (x, t) = −∇Ψ (x, t) · F (x), (6) ∂t where Ψ (x, t) is a 4D implicit surface model. Here, the following framework is used for the speed function: F (x) = Fp (x) + Fc (x) + Fa (x), Fp (x) = α·(δ·B(x)+PI (x)), Fc = −β ·PI (x)·κ(x), Fa = γ ·∇Q(x)·
(7) |∇Ψ | , (8) ∇Ψ
∇Φ where κ(x) is the curvature of the current front (∇ |∇Φ| ), PI is an edge potential map and Q is the advection force, defined as:
PI (x) = e−|∇G∗I(x)| ,
Q(x) = −|∇(Gσ ∗ I(x))|,
(9)
where Gσ is the Gaussian filter. Fp , Fc , Fa are propagation, curvature, and advection speed terms, respectively. Our prior knowledge specific contribution is the enhancement of the propagation term with the belief map, B(x).
Modeling of Intensity Priors for Knowledge-Based Level Set Algorithm
867
Initialization. The algorithm was initialized with the set of points rather than the contours. The advantage of level set propagation to evolve in separated fronts that could be later merged allows this kind of initialization. Voxels with prior PDF larger than a predefined threshold and prior PDF of all neighbors in the 2x2x2 neighborhood above the threshold were selected as seed points. The threshold was empirically set to 0.8. Level set distance map was initialized with the spheres (r=3) around seed points. 2.3
Experimental Set-Up
Five patient datasets were available (A ÷ E), four thereof with diagnosed meningioma and one with skull metastasis. All patients were scanned with Siemens Somatom Plus CT, resolution: 0.43 x 0.43 x 3mm. Tumors were manually delineated by an expert. Performance estimation of learning models is done with leave-one-out cross-validation. The method results in four EM training sets. To address the issues introduced in section 1 following experiments were conducted: 1. Assessment of the optimal number of Gaussian classes in Eq. 2. 2. Accuracy and robustness analysis of the segmentation algorithm for different PDF modeling approaches, Eq. 2 and Eq. 4. An open source software system Insight Segmentation and Registration Toolkit [?] (version 2.4.1.) was used to implement all image processing and segmentation filters needed for this study.
3 3.1
Results Intensity Priors
For each training sample, one, two, and three-modal GMMs have been optimized, i.e. GMM1, {(μ10 , σ01 , π01 )}, GMM2, {(μ20 , σ02 , π02 ), (μ21 , σ12 , π12 )}, GMM3, {(μ30 , σ03 , π03 ), (μ31 , σ13 , π13 ), (μ32 , σ23 , π23 )}. More than 3 classes could not be found. Quality of GMM estimation and the comparison between the models have been performed using Kolmogorov-Smirnov (KS) method in the CDF (Cumulative Distribution Function) domain. KS differences (difference of CDF values of original and estimated sample for all intensities) have shown that for more classes the GMM modeling is more realistic, Table 1, Fig. 1. The first class has been found around HU=80, i.e. soft tissue, second class around HU=200, i.e. trabecular bone, and third class around HU=800, i.e. cortical bone. The soft tissue portion was about 80% of the entire distribution. In accordance to previously modeled intensity distributions, belief maps for each patient dataset have been generated, Fig. 2. In the case of one class GMM, it is obvious that there is no sharp distinction in soft tissue between tumorous and healthy tissue. This behavior is a consequence of the wide intensities range, raising standard deviation of the Gauss distribution. Less visible differences between other three models (GMM2, GMM3, and delta priors Eq. 4) have been further analyzed considering their influence on the segmentation accuracy.
868
A. Popovic et al. Table 1. Maximal KS differences Training set 1 Training set 2 Training set 3 Training set 4 Training set 5 0.34 0.036 0.017
0.36 0.043 0.025
0.37 0.042 0.026
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6 CDF
CDF
GMM1 GMM2 GMM3
0.5
0.34 0.038 0.024
0.33 0.046 0.026
0.5 0.4
0.4
0.3
0.3 0.1 0 0
200
400
GMM 3 GMM 2 GMM 1 Original distribution
0.2
GMM 3 GMM 2 GMM 1 Original distribution
0.2
0.1 0
600 800 1000 1200 1400 1600 HU values
Training set 1
0
200
400 600 HU values
800
1000
1200
Training set 2
Fig. 1. Cumulative density functions of original image and three Gaussian mixture models for two different training sets
(A)
(B)
(C)
(D)
(E)
Fig. 2. Belief maps generated for different priors modeling; (A) original image (B) delta priors, Eq. 4 (C) GMM with one class (D) GMM with two classes (E) GMM with three classes
Modeling of Intensity Priors for Knowledge-Based Level Set Algorithm
3.2
869
Segmentation Accuracy
The segmentation accuracy has been assessed combining analysis in the Receiver Operating Curve (ROC) space and the spatial overlap metrics, Dice similarity coefficient (DSC), [12]. Level set propagation parameters [α, β, γ, δ] range has been defined prior to leave-one-out procedure. The range has been selected wide enough to observe the best segmentation qualities for all cases in the middle of the parameter space. To illustrate intensity priors influence, two characteristic patient datasets have been selected: one with substantial bone involvement (patient C, skull metastasis Table 2. Mean and standard deviation of DSC metric for two characteristic patient datasets. Patient C is diagnosed with skull metastasis and patient E with meningioma, WHO grade II. Patient C E
GMM2 0.622 ± 0.1308 0.873 ± 0.0171
GMM3 0.801 ± 0.0045 0.893 ± 0.0016
0.8
0.95
0.75
0.94
0.7
0.93 0.92
0.6
sensitivity
sensitivity
0.65
Delta 0.730 ± 0.0061 0.902 ± 0.0016
0.55 0.5
0.91 0.9 0.89
0.45 0.4
0.88
GMM2 GMM3 delta
0.35 0.3 0
0.002
0.004
0.006
0.008
GMM2 GMM3 delta
0.87 0.01
0.86 0.004 0.006 0.008
1-specificity
0.01 0.012 0.014 0.016 0.018 1-specificity
Fig. 3. Analysis of the segmentation accuracy in ROC space for 3 different approaches in generation of intensity priors
Delta priors
GMM2
GMM3
Fig. 4. Example segmentation results for patient C
870
A. Popovic et al.
from mama carcinoma) and one with with minor bone involvement (patient E, meningioma, WHO grade II). Table 2 shows results of DSC evaluation for these two patients. Figure 3 depicts accuracy results in the ROC space.
4
Discussion and Conclusion
Assessment of modeling accuracy with Gaussian mixture models, using Kolmogorov-Smirnov method, Table 1, has unambiguously shown that the most accurate model could be obtained by using three classes GMM. This statistical result, giving modeling accuracy only, has been further validated considering its influence on the segmentation algorithm outcome. Whereas in the case of the substantial bone involvement, GMM3 has significantly outperformed other methods, Table 2 and Fig. 3, in the case of the minor bone involvement, difference between delta priors and GMM is less obvious (mean DSC 0.89 and 0.9, respectively). This indicates that delta priors method performs effective in predominantly single class problems, e.g. for kidney segmentation [7]. GMM2 has shown inadequate robustness towards segmentation parameter change, Table 2. Conclusively, GMM with three classes is the most suitable model, of those utilized in this study, for a complex multi-class problem of the modeling intensity priors of calvarial tumors. Segmentation accuracy of the automated algorithm presented here could be regarded as very good (DSC > 0.7) according to [13]. Our further work will focus on integration of the spatial information in the intensity prior modeling and level set segmentation framework proposed.
Acknowledgement This project is funded in part by the German Research Foundation (DFG), Framework ‘Surgical Navigation and Robotics’ (DFG RA548/2-1).
References 1. Popovic, A., Engelhardt, M., Wu, T., Portheine, F., Schmieder, K., Radermacher, K.: CRANIO - Computer Assisted Planning for Navigation and Robot-assisted Surgery on the Skull. In Lemke, H., Vannier, M., Inamura, K., Farman, A., Doi, K., Reiber, J., eds.: Proceedings of the 17th International Congress and Exhibition (CARS). Volume 1256 of International Congress Series., Elsevier (2003) 1269–1276 2. Grunert, P., anf J. Espinosa, K.D., Filippi, R.: Computer-aided Navigation in Neurosurgery. Neurosurg Rev 26 (2003) 73–99 3. Grover, S., Aggarwal, A., Uppal, P.S., Tandon, R.: The CT Triad of Malignacy in Meningioma - Redefinition, with a Report of Three New Cases. Neuroradiology 45 (2003) 799–803 4. Popovic, A., Engelhardt, M., Radermacher, K.: Segmentation of Skull-infltrated Tumors Using ITK: Methods and Validation. In: ISC/NA-MIC/MICCAI Workshop on Open-Source Software at MICCAI 2005. (2005)
Modeling of Intensity Priors for Knowledge-Based Level Set Algorithm
871
5. Popovic, A., Engelhardt, M., Radermacher, K.: Knowledge-based segmentation of calvarial tumors in computed tomography images. In: Bildverarbeitung f¨ ur Medizin, BVM 2006. Informatik-Aktuell, Springer-Verlag (2006) 151–155 6. Ruf, A., Greenspan, H., Goldberger, J.: Tissue classiffcation of noisy mr brain images using constrained gmm. In Duncan, J., Gerig, G., eds.: Proceedings of Eight International Congerence on Medical Image Computing and Computer Assisted Intervention, MICCAI 2005, Lecture Notes on Computer Science. Volume 3750 of Lecture Notes on Computer Science., Springer-Verlag (2005) 790–797 7. Touhami, W., Boukerroui, D., Cocquerez, J.P.: Fully automatic kidneys detection in 2d ct images: A statistical approach. In Duncan, J., Gerig, G., eds.: Eight International Conference on Medical image Computing and Computer-Assisted Intervention, MICCAI. Volume 3750 of Lecture Notes on Computer Science., Springer (2005) 262–269 8. Ho, S., Bullitt, E., Gerig, G.: Level Set Evolution with Region Competition: Automatic 3-D Segmentation of Brain Tumors. In: Proceedings of 16th International Conference on Pattern Recognition (ICPR). Volume 1. (2002) 532–535 9. Colliot, O., Mansi, T., Bernasconi, N., Naessens, V., Klironomos, D., Bernasconi, A.: Segmentation of focal cortical dysplasia lesions using a feature-based level set. In Duncan, J., Gerig, G., eds.: Eight International Conference on Medical image Computing and Computer-Assisted Intervention, MICCAI. Volume 3749 of Lecture Notes on Computer Science., Springer (2005) 375–382 10. Leventon, M., Grimson, E., Faugeras, O., Kikinis, R., III, W.W.: Knowledge-based Segmentation of Medical Images. In: Geometric Level Set Methods in Imaging, Vision, and Graphics. Springer-Verlag (2003) 401–420 11. Marin, J.M., Mengersen, K., Robert, C.P.: Chapter 16: Bayesian modelling and inference on mixtures of distributions. Number 25. In: HANDBOOK OF STATISTICS. Elsevier (2005) 12. : Statistical Validation of Image Segmntation Quality Based on a Spatial Overlap Index. Acad Radiol 11 (2004) 178–189 13. Zijdenbos, A., Dawant, B., Margolin, R., Palmer, A.: Morphometric Analysis of White Matter Lesions in MR Images: Method and Validation. IEEE Transactions on Medical Imaging 13 (1994) 716–724
A Comparison of Breast Tissue Classification Techniques Arnau Oliver1 , Jordi Freixenet1 , Robert Mart´ı1 , and Reyer Zwiggelaar2 1
Institute of Informatics and Applications, University of Girona Campus Montilivi, Ed. P-IV, 17071, Girona, Spain {aoliver, jordif, marly}@eia.udg.es 2 Department of Computer Science, University of Wales Aberystwyth SY23 3DB, UK
[email protected]
Abstract. It is widely accepted in the medical community that breast tissue density is an important risk factor for the development of breast cancer. Thus, the development of reliable automatic methods for classification of breast tissue is justified and necessary. Although different approaches in this area have been proposed in recent years, only a few are based on the BIRADS classification standard. In this paper we review different strategies for extracting features in tissue classification systems, and demonstrate, not only the feasibility of estimating breast density using automatic computer vision techniques, but also the benefits of segmentation of the breast based on internal tissue information. The evaluation of the methods is based on the full MIAS database classified according to BIRADS categories, and agreement between automatic and manual classification of 82% was obtained.
1
Introduction
Breast cancer is considered a major health problem in western countries, and constitutes the most common cancer among women in the European Union. It is calculated that between one in eight and one in twelve women will develop breast cancer during their lifetime [1]. Mammography is still the preferred and most efficient method for detecting breast cancer at early stages, a crucial issue for a high survival rate. Computer Aided Diagnosis (CAD) systems are being developed to assist radiologists in the evaluation of mammographic images [2,3]. However, recent studies have shown that the sensitivity of these systems is significantly decreased as the density of the breast increases while the specificity of the systems remains relatively constant [4]. From a medical point of view, these studies are disappointing, because it is well-known that there is a strong positive correlation between breast parenchymal density in mammograms and breast cancer risk [5]. Therefore, automatic classification of breast tissue will be beneficial, not only to estimate the density of the breast, but also to establish an optimal strategy to follow if, for example, the radiologist is looking for masses. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 872–879, 2006. c Springer-Verlag Berlin Heidelberg 2006
A Comparison of Breast Tissue Classification Techniques
2
873
Breast Density Classification
Automated parenchymal pattern classification is not a recent topic. Effectively, a number of different proposals can be found in literature [6,7]. However, only a small number of those papers have classified the density according to BIRADS categories [8,9,10,11], which is the recent standard in radiology, developed by the American College of Radiology. All the developed methods are based on extracting features from the breast, which can be related to texture or just gray-level information. A main difference between these works is the regions from which the features are extracted. For instance, Bovis and Singh [8] extract texture features as if the breast has homogenous texture, Karssemeijer [7] extracts features from regions selected according to the distance to the skin-line, and Oliver et al. [10] segment the breast according to internal tissue information. The aim of this paper is to evaluate the application of a segmentation step prior to the extraction of features in the mammogram. Figure 1 shows graphically these different strategies, which are divided in (b) no segmentation of the breast, (c) segmentation of the breast according to the distance of each pixel to the skin-line, and (d-f) segmentation of the breast according to internal tissue information, where (d) uses Fuzzy C-Means, (e) Fractal analysis, and (f) a Statistical approach. The following subsection provides more details on these strategies, while the remaining subsections describe the extracted features, as well as the classifier used in this work. However, the first step is the segmentation of the profile of the breast [12]. The aim of this initial segmentation is to isolate the breast area from other objects present in a mammographic image such as background, annotations and pectoral muscle in MLO images. This segmentation results in a minor loss of skin-line pixels in the breast area, but these pixels are not relevant for tissue density estimation. 2.1
Strategies on Breast Tissue Density Segmentation
No Segmentation. The first approach is the extraction of features of the global breast, without any kind of segmentation. Note that this approach implicitly considers the breast area as homogeneous, with the same texture for the whole breast. Segmentation According to the Distance to the Skin-Line. This approach, graphically shown in Figure 1(c), was first suggested by Karssemeijer [7]. The main idea for such approach is the assumption that a strong correlation exists between tissue thickness and distance to the skin line. To compute the distance to the skin line of the breast a distance transform is used. First, a binary object is formed by merging the breast tissue and the pectoral (or the equivalent segmented region). This object is eroded repeatedly using a circular structuring element. The number of erosions required for a pixel to be removed from the object is taken as the distance to the skin-line. Segmentation Through Fuzzy C-Means. In contrast with the previous approach, the approach proposed by [10] aims to divide the breast into two different
874
A. Oliver et al.
clusters according to the internal breast density, as is shown in Figure 1(d). The Fuzzy C-Means algorithm was used in order to group the pixels of the breast into two separate categories: fatty and dense tissue. As the authors suggested, instead of using a random placement of the initial seeds, the initialization is done by using the gray level values that represent 15% and 85% of the accumulative histogram of the breast pixels. Some mammograms do not have clearly determined dense and fatty components. In these cases, the result of the segmentation is one cluster grouping the breast tissue and the other cluster grouping regions with less compressed tissue (near the skin-line). Thus, in these cases, the breast texture information is in the breast tissue cluster, while the ribbon does not provide significant information to the system. Segmentation Using Fractal Analysis. Another work segmenting the breast using breast tissue information is the work of Raba et al. [13], in which the breasts were divided using a fractal scheme. The fractal coding recursively splits the image in quadrants depending on the information that is contained in each region. The function to determine if a region should be split is based on a local histogram measure. A broad histogram means that the region contains intensities that covers a large spectrum from dark to lighter values, and can be split in four regions. These regions are recursively evaluated until the decision function determines that the region does not need to be split, thus obtaining regions with uniform properties of tissue. Figure 1(e) shows some results using this algorithm. Segmentation Via Statistical Analysis. This novel approach is based on a statistical analysis of the breast. Windows of 25 × 25 pixels of one mammogram are extracted and used as the ground-truth in order to segment the other breasts mammograms. Some of the windows represent dense breast tissue while others represent non-dense tissue. Hence, using the fisherfaces approach [14], these windows are used to construct a model from each part of the mammogram, and subsequently, each subwindow of the mammogram is classified as one of those regions. Thus, we finally obtain a segmentation of the breast in two regions, which represents fatty and dense tissue, as is shown in Figure 1(f). 2.2
Extracted Features
Once the breast is divided in regions, a set of morphological and texture features is extracted from each one. As morphological features, the relative area and the four first histogram moments for the fatty and dense regions were calculated. A set of features derived from co-occurrence matrices [15] were used as texture features. Co-occurrence matrices are essentially two-dimensional histograms of the occurrence of pairs of gray-levels for a given displacement vector. Formally, the co-occurrence of gray levels can be specified as a matrix of relative frequencies Pij , in which two pixels separated by a distance d and angle θ have gray levels i and j. In this work we use four different directions: 0◦ , 45◦ , 90◦ , and 135◦ , and three distances equal to 1, 5, and 9 pixels. For each co-occurrence matrix the following statistics were used: contrast, energy, entropy, correlation, sum average,
A Comparison of Breast Tissue Classification Techniques
875
sum entropy, difference average, difference entropy, and homogeneity features. Thus, we deal with a set of 113 features for each class. 2.3
Classification
In order to train the system, we used a Bayesian classifier which combines two well-known classifiers: the k-Nearest Neighbours (kNN) algorithm and the C4.5 decision tree. Briefly, the kNN classifier [16] consists of classifying a non-classified vector into the k most similar vectors present in the training set, while a decision tree recursively subdivides regions in feature space into different subspaces, using different thresholds in each dimension to separate the classes “as much as possible”. In this work we used the C4.5 algorithm developed by Quinlan [17], improved by a boosting procedure. A new case is classified according to the classic Bayes equation: P (x ∈ Bi |A(x)) =
P (A(x)|x ∈ Bi )P (Bi ) j=1..4 P (A(x)|x ∈ Bj )P (Bj )
(1)
In words, we consider the probability of mammogram x with set of features A(x) to belong to the class Bi as the posterior problem. The prior is the probability of the mammogram to belong to a class before any observation of the mammogram is done. Here we used as the prior probability the number of cases that exists in the database for each class, divided by the total number of cases. The likelihood estimation is calculated by using a non-parametric estimation, which is explained in the next paragraph. Finally, the evidence is just a normalization factor, needed to ensure that the sum of posteriors probabilities for each class is one. The value of j indicates the number of BIRADS classes. The likelihood estimation is based on the combination of the explained classifiers. This is achieved following a soft-assign approach where binary (or discrete) classification results are transformed into continuous values which depict class membership. For the kNN classifier, the membership value of a class is proportional to the number of neighbors belonging to this class. The membership value for each class will be the sum of the inverse Euclidean distances between neighboring patterns belonging to that class and the unclassified pattern. A final normalization to one between all the membership values is required. In the traditional C4.5 decision tree, a new pattern is classified by using the vote of the different classifiers weighted by their accuracy. Thus, in order to achieve a membership for each class, instead of considering the voting criteria we take into account the result of each classifier. Adding all the results for the same class and normalizing all the results, the membership for each class is obtained.
3
Results
In order to test the proposed methods we used the MIAS database [18], which is a public database of digitized mammograms. An expert has classified all the 322 mammograms according to the BIRADS lexicon, obtaining 129 BIRADS I,
876
A. Oliver et al.
Table 1. Confusion matrix for the classification of the mammograms of MIAS database (a) without segmentation and (b) segmenting according to the distance to the skin-line Aut. Classif.
Truth
I II III IV
II
98 9 11 44 11 8 5 9 κ = 0.47 (a)
Aut. Classif
III
IV
15 20 46 18
7 4 5 12
I I
Truth
I
II III IV
II
117 6 10 55 8 6 0 3 κ = 0.71
III
IV
6 14 55 13
0 0 1 28
(b)
79 BIRADS II, 70 BIRADS III, and 44 BIRADS IV samples. To evaluate the different methods, a leave-one-woman-out procedure was used, in which each sample was analyzed by a classifier which was trained using all other samples except for those from the same woman. Note that this is necessary in order to not bias the results as, typically, both mammograms for a single woman have similar internal morphology. The confusion matrix for the first strategy (no segmentation) is shown in Table 1(a). This should be read as follows: rows indicate the object to be recognized (the true class) and columns indicate the label the automatic classifier associates with this object, thus obtaining the correct classified mammograms in the diagonal of the matrix. Therefore, the performance of this approach is 62%. We can see that the mammograms better classified are those belonging to BIRADS I, while mammograms belonging to BIRADS IV are the worst classified. The Cohen’ kappa (κ) coefficient [19] is also included in the results. A κ coefficient equal to one means a statistically perfect model whereas a value equal to zero means every model value was different from the actual value. Table 1(b) shows the results obtained by the second approach, which is the segmentation of the breast in regions according to the distance to the skin-line. Using such approach, the results are drastically increased compared with the no-segmentation approach, obtaining 79% correct classification and κ = 0.71. Although the performance of all the classes is improved, the highest improvement is found in BIRADS IV. Table 2 shows the results obtained using the breast tissue segmentation strategies. Namely, (a) shows the results obtained by the Fuzzy C-Means approach, (b) based on the Fractal approach, and (c) using the Statistical approach. The overall performance for each algorithm is similar: 82% (κ = 0.75), 81% (κ = 0.73), and 80% (κ = 0.73), respectively, and all are better than the results obtained by both previous approaches. The results obtained by the Fuzzy CMeans and the Statistical approach are quite similar, with more than 90% of correct classification for mammograms belonging to BIRADS I. This percentage is reduced for the rest of classes to values around 75%. In contrast, in the Fractal approach, the percentage of correct classification for BIRADS I is lower compared to the other approaches, but the other classes are slightly better classified.
A Comparison of Breast Tissue Classification Techniques
877
Table 2. Confusion matrices for MIAS mammogram classification by using the internal breast density as a segmentation strategy: (a) Fuzzy C-Means, (b) Fractal, and (c) Statistical approaches
II III IV
118 7 4 8 59 9 6 11 53 0 3 6 κ = 0.75 (a)
4
III
Aut. Classif
IV
0 3 0 35
I I
Truth
Truth
I
II
II III IV
II
III
112 11 4 9 61 7 2 4 52 0 2 7 κ = 0.73 (b)
Aut. Classif
IV
2 2 12 35
I I
Truth
Aut. Classif I
II III IV
II
III
116 8 5 6 59 10 2 8 52 0 3 8 κ = 0.73
IV
0 4 8 33
(c)
Discussion and Conclusions
This paper has presented five different automatic methods to classify mammograms according to the BIRADS standard. The paper has focused on the strategy used for the breast tissue density segmentation. Thus, we distinguished between no breast segmentation, segmentation of the breast according to the distance to the skin-line, and segmentation of the breast according to internal breast tissue. Moreover, we compared three different ways to segment the breast according to this latter strategy. Figure 1 shows the segmentation of the breast according to those strategies. The three last columns (which corresponds to the segmentation algorithms that use breast tissue information) show similar results, and therefore the classification results for these strategies are also similar. Analysing the results in more detail, one can conclude that the Fractal approach results in a pixelated segmentation, while the Statistical approach obtains larger and clearly separated regions. On the other hand, the Fuzzy C-Means performance is an intermediate solution and classification results are slightly improved compared to the other two. The different behaviour of the algorithms can be explained by their own nature. The Fractal approach recursively splits the image using a quadtree structure. When there is not sufficient variance in a quadrant, the algorithm stops splitting, thus obtaining quadrangular regions. On the other hand, with the statistical analysis of the breast, small tissue variations are insufficient to modify the assigned density class. In contrast, these small variations do affect the result of the Fuzzy C-Means algorithm. The obtained results show that the segmentation step increases the performance of the classification, improving the results by more than 20%. Moreover, we have shown that using the segmentation according to the breast tissue outperforms the segmentation according to the distance to the skin-line. Finally, we have also noted, that the strategy used to segment the internal breast tissue do not result in major variation in the results, with the Fuzzy C-Means based results slightly better than the other ones. Bovis and Singh [8] and Petroudi et al. [9] also classified breast tissue according to BIRADS categories. While Bovis and Singh reached 71% of correctly classified
878
A. Oliver et al.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 1. The reviewed strategies for dividing a mammogram into regions in mammograms belonging to BIRADS I the upper row to BIRADS IV the lower. (b) Segmentation using a single breast area, (c) based on the distance between the pixels and the skin-line, (d) based on Fuzzy C-Means clustering of pixels with similar appearance, (e) based on the fractalization of the image, and (f) using the Statistical approach.
mammograms, Petroudi et al. obtained 90%, 64%, 70% and 78% of correct classification for mammograms belonging to BIRADS I, BIRADS II, BIRADS III and BIRADS IV respectively, achieving a total of 76% correct classification for all the cases. In contrast, using a segmentation according to the internal breast tissue, these results are clearly improved, obtaining 82% of correct classification. It should be noted that the obtained results are based on a leave-one-woman-out methodology.
Acknowledgments This work was partially supported by the Spanish Ministry of Education and Science under the grant TIN2005-08792-C03-01. We also thank Dr. Josep Pont for providing the ground-truth for the experiments.
A Comparison of Breast Tissue Classification Techniques
879
References 1. Eurostat: Health statistics atlas on mortality in the European Union. Official Journal of the European Union (2002) 2. Birdwell, R., Ikeda, D., O’Shaughnessy, K., Sickles, E.: Mammographic characteristics of 115 missed cancers later detected with screening mammography and the potential utility of computer-aided detection. Radiology 219 (2002) 192–202 3. Freer, T., Ulissey, M.: Screening mammography with computer-aided detection: Prospective study of 12860 patients in a community breast center. Radiology 220 (2001) 781–786 4. Ho, W., Lam, P.: Clinical performance of computer-assisted detection (CAD) system in detecting carcinoma in breasts of different densities. Clinical Radiology 58 (2003) 133–136 5. Wolfe, J.: Risk for breast cancer development determined by mammographic parenchymal pattern. Cancer 37 (1976) 2486–2492 6. Miller, P., Astley, S.: Classification of breast tissue by texture analysis. Image and Vision Computing 10 (1992) 277–282 7. Karssemeijer, N.: Automated classification of mammographic parenchymal pattern. Physics in Medicine and Biology 28 (1998) 365–378 8. Bovis, K., Singh, S.: Classification of mammographic breast density using a combined classifier paradigm. In: Proc. Medical Image Understanding and Analysis. (2002) 177–180 9. Petroudi, S., Kadir, T., Brady, M.: Automatic classification of mammographic parenchymal patterns: A statistical approach. In: IEEE Int. Conf. Engineering in Medicine and Biology Society. Volume 1. (2003) 798–801 10. Oliver, A., Freixenet, J., Zwiggelaar, R.: Automatic classification of breast density. In: IEEE Int. Conf. on Image Processing. Volume 2. (2005) 1258–1261 11. Bosch, A., Mu˜ noz, X., Oliver, A., Mart´ı, J.: Modeling and classifying breast tissue density in mammograms. In: IEEE Conf. on Computer Vision and Pattern Recognition. Volume 2. (2006) 1552–1558 12. Raba, D., Oliver, A., Mart´ı, J., Peracaula, M., Espunya, J.: Breast segmentation with pectoral muscle suppression on digital mammograms. In: Lecture Notes in Computer Science. Volume 3523. (2005) 471–478 13. Raba, D., Mart´ı, J., Mart´ı, R., Peracaula, M.: Breast mammography asymmetry estimation based on fractal and texture analysis. In: Proc. Computed Aided Radiology and Surgery, Berlin, Germany (2005) 14. Belhumeur, P., Hespanha, J., Kriegman, D.: Eigenfaces vs Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (1997) 711–720 15. Haralick, R., Shanmugan, K., Dunstein, I.: Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics 3 (1973) 610–621 16. Duda, R., Hart, P., Stork, D.: Pattern Classification. 2 edn. John Wiley & Sons, New York (2001) 17. Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, New York (1993) 18. Suckling, J., Parker, J., Dance, D., Astley, S., Hutt, I., Boggis, C., Ricketts, I., Stamatakis, E., Cerneaz, N., Kok, S., Taylor, P., Betal, D., Savage, J.: The Mammographic Image Analysis Society digital mammogram database. In: International Workshop on Digital Mammography. (1994) 211–221 19. Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20 (1960) 27–46
Analysis of Skeletal Microstructure with Clinical Multislice CT ¨ Joel Petersson1,2, Torkel Brismar3 , and Orjan Smedby1,2,4 1
Center for Medical Image Science and Visualization (CMIV), Link¨ oping University, Sweden 2 Department of Science and Technology (ITN), Link¨ oping University, Sweden 3 Department of Radiology, Karolinska University Hospital Huddinge, Sweden 4 Department of Medicine and Care (IMV), Link¨ oping University, Sweden Abstract. In view of the great effects of osteoporosis on public health, it would be of great value to be able to measure the three-dimensional structure of trabecular bone in vivo as a means to diagnose and quantify the disease. The aim of this work was to implement a method for quantitative characterisation of trabecular bone structure using clinical CT. Several previously described parameters have been calculated from volumes acquired with a 64-slice clinical scanner. Using automated region growing, distance transforms and three-dimensional thinning, measures describing the number, thickness and spacing of bone trabeculae was obtained. Fifteen bone biopsies were analysed. The results were evaluated using micro-CT as reference. For most parameters studied, the absolute values did not agree well with the reference method, but several parameters were closely correlated with the reference method. The shortcomings appear to be due to the low resolution and high noise level. However, the high correlation found between clinical CT and micro-CT measurements suggest that it might be possible to monitor changes in the trabecular structure in vivo.
1
Introduction
The high prevalence and great societal costs of osteoporosis necessitate methods to quantify the disease in vivo, in particular when evaluating treatment. Currently, such quantification is most commonly made with dual energy X-ray absorptiometry (DXA) [1], a method which measures the mineral content of bone but fails to describe its trabecular structure. In post-mortem specimens, several parameters describing the structure in terms such as number, thickness and spacing of bone trabeculae have been identified, initially from micrographs of 2D sections, but later also from micro computed tomography [2]. For clinical as well as research purposes, it would be very attractive to be able to make similar measurements with a modality that can be used in vivo, such as clinical computed tomography (CT).
The authors are thankful to Andres Laib for providing micro-CT data, to Ingela Nystr¨ om for valuable methodological suggestions and to Johanna Petersson for technical assistance.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 880–887, 2006. c Springer-Verlag Berlin Heidelberg 2006
Analysis of Skeletal Microstructure with Clinical Multislice CT
881
The aim of this work was to ascertain whether useful estimates of parameters describing bone structure in specimens can be obtained with clinical CT, by implementing such a method, testing it in vitro and correlating it to the reference method micro-CT.
2
Material and Methods
All the methods described in this section was implemented using MATLAB on a standard PC with two 1.7 GHz processors and 2 GB of RAM. 2.1
Samples
The samples consisted of 15 bone biopsies from the radius. The biopsies were approximately cubic with a side of 10 mm. Each cube included a thin slab of cortical bone on one side to facilitate orientation. One of the samples is shown in Figure 1. When imaged with clinical CT, the bone sample was placed in a test tube filled with water, and the tube was then placed in the centre of a paraffin cylinder with a diameter of approximately 10 cm, representing soft tissue to simulate measurement in vivo.
Fig. 1. One of the trabecular bone biopsies reconstructed from micro-CT data
2.2
Data Acquisition
The clinical CT used was a 64-slice LightSpeed VCT (GE Medical Systems, Milwaukee, WI, USA). The acquired data were reconstructed using the kernel called ”boneplus”. Acquisition parameters are shown in Table 1. Micro-CT data were acquired with a small desktop CT used for analysing biopsies and other specimens (µCT 40; SCANCO Medical AG, Bassersdorf, Switzerland). The specimens can have a maximum diameter of 36 mm and a maximum length of 80 mm. The micro-CT volumes were acquired at SCANCO using a tube voltage of 70 kVp and an isotropic resolution of 20 µm. The micro-CT data were used as a reference to evaluate the clinical CT data. A cube, approximately 8 mm in side, containing only trabecular bone was extracted from each dataset for analysis. The coordinate system was placed with the z-axis along the axial sections and the y-axis along the cortical slab.
882
¨ Smedby J. Petersson, T. Brismar, and O. Table 1. Clinical CT acquisition protocol Property Value Voxel size 0.188×0.188×0.1 mm Slice thickness 0.625 mm Tube voltage 120 kV Effective mAs 130 Field of view 96 mm Matrix 512×512 Convolution kernel Boneplus
2.3
Segmentation
To obtain approximately isotropic voxels, each pair of sections was averaged into one single section in the volumes acquired with clinical CT, thus obtaining a resolution of 0.188×0.188×0.2 mm. Due to the averaging, some of the noise was also eliminated. The trabecular structure was then extracted using a modified version of the automated 3D region growing algorithm (ARG), developed by Revol-Muller et al. [3]. The ARG algorithm iterates a minimum variance region growing algorithm, [4], until the optimal segmentation is achieved according to an assessment function. Several assessment functions are proposed in [3]; in this study the DGL function was used. The function is region-based and measures the distance between the grey level function and a step function based on the average level in the segmented region and background respectively. The original ARG method was developed for bimodal histograms. However, the noise level of clinical CT was too high and there was no minimum between the distributions corresponding to bone and water respectively. Instead, the following method was used. The seed region, i.e. the first undersegmented volume, was created by thresholding at a level obtained by adding to the histogram maximum a constant large enough to ensure undersegmentation. The lower limit of the variance interval was chosen as the variance of the voxels corresponding to the undersegmented volume. The upper limit was chosen as the lower limit multiplied by a constant. For the micro-CT data, the histogram was bimodal and the two distributions could be approximated with two gaussian functions. The intersection between the two gaussians was used as a threshold. The variance of the thresholded volume was used as the centre of the variance interval. The length of the interval was chosen by multiplying the centre variance by a constant. And the initial seed region was obtained by performing an erosion of the thresholded volume. There was no need for individual thresholding of the volumes, and thus the segmentation could be performed in a completely automated mode. 2.4
Thinning
The thinning algorithm is a morphologic operation designed to erode a binary volume without changing its topology, i.e. the number of cavities, tunnels and
Analysis of Skeletal Microstructure with Clinical Multislice CT
883
components are preserved. The segmented structure was thinned to a unit-wide skeleton representing the midline of the structure. The method implemented in this study was a parallel algorithm based on the removal of simple points [5]. Using a thinning algorithm to estimate the midline, rather than, e.g., a centres-of-maximal-balls algorithm, guarantees that the skeleton is unit-wide with preserved topology. This makes it is possible to estimate parameters such as the number of nodes per volume (Tb.Nd) and termini per volume (Tb.Tm). An example of the thinning process is shown in Figure 2.
Fig. 2. Left: Result of the segmentation of a trabecular bone cube acquired with clinical CT. Right: Result of the thinning process applied to the left image. The first ten voxel layers are brighter for clarity.
2.5
Distance Transforms
When a distance transform is applied to a binary region, the value in each background voxel is replaced by its distance to the closest component voxel. In this study, the exact Euclidean distance transform was used whenever feasible. However, for the centres-of-maximal-balls algorithm described below, which requires a locally derived distance map, the faster chamfer 3-4-5 distance transform [6] was used instead. 2.6
Estimation of Mean Distance
Two methods to estimate the mean distance were used, local maxima (LM), and centres of maximal balls (CMB). Local Maxima (LM). The local maximum is here defined by: Definition 1. Let DTEuc be the Euclidean distance transform of a binary volume. A voxel p ∈ {DTEuc > 0} is considered a local maximum if all its 26neighbours, q ∈ Ne26 (p), has a distance label less or equal to the distance label of q: p ∈ LM ↔ DTEuc (p) ≥ DTEuc (q) ∀ q ∈ Ne26 (p)
(1)
The mean distance is then estimated by averaging the distance labels of the detected LM.
884
¨ Smedby J. Petersson, T. Brismar, and O.
Centres of Maximal Balls (CMB). Each value in a distance map can be considered as the radius of a sphere, the surface of which touches at least one object in the corresponding binary image. The centres-of-maximal-balls algorithm detects all spheres not completely covered by another sphere. As the CMB algorithm generalised to 3D [7] requires a locally defined distance transform, the chamfer 3-4-5 distance was used. The mean distance is then estimated by averaging the distance labels of the detected CMB. 2.7
Calculating Structural Parameters
All the parameters are calculated directly from the three-dimensional data. No models describing the shape of trabeculae were assumed. The parameters studied have been standardized by Parfitt et al. [2] except for Tb.Nd and Tb.Tm, which have been generalized to 3D in this study. Some of the parameters were estimated using both local maxima and centres of maximal balls. To distinguish these, LM or CMB have been added after the parameter abbreviation. The bone fraction (BV/TV) was calculated as the number of voxels identified as bone in the segmentation divided by the total number of voxels in the volume. Trabecular separation (Tb.Sp LM and Tb.Sp CMB) is the mean distance between the borders of the segmented trabeculae, calculated using local maxima and centres of maximal balls, respectively. The parameter is illustrated in Figure 3.
Fig. 3. The definition of Tb.Sp and Tb.N respectively. The measure is threedimensional but shown as a two-dimensional section for clarity.
Trabecular number (Tb.N) is the inverse of the mean spacing between the midlines of the trabeculae. The midlines of the trabeculae are estimated using the thinned structure. The spacing is calculated using both LM and CMB. The difference between T.Sp and Tb.N is illustrated in Figure 3. The thickness of the trabeculae (Tb.Th) was estimated by first calculating the Euclidean distance transform of the complement of the segmented volume. Thus, each voxel value in the trabeculae is replaced with the distance to the nearest background voxel. The distance labels along the thinned structure are then extracted and averaged to estimate the mean thickness. Nodes per volume (Tb.Nd) is the number of trabecular intersections per volume. A node is identified in the thinned structure as a voxel with at least three
Analysis of Skeletal Microstructure with Clinical Multislice CT
885
neighbours in its 26-neighbourhood, detected nodes within a 5×5×5-region was treated as one. The total number of nodes is then divided by the total volume. A terminus is a free end of a trabecula. The number of termini per volume is denoted by Tb.Tm. This parameter is also calculated from the thinned structure as the number of voxels with only one neighbour in its 26-neighbourhood divided by the total volume. To study dependence on orientation, the mean intercept length (MIL) was calculated in the three main directions using a three-voxel matching kernel. The kernel consists of two one-voxels and one zero-voxel. The kernel is traversed through the volume and intersections between bone and background are counted. The total traversed length is then divided by the number of water-to-bone intersections.
3
Results
The relationship between parameters estimated from clinical CT and microCT data is illustrated in Table 2. The results show considerable overestimation of thickness and distance parameters such as Tb.Th and Tb.Sp with clinical CT. Consequently, the parameter Tb.N, which is the inverse of the trabecular spacing, is underestimated. When the correlation coefficients between clinical CT and micro-CT were calculated, however, they were all above 0.7, except for Tb.Th and Tb.Tm. Table 2. Results from clinical CT compared to micro-CT
Tb.Sp LM [mm] Tb.Sp CMB [mm] Tb.N LM [mm−1 ] Tb.N CMB [mm−1 ] Tb.Th Euc [mm] BV/TV [1] Tb.Nd [mm−3 ] Tb.Tm [mm−3 ] MILx [mm] MILy [mm] MILz [mm]
4
Clinical CT mean ± SD 0.797 ± 0.049 0.830 ± 0.131 0.677 ± 0.038 0.795 ± 0.071 0.633 ± 0.016 0.424 ± 0.063 0.296 ± 0.049 0.147 ± 0.025 3.068 ± 0.468 2.445 ± 0.397 5.140 ± 0.708
Micro-CT mean ± SD 0.683 ± 0.126 0.618 ± 0.128 1.151 ± 0.167 1.357 ± 0.223 0.128 ± 0.015 0.095 ± 0.033 4.919 ± 1.778 0.792 ± 0.248 2.423 ± 0.894 2.026 ± 0.638 3.470 ± 1.039
r (95% confidence limits) 0.89 ( 0.71; 0.96) 0.77 ( 0.42; 0.92) 0.77 ( 0.43; 0.92) 0.81 ( 0.51; 0.93) 0.61 ( 0.14; 0.85) 0.93 ( 0.80; 0.98) 0.74 ( 0.37; 0.91) 0.08 (-0.45; 0.57) 0.83 ( 0.55; 0.94) 0.80 ( 0.48; 0.93) 0.76 ( 0.41; 0.92)
Discussion
In this study, a method for analysing trabecular structure was implemented and tested in bone samples. The results indicate that most of the parameters used for characterising trabecular bone in micro-CT can also be calculated for specimens imaged with clinical CT, with a result that is closely correlated to the reference
886
¨ Smedby J. Petersson, T. Brismar, and O.
method, although yielding large systematic errors. Only for trabecular thickness and number of termini, which seem to be most sensitive to low resolution, correlation coefficients below 0.7 were obtained. A major reason for choosing the ARG segmentation was the possibility to automate the algorithm. Another consequence of the region growing is that small isolated structures are excluded. Simple thresholding and removal of small components would probably create similar result. However, the choice of threshold is not obvious and the results would depend on the individual selecting it. The advantage of using a thinning algorithm is that the result is a one voxel wide structure. Often, the CMB algorithm is used to approximate midlines, which may result in Tb.N being biased due to the fact that the resulting CMB region is not one voxel wide. By using the thinned structure instead of LM or CMB for the Tb.Th calculations, the thickness is not biased by volume. Instead, Tb.Th is weighted by length which is more intuitive when measuring an elongated object. Unfortunately, the attempt to use the resulting skeleton to estimate the number of nodes and ends resulted in a resolution dependent method. A thicker structure, in number of voxels, results in more spurious branching [5]. Future research will address this matter. It is not completely clear how the measurements should be averaged. Now all detected LM or CMB are averaged. In particular with CMB there are more voxels detected in larger holes. This can be an advantage, as the measures are then to some degree weighted by volume. However, if volume weighting is desired, then the method presented by Hildebrand and Ruegsegger [8] is preferable since each voxel in the background is set to the value of the CMB that covers it. It is not obvious how this could be performed for LM, since the LM spheres do not cover the entire volume of background as CMB do. The Tb.Sp measures show an overestimation due to partial volume effects, leaving the thinnest trabeculae undetected. Tb.N shows an underestimation instead since it is a measure of the number of trabeculae per length. Thus, with undetected trabeculae or with several trabeculae treated as one, the result is fewer trabeculae. Although the absolute values are not correct, the correlation is surprisingly strong for both Tb.Sp and Tb.N. Consistently using the technique tested in this report might therefore enable reliable detection of changes in these parameters in longitudinal studies, provided that it can easily be transferred to use in vivo. Tb.Th shows a considerable overestimation and weak correlation, implying that the resolution is too low to correctly represent this aspect of trabecular structure. Similar results have previously been found by e.g. Laib and Ruegsegger [9]. The bone fraction, BV/TV, showed the strongest correlation despite the large overestimation. The overestimation is again a product of the partial volume effect associated with low resolution. The MIL parameters show that the each parameter has a relatively strong correlation to the reference. However, the absolute values are not equally
Analysis of Skeletal Microstructure with Clinical Multislice CT
887
overestimated in the three directions. The lowest correlation between clinical CT and micro-CT is seen for MILz, probably due to the large slice thickness. This precludes meaningful comparisons between the different directions. In conclusion, the high correlation for the parameters describing trabecular separation, trabecular number and bone volume fraction indicate that it might be possible to monitor changes in the microarchitecture of trabecular bone with clinical CT, as a complement to measurements of bone mineral density. The results would probably improve if a smaller slice thickness and a lower noise level could be produced.
References 1. Cullum, I.D., Ell, P.J., Ryder, J.P.: X-ray dual-photon absorptiometry: a new method for the measurement of bone density. Br J Radiol 62(739) (1989) 587– 92 Journal Article. 2. Parfitt, A.M., Drezner, M.K., Glorieux, F.H., Kanis, J.A., Malluche, H., Meunier, P.J., Ott, S.M., Recker, R.R.: Bone histomorphometry: standardization of nomenclature, symbols, and units. Report of the ASBMR Histomorphometry Nomenclature Committee. J Bone Miner Res 2(6) (1987) 595–610 3. Revol-Muller, C., Peyrin, F., Carrillon, Y., Odet, C.: Automated 3D region growing algorithm based on an assessment function. Pattern Recognition Letters 23(1-3) (2002) 137–150 4. Revol-Muller, C., Jourlin, M.: A new minimum variance region growing algorithm for image segmentation. Pattern Recognition Letters 18(3) (1997) 249–258 5. Xie, W., Thompson, R.P., Perucchio, R.: A topology-preserving parallel 3D thinning algorithm for extracting the curve skeleton. Pattern Recognition 36(7) (2003) 1529– 1544 6. Borgefors, G.: On digital distance transforms in three dimensions. Computer Vision and Image Understanding 64(3) (1996) 368–376 7. Svensson, S., di Baja, G.S.: Using distance transforms to decompose 3D discrete objects. Image Vision Comput 20(8) (2002) 529–540 8. Hildebrand, T., Ruegsegger, P.: A new method for the model-independent assessment of thicness in three-dimensional images. Journal of Microscopy 185(1) (1997) 67–75 9. Laib, A., Ruegsegger, P.: Comparison of structure extraction methods for in vivo trabecular bone measurements. Comput Med Imaging Graph 23(2) (1999) 69–74
An Energy Minimization Approach to the Data Driven Editing of Presegmented Images/Volumes Leo Grady and Gareth Funka-Lea Department of Imaging and Visualization Siemens Corporate Research Princeton, NJ, USA
[email protected],
[email protected]
Abstract. Fully automatic, completely reliable segmentation in medical images is an unrealistic expectation with today’s technology. However, many automatic segmentation algorithms may achieve a near-correct solution, incorrect only in a small region. For these situations, an interactive editing tool is required, ideally in 3D, that is usually left to a manual correction. We formulate the editing task as an energy minimization problem that may be solved with a modified version of either graph cuts or the random walker 3D segmentation algorithms. Both algorithms employ a seeded user interface, that may be used in this scenario for a user to seed erroneous voxels as belonging to the foreground or the background. In our formulation, it is unnecessary for the user to specify both foreground and background seeds.
1 Introduction Automatic segmentations of targets in images/volumes often require some degree of editing to meet the needs of a particular user (e.g., a physician). The question that we are concerned with is: Given a pre-existing segmentation (obtained through other means, e.g., an automatic algorithm), how can one edit the segmentation to correct problems? We formulate the editing task as an energy minimization problem and show how it may be optimized with a modified graph cuts [1] or random walker algorithm [2]. Another view of this work is that we detail how a prior segmentation may be seamlessly combined with graph cuts or the random walker algorithm to allow for editing, while maintaining the important property of both algorithms that an arbitrary segmentation may be achieved with enough interaction. We will use the term presegmentation to refer to the prior, incorrect, pre-existing segmentation that was obtained through other means and presented for editing. In our experience with clinicians, an editing tool is expected to have the following characteristics: 1. 2. 3. 4.
Operate locally to the interaction site. Operate quickly. Produce modifications in 3D, not just on the viewing slice. Produce intuitive solutions.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 888–895, 2006. c Springer-Verlag Berlin Heidelberg 2006
An Energy Minimization Approach
889
The graph cuts and random walker interactive segmentation algorithms appear to be good candidates for editing. Both algorithms require the user to place marks with a mouse (hereafter referred to as seeds), to indicate a few pixels belonging to the foreground/background of the target object. These seed locations are then used to produce a full 3D segmentation (labeling). A characteristic of the algorithms is that an undesired segmentation may be easily modified by adding new seeds, which typically results in faster processing of the updated segmentation. In our problem of editing, we would like to preserve the quality of fast modification of the segmentation but, instead of being given a previous set of seeds, we are given a previous complete segmentation produced by another algorithm. Use of these algorithms for editing therefore requires that we preserve the character of these algorithms as fast, 3D, and intuitive, while enforcing a locality of operation. Implicit in the requirements for local operation and intuitive results is a requirement for stability. In particular, when editing is invoked the presegmented result should not change if the user chooses not to interact. Although obvious, this requirement is not met by many potential approaches. If the editing will be done at a pixel level and the presegmentation is at a subpixel resolution then the first step of the editing will be a pixelation of the presegmentation which will change the results due to sampling even if the user does not change the label of a single pixel. The editing we describe will be at a pixel level and we assume that the presegmentation is at the same level of resolution. Another effect can also lead to instability when initiating editing — if the data driven editing is formulated as an optimization, and the current presegmentation is not a local optimum, then the segmentation result will change with the optimization even without user input. Since the source of our presegmentation is unknown, we will formulate our optimization in such a way that the presegmentation is a global optimum without user input. As stated by Kang et al. [3], “...the literature on editing tools in general and on 3D tools in particular is sparse.” In fact, many publications on medical image segmentation explicitly assume the availability of a manual editing tool to correct undesirable results. Although some user interaction is clearly necessary, our goal is to provide a tool that requires minimal user interaction to achieve the desired, edited, result. Kang et al. [3] introduce three tools for interactive correction of a presegmented structure. The first tool allows the user to select a VOI, within which holes in the segmentation are filled. The second tool allows a user to bridge points in the segmentation to indicate to the hole-filling system that a volume in the segmentation should be filled. The final tool introduces control points on the presegmented surface that the user is allowed to drag/modify. Modification of a control point on the boundary introduces a displacement field on nearby control points that results in the displacement of a boundary region in the neighborhood of the modified control point. Each of these tools has the drawback that the image content is ignored in modifying the presegmentation. In the approach we present here, the user seeds, presegmentation and image content all impact the edited segmentation. There are a number of tools for interactive segmentation in 2D [4,5,6] and 3D [7,8,9]. However, none of these are formulated to make use of a presegmentation. In addition there is a large body of literature in the computer aided design and computer graphics communities that looks at graphical model editing, but the editing is done without the
890
L. Grady and G. Funka-Lea
influence of image data. Our work is related to the large body of work on image segmentation using shape priors (see, for instance, [10,11,12,13,14,15]). The presegmentation in our work has some of the aspects of a shape prior. However, shape priors are built from a sampled distribution of shapes while in the case of a presegmentation there is only one instance of a prior shape. So, for our problem, there is no learned uncertainty in the prior information. Instead, the goal is to deviate from the presegmentation locally to the interaction site and relative to the image data. This paper is organized as follows. Section 2 formulates the presegmentation editing problem as a graph cuts or random walker segmentation problem. Section 3 offers several 2D and 3D editing results. Finally, Section 4 presents concluding remarks.
2 Method In order to meet the goal stated above that an unedited presegmentation returns the presegmentation as the optimum, we avoid a continuum formulation out of concern that the discretization step might alter the presegmentation. Therefore, our formulation will be on a discrete space or, in general, a graph. We begin by defining a precise notion for a graph. A graph [16] consists of a pair G = (V, E) with vertices (nodes) v ∈ V and edges e ∈ E ⊆ V × V . An edge, e, spanning two vertices, vi and vj , is denoted by eij . A weighted graph assigns a value to each edge called a weight. The weight of an edge, eij , is denoted by w(eij ) or wij and is assumed to be positive. The degree of a vertex is di = w(eij ) for all edges eij incident on vi . We associate each pixel (voxel) with a node and we will assume that each pixel (voxel) is connected by an edge to its four (six) cardinal neighbors. Define an affinity weighting between pixels as given by the typical [7,2] Gaussian weighting function wij = exp (−β(gi − gj )2 ), (1) where gi represents the grayscale intensity at node (pixel) vi . Define a presegmentation, p, determined by another process (e.g., a separate, automatic segmentation algorithm), as 1 if vi was presegmented as foreground, pi = (2) 0 if vj was presegmented as background. Given a segmentation p, define the editing problem as the minimization of the energy functional Q(x) = wij (xi − xj )2 + γ (1 − pi ) xi + pi (1 − xi ) , (3) eij
i
i
with respect to the foreground indicator function x, defined on the nodes, where γ is a parameter indicating the strength of the presegmentation. This functional encourages the presegmentation as well as encouraging a data-driven smoothness in the form of the first term. Note that, with a sufficiently large γ, the presegmentation will always be returned. Given a user-defined editing set of nodes (possibly empty) marked as foreground seeds F ⊂ V and a user-defined editing set of nodes (possibly empty) marked
An Energy Minimization Approach
891
Fig. 1. Graphical interpretation of the editing formulation given a presegmentation. The dark red and blue pixels correspond to the presegmented foreground/background. The light red/blue pixels represent “supernodes” that are attached to the presegmented foreground/background with strength γ. The energy functional of (3) may be minimized by applying either the graph cuts (binary minimization) or the random walker (real-valued minimization) algorithm to this graph construction. User placed editing seeds may now be employed in the manner of the hard constraints used in the standard formulation of both algorithms.
as background seeds, B ⊂ V , such that F ∩ B = ∅, the seeds are incorporated into the minimization of (3) by performing a constrained minimization of Q(x) with respect to the constraints 1 if vi ⊂ F , xi = (4) 0 if vj ⊂ B. If the minimization of Q(x) in (3) is forced to give a binary-valued optimum, x, for all unseeded nodes, then the minimization of (3) is given by the graph cuts algorithm of [7] with the construction of Figure 1. In the language of [7], seeds are given by F, B, N-links have weight wij and each node, vi , is connected via a T-link to a foreground (if pi = 1) or background (if pi = 0) “supernode” with weight γ. If the optimization of (3) is performed with respect to a real-valued x (afterward thresholded at 0.5 to produce a “hard” segmentation) the random walker algorithm may be performed with the same weighted graph construction given above for graph cuts [9]. Since the random walker has a provable robustness to noise [9] not offered by the graph cuts algorithm and because the “soft” confidence values for each node that are returned by the random walker algorithm are generally beneficial for visualization and smoothing purposes, we will focus here on the random walker solutions for the editing problem given (3).
892
L. Grady and G. Funka-Lea
One view of the editing formulation of (3) is as a statistical prior that is fed to the segmentation algorithm. Statistical priors may be incorporated into either the graph cuts or the random walker algorithm in the same manner — by connecting each node to a floating foreground/background (i.e., source/terminal) node with strength proportional to the prior [7,9]. Figure 1 gives an example of this construction. The formulation given above satisfies three of the four design criteria for an interactive editing algorithm. Modifications behave intuitively, are performed in 3D and the updates may be computed quickly (we refer the reader to [9] for the reasons why this computation is more efficient in the case of the random walker). However, the criterion of local operation is not incorporated into the above formulation, i.e., the segmentation could change at any location in the image. Therefore, we propose to make the presegmentation strength, γ, a function of distance from the seed locations 1 , i.e., d (vi , vj ) γi = κ exp − , (5) σ where d(vi , vj ) is the minimum distance from vi to all vj ∈ F, B. Therefore, the κ parameter indicates the overall strength of consideration given to the presegmentation and the parameter σ controls the locality of the modification given by the seeds.
3 Results Validation of an editing algorithm such as this is difficult, since the ultimate metric of utility is the match between the intuition of the user and the result of the editing. However, it is possible to demonstrate that the editing tool satisfies the remaining design criteria of speed, locality and 3D operation. We will begin with 2D examples in order to characterize the algorithm behavior and report calculation speeds. Figure 2 shows six examples of images taken from different imaging modalities. For each image, an incorrect presegmentation was given from a separate system (e.g., an automatic segmentation algorithm) that a user would want to correct. The incorrect presegmentations are given in the second column, outlined in blue/gray. A user may place foreground seeds to include regions excluded in the presegmentation, indicated by green/dark gray marks or background seeds to exclude regions included by the presegmentation, indicated by yellow/light gray marks. Although none of the examples in the second column include both foreground and background seeds, there is no limitation to using both seed types. These experiments were conducted on an Intel Xeon with a 2.40GHz processor using a conjugate gradients solver for the random walker algorithm. From top to bottom — the aortic aneurysm CT image had 512 × 512 pixels and the editing required 18.28s, the bone CT image had 512×512 pixels (cropped in the figure) and required 9.5s, the brain tumor MR image was 295 × 373 and required 1.79s, the lung tumor CT image was 512 × 512 and required 23.67s, the left ventricle CT image was 256 × 256 and required 1.95s and the left ventricle ultrasound image was 240 × 320 (cropped in the figure) and required 2.58s for editing. Figure 3 shows the same experiment run on a 3D dataset of a liver tumor in which the presegmentation erroneously included surrounding tissue, due to a vessel passing 1
We would like to thank Reto Merges for this modification.
An Energy Minimization Approach
893
Fig. 2. 2D examples of our editing algorithm. From left to right — First column: The original image. Second column: The presegmentation (outlined in blue/gray) provided by another system (e.g., an automatic algorithm). Third column: The user-placed seeds used to correct the segmentation. Green/dark gray seeds indicate that the user wants to include this region in the segmentation while yellow/light gray seeds indicate that the user wants to exclude this region from the segmentation. Fourth column: The updated segmentation.
894
L. Grady and G. Funka-Lea
Fig. 3. A 3D example of our editing algorithm. In all slices the blue/gray lines indicate the segmentation border. Top row: Axial slices of the original segmentation of a liver tumor in CT with a leak into the surrounding tissue. Bottom row: The corrected segmentation, after placement of background (exclude) seeds in one slice by the user, given on the leftmost slice by the yellow/light gray seeds.
nearby. Background seeds were placed on a single slice but, because the energy minimization is formulated on an arbitrary graph (in this case, a 3D lattice), the resulting solution effects all slices. Given this volume, cropped to 89×90×29, and using the same computer as in the 2D experiment, producing an edited solution in 3D required 3.95s.
4 Conclusion Perfectly reliable, automatic segmentation of an object in a medical image is unrealistic with today’s segmentation technology. However, an automatic system may get close to the user-desired segmentation. Therefore, the availability of a smart editing tool is essential, since a manual correction is far too time consuming, especially in 3D data. The “scribble” interface of the graph cuts and random walker algorithms provide a natural user interface that allows the user to seed image regions to include or exclude in the edited segmentation. With the native definitions of these algorithms, both foreground and background regions must be seeded and information from a presegmentation cannot be used. In this paper, we have shown how the presegmentation may be used to frame the editing problem in an energy minimization framework that permits optimization with either the graph cuts or random walker algorithm, depending on whether or not segmentation confidences are required. Our energy minimization framework was tested on several 2D and 3D editing examples (using the random walker minimization) and the results were displayed in Figures 2 and 3 with timing information. Overall, the energy minimization framework provides a meaningful, smart, image-dependent method of editing a presegmented volume with known minimization techniques. Finally, the editing presented here satisfies our stated desired qualities of an editing algorithm in that the edited segmentation is obtained locally, quickly, intuitively and operates in 3D.
An Energy Minimization Approach
895
References 1. Boykov, Y., Jolly, M.P.: Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images. In: Proc. of ICCV 2001. (2001) 105–112 2. Grady, L., Funka-Lea, G.: Multi-label image segmentation for medical applications based on graph-theoretic electrical potentials. In: Computer Vision and Mathematical Methods in Medical and Biomedical Image Analysis, ECCV 2004. Number LNCS3117, Prague, Czech Republic, Springer (2004) 230–245 3. Kang, Y., Engelke, K., Kalender, W.A.: Interactive 3D editing tools for image segmentation. Medical Image Analysis 8 (2004) 35–46 4. Falc˜ao, A.X., Udupa, J.K., Samarasekera, S., Sharma, S., Elliot, B.H., de A. Lotufo, R.: User-steered image segmentation paradigms: Live wire and live lane. Graphical Models and Image Processing 60 (1998) 233–260 5. Mortensen, E., Barrett, W.: Interactive segmentation with intelligent scissors. Graphical Models in Image Processing 60(5) (1998) 349–384 6. Elder, J.H., Goldberg, R.M.: Image editing in the contour domain. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(3) (2001) 291–296 7. Boykov, Y., Jolly, M.P.: Interactive organ segmentation using graph cuts. In: Medical Image Computing and Computer-Assisted Intervention, Pittsburgh, PA (2000) 276–286 8. Lefohn, A.E., Cates, J.E., Whitaker, R.T.: Interactive, GPU-based level sets for 3D segmentation. In: Medical Image Computing and Computer Assisted Intervention (MICCAI). (2003) 564–572 9. Grady, L.: Multilabel random walker image segmentation using prior models. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Volume 1 of CVPR., San Diego, IEEE, IEEE (2005) 763–770 10. Cootes, T.F., Hill, A., Taylor, C.J., Haslam, J.: Use of active shape models for locating structures in medical images. Image Vision Comput. 12(6) (1994) 355–365 11. Leventon, M.E., Grimson, W.E.L., Faugeras, O.: Statistical shape influence in geodesic active contours. In: Proc. Conf. Computer Vision and Pattern Recognition. Volume 1., Hilton Head Island, SC (2000) 316–323 12. Cremers, D., Tischh¨auser, F., Weickert, J., Schn¨orr, C.: Diffusion snakes: Introducing statistical shape knowledge into the Mumford-Shah functional. International Journal of Computer Vision 50(3) (2002) 295–313 13. Rousson, M., Paragios, N.: Shape priors for level set representations. In: Proc. of ECCV 2002. Volume 2. (2002) 78–92 14. Kumar, M.P., Torr, P.H., Zisserman, A.: OBJ CUT. In: Proc. of CVPR 2005. Volume 1. (2005) 18–25 15. Freedman, D., Zhang, T.: Interactive graph cut based segmentation with shape priors. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). Volume 1. (2005) 755–762 16. Harary, F.: Graph Theory. Addison-Wesley (1994)
Accurate Banded Graph Cut Segmentation of Thin Structures Using Laplacian Pyramids Ali Kemal Sinop and Leo Grady Department of Imaging and Visualization Siemens Corporate Research Princeton, NJ, USA
[email protected],
[email protected]
Abstract. The Graph Cuts method of interactive segmentation has become very popular in recent years. This method performs at interactive speeds for smaller images/volumes, but an unacceptable amount of storage and computation time is required for the large images/volumes common in medical applications. The Banded Graph Cut (BGC) algorithm was proposed to drastically increase the computational speed of Graph Cuts, but is limited to the segmentation of large, roundish objects. In this paper, we propose a modification of BGC that uses the information from a Laplacian pyramid to include thin structures into the band. Therefore, we retain the computational efficiency of BGC while providing quality segmentations on thin structures. We make quantitative and qualitative comparisons with BGC on images containing thin objects. Additionally, we show that the new parameter introduced in our modification provides a smooth transition from BGC to traditional Graph Guts.
1 Introduction The Graph Cuts method of interactive segmentation [1] has become very popular in recent years due to its behavioral and theoretical properties. This method performs at interactive speeds for smaller images/volumes, but an unacceptable amount of computation time is required for the large images/volumes common in medical applications. Consequently, the Banded Graph Cuts algorithm (BGC) [2] was proposed in an attempt to preserve the segmentation quality of Graph Guts while drastically increasing the computational speed. Although BGC performs well when the segmentation target is a large, blob-like object, the segmentation quality breaks down when the image contains thin structures. Since thin structures are ubiquitous in medical images (typically in the form of vessels), the BGC algorithm must be modified in order to apply to this problem domain. We present a modification of the BGC algorithm that preserves its computational advantages, but produces segmentations much closer to Graph Cuts when the segmentation target contains thin structures. This modification is based upon the use of a difference image when coarsening to identify thin structures that may be included into the band during the segmentation. Graph Cuts has risen to become an extremely popular algorithm for image segmentation (although we will employ the generic term “image” in this paper, we intend this R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 896–903, 2006. c Springer-Verlag Berlin Heidelberg 2006
Accurate Banded Graph Cut Segmentation of Thin Structures
897
term to be understood as pertaining to data volumes as well). The Graph Cuts algorithm inputs two user-defined “seeds” (or seed groups) indicating samples of the foreground object and the background. The algorithm then proceeds to find the max-flow/min-cut between these seeds, viewing the image as a graph where the pixels are associated with nodes and the edges are weighted to reflect image gradients. Although standard maxflow/min-cut routines [3] may be used for this computation, faster specialty algorithms for the domain of image segmentation have been developed [4]. Part of the popularity of the Graph Cuts algorithm also stems from the statistical interpretation of the approach as the minimization of a certain Markov Random Field [5,6]. Additionally, the segmentation results are straightforward to predict and intuitive to use since the algorithm always finds the segmentation boundary at the location of the minimum cut (or surface, under the interpretation of [7]). The max-flow/min-cut algorithm introduced by [4] makes a great computational advance over the conventional methods for computing max-flow/min-cut. However, for high-resolution images or medical volumes, the computational burden is still too large for the algorithm to operate at an interactive speed. For example, it was reported in [2] that over six minutes were required to segment a medical volume of size 256×256×185. This large computational burden for the Graph Cuts algorithm prompted the introduction of the BGC algorithm [2]. The goal of the BGC algorithm was to provide the same cut (segmentation) of the native Graph Cuts algorithm by introducing a multilevel scheme for the computation. Although the BGC algorithm is not guaranteed to provide the minimum graph cut, it was convincingly shown in [2] that the (near) minimal graph cut was returned for practical problems in which the segmentation target was “blob-like”. However, the authors of [2] state (and demonstrate) clearly that the BGC algorithm is inappropriate for thin objects. Unfortunately, the medical imaging domain frequently contains thin segmentation targets in the form of vascular structures. Despite this limitation to “blob-like” segmentation targets, the BGC algorithm produces computational speeds over an order of magnitude faster than those obtained when employing conventional Graph Cuts (i.e., using the algorithm of [4] on the full graph). The BGC algorithm is not the only approach to increasing the computational efficiency of Graph Cuts. The Lazy Snapping algorithm of [8] performs a presegmentation grouping of the pixels using a watershed transform [9] and treats each watershed basin as a supernode. Graph Cuts is then run on the graph of supernodes. By drastically reducing the number of nodes in the graph, this approach greatly increases the speed of Graph Cuts in practice. However, the computational savings are more difficult to predict with this approach since the number of supernodes is highly image dependent. Additionally, the authors of this approach offer no characterization of when to expect this coarsening approach to succeed or fail to produce the same cut given by full-resolution Graph Cuts. We propose a modification of the BGC algorithm that allows for the accurate segmentation (relative to full-resolution Graph Cuts) of thin objects while preserving the computational advantages of BGC. The modification is born of the observation that thin structures are lost as a result of the coarsening operation involved in BGC. Therefore, we employ the idea of a Laplacian pyramid [10] to recover this lost information and extend the band such that high-threshold pixels are included. In this manner, the
898
A.K. Sinop and L. Grady
additional computational burden is slight but thin structures are recovered. The extra computational burden comes only from considering a small number of additional pixels in the band and by computation of the Laplacian pyramid, which need not be computed explicitly and therefore requires simply an additional linear-time pass over the image. This paper is organized as follows: Section 2 gives details of the method and implementation, Section 3 gives results compared to BGC and full-resolution Graph Cuts. Finally, Section 4 offers concluding statements.
2 Method We begin by first introducing the graph structures used in this algorithm. Let (P, I) be an N-D image on a finite discrete set P with values in RN and a mapping from this set to a value I(p) in some value space. In this paper, we assume that I 0 = I which is the base image. For an image I, we can construct the associated undirected weighted graph G = (V, E, W ) consisting of nodes (vertices) v ∈ V , edges e ∈ E and nonnegative weights w ∈ W with a specified neighborhood structure. In our setting, each node is identified with one pixel and a 4-connected (6-connected in dimension 3) lattice structure of edges is assumed. For a set of images {I 0 , · · · , I k }, a separate graph is constructed for each level, producing a set of graphs {G0 = (V 0 , E 0 , W 0 ), · · · , Gk = (V k , E k , W k )}. We begin by reviewing the BGC algorithm of [2] in more detail before proceeding to the proposed modifications via a Laplacian pyramid. For an input image, I 0 , the BGC algorithm produces an image pyramid, {I 1 , · · · , I K } for a predetermined number of levels, K, via downsampling by a factor of two in each dimension, indicated here by the downsampling operation D (·), e.g., I 1 = D I 0 . The Graph Cuts algorithm of [4] is then applied to the coarsest-level image, I K , to produce a minimum cut on the coarse level that assigns pixels to sets representing foreground, F K ⊂ V K , and background, B K ⊂ V K , such that F K ∪ B K = V K and F K ∩ B K = ∅. Let ∂F K be the boundary of the foreground segmentation defined as ∂F K = {vi ∈ F K−1 | ∃ vj ∈ B K−1 s.t. ||vi − vj || ≤ 1}, where || · || is taken to represent a Euclidean distance of the node (pixel) coordinates. This coarse-level segmentation boundary is then upsampled (K − 1), indicated by the upsampling operation U (·), e.g., to level ∂F K−1 = U ∂F K . This upsampled segmentation boundary is modified by dilating the segmentation boundary by a distance d to include a “band” of unsegmented pixels. The BGC approach then proceeds to treat the inside of the band as foreground seeds and the outside of the band as background seeds and compute the graph cut inside the band. Formally, this may be described by forming three sets of nodes on the coarse level, F∗K−1 ⊂ F K−1 , B∗K−1 ⊂ B K−1 , QK−1 ⊂ V K−1 , where F∗K−1 is the set of foreground nodes on the interior of the band, B∗K−1 is the set of background nodes outside the band and QK−1 is the set of nodes inside the band (the “questionable” nodes). For a given d, QK−1 = {vi ∈ V K−1 | ∃ vj ∈ U (∂F K ) s.t. ||vi − vj || ≤ d}. Note that the (coarsened) locations of the seed pixels are never allowed to belong to the questionable set Qk . The BGC then performs a second max-flow/min-cut computation on I K−1 where the nodes in F∗K−1 are treated as source seeds and the nodes in B∗K−1 are treated as background seeds. This operation is then continued to the next level until the fine-resolution level is reached and a fine-level segmentation is obtained.
Accurate Banded Graph Cut Segmentation of Thin Structures
899
Fig. 1. Overview of the algorithm. An initial circle with two thin attached structures (e.g., vessels) is to be segmented using the foreground seeds (red) and background seeds (blue). First, the image/seeds are coarsened, during which the thin structures are lost. The segmentation is then performed on the coarse level. The resulting segmentation is then upsampled and the original BGC band is computed (indicated by the gray crosshatched area). In our approach, this band is augmented to include the thin structures (indicated by the black crosshatched area) as found by pixels with a large magnitude in the difference image of the Laplacian pyramid.
The Laplacian pyramid introduced by Burt and Adelson [10]preserves the fine-level detail by storing a second image, S k , defined as S k = I k − U D I k . This subtraction image preserves the fine details at level k obscured by the downsampling and upsampling operations. Since this image preserves the fine details, we may employ these values to introduce additional nodes into the band in BGC. Specifically, we introduce a threshold, α, and modify the definition of Qk given above for the “band nodes” to state QK−1 = {vi ∈ V K−1 |S K−1 (vi ) ≥ α}∪QK−1 and then remove all the nodes in QK−1 ∗ ∗ which are not in the same connected component with the original band QK−1 . Here S k (vi ) is taken to indicate the magnitude of the subtraction image at node (pixel) vi . The parameter α may be interpreted as an “interpolation” parameter between the BGC algorithm (α = ∞) and the standard Graph Cuts algorithm (α = 0). In Section 3 we show that an α setting much closer to BGC preserves the speed advantages of this approach but setting α < ∞ allows the recovery of thin structures previously missed in the BGC algorithm. We note that the image S need not be computed or stored explicitly, since for each pixel, we can compute the associated Laplacian value online in constant time. Also assignment of nodes as foreground or background is changed, too. Since the band may disconnect F K and B K , the final F∗K−1 and B∗K−1 are taken as the connected components containing foreground and background seeds respectively. Figure 1 illustrates the entire process. An initial circle with two thin attached structures (e.g., representing an idealized organ fed with blood vessels) is to be segmented using the indicated seeds. This image is coarsened and segmented, but the “vessels” are not properly segmented due to the coarsening process. When upsampling, the band (indicated by a gray crosshatch) is augmented by the pixels crossing threshold α (indicated by the black crosshatch). Now, when the segmentation is performed the “vessels” will be included. Figure 2 gives an example of the process for a real image, with its corresponding subtraction image.
900
A.K. Sinop and L. Grady
(a) Original image
(b) Subtraction image (c) Unaugmented band (d) Augmented band
Fig. 2. Construction of a segmentation band. (a) The original image to be segmented (seeds not shown). (b) The subtraction (Laplacian pyramid) image obtained from upsampling the downsampled image and subtracting it from the original. Note that thin (high frequency) structures are strongly represented. (c) The unmodified band resulting from an upsampled segmentation. (d) The segmentation given after augmenting the band with the high-magnitude values in the subtraction image (b).
In accordance with [2], our graph was taken as a 4-connected lattice on all 2levels (Ii −Ij ) with weights given by wij = exp − 2σ2 . We define the upsampling and downsampling functions as Up(I) =↑ (I ⊗ w) and Down(I) = (↓ I) ⊗ w. Here w is a Gaussian filter with σ = 1.0, ⊗ denotes convolution, ↑ samples odd pixels and ↓ samples each pixel twice.
3 Results The validation of our augmented BGC algorithm addresses the following questions: 1. Does our augmentation of BGC perform qualitatively better on images containing thin structures (with respect to full-resolution GC)? 2. Does our augmentation of BGC perform quantitatively better on images containing thin structures (with respect to full-resolution GC)? 3. What is the tradeoff of speed/accuracy obtained by setting different values of α? The experiments were performed for both 2D and 3D data. Figure 3 and Figure 4 address the issue of qualitative comparison for 2D and 3D data respectively. In Figure 3 two images containing vascular structures were given foreground/background seeds such that a full-resolution GC produces good segmentations. The images were then segmented using BGC and our augmented BGC using three levels. As indicated by the authors of BGC, the algorithm is unable to accurately find small/thin structures, especially when the structures are distant from the seed locations. In contrast, our augmented BGC recovers the full-resolution GC segmentations at speeds only marginally decreased from than the high-performance BGC. Both images had resolution 512 × 512. The same experiment was performed in 3D on an aorta and a left atrium segmentation target, with volumes 256 × 224 × 32 and 124 × 184 × 185, respectively. In both cases, the augmented BGC was able to recover the full resolution GC solution exactly, even though additional levels were used (i.e., another coarsening step was applied before running GC).
Accurate Banded Graph Cut Segmentation of Thin Structures
901
(a) Original image
(b) Graph Cut
(c) BGC
(d) Augmented BGC
(e) Original image
(f) Graph Cut
(g) BGC
(h) Augmented BGC
Fig. 3. Qualitative comparison of BGC and augmented BGC (with respect to full-resolution GC) on two 2D images contained elongated structures. (a,e) Original images. (b,f) Full-resolution Graph Cuts segmentation. (c,g) Banded Graph Cuts solution with three levels — Note that many elongated structures are incorrectly segmented. (d,h) Augmented Banded Graph Cut solution with three levels.
A quantitative analysis of the augmented BGC was combined with an investigation into the role of the new α parameter. We began with a set of six number of 2D images containing vessels or other elongated structures. In each image, sufficient seeds were given to the full resolution GC algorithm in order to produce a quality segmentation of the elongated target. For each image, the segmentation was then performed using BGC with 2–3 levels and augmented BGC using 3 levels. The α parameter was then varied between {0–max S} and the speed/memory was tracked with respect to the segmentation accuracy. The segmentation accuracy was computed with respect to full resolution Graph Cuts by histogram intersection. Figure 3 shows the mean values with an error bar indicating one standard deviation in both speed/memory and accuracy. In these plots, the three dots corresponding to BGC indicate the use of zero coarse levels (i.e., fullresolution GC) and one or two coarse levels. Two qualities of the augmented Graph Cuts are portrayed in Figure 3: 1) The α parameter provides a smooth transition from the BGC algorithm to full-resolution GC, 2) By setting α close to zero, large gains can be made over BGC in accuracy, for mild costs in speed/memory. This experiment was repeated a second time for an aorta volume illustrated in Figure 4. Although the increases in accuracy appear less drastic than in the 2D case, this is simply because the number of voxels involved in the elongated pieces is very small compared to the total number of voxels in the object. However, as shown in Figure 4, the qualitative difference in accuracy is significant. Note that the augmented BGC with α = 0 requires more time than the full resolution Graph Cuts, since the coarsening, etc. become wasted, time-consuming, operations in this case. Similarly, another experiment was performed on a set of three left atrium datasets. Although the sizes of the data
902
A.K. Sinop and L. Grady
1
1
1
1
1
1
0.9
0.9
0.95
0.95
0.95
0.95
0.8
0.8
0.6
0.6
0.5
0.8
0.5
0
5 10 15 20 25 Memory Consumption (in Mbytes)
(a)
0
0.5 1 1.5 2 Time Elapsed (in seconds)
2D angiography images
2.5
0
200 400 600 800 Memory Consumption (in Mbytes)
(b)
Accuracy
0.75
Banded Laplacian 0.7
0
20 40 60 80 Time Elapsed (in seconds)
Aorta dataset
100
0.85
0.8
0.75
Banded Laplacian 0.7
0.9
0.85
0.8
0.75
Banded Laplacian 0.4
0.9
0.85
0.8
0.75
Banded Laplacian 0.4
0.9
0.85
Accuracy
0.9
0.7
Accuracy
0.7
Accuracy
Accuracy
Accuracy
Fig. 4. Qualitative comparison of the full resolution Graph Cuts, the BGC algorithm and the augmented BGC algorithm on volumetric data containing an aorta (top row) and a left atrium (bottom row). (a,d) The full resolution Graph Cuts result, requiring 10s and 300Mb of memory (a) and 40s/700Mb (d). (b,e) Banded Graph Cuts results with the same seeding using one coarse level, requiring 3s/36Mb (b) and 8s/86Mb (e). (c,f) Augmented Banded Graph Cuts results using two coarse levels, requiring 3s/33Mb (c) and 15s/216Mb (f).
Banded Laplacian 0.7
0
0.2 0.4 0.6 0.8 1 Memory Consumption (normalized)
(c)
Banded Laplacian 0.7
0
0.5 1 1.5 2 Time Elapsed (normalized)
2.5
Left Atrium datasets
Fig. 5. Memory/time and accuracy tradeoff plots. The horizontal and vertical error bars represent one standard deviation in the corresponding axis. In all plots, the red line indicates the Banded Graph Cuts approach with three points corresponding to two coarse levels, one coarse level and zero coarse levels (i.e., full resolution Graph Cuts). The green line indicates the performance of the augmented Banded Graph Cuts approach with three levels, using an α parameter from {0–max S}.
volumes were the same, the volume of the left atrium in each case was different and therefore the memory consumption and time elapsed were normalized to unity. In this case, the bars show one standard deviation in the corresponding axis. The quantitative improvement in accuracy may seem slight, due to the fact that the inside of heart has also high Laplacian values. To remedy this problem, the Laplacian band may be forced to stay outside of the foreground region.
4 Conclusion The Graph Cuts algorithm has become a very popular algorithm in recent years for interactive segmentation. However, the speed/memory restrictions of this approach prevents its interactive use on many modern medical datasets, due to their size. For this purpose, the Banded Graph Cuts algorithm was introduced to alleviate the speed/memory concerns while largely providing the same segmentation as standard Graph Cuts. However, the BGC algorithm is unable to accurately handle segmentation tasks involving
Accurate Banded Graph Cut Segmentation of Thin Structures
903
elongated structures, which are very common in medical segmentation tasks. In this paper we introduced the Augmented Band Graph Cuts approach that preserves the positive speed/memory qualities of BGC, while allowing for an accurate result when the segmentation task involves elongated structures. Specifically, the difference information from a (simulated) Laplacian pyramid is used to add to the segmentation band at subsequent levels in order to place elongated structures within the space of considered segmentations for the fine level Graph Cuts. Our proposal to augment Banded Graph Cuts with information from a (simulated) Laplacian pyramid was studied from the standpoint of qualitative and quantitative improvement and the role of the introduced α parameter. These studies were carried out on both 2D and 3D datasets containing segmentation targets with an elongated object. The studies concluded that it was possible to obtain nearly the same increase in speed/memory as was determined by the BGC algorithm, but to simultaneously achieve a segmentation much closer to the full-resolution Graph Cuts. Therefore, we can conclude that the Banded Graph Cuts algorithm should always use an augmented band, since the computational increase is negligible, while the accuracy increase is significant when elongated structures are present in the image.
References 1. Boykov, Y., Jolly, M.P.: Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images. In: Proc. of ICCV 2001. (2001) 105–112 2. Lombaert, H., Sun, Y., Grady, L., Xu, C.: A multilevel banded graph cuts method for fast image segmentation. In: Proceedings of ICCV 2005. Volume I., Bejing, China, IEEE (2005) 259–265 3. Gibbons, A.: Algorithmic Graph Theory. Cambridge University Press (1989) 4. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE PAMI 26(9) (2004) 1124–1137 5. D. Greig, B.P., Seheult, A.: Exact maximum a posteriori estimation for binary images. Journal of the Royal Statistical Society, Series B 51(2) (1989) 271–279 6. Boykov, Y., Veksler, O., Zabih, R.: A new algorithm for energy minimization with discontinuities. In Pelillo-M., H.E.R., ed.: Energy Minimization Methods in Computer Vision and Pattern Recognition. Second International Workshop, EMMCVPR’99, York, UK, 26-29 July 1999. (1999) 205–220 7. Boykov, Y., Kolmogorov, V.: Computing geodesics and minimal surfaces via graph cuts. In: Proceedings of Iternational Conference on Computer Vision. Volume 1. (2003) 8. Li, Y., Sun, J., Tang, C., Shum, H.: Lazy snapping. In: Proceedings of ACM SIGGRAPH 2004. (2004) 303–308 9. Roerdink, J., Meijster, A.: The watershed transform: Definitions, algorithms, and parallellization strategies. Fund. Informaticae 41 (2000) 187–228 10. Burt, P.J., Adelson, E.H.: The Laplacian pyramid as a compact image code. IEEE Transactions on Communications COM-31(4) (1983) 532–540
Segmentation of Neck Lymph Nodes in CT Datasets with Stable 3D Mass-Spring Models Jana Dornheim1 , Heiko Seim1 , Bernhard Preim1 , Ilka Hertel2 , and Gero Strauss2 1 2
Otto-von-Guericke-Universität Magdeburg, Germany Hals-Nasen-Ohren-Universitätsklinik Leipzig, Germany
Abstract. The quantitative assessment of neck lymph nodes in the context of malign tumors requires an efficient segmentation technique for lymph nodes in tomographic 3D datasets. We present a Stable 3D MassSpring Model for lymph node segmentation in CT datasets. Our model for the first time represents concurrently the characteristic gray value range, directed contour information as well as shape knowledge, which leads to a much more robust and efficient segmentation process. Our model design and segmentation accuracy are both evaluated with lymph nodes from clinical CT neck datasets.
1
Motivation
The assessment of lymph nodes plays an important role in the diagnosis, staging, treatment and therapy control of malign tumors and their metastases. MRI and CT scans of the respective regions allow for a 3D assessment of the pathological situation, however, an exact quantitative analysis demands an efficient segmentation of the lymph nodes in the 3D datasets. Lymph node segmentation currently has to be performed manually, which is unfeasible in the time-critical clinical routine. We present an efficient model-based segmentation technique for enlarged lymph nodes in CT datasets of the neck. Automatic segmentation of neck lymph nodes is difficult due to the following problems: – Neck lymph nodes often touch or infiltrate adjacent soft tissue (Fig. 1 (b), (c)), which lets all purely gray value-based segmentation techiques fail. – Adjacent high-contrast structures often lead to high gradient magnitudes in the local neighbourhood (Fig. 1 (d)), which may distract model-based segmentation approaches based on contour strength. – The lymph nodes’ interior homogeneity may be disturbed by a central necrosis (Fig. 1 (e)). – The size of metastatic lymph nodes varies significantly.
2
State of the Art
The segmentation of lymph nodes in CT image data is long been known as an open problem. Rogowska et al. [1] evaluated elementary segmentation techniques R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 904–911, 2006. c Springer-Verlag Berlin Heidelberg 2006
Segmentation of Neck Lymph Nodes in CT Datasets with SMSMs
(a)
(b)
(c)
(d)
905
(e)
Fig. 1. Different anatomical situations in CT head and neck data: (a) isolated neck lymph node, (b) neck lymph node adjacent to M. sternocleidomastoideus, (c) two adjacent lymph nodes, (d) lymph node touching high-contrast structure (blood vessel), (e) lymph node with a central necrosis (dark).
(e.g. Thresholding, Watershed Transformation, ...) for lymph node segmentation, and conclude that a high degree of model knowledge is needed for a reliable segmentation. Honea et al. [2] used a slicewise Active Contour as a first modelbased segmentation approach. They extended their approach by using a 3D Active Contour model [3]. Both their 2D and 3D approach are purely based on contour information (gradient magnitude) and shape knowledge (represented by a viscosity condition). Yan et al. [4] exploited the characteristic gray value range as well as the contour information of lymph nodes by using an improved 2D Fast Marching Method with an intensity-weighted speed term. However, no shape knowledge was used by their approach. Boundary leaking could only be avoided by a hard stopping criterion (bounding circle around the lymph node). All discussed approaches require high interaction effort. No quantitative evaluation on clinical CT datasets was performed. No approach integrates all, contour information, characteristic gray value range and shape knowledge. Besides lymph node segmentation techniques, newer approaches are known on the segmentation of small homogeneous soft tissue structures in CT data (e.g. pulmonary nodules [5]). However, these approaches presume a homogeneous background, which makes them inappropriate for the segmentation of neck lymph nodes in their complex anatomical environment.
3
Method
We employ Stable 3D Mass-Spring Models (SMSMs) introduced by Dornheim et. al. [6] for the neck lymph nodes segmentation, which allow for explicit modelling of shape and appearance, while leaving the modelled object size flexible. 3.1
Stable 3D Mass-Spring Models
Mass-Spring Models (MSMs) are physically-based models known from soft tissue simulation. They use spring forces to describe object shape by the distance of connected mass points. MSMs can also be applied as deformable models for segmentation. In this case, no simulation of a real physical situation is required. Instead, the model dynamics is used to adapt a deformable shape to image
906
J. Dornheim et al.
information by a local optimization process. For this purpose however, the representation of shape by spring forces alone is not stable enough and often leads to model shape collapse, especially in 3D. SMSMs extend conventional MSMs by using not only the springs’ rest lengths but also their rest directions to describe the model. Whenever a spring deviates from its rest direction, a torsion force is created to restore its original direction. Thus, the object shape is mainly represented by the springs’ rest directions, while their rest lengths represent the object’s scaling. The external model forces of an SMSM are created by sensors at the mass points. They transform the image information at their current position into a sensor force acting on the associated mass point. For the segmentation of lymph nodes, two types of sensors are of interest: intensity sensors, dragging their associated mass to voxels of a given gray value range, and direction-weighted gradient sensors, steering towards voxels at gradients of high magnitude and preferable expected direction. 3.2
Design of the Lymph Node Model
The manual segmentation of lymph nodes exploits a set of characteristic features, which all together allow the human observer to detect lymph nodes in CT datasets. The following features were confirmed by our clinical partner: (i) Ellipsoid to spherical shape with a varying ratio of longitudinal and transversal radii. This ratio is approximately 2:1 for healthy lymph nodes. With metastatic lymph nodes, it approaches 1:1. (ii) Medium strength of contour information for isolated lymph nodes. Contours of non-isolated lymph nodes can be interrupted by adjacent soft tissue or amplified by adjacent high-contrast structures. (iii) Homogeneous gray value (soft tissue) in the lymph node interior (with the exception of lymph nodes with a central necrosis). (iv) Size varies strongly: neck lymph nodes are clinically relevant at a size of 1cm, the maximum size can be assumed as 3 cm, since larger lymph nodes must be considered “real” metastases, which are often not spherical. To model the ellipsoid to spherical shape of the lymph nodes (i), a 3D SMSM was created as a polygonic sphere surface (Fig. 2 (a)). At each mass point, a direction-weighted gradient sensor was placed, in order to model the contour information (ii) including the local contour normal direction we expect to find here. To incorporate the additional knowledge about the inner gray value of the lymph nodes (iii) into the model, we created a second, inner sphere surface (Fig. 2 (b)) as a downscaled copy (scaling factor 0.9) of the first surface. At each mass point of this inner surface, an intensity sensor was placed, searching for the specific gray value of the target lymph node. We chose to arrange these inner sensors as another surface, instead of distributing them evenly across the lymph node interior, in order to segment lymphnodes with inner necrosis.
Segmentation of Neck Lymph Nodes in CT Datasets with SMSMs
907
Fig. 2. Stages of the model construction (schematic 2D view): (a) First surface model with gradient sensors (dark gray). (b) An inner front of intensity sensors (white) is added. (c) Both fronts are connected by stiff springs, with the effect, that the associated gradient and intensity sensor act as a functional unit (d).
In a final modelling step, each gradient sensor of the outer surface was connected to its associated intensity sensor at the inner surface (Fig. 2 (c)). To make both associated sensors work as a functional unit, we assigned all springs between the two surfaces a very high stiffness (factor 1000 to all other springs) (Fig. 2 (d)). This way, each intensity sensor keeps its associated gradient sensor close to the homogeneous lymph node interior and prevents model distraction by false strong gradients (ii). All other springs, i.e. the springs belonging to one of the surface submodels, were assigned a very low stiffness, in order to represent the unknown size of the lymph nodes (iv). In connection with the shape-preserving torsion forces, model shape stability is guaranteed [6] and can be expected to bridge gaps in the lymph node contour (ii) by shape knowledge. 3.3
Segmentation Process
The segmentation process consists of the following three steps: Step 1: Initial placement of the model in the dataset. Since the model simulation is based on a local optimization process [6], the model should initially be placed inside the target lymph node. Thereto, the user specifies a seed point, at which a scaled model of minimal size (0.3mm) is placed automatically, assuming that all lymph nodes are at least twice as big as this minimal size. Alternatively, the initial model can be specified by two seed points at the lymph node contour, whose distance is used as the intial model’s diameter. This more precise initialization leads to faster model convergence. After model initialization, the rest lengths of all springs are set to their current lengths, so that the new model size is its default size. Step 2: Determination of gray value range. The expected gray value range of the target lymph node is estimated from the given start position. Thereto, the gray value range of all pixels on the initial sphere diameter is used. The model’s intensity sensors are then programmed to react only to this gray value range. Step 3: Model adaptation. Finally, the model adaptation is run (Fig. 3) with the runtime parameters shown in Table 1. They were adjusted once by experimental
908
J. Dornheim et al.
Fig. 3. 2D and 3D view of the model adaptation process
model runs and then used for all datasets during the evaluation phase. The model adaptation is automatically stopped, when every mass point remains inside a certain ε environment for n simulation steps. Due to the use of a damping force, the model converges and stops.
4
Results and Evaluation
11 CT datasets of the neck region with 146 enlarged lymph nodes were available for the evaluation. The datasets stem from different CT devices, the quality of the datasets varied significantly with respect to signal-to-noise ratio, horizontal resolution and slice distance. For 40 lymph nodes in 5 of the 11 datasets, a gold standard (segmentation by a radiologist) was given, as well as manual segmentations by two experienced users. All evaluation was performed with the created SMSM (100 masses, 338 springs), without manual correction of the results. 4.1
Qualitative Evaluation
We first performed a qualitative study to validate the model design and to reveal, which components of the model are essential. Thereto, we started the model simulation with different model components disabled on specific lymph nodes of the 11 test datasets. The following results could be verified. Table 1. Model simulation parameters (according to [6]) Parameter Category
Parameter Description
Simulation step size Tolerance (stopping criterion) Simulation steps (stopping criterion) Torsion forces Weighting Spring forces of force components Gradient sensor forces Intensity sensor forces Model adaptation
Symbol Parameter Value t ε n wt wf wsg wsi
0.02 0.01 mm 5 25 20 9 6
Segmentation of Neck Lymph Nodes in CT Datasets with SMSMs
(a)
(b)
(c)
909
(d)
Fig. 4. Effect of the torsion forces on the segmentation results: (a) distraction of gradient sensors by stronger neighbouring gradients. In (b), this distraction is prevented by the shape-stabilizing effect of the torsion forces. In (c) and (d), the shape knowledge of the model extrapolates missing contour information.
Shape representation by torsion forces. The torsion forces proved to be suitable for the representation of the shape knowledge. We first tested this behaviour with lymph nodes touching high-contrast structures. The shape stability gained by the use of torsion forces prevents the model from being distracted by the strong gradients of these neighbouring structures (Fig. 4 (a) and (b)). In cases with lymph nodes adjacent to soft tissue structures, the torsion forces caused the model to extrapolate the organic lymph node surface by shape knowledge at places with weak or missing contour information (Fig. 4 (c) and (d)). Adaptation to lymph node size. Our model proved to be able to adapt to lymph nodes of different sizes even for imprecise initial positioning. We verified this behaviour with 8 isolated lymph nodes and a very small model initialization. In all cases, the model adapted completely to the size of the target lymph node (Figure 3). Without torsion forces, the model collapsed in 4 of 8 cases. Functional unit of gradient and intensity sensors. The combination of intensity and gradient sensors turned out to be essential for the successful segmentation. We verified this with 10 lymph nodes adjacent to high-contrast structures. Without intensity sensors, the strong neighbouring gradients often attracted single gradient sensors, which lead to “leaking” into the high-contrast structure (Fig. 5 (a)). In cases of small lymph nodes, often the whole model adapted to the complete shape of the blood vessel (Fig. 5 (c)). With the combined use of gradient and intensity sensors, this behaviour was completely eliminated.
(a)
(b)
(c)
(d)
Fig. 5. Segmentation results with gradient sensors only (a, c), and with combined gradient and intensity sensors (b, d).
910
J. Dornheim et al.
The results of our qualitative analysis show, that only the combination of all above-described model components - shape knowledge, gray value and gradient information - enables a robust lymph node segmentation. 4.2
Quantitative Evaluation
To evaluate the accuracy of our model-based segmentation, we assembeled a quantitative analysis based on the 40 lymph nodes for which a gold standard was available, which was the case for all problem classes of homogeneous lymph nodes (see Fig. 1 (a)–(d)). These 40 lymph nodes were equally distributed across 5 of the 11 CT neck datasets. We used the 2-point initialization to place the model in each of the 40 lymph nodes and started the model simulation with the parameters given in Table 1. The model simulation stopped automatically for all 40 lymph nodes after 2 to 30 seconds on a modern standard PC (3.2 GHz Pentium 4). The model’s segmentation result, as well as the manual expert segmentations for all 40 lymph nodes were then compared to the given gold standard (see Fig. 6 and Table 2 for results). Our analysis showed, that the accuracy of our segmentation model is comparable with the accuracy of the given manual segmentations. The mean surface distance of the model’s segmentation results varied between 0.3 mm and 0.6 mm, while the Hausdorff distance varied between 1.7 mm and a maximum of 3.9 mm. For the Hausdorff distance, the biggest errors stem from different estimation of the upper and lower border of the lymph node, which is directly correlated with the slice distance (up to 3.9 mm). Since this deviation could also be observed for manual segmentations, it shows that this is still a general problem, which can only be reduced by smaller slice distances.
5
Conclusion
We presented a new 3D segmentation technique for lymph nodes in CT datasets, using deformable models (SMSMs). This allowed for the first time to incorporate the three characteristic features of lymph nodes - gray value range, contour information and shape knowledge - into one single 3D model. This results in a robust and efficient segmentation process with minimal interaction needed. Our method was evaluated with 40 lymph nodes from clinical CT datasets and could be proved to be comparable to the manual segmentation results of two experienced users.
Fig. 6. Segmentation results of the lymph node model (white) and gold standard segmentation (light gray)
Segmentation of Neck Lymph Nodes in CT Datasets with SMSMs
911
Table 2. Results of the quantitative Evaluation Dataset User A User B Model User A Hausdorff Distance User B (mm) Model User A Volumetric Segmentation User B Error (%) Model Mean Surface Distance (mm)
1
2
3
4
5
0.420 0.488 0.421 3.513 4.113 3.913 45.88 45.63 44.38
0.353 0.406 0.309 2.401 2.644 2.350 69.38 65.25 50.75
0.401 0.444 0.596 3.238 3.225 3.825 43.25 50.88 51.38
0.311 0.271 0.583 1.213 1.150 1.700 35.13 29.88 51.75
0.296 0.279 0.416 1.575 1.725 1.925 37.00 33.13 38.88
References 1. Rogowska, J., Batchelder, K., Gazelle, G., Halpern, E., Connor, W., Wolf, G.: Evaluation of selected two-dimensional segmentation techniques for computed tomography quantification of lymph nodes. In: Investigative Radiology. Volume 13. (1996) 138–145 2. Honea, D.M., Ge, Y., Snyder, W.E., Hemler, P.F., Vining, D.J.: Lymph node segmentation using active contours. In: Proc. SPIE Medical Imaging 1997: Image Processing. Volume 3034. (1997) 265–273 3. Honea, D., Snyder, W.E.: Three-dimensional active surface approach to lymph node segmentation. In: Proc. SPIE Medical Imaging 1999: Image Processing. Volume 3361. (1999) 1003–1011 4. Yan, J., ge Zhuang, T., Zhao, B., Schwartz, L.H.: Lymph node segmentation from CT images using fast marching method. Computerized Medical Imaging and Graphics 28 (2004) 33–38 5. Kuhnigk, J.M., Dicken, V., Bornemann, L., Wormanns, D., Krass, S., Peitgen, H.O.: Fast automated segmentation and reproducible volumetry of pulmonary metastases in CT-scans for therapy monitoring. In: Medical Image Computing and ComputerAssisted Intervention – MICCAI 2004. Volume 3217. (2004) 933–941 6. Dornheim, L., Tönnies, K., Dornheim, J.: Stable dynamic 3d shape models. In: International Conference on Image Processing (ICIP). (2005)
Supervised Probabilistic Segmentation of Pulmonary Nodules in CT Scans Bram van Ginneken Image Sciences Institute, University Medical Center Utrecht, the Netherlands
[email protected]
Abstract. An automatic method for lung nodule segmentation from computed tomography (CT) data is presented that is different from previous work in several respects. Firstly, it is supervised; it learns how to obtain a reliable segmentation from examples in a training phase. Secondly, the method provides a soft, or probabilistic segmentation, thus taking into account the uncertainty inherent in this segmentation task. The method is trained and tested on a public data set of 23 nodules for which soft labelings are available. The new method is shown to outperform a previously published conventional method. By merely changing the training data, non-solid nodules can also be segmented.
1
Introduction
Lung cancer is by far the main cancer killer. Early detection of malignant nodules from computed tomography scans and subsequent surgery, is considered one of the most promising strategies to reduce lung cancer mortality. Only a small fraction of nodules is cancerous and volume and growth rate of the lesions are the main indicators of malignancy. Thus accurate segmentation of small pulmonary nodules is of paramount clinical importance. Many academic groups and companies have developed lung nodule segmentation schemes, see for example Refs. [1,2,3,4,5,6,7] and [8] for a more extensive overview. Most of these approaches are based on standard image processing techniques such as thresholding, region growing, component labeling, ray shooting, mathematical morphology, etc. Many are semi-automatic, requiring internal parameters to be tuned for every specific case. A major impediment for algorithm development and validation is the impossibility to obtain a ground truth for nodule segmentations in real clinical data. Most algorithms have therefore been validated on phantom data, with good results. But these phantoms do not capture the enormous variety of pulmonary structures and scan artifacts encountered in real data. Our experience with several commercial algorithms from various vendors is that they regularly produce unsatisfactory segmentations, even on simple cases of isolated nodules. Reproducibility studies have shown [9] that 95% limits of agreement for volume measurements are around 20% for commercial software with manual interaction by a radiologist. This limits the possibility to unequivocally demonstrate R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 912–919, 2006. c Springer-Verlag Berlin Heidelberg 2006
Supervised Probabilistic Segmentation of Pulmonary Nodules in CT Scans
913
growth of nodules on short-term follow-up scans. Hence the problem of nodule segmentation is still far from solved. Recently the Lung Image Database Consortium (LIDC) [10], published a set of 23 scans with manual nodule outlines. Analysis of this data, that is used in this work, reveals that only 15% of voxels with probability for being nodule pi > 0 have pi = 1 and only an additional 18% has pi ≥ 0.9. This suggests that nodule segmentation is not a hard classification task and that it is preferably to perform probabilistic or soft segmentations, contrary to what published methods do. In this work we propose a new method for probabilistic nodule segmentation that takes the uncertainty about the voxel label explicitly into account. We use the LIDC data for training, and employ statistical techniques to infer the probability that voxels in unseen data belong to a nodule. The method is completely automatic if a seed point in the nodule is supplied, and can be applied to different types of nodules by only changing the training data.
2
Materials
The LIDC data consists of 23 scans in which one nodule has been manually segmented by 6 radiologists using manual tracing and two (unspecified) semiautomatic algorithms, making for a total of 18 manual segmentations. Together with CT data, soft segmentations are available, derived by averaging the 18 hard segmentations. These soft segmentations are taken as truth in this work. Slice thickness is 0.625 mm, in-plane resolution varies from 0.625 to 0.742 making for nearly isotropic data. All volumes and length parameters in this work are given in voxels. The variety among the nodules is large. They range from very small to very large, from round and solid to spiculated, ill-defined and non-solid.
3
Method
The method assumes that we are supplied with a CT scan and a seed point in the nodule to be segmented. The result is independent of the exact position of the seed point as long as the seed is inside the nodule. As a pre-processing step, the lung fields in the scan are segmented. This allows for correct segmentation of nodules in contact with the chest wall. The data does not consist of complete scans but only of sections around the nodule. Therefore a 2D lung field segmentation algorithm was used, similar to the one described in [11] and based on thresholding, component-labeling, rule-based selection of lung field components, hole filling and smoothing the lung fields by a morphological closing operation with a circular kernel. The results were satisfactory in all cases. All subsequent analysis is performed on voxels within the lung fields only. 3.1
Region Regression
The core of the soft segmentation algorithm consists of a region growing process starting from the supplied seed point. We used 6-connectivity. For every neighbor
914
B. van Ginneken
encountered during the growing process, the probability pi for it belonging to the nodule is estimated using regression. All neighbors with pi > t are accepted. We used t = 0.15 but slight changes to this value are not critical. Regression is a general statistical technique to relate one or more predictor variables (features, that form a feature vector that resides in feature space) to a single response variable (output value, pi ). The functional relationship between the input and the output values is learned from examples and can then be used to predict the output of a new, previously unseen feature vector. We experimented with linear regression and two types of non-linear regression, k-nearest neighbor regression (kNNR) and support vector machine regression. Both nonlinear methods gave similar results, better than linear regression. As kNNR was fastest and easiest to use, we report here results for this method only. The number of neighbors k was set to 15 (again, this setting was not critical). The average output of these neighbors is the output of kNNR. 3.2
Features
A proper choice of features is crucially important to obtain good performance. Density of voxels is the most obvious feature to use, as most lung nodules have tissue densities which distinguish them from the surrounding lung parenchyma. To incorporate gray level information from a spatial neighborhood we used the original density as well as five blurred versions of the volume (Gaussian blurring with σ = 0.5, 1, 2, 4, 8) as features. To separate voxels on edges from those in homogeneous regions in features space, we added the gradient magnitude, computed with Gaussian derivatives at the same scales. The main reason why simple region growing only produces good results in the case of isolated nodules is that other structures with tissue density, mainly vessels, are frequently in contact with the nodule. The remaining features aim to distinguish nodule voxels from those in attached structures. Vessels are elongated and a classic way to distinguish nodular from elongated structures is to examine second order derivatives along principal directions. We computed the eigenvalues of the Hessian matrix, sorted from large to small, again at scales σ = 0.5, 1, 2, 4, 8. Another way to remove dense structures that have limited extent in any direction (like vessels, or fissures) is by gray level openings. We performed a separable gray level opening with a uniform kernel of 3 voxels length. This process was repeated four times, yielding four feature values. Finally, we applied an algorithm described as Iterative Morphological Filtering (IMF) in [2]. This is a sequence of morphological operations aimed at removing small vessels that are in contact with a nodule. First a volume of interest around the nodule is thresholded at threshold T and stored as I. Next, a binary opening with a spherical kernel of diameter D is applied to I and only the component connected to the seed point in the nodule is retained. This result is iteratively dilated with a spherical kernel of size D, D/2, D/4 and so on until D = 1. After every dilation, the result is multiplied with I. With only the opening applied, the vessels are removed but the shape of the nodule boundary changes substantially. The iterated dilation and multiplication process restores as much as possible of the original nodule.
Supervised Probabilistic Segmentation of Pulmonary Nodules in CT Scans
915
We applied IMF with T = -550 HU and D = 11 voxels and used the result, as well as five blurred versions (σ = 0.5, 1, 2, 4, 8) as feature volumes. A set of feature vectors from voxels with known output pi is needed to train the regression system. To sample a balanced learning set from the nodules and their surroundings, 1000 voxels from every training case were randomly selected for which pi > 0 and another 1000 were chosen among those with pi = 0 and the distance to the closest voxel with pi > 0 below 10 voxels. 3.3
Nonsolid Nodules
For nonsolid nodules, IMF is not effective, not even with different settings of T and D. Also the other features are very different for these nodules. Solid nodules have fairly constant (low gradient) densities around 0 HU, while nonsolid nodules have fluctuating (high gradient) densities around -500 HU. Training a regression system with predominantly solid nodules and applying it to non-solid nodules is therefore bound to give poor results. We developed a simple algorithm that inspects the density around the seed point, and decides if a nodule is solid or non-solid. We then trained different regression systems for solid and non-solid lesions. In the case of non-solid nodules, IMF features were not used. 3.4
Conventional Method
As reference method a published lung nodule segmentation algorithm [2] was implemented, which we believe representative of most existing methods. This method is based on the IMF algorithm. A volume of interest (VOI) around the nodule and a seed point is required. The VOI is first supersampled to 0.25 mm isotropic voxel size using tri-linear interpolation. Next, the IMF algorithm is applied. Finally, the VOI is subsampled to the original resolution by voxel averaging. Note that in this way a soft segmentation is obtained, but voxels with 0 < pi < 1 can only occur around the nodule boundary. We determined T = −550 HU and D = 31 voxels (in the supersampled data) as best parameter settings.
4
Experiments and Results
A soft or probabilistic segmentation s is a volume in which each voxel has a probability 0 ≤ pi ≤ 1 for belonging to the object associated with it. To compute object volume from s we use SV (s) =
n
pi ,
(1)
i=1
where the summation runs over all voxels. To compare two soft segmentations we define the soft overlap as SO(s1 , s2 ) =
n i=1
min(p1i , p2i ) . abs(p1i − p2i ) + min(p1i , p2i )
(2)
916
B. van Ginneken
This definition is a straightforward extension to the soft case of the common definition of overlap for hard segmentations as intersection divided by union. It is equal to 1 for complete agreement and has a minimum value of 0 for complete disagreement. Any soft segmentation s can be turned into a hard (binary) segmentation h by thresholding at 0.5. SO(s, h) is a measure of the ‘softness’ of a segmentation, it is the best possible soft overlap with a given s that could be obtained with a hard segmentation. We report the soft volume of each nodule and SO(s, h). The latter can be compared to SO for both the new and the conventional method. We also report the relative (soft) volume error RVE in % which is the relative difference (positive for oversegmentation and negative for undersegmentation) in volume between the true and the automatically determined soft volume. As the number of nodules cases released by the LIDC is fairly small, all experiments were carried out in a leave-one-out regime. That means that for testing one case, all other cases are used for training. The number of non-solid cases turned out to be three. So there are 19 cases available for training for every solid nodule and only 2 for every non-solid nodule. The results of the experiments are given in Table 1 and illustrated for a number of cases in Figure 1. The overlap values are significantly higher for the proposed method than for the conventional method which fails completely on all non-solid nodules and on case 1 (attached fissure is included in the segmentation), and case 3 (complex vascular attachment). For case 1 the new method provides an excellent segmentation. Case 3 is the only large failure of the proposed method, probably because there is no other case like this one available for training. Note how both methods produce good results for a large solid, almost isolated nodule (case 15) and even for a complex nodules such as case 10 (although the soft volume obtained with the proposed method agrees much better with the truth than the result of the conventional method). The overlap obtained with the proposed method is in 17 cases higher than the maximum possible overlap of a hard segmentation (columns SO(s, h) vs. SO in Table 1). The relative volume errors are fairly large but the proposed method achieves an RVE < 15% in 14 cases.
5
Discussion
The results clearly demonstrate the possibility of obtaining a fairly reliable soft nodule segmentation with a simple, voxel-wise methodology. The complete segmentation requires in the order of 20 to 30 seconds, including computation of feature images. There is ample room for improving the speed of the method. The differences between the proposed method and the conventional method are in many cases not big, which is understandable as the result of the IMF algorithm was used as features for the regression method. The results for non-solid nodules are encouraging, taking into account that just two nodules were used for training the system in these cases. Using a
Supervised Probabilistic Segmentation of Pulmonary Nodules in CT Scans
917
Table 1. Experimental results for all 23 nodule cases. The last two columns show the results of the conventional method. Non-solid nodules are indicated with an asterix. case # 1 2 3 4 5 6 7∗ 8 9 10 11 12 13 14 15 16 17 18∗ 19 20 21∗ 22 23 mean std
SV 772.2 215.3 121.3 243.9 307.2 144.2 347.0 251.3 1171.0 21653.3 150.4 385.1 67.7 138.3 8125.5 541.8 113.1 912.4 4472.6 464.7 623.6 1515.9 19497.3 2705.9 5928.6
SO(s, h) 0.74 0.65 0.59 0.64 0.61 0.59 0.63 0.64 0.66 0.72 0.56 0.55 0.56 0.59 0.84 0.41 0.54 0.53 0.55 0.67 0.46 0.69 0.79 0.62 0.10
SO 0.89 0.80 0.13 0.68 0.80 0.73 0.52 0.79 0.67 0.78 0.65 0.70 0.47 0.77 0.90 0.54 0.70 0.29 0.51 0.79 0.60 0.74 0.77 0.66 0.18
RVE 3.7 9.4 586.0 0.4 -1.1 27.4 -44.6 10.0 11.5 7.7 20.4 25.9 -52.4 -10.2 0.2 -39.7 7.8 6.9 -32.9 -9.5 23.4 4.4 -9.1 23.7 124.5
SO [2] 0.28 0.67 0.10 0.49 0.67 0.68 0.01 0.70 0.58 0.68 0.63 0.59 0.50 0.68 0.85 0.37 0.64 0.04 0.56 0.71 0.08 0.64 0.78 0.52 0.25
RVE [2] 253.7 37.8 842.0 88.0 22.0 34.0 7456.6 27.7 67.5 34.4 43.1 7.6 -45.7 -5.4 15.6 -48.8 4.3 1845.5 4.2 29.8 977.7 40.7 15.8 510.8 1577.7
larger training database is clearly a priority. Many of the current 23 cases are unique and therefore not represented in the training data in a leave-one-out test regime. There is no doubt that this has a detrimental effect on the segmentation performance. Obvious issues for further research are investigation of other feature sets, different regression schemes and the use of feature selection and extraction methods. More interesting is the extension to a method that employs a second region regression based on features derived from the output of the first process. In this way features that are more directly related to the shape of the resulting segmentation can be defined. In practice, the purpose of nodule segmentation is often growth rate determination. It has been observed [1] that volume errors in automatic segmentation methods are not necessarily problematic for accurate growth rate estimation as long as they are consistent. The current method should be tested on more data to determine its reproducibility and accuracy in growth rate determination.
918
B. van Ginneken
(a)
(b)
(c)
(d)
Fig. 1. Segmentation result for various cases. (a) CT data, axial or coronal sections zoomed in on the nodule with a lung window level, W/L = 1400/-600 HU. (b) Truth. (c) Results of the proposed method. (d) Result of conventional method [2].
6
Conclusion
The manual outlines in the LIDC data set make a strong case for using soft instead of hard segmentation schemes for pulmonary nodules. The proposed method is fully automated, provides soft segmentations and learns from examples, contrary to existing approaches that use hard-coded rules, typically require correct choices of internal parameters to achieve good results, and produce binary segmentations. Although the system is not extensively optimized and trained with only a small number of cases, the obtained results are very promising.
Supervised Probabilistic Segmentation of Pulmonary Nodules in CT Scans
919
References 1. Mullally, W., Betke, M., Wang, J.: Segmentation of nodules on chest computed tomography for growth assessment. Med. Phys. 31 (2004) 839–848 2. Kostis, W., Reeves, A., Yankelevitz, D., Henschke, C.: Three-dimensional segmentation and growth rate estimation of small pulmonary nodules in helical CT images. IEEE Trans. Med. Imag. 22(10) (2003) 1259–1274 3. Fetita, C.I., Pr´eteux, F., Beigelman-Aubry, C., Grenier, P.: 3D automated lung nodule segmentation in HRCT. In: MICCAI. Volume 2878 of LNCS. (2004) 626– 634 4. Kuhnigk, J.M., Dicken, V., Bornemann, L., Wormanns, D., Krass, S., Peitgen, H.O.: Fast automated segmentation and reproducible volumetry of pulmonary metastases in CT-scans for therapy monitoring. In: MICCAI. Volume 3217 of LNCS. (2004) 933–941 5. Okada, K., Comaniciu, D., Krishnan, A.: Robust 3D segmentation of pulmonary nodules in multislice CT images. In: MICCAI. Volume 3217 of LNCS. (2004) 881–889 6. Okada, K., Comaniciu, D., Krishnan, A.: Robust anisotropic Gaussian fitting for volumetric characterization of pulmonary nodules in multislice CT. IEEE Trans. Med. Imag. 24 (2005) 409–423 7. Wiemker, R., Rogalla, P., Blaffert, T., Sifri, D., Hay, O., Shah, E., Truyen, R., Fleiter, T.: Aspects of computer-aided detection (CAD) and volumetry of pulmonary nodules using multislice CT. British Journal of Radiology, 78 (2005) 46–56 8. Sluimer, I., Schilham, A., Prokop, M., van Ginneken, B.: Computer analysis of computed tomography scans of the lung: a survey. IEEE Trans. Med. Imag. 25(4) (2006) 385–405 9. Wormanns, D., Diederich, S.: Characterization of small pulmonary nodules by CT. Eur. Radiol. 14 (2004) 1380–1391 10. Armato, S.G., McLennan, G., McNitt-Gray, M.F., Meyer, C.R., Yankelevitz, D., Aberle, D.R., Henschke, C.I., Hoffman, E.A., Kazerooni, E.A., MacMahon, H., Reeves, A.P., Croft, B.Y., Clarke, L.P.: Lung Image Database Consortium: Developing a resource for the medical imaging research community. Radiology 232(3) (2004) 739–748 11. Armato, S.G., Giger, M.L., MacMahon, H.: Automated detection of lung nodules in CT scans: Preliminary results. Med. Phys. 28(8) (2001) 1552–1561
MR Image Segmentation Using Phase Information and a Novel Multiscale Scheme Pierrick Bourgeat1, Jurgen Fripp1,3 , Peter Stanwell2 , Saadallah Ramadan2 , and S´ebastien Ourselin1 1
BioMedIA Lab, Autonomous Systems Laboratory, CSIRO ICT Centre, Sydney, Australia {pierrick.bourgeat, jurgen.fripp, sebastien.ourselin}@csiro.au 2 Institute for Magnetic Resonance Research, Sydney, Australia {stan, saad}@imrr.usyd.edu.au 3 School of ITEE, University of Queensland, Australia
Abstract. This paper considers the problem of automatic classification of textured tissues in 3D MRI. More specifically, it aims at validating the use of features extracted from the phase of the MR signal to improve texture discrimination in bone segmentation. This extra information provides better segmentation, compared to using magnitude only features. We also present a novel multiscale scheme to improve the speed of pixel based classification algorithm, such as support vector machines. This algorithm dramatically increases the speed of the segmentation process by an order of magnitude through a reduction of the number of pixels that needs to be classified in the image.
1
Introduction
Segmentation of textured organs is a difficult problem, with the most successful approaches relying on the combination of shape and texture information [1,2,3]. However, texture features do not always provide a sufficiently good discrimination to allow an accurate final segmentation [4]. This work aims at validating the use of features extracted from the phase of the MR image to improve texture discrimination in bone segmentation, which has broad clinical applications in diagnosis, detection of changes in longitudinal studies and surgical planning. For any type of MR image acquisition, magnitude is only one part of the MR signal, the latter being a complex signal acquired in k-space, with phase and magnitude components as shown Fig. 1. For conventional anatomical imaging methods, the phase information is non-coherent, with dephasing caused by chemical shift variation and local magnetic susceptibility. The latter effect is due to differing magnetic susceptibilities within the body and/or instrumental imperfections. Some imaging sequences are more sensitive to this effect than others, and with a properly chosen sequence, phase can be used to provide information about tissue interfaces, not available in the magnitude of the signal. Using phase to improve tissue contrast on magnitude images has prompted a growing interest in the MRI community, as highlighted by recent work from Haacke and Sehgal on susceptibility-weighted imaging [5,6,7]. R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 920–927, 2006. c Springer-Verlag Berlin Heidelberg 2006
MR Image Segmentation Using Phase Information
921
Fig. 1. Magnitude and phase image with TE = 8.6ms
For the remainder of the paper, we first describe the MR acquisition protocol, followed by the feature extraction method that enables features to be extracted from the phase without phase unwrapping, and we present a novel multiscale segmentation scheme designed to speed-up the pixel-wise classification step. In the final section, we present results on segmentation accuracy using phase information, and the performances of the multiscale scheme in terms of speed and accuracy.
2
MR Acquisition
The images used in this paper were acquired on a whole-body 3T clinical scanner (Magnetom Trio, Siemens AG, Germany) using the manufacturer’s transmitreceive quadrature extremity coil. Raw image data were acquired as a 3D-volume with the use of a simple gradient-echo sequence (FLASH). Raw data was then processed to produce separate phase and magnitude images. Acquisition parameters were chosen to maximise signal-to-noise ratio while enhancing the inherent phase contrast of the knee joint. The following acquisition parameters were used : echo times (TE ) - 4.9 & 8.6 ms, repetition time - 28 ms, flip angle - 15◦ , FOV 150 mm, matrix = 256 x 256, 1.5 mm slice thickness, and 64 partitions. The knees of 18 healthy volunteers were scanned, producing a set of 2 complex images per knee, corresponding to the 2 echo times. As presented in Fig. 1, the wrapped phase image shows strong textural information in the bone and the background, compared to the relatively smooth areas in the cartilage and the muscles. In areas of low intensity, such as the background, the signal does not contain enough information to produce an accurate measure of the phase.
3
Features Extraction
In Reyes-Aldasoro approach [8], a subband filtering is applied in the original k-space to extract texture features from the complex image, corresponding to a region of the spatial and frequency domain. The amplitude of the filter response
922
P. Bourgeat et al.
measures the signal energy within the selected region, and is strongly dependent on the local amplitude of the signal. Therefore, in the areas of low amplitude, the phase information will not have any effect on the filtering. However, if the complex image is normalized to produce a complex phase image of constant unity magnitude, as we proposed in [9], the filter response will be directly related to the phase content of the image. Therefore, the same filters can be applied on the magnitude and the normalized phase data. The normalization is performed by dividing the complex image I(x, y, z) by the amplitude A(x, y, z), to generate a complex phase image Iϕ (x, y, z) of constant unity amplitude, and therefore only composed of phase information : Iϕ (x, y, z) =
I(x, y, z) = ejϕ(x,y,z) . A(x, y, z)
(1)
Iϕ (x, y, z) is a complex image that can be Fourier transformed and then filtered in order to extract phase information. As phase wraps appears when the phase is mapped from the complex space to the real space, keeping the phase information in the complex domain removes the need of phase unwrapping. Variations in the phase induce variations in the frequency of the signal, and therefore, a frequency analysis performed using Gabor filters on Iϕ (x, y, z) can extract textural information about the phase. Our subband filtering is similar to that of Zhan and Shen [2], with a bank of non symmetric Gabor filters applied in the coronal and sagittal plane. Because of the anisotropic nature of the images, the Gabor filter bank must be designed in the real space instead of the pixel space. Therefore, a bank of filters with 6 orientations and 5 scales is used in the sagittal plane, and only 3 scales in the coronal plane to accommodate the poor high frequency content in that plane. A 3D Gaussian filter is then applied on the magnitude of the output of each filter to smooth the response across slices. As it is important to preserve rotation invariance, the magnitude of the output of the filters is summed across all orientations in order to obtain a set of features that is rotation invariant. Fig. 2 presents the features obtained in the sagittal plane from the magnitude and the phase images with TE = 8.6 ms. The feature images illustrate the discrimination power of the phase between bones and surrounding tissues.
4
Contour Refining Segmentation
The extracted features are classified using the support vector machines (SVM) classifier [10] in a pixel-wise fashion. The implementation relies on SVMLIB [11]. We use an RBF kernel, with the parameters (C, γ) optimised using a five fold cross validation. SVM are among the most effective methods for classification, but they usually require a large number of support vectors to address complicated non-linear problems, resulting in a high computational cost. Training methods have been designed to improve their efficiency through a smaller number of support vectors used [12]. However, they usually require to sacrifice some of the accuracy in order to significantly increase the decision speed. In our
MR Image Segmentation Using Phase Information
923
Fig. 2. Magnitude features (top row) and phase features (bottom row). The features represent a 5 level decomposition, from low to high frequencies (presented from left to right)
approach, we consider the classification problem in terms of image processing, where bones are large compact objects, and not all of the pixels need to be classified to get a good representation of their shape. Therefore, we propose a novel coarse-to-fine scheme that recursively refines the boundary of the objects, with a reduced number of pixels classified inside/outside of the object. The standard multiscale approach is aimed at identifying homogeneous regions on the coarser scale, and recursively refining this estimation using finer scales [13,14]. However, these techniques are designed to improve segmentation results while reducing sensitivity to noise, but do not typically target the speed of the segmentation process. Our implementation is specifically designed to reduce the number of pixels being classified by the SVM, and improve the segmentation speed. We use a coarse-to-fine approach to get a representation of the objects at the lowest resolution, and use this representation to identify the pixels lying on the contour. This contour is recursively refined until the highest resolution is obtained, thus only classifying the pixels lying on this contour. Consider a set of feature images F(x, y, z), a coarse segmentation of the bones can be obtained by subsampling F(x, y, z) by a scaling factor 2k (k ∈ N), such that Fk = F ↓ 2k , and classify Fk through SVM to produce a segmented mask Sk = SV M (Fk ). Sk is then upsampled by a factor 2 to produce an upsampled mask Uk−1 such that Uk−1 = Sk ↑ 2. The upsampled mask Uk−1 is then eroded and dilated using a cubic structuring element A of 1 pixel radius to produce two masks, respectively Ek−1 = Uk−1 A and Dk−1 = Uk−1 ⊕ A, and obtain a contour mask Ck−1 = Dk−1 − Ek−1 . The contour mask Ck−1 identifies the pixels where the real boundary of the object is most likely to be located, and therefore needs to be refined. The feature images F(x, y, z) are then subsampled by a factor 2k−1 to produce Fk−1 and match the scale of Ck−1 . The segmented image Sk−1 corresponding to the scale k − 1 is obtained as a combination of the eroded image Ek−1 , with the newly classified pixels of Fk−1 lying within Ck−1 , such that Sk−1 = SV M (Fk−1 ∩ Ck−1 ) ∪ Ek−1 . This process is repeated k times until the full resolution is reached. Fig. 3 illustrates the process with k = 4, showing the intermediate segmentations at each scale Fig. 3(a-e), and the corresponding contour masks Fig. 3(k-n) generated
924
P. Bourgeat et al.
through erosion and dilation of these segmented images. This approach reduces dramatically the number of pixels that need to be classified to obtain the full resolution segmented image. Moreover, the number of pixels classified is further reduced by keeping a list of the pixels already classified at previous scales, and prevent reclassifying the same pixel several time. A diagram of the complete scheme is presented in Fig. 4. This type of approach could be easily adapted to a multiclass problem by computing the contour masks for each class, and merge these masks into a single mask containing all the pixels that need to be refined for all classes.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
Fig. 3. Segmented images with k = 4 presented from scale 4 to 0 (a-e), along with the corresponding probability images (f-j), the contour masks showing where the contour is refined at each scale (k-n), and the manual segmentation (o). The images are all scaled at the same size for clarity.
Fig. 4. Diagram of the multiscale scheme
MR Image Segmentation Using Phase Information
925
The SVM can be trained to output a probability estimate, along with the hard segmentation. Our coarse-to-fine approach can also be used to generate probability maps where the initial probability is set by the segmentation at the coarse scale, and the probability estimate on the contour of the object is then recursively refined within the contour masks Ck . An example of the probability maps at each scale for k = 4 is presented Fig. 3(f-j). These probability maps can then be used as appearance model to drive an active shape model (ASM) with the shape constraints used to generate a more anatomically correct representation of the bones, as presented in [1] and [2].
5
Experimental Results and Discussion
From the database of 18 knees, where the bones have been manually segmented, 4 knees are used for training of the classifier with 10.000 points extracted from each image, and 14 knees are used for testing. For the first experiment validating the usefulness of the phase in bone segmentation, the classifier was trained with 3 different sets of features (Gabor filtering applied on magnitude only, phase only, or magnitude and phase) using either the first echo only, the second echo only, or both echos. Table 1. Mean (standard deviation) of the DSC on the bone segmentation over the 14 testing datasets
Magnitude Phase Magnitude and Phase
TE1
TE2
TE1 TE2
0.77 (0.05) 0.67 (0.08) 0.83 (0.04)
0.80 (0.04) 0.73 (0.05) 0.85 (0.03)
0.88 (0.03) 0.83 (0.05) 0.90 (0.02)
Since we are looking at segmenting 4 large bones, a size filter is applied to the segmented image to preserve the 4 largest connected components, and remove small misclassified volumes. The mean and standard deviation of the Dice similarity coefficient (DSC)1 after filtering are presented in Table 1 for the 14 test images. The combination of phase and magnitude information improves the results by increasing the mean DSC, and reducing the standard deviation. This leads to more accurate results with less variation across the dataset. This is an important result as the phase is always acquired during any type of acquisition and can be readily included in segmentation algorithms. Also, in the case of an acquisition with 2 echos, including the 2 echos dramatically improves the results (+0.08 on the magnitude, +0.1 on the phase and +0.05 on the magnitude and phase), to produce an excellent DSC of 0.9 with the phase-magnitude combination. For the second set of experiments, the multiscale approach is compared with regards to the number of scales k used (k = 0 being the full resolution) in terms 1
DSC =
2(A∩B) |A|+|B|
where A and B are two binary objects.
926
P. Bourgeat et al.
of percentage of pixels classified (Fig. 5(a)), and DSC of the segmented images (Fig. 5(b)) keeping only the 4 largest connected components. The experiments are conducted over the 14 testing dataset using the two echos, and the three sets of features. Fig. 5(a) shows the exponential decrease in the number of pixels classified when k is increased. The improvement is even more pronounced when the segmentation is more accurate, as it creates large areas with fewer contours. Interestingly, the DSC is also improved when k is increased as presented Fig. 5(b), to reach a peak for k = 3. This corresponds to an optimum where small misclassified volumes inside the bone will be correctly classified as they are not detected at the coarser scales, and therefore not refined. However, if k is increased above 3, small objects such as the thin fibula will not be detected at the coarser level, and will be missed out in the final segmentation. Therefore, k must be set according to the size of the smallest object to be segmented. With k = 3, only 8% of the pixels of the image are classified using magnitude, or combined phase and magnitude features. As a result, a classification that would take over 1 hour on 3.2GHz desktop computer if all the pixels were classified, can be performed in 5 minutes (the extra cost of the contour extraction accounting for only 10 seconds of this time). 0.92
Magnitude Phase Magnitude and Phase
25
0.90
0.88
DICE score
Percentage of pixels classified (%)
30
20
15
0.86
0.84
10
0.82
0.80
5 0
1
2
3
Number of scales k
(a)
4
5
0
1
2
3
4
5
Number of scales k
(b)
Fig. 5. Performance results of the multiscale scheme for different k values in terms of speed (a), and accuracy (b)
6
Conclusion
Phase can be easily integrated in a feature extraction scheme to provide superior tissue discrimination compared to magnitude only based features. Phase information is always acquired and its incorporation into current image analysis techniques provides an exciting new field of research. In the case of bone segmentation, the use of phase improves the accuracy and robustness of the segmentation, resulting in a higher DSC and smaller standard deviation across the 14 testing dataset. The multiscale scheme dramatically improves the speed of the segmentation, speed being the most prominent drawback of SVM-based image segmentation algorithm. The segmented images can be used as an automatic initialisation of ASM models through registration with an atlas, or can be
MR Image Segmentation Using Phase Information
927
directly used via the probability estimate to drive the ASM deformation. Future work will look into other approaches to extract phase information without phase unwrapping, and study other organs where phase can assist segmentation and analysis.
References 1. van Ginneken, B., Frangi, A., Staal, J., Romeny, B., Viergever, M.: Active shape model segmentation with optimal features. IEEE Trans Med Imaging 21(8) (2002) 924–933 2. Zhan, Y., Shen, D.: Automated segmentation of 3D US prostate images using statistical texture-based matching method. In: MICCAI’03. Volume 2878 of LNCS., Montreal, Canada, Springer Verlag (2003) 688–696 3. Xie, J., Jiang, Y., t. Tsui, H.: Segmentation of kidney from ultrasound images based on texture and shape priors. IEEE Trans Med Imaging 24(1) (2005) 45–57 4. Lorigo, L., Faugeras, O., Grimson, W., Keriven, R., Kikinis, R.: Segmentation of bone in clinical knee MRI using texture-based geodesic active contours. In: MICCAI’98. Volume 1496 of LNCS., Cambridge, MA (1998) 1195–1204 5. Haacke, E., Xu, Y., Cheng, Y.C., Reichenbach, J.: Susceptibility weighted imaging (SWI). Magn Reson Med 52(3) (2004) 612–618 6. Haacke, E., Xu, Y.: The role of voxel aspect ratio in determining apparent vascular phase behavior in susceptibility weighted imaging. Magn Reson Med 24(2) (2006) 155–160 7. Sehgal, V., Delproposto, Z., Haacke, E., Tong, K., Wycliffe, N., Kido, D., Xu, Y., Neelavalli, J., Haddar, D., Reichenbach, J.: Clinical applications of neuroimaging with susceptibility-weighted imaging. J Magn Reson Imaging 22(4) (2005) 439–450 8. Reyes-Aldasoro, C.C., Bhalerao, A.: Volumetric texture description and discriminant feature selection for MRI. In: IPMI’03. Volume 18., Ambleside, UK, Springer (2003) 282–293 9. Bourgeat, P., Fripp, J., Janke, A., Galloway, G., Crozier, S., Ourselin, S.: The use of unwrapped phase in MR image segmentation : a preliminary study. In: MICCAI’05. Volume 3750 of LNCS., Palm Springs, CA, Springer Verlag (2005) 813–820 10. Vapnik, V.: The nature of statistical learning theory. Springer Verlag, New York (1995) 11. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001) Software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm. 12. Zhan, Y., Shen, D.: An adaptive error penalization method for training an efficient and generalized SVM. Pattern Recognition 39(3) (2006) 342–350 13. Liang, K.H., Tjahjadi, T.: Adaptive scale fixing for multiscale texture segmentation. IEEE Trans Image Process 15(1) (2006) 249–256 14. Choi, H., Baraniuk, R.G.: Multiscale image segmentation using wavelet-domain hidden Markov models. IEEE Trans Image Process 10(9) (2001) 1309–1321
Multi-resolution Vessel Segmentation Using Normalized Cuts in Retinal Images Wenchao Cai and Albert C.S. Chung Lo Kwee-Seong Medical Image Analysis Laboratory, Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong {wenchao, achung}@cse.ust.hk Abstract. Retinal vessel segmentation is an essential step of the diagnoses of various eye diseases. In this paper, we propose an automatic, efficient and unsupervised method based on gradient matrix, the normalized cut criterion and tracking strategy. Making use of the gradient matrix of the Lucas-Kanade equation, which consists of only the first order derivatives, the proposed method can detect a candidate window where a vessel possibly exists. The normalized cut criterion, which measures both the similarity within groups and the dissimilarity between groups, is used to search a local intensity threshold to segment the vessel in a candidate window. The tracking strategy makes it possible to extract thin vessels without being corrupted by noise. Using a multi-resolution segmentation scheme, vessels with different widths can be segmented at different resolutions, although the window size is fixed. Our method is tested on a public database. It is demonstrated to be efficient and insensitive to initial parameters.
1
Introduction
The analysis of retinal images, also called fundus images, plays an important role in the diagnoses of various eye diseases. Useful features of color retinal images are extracted and analyzed to help the ophthalmologists, e.g. optic disk, exudates, the structure and widths of vessels. In this paper, we focus on the vessel segmentation in retinal images. Vessel segmentation is very important in various medical diagnoses. Many algorithms for vessel segmentation in 2-D and 3-D data have been proposed. In a broad term, these algorithms can be divided into two categories: those utilize some filters to extract boundary or ridge of the vessels, followed by further refinement (e.g. [1,2,3,4]); and those employ tracking strategy by some given or detected seeds in the vessels (e.g. [5,6,7]). The algorithm presented by Chaudhuri et al. [2] was based on directional 2-D matched filter that assumed a Gaussian vessel cross-sectional profile and small vessel radius variations. Hoover et al. [1] improved this method by a region-based threshold probing of the matched filter response. Jiang and Mojon [8] directly applied multi-threshold probing to the image through a verification procedure which made use of a curvilinear structure model. Koller et al. [9] introduced a nonlinear multiscale line filter based on the second derivative of Gaussian R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 928–936, 2006. c Springer-Verlag Berlin Heidelberg 2006
Multi-resolution Vessel Segmentation Using Normalized Cuts
929
function. This filter could be used to detect lines of various widths in 2-D and 3D image data. Similarly, some algorithms ([3,10,11,12]) used the Hessian matrix to extract the local line features on different scales of 3-D volume. Different from these algorithms based on filters, Tolias and Panas [5] developed a fuzzy C-means clustering algorithm. They started the fuzzy tracking algorithm from the optic located by the algorithm. Lalonde et al. [6] tracked a pair of points on the boundary of vessel in the edge map and Cree et al. [7] tracked the vessel by fitting a 2-D physical model. The methods above for vessel segmentation can work well to extract the major parts of the vasculature. For the thinner parts, however, it is still an open problem because the image contrast around thin vessels is generally low. Recently, focusing on 2-D retinal images, several supervised methods [13,14,15] have been explored to get better results. At every pixel, they extracted a feature vector in the neighborhood. The vector was classified by a kNN-classifier [13,14] or a Bayesian classifier with class-conditional probability density function [15]. It can be labor-intensive to segment the training images manually. The process of training needs to be repeated in different situations. Thus we propose an unsupervised method in this paper. The proposed algorithm employs the gradient matrix of the Lucas-Kanade equation [16] to evaluate a local window, and determines whether there is a vessel in the window by eigen-decomposing the gradient matrix. Unlike the methods using the Hessian matrix, since the proposed algorithm avoids computing the second derivative, it can detect thinner vessels in a window. For every candidate window, which is a region containing a vessel, the algorithm searches a intensity threshold by minimizing the normalized cut criterion [17]. In order to segment thinner vessels and avoid selecting noisy candidate windows, the algorithm utilizes a tracking strategy. This makes it possible to set strict parameters for pre-selection and relax them dynamically . For vessels with different widths, the algorithm generates the Gaussian pyramid for an input image and segment vessels with different widths at different resolution levels. It is shown that the experimental results on a public database, DRIVE [13], are more accurate than one of the widely used unsupervised methods [8], and comparable with two supervised methods [13,15].
2
Proposed Method
The overview of the proposed method is described as follows: 1. Compute the Gaussian pyramid of the input image; 2. For images at different resolution levels, compute the gradient vector for every pixel by the Sobel operator; 3. Scan the image by a sliding window of fixed size, compute the gradient matrix of the window and evaluate it by eigen-decomposing the matrix (Section 2.1); 4. Select the window whose eigenvalues satisfy some conditions as an candidate window, and search a local intensity threshold in the candidate window by which the binary segmentation with the smallest N Cut in Eq. 3 produces the blood vessel (Section 2.2);
930
W. Cai and A.C.S. Chung
5. If a vessel is detected, track the vessel along the direction generated from the eigenvector of gradient matrix and relax parameters to select the next candidate window (Section 2.3); otherwise, continue scanning; 6. Combine all the results obtained from different resolution levels and refine the final result. The details are explained in the following subsections. 2.1
Gradient Matrix
The gradient matrix originates from the problem of optical flow. Eq. 1 is the basic optical flow equation ∇I · [u v]T + It = 0.
(1)
Considering every pixel p in a small window under the assumption that the flow field is smooth locally, Lucas and Kanade [16] proposed a least square method to obtain the solution [u, v] as follows: Ix Ix p Ix Iy Ix It T p p · [u v] = − . (2) p Iy Ix p Iy Iy p Iy It The left hand side of Eq. 2 is called gradient matrix in this paper, denoted by G. The gradient matrix G contains the texture information of the window. Table 1 shows the relationship between local features and two eigenvalues of G, λ1 and λ2 (λ1 ≥ λ2 ). Table 1. Relationship between local features and G Local Feature λ1 High textured Large Low textured Small Edge Large Line Large
λ2 Large Small Small Small
|Σ∇I| – Small Large Small
In Table 1, we see that the type of local feature can be determined by the values of λ1 and λ1 /λ2 . However, from the two eigenvalues, we cannot distinguish between an line and an edge. It is observed that, although both of them produce the gradient matrices with large λ1 and small λ2 , all the gradients along an edge are in the same orientation, and the gradients along a line are in two inverse orientations. For a line structure in the window, the magnitude of the sum of gradients (|Σ∇I|) will be small. Therefore, the line structure can be detected by looking at the value of (|Σ∇I|/max(|∇I|)), where max(|∇I|) is the maximal magnitude of the gradient in the window. It is used to normalize |Σ∇I| in regions with different contrast. For example, in Figure 1, we consider three 7×7 windows. From top to bottom, there is a junction in the first window, so it is a high texture region with a large λ1 = 64799.6 and a small λ1 /λ2 = 2.4. There is a vessel (a
Multi-resolution Vessel Segmentation Using Normalized Cuts
931
Fig. 1. Illustration of selecting candidate windows and segmentation in candidate windows. The left image is the green channel of an original image; the middle images show the corresponding zoom-in white windows in left image; the right column shows the eigenvalues of the gradient matrices of the corresponding 7 × 7 white windows and the segmentation results. The third window with an edge is rejected due to a large value of |Σ∇I|/max(|∇I|).
line) crossing the second window, so λ1 = 118183.6 is large and λ1 /λ2 = 45.4 means that λ2 is relatively small. There is an edge in the third window, λ1 = 1219631.3 is also very large and λ1 /λ2 = 50 also indicates a relatively small λ2 . However, the values of |Σ∇I|/max(|∇I|) distinguish the second (with 2.4) and the third (with 20.7) windows. Therefore, according to these three values, the first and second windows are selected as candidate windows, while the third window is rejected. 2.2
Normalized Cut
After selecting the candidate windows by the gradient matrix, the algorithm needs to find an efficient way to segment the vessel. In our work, we introduce normalized cut criterion to search the best intensity threshold for segmentation. The normalized cut (NCut) was proposed by Shi and Malik [17] based on the graph theory and spectral clustering. For two-way cut of a graph, the normalized cut criterion can be written as: N Cut(A, B) =
p∈A,q∈B
wpq (
1 p∈A
Dp
+
1 p∈B
Dp
),
(3)
where A, B are two segments of the original set V , wpq is the similarity between two vertexes p and q according to intensity and distance, Dp = q∈V wpq is defined as the degree of the vertex p. From Eq. 3, we can see that the normalized cut measures the total dissimilarity between the different groups, as well as the
932
W. Cai and A.C.S. Chung
total similarity within groups, which can help to search the best threshold to segment the vessel in a given window. Although the problem of minimizing N Cut is NP hard [17], ignoring the discrete constraint, the authors presented an approximation solution by eigendecomposing. Because computation of eigen-decomposing is inefficient, we only use Eq. 3 as a criterion to search an intensity threshold for every candidate window which produces the lowest N Cut. Although it is an approximation of the standard normalized cut, it works more efficiently and our experiments show that most of the vessels can be well segmented by a local intensity threshold in candidate windows. Based on Eq. 3, the N Cut tends to partition the window into two parts with similar size. Therefore, we assume the width of a vessel is around 2 − 3 pixels, and set the size of the sliding window 7 × 7 to suit the property of normalized cut criterion. Because the size of candidate windows (7 × 7) is too small to show the segmentation image, the segmentation result of the first and second window in Figure 1 are shown by a binary 7 × 7 matrix in the right column of Figure 1. 2.3
Vessel Tracking and Multi-resolution
Based on the two steps described above, the algorithm can segment the vessels from an image. In order to obtain more accurate segmentation result, we have to set appropriate parameters to select candidate windows. Although loose parameters can help to select the windows with thinner vessels, they also introduce more noisy windows, which greatly increase the computation load and segmentation error. To be insensitive to initial parameters, our method utilizes tracking strategy to solve this problem in our work. Since the first eigenvector of the gradient matrix indicates the normal direction of a vessel in the corresponding window, the method can track along the tangent direction of the vessel. Meanwhile, the parameters are relaxed to make it easy to select the next tracked window. As mentioned in Section 2.2, we fix the size of sliding window to make full use of the advantage of the N Cut. When the width of vessel is around 2 − 3 pixels, the proposed method will work best. Considering the variation of widths, we construct the Gaussian pyramid for the input image, and thus vessels with different widths will be segmented in different resolution levels. Finally, the algorithm recovers the results from images of different resolutions to the original resolution. The pixels segmented out in any resolution level are combined as the final result.
3 3.1
Experiments and Results Data and Evaluation
Our method was tested on a public database of retinal images, called DRIVE [13]. In total, there are 40 color images with 565 × 584 pixels. All the images are divided into two groups: 20 images segmented by only one observer for training and the other 20 images segmented by two observers for testing. More details of
Multi-resolution Vessel Segmentation Using Normalized Cuts
933
the database can be found in [13]. In our experiments, we took the segmentation of the first observer as the ground truth, and the second is for reference. We only showed the results of the testing set to compare with other methods. As mentioned in [6], since the green channel provides the highest contrast of the image, we only used the green channel of the color images for testing. To evaluate the performance, we employed the receiver operating characteristic (ROC) curves, which show the plots of true positive fraction Tp versus false positive fraction Fp . The true positive and false positive fractions are defined as follows: Ntp Nf p Tp = , Fp = , (4) Nvessel Nnon−vessel where Ntp , Nf p are the numbers of true positives and false positives respectively; Nvessel , Nnon−vessel are the total numbers of vessel and non-vessel pixels in the ground truth. The DRIVE database provides mask images for all the images. So that only pixels inside FOV were considered in the testing. 3.2
Parameters Setting
In our experiments, the parameters of the algorithm were set as follows. For normalized cut, we compute the similarity wij between two pixels i and j: 2
wij = e
− I(i,j) 2 σ
1
e
−D(i,j)2 σ2 2
,
(5)
where I(i, j) is the difference of intensity; D(i, j) is the distance between i and j; σ12 = δI2 ∗ 0.3 (δI is the maximal variation of intensity in current window),and σ22 = 49. A window will be selected as a candidate if the conditions are satisfied: λ1 > T1 ,λ1 /λ2 > T2 ,|Σ∇I|/ max (|∇I|) < T3 . where T1 , T2 and T3 are parameters for selecting candidate windows. Based on our experiments, we made a tradeoff between all the three parameters by using only one variable k: T1 = 12000 − 1000k, T2 = 2.5 − 0.1k, T3 = 2.0 + 0.7k. We changed k from 0 to 10 to generate the ROC curves. For different resolution levels, T1 and T2 are multiplied by a factor n + 2, n is the level of resolution. In the experiments, we use three resolution levels, a higher resolution (n = 0), the original resolution (n = 1) and a lower resolution (n = 2). In lower resolution, the vessels to be segmented corresponds to the thick vessels in the original image. In general, the image contrast around these thick vessels is high, and it is easy to segment them. Therefore, the parameters were directly fixed in the lower resolution with T1 = 60000, T2 = 2.5, T3 = 4 in all situations. In the tracking process, T1 , T2 and T3 are multiplied by factors of 0.7, 0.7 and 1.5 respectively. 3.3
Results
Because the performances of some existing methods on the database have been reported in [14] and [15], we directly compared our results with the reported results. The ROC curves of all the methods are plotted in Figure 2. In [13]
934
W. Cai and A.C.S. Chung
Fig. 2. The ROC curves and Dc of four methods on the DRIVE
and [15], they computed the area Az under the ROC curve to evaluate the performance of the algorithms. However, we only plotted part of ROC with Fp < 0.5, and compute the distance Dc between the ROC and the ideal point (0, 1) instead of Az . The closer to the ideal point, the better the algorithm is. There are two reasons. First, unlike the supervised method, our method cannot control only one parameter to threshold all the pixels. This means our curves hardly reach left bottom and upper right corners. Second, we also think it is no need to compare Az , because the result is meaningless when Fp is too large (there will be more false positives than true positives). We pay more attention to the area around the point of segmentation by the second observer, as pointed out by the arrows in Figure 2. Although we don’t have the accurate testing results of other methods, the values of Dc still can distinguish the performances of all the methods (see the legend of Figure 2). First of all, we have to point out that the methods of Staal et al. and Soares et al. are supervised. Especially, they did leave-one-out experiments on STARE database. It is natural that their results are better than other unsupervised methods. In Figure 2, although the curves of the proposed method are below the curves of two training methods, they are closer to the ideal point (0, 1) and higher than the curves of Jiang et al.. Besides that, the ROC curves show another feature of the method. Every marker on the ROC of the proposed method stands for an average evaluation of a set of parameters (the markers on the other curves are the points we sample from the reported results). We notice that, although we linearly adjust the parameters by the same step, they are very compact on the left part of the curve which is in the region close to the second observer and the ideal point (0, 1). That is because the algorithm uses the tracking strategy which makes the algorithm not so sensitive to initial parameters. The experiments also show the method is efficient. Although our implementation is experimental, the proposed method can segment an image of DRIVE database in about 30s (Pentium-IV 1.5GHz), which is similar with the method of
Multi-resolution Vessel Segmentation Using Normalized Cuts
935
Jiang et al., but much faster than the supervised method in [13] (about 15min, Pentium-III 1.0GHz). There is a large space to speed up our method, because most of the major parts of vessels are segmented in all three resolution levels. We can combine the process of segmentation in different resolution levels together to avoid the redundant computation.
4
Conclusion
In this paper, we have proposed an unsupervised method based on normalized cut criterion and the gradient matrix to segment vessels in retinal images. The performances of the proposed method and some existing methods have been evaluated using the ROC curves. The evaluation demonstrates our method can improve the segmentation accuracy, as compared with the other widely used unsupervised method. The proposed method is computationally efficient.
References 1. Hoover, A., Kouznetsova, V., Goldbaum, M.H.: Locating blood vessels in retinal images by piece-wise threshold probing of a matched filter response. TMI 19 (2000) 203–210 2. Chaudhuri, S., Chatterjee, S., et al.: Detection of blood vessels in retinal images using two-dimensional matched filters. TMI (1989) 263–269 3. Krissian, K., Malandain, G., et al.: Model-based multiscale detection of 3d vessels. In: CVPR. (1998) 722–727 4. Lowell, J., Hunter, A., et al.: Measurement of retinal vessel widths from fundus images based on 2-d modeling. TMI 23 (2004) 1196–1204 5. Tolias, Y.A., Panas, S.M.: A fuzzy vessel tracking algorithm for retinal images based on fuzzy clustering. TMI 17 (1998) 263–273 6. Lalonde, M., Gagnon, L., Boucher, M.C.: Non-recursive paired tracking for vessel extraction from retinal images. In: CVI 2000. (2000) 61–68 7. Cree, M., Cornforth, D.J., H.F., J.: Vessel segmentation and tracking using a two-dimensional model. In: IVC New Zealand. (2005) 345–350 8. Jiang, X., Mojon, D.: Adaptive local thresholding by verification-based multithreshold probing with application to vessel detection in retinal images. PAMI 25 (2003) 131–137 9. Koller, T., Gerig, G., et al.: Multiscale detection of curvilinear structures in 2d and 3d image data. In: ICCV. (1995) 864–869 10. Lorenz, C., Carlsen, I.C., et al.: Multi-scale line segmentation with automatic estimation of width, contrast and tangential direction in 2d and 3d medical images. In: CVRMed. (1997) 233–242 11. Sato, Y., Nakajima, S., et al.: 3d multi-scale line filter for segmentation and visualization of curvilinear structures in medical images. In: CVRMed. (1997) 213–222 12. Chen, J., Amini, A.A.: Quantifying 3-d vascular structures in mra images using hybrid pde and geometric deformable models. TMI 23 (2004) 1251–1262 13. Staal, J., Abr` amoff, M.D., et al.: Ridge-based vessel segmentation in color images of the retina. TMI 23 (2004) 501–509
936
W. Cai and A.C.S. Chung
14. Niemeijer, M., Staal, J.J., et al.: Comparative study of retinal vessel segmentation methods on a new publicly available database. Volume 5370., SPIE (2004) 648–656 15. Soares, J.V.B., Leandro, J.J.G., et al.: Using the 2-D Morlet wavelet with supervised classification for retinal vessel segmentation. In: 18th Brazilian Symposium on Computer Graphics and Image Processing, Natal, RN (2005) 16. Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: IJCAI. (1981) 674–679 17. Shi, J., Malik, J.: Normalized cuts and image segmentation. PAMI 22 (2000) 888–905
Simulation of Local and Global Atrophy in Alzheimer’s Disease Studies Oscar Camara-Rey1, Martin Schweiger1, Rachael I. Scahill2 , William R. Crum1 , Julia A. Schnabel1 , Derek L.G. Hill1 , and Nick C. Fox2 1
Center of Medical Image Computing, Unviersity College of London, UK
[email protected] 2 Dementia Research Centre, Institute of Neurology, University College of London, UK
Abstract. We propose a method for atrophy simulation in structural MR images based on finite-element methods, providing data for objective evaluation of atrophy measurement techniques. The modelling of diffuse global and regional atrophy is based on volumetric measurements from patients with known disease and guided by clinical knowledge of the relative pathological involvement of regions. The consequent biomechanical readjustment of structures is modelled using conventional physics-based techniques based on tissue properties and simulating plausible deformations with finite-element methods. Tissue characterization is performed by means of the meshing of a labelled brain atlas, creating a reference volumetric mesh, and a partial volume tissue model is used to reduce the impact of the mesh discretization. An example of simulated data is shown and a visual evaluation protocol used by experts has been developed to assess the degree of realism of the simulated images. First results demonstrate the potential of the proposed methodology.
1
Introduction
The pattern of cerebral atrophy on magnetic resonance imaging (MRI) is currently used to help differentiate the degenerative dementias while the rate of atrophy may be used to detect early disease and potentially to assess diseasemodifying effects of therapies. Algorithms referred to as computational anatomy [1] are computerized approaches that offer automated or semi-automated solutions for MRI analysis, including quantification of cerebral atrophy. There is a lack of a gold standard against which to judge these methods or to help refine them. There have been limited attempts [2,3,4] to model the complex effects on
O. Camara acknowledges support of the EPSRC GR/S48844/01, Modelling, Understanding and Predicting Structural Brain Change. W.R. Crum and M. Schweiger acknowledge support of the Medical Images and Signals IRC (EPSRC GR/N14248/01 and UK Medical Research Council Grant No. D2025/31). J.A. Schnabel acknowledges support of the EPSRC GR/S82503/01, Integrated Brain Image Modelling project. R.I. Scahill and N.C. Fox acknowledge support from the UK Medical Research Council, G90/86 and G116/143 respectively.
R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 937–945, 2006. c Springer-Verlag Berlin Heidelberg 2006
938
O. Camara-Rey et al.
Fig. 1. Mesh generation procedure. Cutting plane of the volumetric finite-element mesh (left) and slices of the rasterization image generated from the labelled mesh, in which some of the used labels are shown (X11 colors): GM, light green; WM, cyan; cortical CSF, cornflowerblue; ventricles, greenyellow; MTL grey matter, red; subtentorial structures, yellow; hippocampi, limegreen; ERA, dark orange.
regional structural volumes and structural configuration that are a consequence of neurodegeneration. The main drawback of these approaches is that they do not take into account the interrelation of different tissue types. We present a method for the simulation of atrophy in MR images of the brain in which atrophy is simulated in different tissue compartments and/or in different neuroanatomical structures with a phenomenological model. This model applies a differential pathological burden in different brain regions. Once volume changes have been defined in particular structures and tissues, the overall impact on brain shape is determined by a conventional physics-based biomechanical model. The consequent readjustment of structures is modelled using techniques based on tissue properties and simulating plausible deformations with finite-element methods (FEM). This validation framework was proposed by Schnabel et al. [5], for the analysis of MR mammography data. In our application, a thermoelastic model of tissue deformation is employed, controlling the rate of progression of atrophy by means of a set of thermal coefficients, each one corresponding to a different type of tissue. The proposed technique can be divided into three main steps: meshing of a reference labelled brain atlas; its introduction into a FEM solver that will generate the simulated deformations; and the application of such deformations to the grey-level version of the brain atlas. A visual evaluation protocol used by experts has been developed to assess the degree of realism of the atrophy simulated images.
2
Mesh Generation
In this work, we have combined information from two well-known brain atlases to generate the reference labelled image and mesh used by the FEM solver: the Montreal Neurological Institute (MNI) Brainweb 1 ; and the International Consortium of Brain Mapping (ICBM) 2 atlases. 1 2
http://www.bic.mni.mcgill.ca/brainweb/ http://www.loni.ucla.edu/ICBM/
Simulation of Local and Global Atrophy in Alzheimer’s Disease Studies
939
For our purposes, we have taken into account the tissue labels from the MNI atlas, including the cerebrospinal fluid (CSF), the grey matter (GM) and the white matter (WM). We have also incorporated labels for the hippocampi and the lateral ventricle, obtained from a semi-automatic segmentation by clinical experts [6], due to their importance as biomarkers in dementia. In addition, we have incorporated the cerebral lobe labels from the ICBM Probabilistic atlas using an affine registration technique in order to place this atlas into the same geometrical space as the MNI one. Finally, two additional labels from the ICBM Template atlas have been included: a label covering the subtentorial structures to specify boundary conditions of the FEM; and a label for the entorhinal area (ERA) because of its implication in Alzheimer’s disease. Some of these labels are illustrated in Figure 1. The first phase of the meshing consists of the generation of a surface mesh from the intracranial surface using the classical marching cubes [7] algorithm, followed by some post-processing steps (decimation, smoothing and selection of biggest connected component) to guarantee the creation of a continuous surface brain mesh. A volume (3D) finite-element (4-noded tetrahedra) mesh was then obtained from the intracranial mesh surface using the NETGEN 3 openSource software. We generated a tetrahedral mesh with very fine resolution (163765 nodes and 868404 elements, with 61132 surface elements). The final step is the labelling of the 3D mesh using the labelled voxel atlas, in which a set of particular biomechanical properties is assigned to each tetrahedron through a rasterization procedure [8]. In order to reduce the impact of the discretization of the mesh, we apply a partial volume tissue model to each tetrahedron in the FEM solver, as detailed in Section 3.3.
3
Finite-Element Deformation Model
3.1
Model
The deformation model employs a linear elastic finite-element method based on the freely available TOAST package 4 . In this work we considered structural displacements induced by atrophy that are small in relation to the total brain volume, so that the use of a linear model is justified. The implementation of the FEM model follows a standard approach, whereby each tetrahedral mesh element is assigned a set of elastic material properties represented by an elasticity matrix D. In the case of isotropic elastic deformations, D is symmetric and can be expressed in terms of two parameters, usually given as Young’s modulus E and Poisson’s ratio ν. The elastic coefficients are assumed time-invariant. The deformation of the mesh is induced by assigning an isotropic thermal expansion coefficient α(i) to each element i, and simulating a global temperature change ΔT . The resulting isotropic thermal expansion enters the description of elastic deformation in the form of an initial element strain, 3 4
http://www.hpfem.jku.at/netgen/ http://www.medphys.ucl.ac.uk/$\sim$martins/toast/index.html
940
O. Camara-Rey et al. (i)
ε0 = {α(i) ΔT, α(i) ΔT, α(i) ΔT, 0, 0, 0},
(1)
where the relation between stresses σ and strains ε is given by (i)
(i)
σ (i) = D(i) (ε(i) − ε0 ) + σ0
(2)
Assembling all element contributions of the mesh leads to the linear system Ku + f + f = r, Kij = BTi DBj dV (3) V
with stiffness matrix K. In an n-noded element, B is a 6 × 3n strain displacement matrix B = {Bi }, and f =
V
BT Dε0 dV,
(4)
contains the volume forces arising from the initial thermal strain, f combines all other surface and volume force terms, r defines explicit displacements, and u is the vector of nodal displacements. 3.2
Boundary Conditions and Mechanical Properties of Brain Tissue
The surface of the mesh was assumed to coincide with the inner surface of the skull, and homogeneous Dirichlet boundary conditions were introduced using a Payne-Irons method to suppress the displacements of boundary nodes. The same strategy was applied to the mesh nodes corresponding to the subtentorial structures since atrophy induced by dementia is normally restricted to the supratentorial area. There is no consensus on the optimal Young’s modulus E and Poisson’s ratio ν of the different brain tissues. Nevertheless, the choice of the elastic properties are not critical in the model presented here, because the boundary conditions enforce a constant global volume, while the relative changes in tissue region volumes are driven by the ratio of thermal coefficients. Furthermore, the mechanical properties of brain tissue are directly relevant to how the brain responds to an external force but are much less relevant to the response of the brain to diffuse cell damage and death that results in active structural readjustment. Therefore, we have adopted the shear modulus values proposed by McCracken et al. [9]: 12KPa for WM and 8KPa for GM (correspondingly, Young’s modulus of 3.86KPa for WM and 2.57KPa for GM). We consider the remaining structures, except the CSF, as being composed of an equal percentage of GM and WM, then their material properties are the average of the ones corresponding to the GM and WM (E = 3.22KPa). For the Poisson’s ratio, we have chosen a value of 0.45 for each brain tissue, except for the CSF, assumed to be a very soft and compressible tissue, with low Young’s modulus (E = 0.1KPa) and Poisson’s ratio (ν = 0.05). 3.3
Thermal Coefficient Computation
Given a segmentation of domain Ω into T tissue types or anatomical structures Ωj , Ω = ∪j Ωj , j = 1..T , we assume the thermal coefficient α to be constant
Simulation of Local and Global Atrophy in Alzheimer’s Disease Studies
941
Fig. 2. Relation between thermal coefficients and desired volume change. Left: Percentage of GM element change when varying the thermal coefficients of the GM and WM (TCgm and TCwm respectively). Right: Relation between desired and the obtained volume change of GM and WM.
within each region Ωj , so that α is expressed in a piecewise constant basis, α(r) = αj , r ∈ Ωj . Once the material properties have been defined, we need to estimate the thermal coefficients αj that will provide a mesh deformation consistent with the desired volume change ΔVj in each tissue segment Ωj . This volume change will provide the ground-truth information needed for validation. ΔVj is computed by integrating the differences in element volumes between the original and displaced meshes over each region Ωj . The method of recovering the thermal coefficients for a given set of target region volume changes is based on a linear inversion technique. The relationship between changes of αj and ΔVi in regions i and j is expressed by the Jacobian matrix ∂Vi Jij = , (5) ∂αj which we compute with a finite difference scheme, by explicitly perturbing each αj . Now we form the linear system J [α] = [ΔV ], which, taking into account the boundary conditions, is quadratic of dimension (T −1)×(T −1) and can be solved by a standard LU decomposition. As illustrated in Figure 2, the assumption of linearity is appropriate. In this figure, we also show the good agreement between the desired and the obtained volume change for a set of 30 simulations (mean difference of 0.35±0.41 and 0.14±0.08 for GM and WM, respectively), confirming the approppriateness of the linear model. As pointed out in Section 2, our framework enables us to use the information about the percentages of tissue in each element in order to reduce the effect of the mesh discretization through partial volume weighting. Hence, for each mesh element i, the thermal coefficient α(i) will be a partial volume-weighted average of the region thermal coefficients contributing to the element: (i) α(i) = pj αj (6) j
(i)
where pj is the fractional volume contribution of region j to element i.
942
4
O. Camara-Rey et al.
Warping
In order to simulate atrophy in the atlas image, we need to use interpolation techniques to generate a displacement field at each voxel of the image. We do this by weighting the displacement vectors of the nodes in the element containing a given voxel by the element linear shape function [10], and then the new image intensities can be interpolated using a truncated sinc interpolation kernel. We have adopted a different interpolation strategy outside the mesh, where no deformation information is available from the FEM model. Here we employ a scattered data interpolation technique based on multilevel B-splines, as in Schnabel et al. [5]. Using such an interpolation scheme, we can warp the grey-level version of the MNI atlas without noise using the dense displacement field generated by the FEM solver. Finally, we add Rician distributed noise to the atrophy simulated image.
5
Simulation Example
The illustrative example shown in this section is focused on Alzheimer’s disease, as it is the most common cause of dementia. Therefore, the longitudinal example is based on the generation of a set of simulated images approximately mimicking the distribution of regional change in AD according to the severity of the disease. Three different atrophy simulated images are generated: in the first one (t1), only changes in the hippocampi and the entorhinal area are applied; the second image (t2) also includes GM changes in the medial temporal lobe and an expansion of the lateral ventricles; finally, a global reduction of the whole brain is applied to generate the third simulated image (t3). In addition, we have generated a corresponding set of simulated images at the three timepoints representing volume change induced by normal aging. We chose a range of values (see Table 1) which included a range of cerebral volume losses, as a percentage of baseline, that is consistent with published rates of atrophy (e.g. for hippocampus and brain) and study durations [11,12] of 6-24 months Figure 3 shows the difference images between the normal aging and the AD simulated images at each timepoint.
6
Measuring Degree of Realism
The aim of this modelling is to generate simulated atrophy for the validation of longitudinal or cross-sectional techniques for the quantification of local and global atrophy. In order to determine whether our simulated atrophy was plausible, we designed an experiment in which a visual protocol is used by experts to differentiate between real and simulated MRI data of controls and patients with atrophy. The experiment uses two matched cohorts of 20 healthy elderly controls and 20 individuals with a diagnosis of probable Alzheimer’s disease, together with nine atrophy simulated images. T1-weighted volumetric MR images were acquired on a 1.5 Tesla Signa unit (General Electric, Milwaukee).
Simulation of Local and Global Atrophy in Alzheimer’s Disease Studies
943
Table 1. Volume changes (% with respect to the baseline) simulated in the longitudinal example. MTL: Medial Temporal Lobe. Most significant volume changes in the AD patients in each timepoint are marked in bold. Normal aging Structures
t1 Cortical CSF 1.90 Ventricles 2.18 Grey Matter -0.65 White Matter -0.17 Hippocampi -1.40 Subtentorial 0.02 Entorhinal Area -2.91 MTL Grey Matter -0.63 Whole Brain -0.36
t2 3.76 4.31 -1.28 -0.32 -2.79 0.03 -5.80 -1.28 -0.71
t3 5.66 6.49 -1.92 -0.49 -4.21 0.05 -8.67 -1.92 -1.08
Alzheimer’s disease t1 1.98 2.23 -0.65 -0.16 -3.33 0.02 -5.11 -0.65 -0.38
t2 5.27 5.82 -1.30 -0.32 -6.39 0.04 -9.60 -4.68 -1.01
t3 16.82 19.34 -5.31 -1.33 -9.31 0.13 -13.68 -8.59 -3.20
Fig. 3. Results of the longitudinal simulation example. Coronal slices of the difference images between the normal aging and the AD simulated images at each timepoint (from left to right, t1, t2, t3).
A mesh warping [13] technique is employed to put the volumetric mesh presented in Section 2 into the anatomical space of nine of the controls, thus generating nine patient-specific meshes. Thereafter, they are introduced into the FEM solver, together with information about the desired volume change, which in this work we have obtained from volumetric difference estimates between both cohorts that have been previously computed using manual tracing techniques. The output of the FEM solver is a set of deformations that are applied to the control images to generate a group of atrophy simulated images equivalent to the cohort of mild AD subjects in terms of regional atrophy. The three sets of real and simulated data are then presented to the neurologists, who were blinded to scan information and were asked to classify each scan on a five point scale, with one being definite control, and five being definite AD. The mean sensitivity and specificity for visual assessment of true AD and control scans was 90% and 77% respectively. The inter-rater agreement was moderate, with a kappa score of 0.52. The mean score for each scan category is (mean±STD): 1.95±1.42 for controls; 4.23±3.19 for real AD; 3.19±1.36 for simulated AD. The mean rating for the simulated atrophy scans was between that given for the control and the AD group. The simulated scans were not consistently classified as AD, but our results show that even true AD cases may
944
O. Camara-Rey et al.
be misidentified in one in ten cases. This may be due to the large inter-subject variation which exists when these subject groups are compared cross-sectionally. These results demonstrate that our simulation is sufficiently realistic to cause experienced neurologists to mis-classify some of our simulated images, and this supports the use of our simulated data in the validation algorithms that quantify atrophy.
7
Conclusions
We have proposed a method to generate valuable gold standard data for the testing and objective quantitative validation of atrophy measurement techniques. Our technique uses expert clinical knowledge of the rate and spatial distribution of atrophy in dementia to create a phenomenological model of the appearance of atrophy in MRI based on finite-element methods. The model includes structural remodelling concomitant with atrophy. Future work will focus on the generation and dissemination of several cohorts of data simulating the pattern of changes induced in the human brain by different types of dementia such as fronto-temporal dementia, vascular disease or multiple sclerosis. The simulated scans will be made available to the wider research community.
References 1. Ashburner, J., Csernansky, J., Davatzikos, C., Fox, N., et al.: Computer-assisted imaging to assess brain structure in healthy and diseased brains. Lancet Neurology 2 (2003) 79–88 2. Lao, Z., Shen, D., Xue, Z., Karacali, B., et al.: Morphological classification of brains via high-dimensional shape transformations and machine learning methods. Neuroimage 21 (2004) 46–57 3. Chen, K., Reiman, E., Alexander, G., Bandy, D., et al.: An automated algorithm for the computation of brain volume change from sequential MRIs using an iterative principal component analysis and its evaluation for the assessment of whole-brain atrophy rates in patients with probable Alzheimer’s disease. Neuroimage 22 (2004) 134–143 4. Xue, Z., Shen, D., Davatzikos, C.: CLASSIC: Consistent Longitudinal Alignment and Segmentation for Serial Image Computing. Neuroimage 21 (2005) 46–57 5. Schnabel, J., Tanner, C., Castellano-Smith, A., Leach, M., et al.: Validation of Non-Rigid Registration using Finite Element Methods. In: Information Processing in Medical Imaging (IPMI’01). Volume 2082. (2001) 183–189 6. Freeborough, P., Fox, N., Kitney, R.: Interactive algorithms for the segmentation and quantitation of 3-D MRI brain scans. Computer Methods and Programs in Biomedicine 53 (1997) 15–25 7. Lorensen, W., Cline, H.: Marching cube, a high resolution 3D surface reconstruction algorithm. In: International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’87). Volume 21. (1987) 163–169 8. Sermesant, M., Forest, C., Pennec, X., Delingette, H., et al.: Deformable biomechanical models: Application to 4D cardiac image analysis. Medical Image Analysis 7 (2003) 475–488
Simulation of Local and Global Atrophy in Alzheimer’s Disease Studies
945
9. McCracken, P., Manduca, A., Felmlee, J., Ehman, R.: Mechanical Transient-Based Magnetic Resonance Elastography. Magnetic Resonance in Medicine 53 (2005) 628–639 10. Davies, A.J.: The Finite Element Method: A First Approach. Oxford University Press (1980) 11. Jack, Jr., C., Shiung, M., Gunter, J., O’Brien, P., et al.: Comparison of different MRI brain atrophy rate measures with clinical disease progression in AD. Neurology 62 (2004) 591–600 12. Schott, J., Price, S., Frost, C., Whitwell, J., et al.: Measuring atrophy in Alzheimer disease: A serial MRI study over 6 and 12 months. Neurology 52 (1999) 1687–1689 13. Camara, O., Crum, W., Schnabel, J., Lewis, E., Schweiger, M., Hill, D., Fox, N.: Assessing the quality of Mesh-Warping in normal and abnormal neuroanatomy. In: Medical Image Understanding and Analysis (MIUA’05). (2005) In Press.
Brain Surface Conformal Parameterization with Algebraic Functions Yalin Wang1,2 , Xianfeng Gu3 , Tony F. Chan1 , Paul M. Thompson2 , and Shing-Tung Yau4 1
2
Mathematics Department, UCLA, Los Angeles, CA 90095, USA Lab. of Neuro Imaging, UCLA School of Medicine, Los Angeles, CA 90095, USA 3 Comp. Sci. Department, SUNY at Stony Brook, Stony Brook, NY 11794, USA 4 Department of Mathematics, Harvard University, Cambridge, MA 02138, USA
[email protected]
Abstract. In medical imaging, parameterized 3D surface models are of great interest for anatomical modeling and visualization, statistical comparisons of anatomy, and surface-based registration and signal processing. Here we introduce a parameterization method based on algebraic functions. By solving the Yamabe equation with the Ricci flow method, we can conformally map a brain surface to a multi-hole disk. The resulting parameterizations do not have any singularities and are intrinsic and stable. To illustrate the technique, we computed parameterizations of several types of anatomical surfaces in MRI scans of the brain, including the hippocampi and the cerebral cortices with various landmark curves labeled. For the cerebral cortical surfaces, we show the parameterization results are consistent with selected landmark curves and can be matched to each other using constrained harmonic maps. Unlike previous planar conformal parameterization methods, our algorithm does not introduce any singularity points. It also offers a method to explicitly match landmark curves between anatomical surfaces such as the cortex, and to compute conformal invariants for statistical comparisons of anatomy.
1
Introduction
Surface-based modeling is valuable in brain imaging to help analyze anatomical shape, to statistically combine or compare 3D anatomical models across subjects, and to map and compare functional imaging parameters localized on anatomical surfaces. Parameterization of these surface models involves computing a smooth (differentiable) one-to-one mapping of regular 2D coordinate grids onto the 3D surfaces, so that numerical quantities can be computed easily from the resulting models [1]. The mesh-based work contrasts with implicit methods, which typically define a surface as the level set of a higher-dimensional function [2]. Relative to level set methods, surface meshes can allow regular 2D grids to be imposed on complex structures, transforming a difficult 3D problem into a 2D planar problem, with simpler data structures, discretization schemes, and rapid data access and navigation. Here we present a new method to parameterize brain R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 946–954, 2006. c Springer-Verlag Berlin Heidelberg 2006
Brain Surface Conformal Parameterization with Algebraic Functions
947
surfaces based on algebraic functions. We find a planar conformal parameterization without any singularities by solving the Yamabe equation with the Ricci flow method. This method can compute conformal invariants of brain surfaces which can be used to compare and classify brain surface structures. Compared with previous brain conformal parametrization work [3,4], the parameterization provided by our algorithm does not have any zero points so there is less area distortion. By solving a harmonic map in the parameter domain, our algorithm provides smooth correspondence fields for matching of different brain surfaces while explicitly matching labeled sets of landmark curves. 1.1
Previous Work
Brain surface parameterization has been studied intensively. Schwartz et al. [5], and Timsari and Leahy [6] computed quasi-isometric flat maps of the cerebral cortex. Drury et al. [7] presented a multiresolution method for flattening the cerebral cortex. Hurdal and Stephenson [8] report a discrete mapping approach that uses circle packings to produce “flattened” images of cortical surfaces on the sphere, the Euclidean plane, and the hyperbolic plane. The obtained maps are quasi-conformal approximations of classical conformal maps. Haker et al. [9] implement a finite element approximation for parameterizing brain surfaces via conformal mappings. They select a point on the cortex to map to the north pole of the Riemann sphere and conformally mapped the rest of the cortical surface to the complex plane by stereographic projection of the Riemann sphere to the complex plane. Gu et al. [10] propose a method to find a unique conformal mapping between any two genus zero manifolds by minimizing the harmonic energy of the map. They demonstrate this method by conformally mapping a cortical surface to a sphere. Ju et al. [11] present a least squares conformal mapping method for cortical surface flattening. Joshi et al. [12] propose a scheme to parameterize the surface of the cerebral cortex by minimizing an energy functional in the pth norm. Wang et al. [3,4] have used holomorphic 1-forms to parameterize anatomical surfaces with complex (possibly branching) topology. Recently, Ju et al. [13] reported the results of a quantitative comparison of FreeSurfer [14], CirclePack, and least squares conformal mapping (LSCM) with respect to geometric distortion and computational speed. They found that FreeSurfer performs best with respect to a global measurement of metric distortion, whereas LSCM performs best with respect to angular distortion and best in all but one case with a local measurement of metric distortion. Among the three approaches, FreeSurfer provides a more homogeneous distribution of metric distortion across the whole cortex than CirclePack and LSCM. LSCM is the most computationally efficient algorithm for generating spherical maps, while CirclePack is extremely fast for generating planar maps from patches. 1.2
Theoretical Background and Definitions
Given a multi-hole genus zero punctured surface, it is conformally equivalent to a special type of algebraic function, ω 2 = (z − z0 )(z − z1 )(z − z2 )...(z − z2g+1 ),
(1)
948
Y. Wang et al.
where z0 , z1 , ..., z2g+1 ∈ R+ are 2g + 2 real numbers [15]. On the other hand, an algebraic function is conformally equivalent to a multi-hole punctured disk. Thus, for any genus zero surface with punctures, it can be shown that there exists a multi-hole punctured disk and a conformal map between the surface and the punctured disk. The disk is not unique, but any two such kind of punctured disks can be mapped to each other by a M¨obius transformation τ of the form τ = az+b cz+d , a, b, c, d ∈ R, z ∈ C, ad − bc = 1. The challenge of computing conformal maps between a punctured sphere to a multi-hole punctured disk lies in the fact that the range surface is underdetermined. Although it is clear that the boundaries of the domain surface will be mapped to circles on the plane, it is totally unknown what the centers and radii of these circles should be. These parameters need to be estimated during the computation process. In the following, we formulate the conformal mapping problem in rigorous terms. Suppose M is a genus zero surface, with n + 1 holes, N is a n-hole punctured disk, φ : M → N is a conformal map between them, g is the metric tensor on M , usually g is induced by the ambient Euclidean metric of R3 . From the theory of complex geometry, the problem of finding a conformal map, φ : M → N , is equivalent to finding a function u : M → R, such that g˜ = e2u g is the metric on M induced by φ, which satisfies the Yamabe equation [16]. ⎧ ˜ =0 ⎨K ˜ =0 (2) Δu − K + e2u K ⎩ kg˜ |∂M = const ˜ is the target Gaussian curvature and K is the original Gaussian curwhere K vature. Basically, the domain surface M is conformally deformed such that the Gaussian curvature of the interior points is zero everywhere and the boundaries become circles. We can solve Equation 2 with the Ricci flow [17], du(t) ˜ − K(t) =K (3) dt ˜ is the target Gaussian curvature and K is the Gaussian curvature at where K time t. We study a conformal deformation that deforms the surface accordingly ˜ − K(t)| falls below a given until the maximal Gaussian curvature error maxt |K threshold.
2
Algorithms
In section 2.1, we explain the algorithm details to conformally map an open boundary genus zero surface to a multi-hole punctured disk. In section 2.2, we explain how to match two open boundary genus zero surfaces via their conformal parameterizations using multi-hole punctured disks.
Brain Surface Conformal Parameterization with Algebraic Functions
2.1
949
Conformal Mapping to a Multi-hole Punctured Disk
The algorithm is equivalent solving Equation 2 that describes a conformal deformation. We use the Ricci flow method [17] to solve this equation. We adopted the same way to compute Gaussian Curvature in [18]. Algorithm 1. Conformal Mapping to a Multi-hole Punctured Disk Input: mesh M , step length , energy difference threshold δK; Output: h : M → D. Here D ∈ R2 , and D is a multi-hole disk. 1. Computing initial radii γi for each vertex, and angle φij for each edge eij , such that lij = γi2 + γj2 − 2γi γj cosφij . 2. Compute boundary loops, denoted as Γ0 , Γ1 , ..., Γn . The Γ0 is the exterior boundary. ˜ i = 0. 3. Set target Gaussian curvature of each interior vertex to be zero, K ˜ k = 2π , 4. For any vertex on vk ∈ Γ0 , set its target Gaussian curvature to K |Γi | where |Γ0 | denotes the number of vertices in Γ0 . 5. For any vertex on vk ∈ Γi , i = 0, set its target Gaussian curvature to ˜ k = − 2π , where |Γi | denotes the number of vertices in Γi . K |Γi | 6. Update the vertex radii with the Ricci flow, ˜ i (t) − Ki (t)) × γi (t). γi (t + 1) = γi (t) + × (K
7. Update the target Gaussian curvature for boundary vertices, suppose vk ∈ Γi , suppose ek−1,k , ek,k+1 ∈ Γi , then let Si = Σepq ∈Γi lpq , then ˜ k = lk−1,k + lk,k+1 × Ci , K 2Si
2π, i = 0 . −2π, i =0 8. Repeat step 6 and 7 until the maximal Gaussian curvature error, maxi |Ki − ˜ i |, is less than δK. K where Ci is Ci =
2.2
Surface Matching with Punctured Disk Parameterization
After the computation of conformal parameterizations for open boundary genus zero surfaces with a multiple-hole punctured disk, we can compute the direct correspondence of two surfaces by solving a constrained harmonic mapping problem [4]. Given two surfaces S1 and S2 , their punctured disk parameterizations are τ1 : S1 → R2 and τ2 : S2 → R2 , we want to compute a map, φ : S1 → S2 . Instead of directly computing of φ, we can easily find a harmonic map between the parameter domains. We look for a harmonic map, τ : R2 → R2 , such that τ ◦ τ1 (S1 ) = τ2 (S2 ), τ ◦ τ1 (∂S1 ) = τ2 (∂S2 ), Δτ = 0. Then the map φ can be obtained by φ = τ1 ◦ τ ◦ τ2−1 . Since τ is a harmonic map while τ1 and τ2 are conformal map, the resulting φ is a harmonic map.
950
3
Y. Wang et al.
Experimental Results
We applied our algorithms to parameterize various anatomic surfaces extracted from 3D MRI scans of the brain. We tested our algorithm on a left hippocampal surface, a structure in the medial temporal lobe of the brain. The original surface is shown in Figure 1(a). We leave two holes on the front and back of the hippocampal surface, representing its anterior junction with the amygdala, and its posterior limit as it turns into the white matter of the fornix. It can be logically represented as an open boundary genus one surface, i.e., a cylinder (note that spherical harmonic representations would also be possible, if ends were closed [19]). Its conformal map to a 1-hole disk is illustrated in Figure 1(d). For the two boundaries of the hippocampal surface, one boundary is mapped to the exterior circle and the other is mapped to the internal circle.
Fig. 1. Illustrates conformal maps from hippocampal and cortical surfaces to multi-hole disks. (a) shows the front view of a hippocampal surface and (d) shows its conformal map to a 1-hole disk. (b) and (c) are two cerebral cortical surfaces. Two central sulci are labeled as yellow curves on each of them. After cutting along the landmark curves, each of these two surfaces can be conformally mapped to a 1-hole disk ((e) and (f)). The radii of the inner circles are conformal invariants of two surfaces and can be used as shape index to compare and classify brain surfaces. (g)-(j) show a conformal map from a left hemisphere cortex with 5 labeled landmarks to a 4-hole disk. (g) and (h) show the front and back side of the surface. (j) shows its conformal map on a 4-hole disk.
Brain Surface Conformal Parameterization with Algebraic Functions
951
We also applied our algorithm to parameterize the surface of the cerebral cortex. The cerebral cortex and landmark data are the same ones used in [20]. We tested our algorithm with different landmark sets. Figure 1(b) and (c) show two cortical surfaces with two central sulci on two hemispheres. After we cut the cortical surface open along the two central sulci, the cortical surface is topologically equivalent to an open boundary genus one surface. Figure 1(e) and (f) show their conformal parameterizations in a 1-hole disk. One of the two landmark curves is mapped to the exterior circular boundary and the other is mapped to the inner circular boundary. The circle center positions and radii are two conformal invariants of the cortical surfaces studied. For example, the inner circle centers of (e) and (f) are in the origins but the radius of (e) is 0.48 while the inner circle radius of (f) is 0.53. These parameters are surface conformal invariants and may be used as a shape index to classify and compare different cortical surfaces. Figure 1(g) and (h) show a left hemisphere cortex with 5 landmarks labeled (thick lines show the precentral sulcus, the superior temporal sulcus, the intraparietal sulcus, the anterior segment of cingulate sulcus and the corpus callosum boundary at the midsagittal plane). After cutting along these landmark curves, the cortical surface becomes an open boundary genus-4 surface. Our algorithm conformally maps the surface to a 4-hole disk (Figure 1(j)). The boundary circle of the corpus callosum is mapped to the exterior circular disk boundary and other four landmark curves are mapped to the disk inner circle boundaries. Figure 2 illustrate how our algorithm is used to match two left hemisphere cortical surfaces. As shown in Figure 2(a), (b), (d) and (e), we selected four major landmark curves on two different cortices, to illustrate the approach (thick lines show the precentral and postcentral sulci, the superior temporal sulcus, and the corpus callosum boundary at the midsagittal plane). By cutting the surface along these landmark curves, we obtain two genus-3 open-boundary surfaces. Figure 2(c) and (f) show their conformal map to a 3-hole disk. Because of the shape difference between two cortices, the centers and the radii of inner circles are different. By computing a constrained harmonic map from (f) to (c), we have a new parameterization (Figure 2(g)) of the cortex in the second row ((d) and (e)). The inner circle centers and radii of the new parameterization are identical to the parameterization in (c). With the new 3-hole disk as the canonical space, we can easily compute a direct surface correspondence between two surfaces (a) and (d). Because the inner circles and exterior circle are identical for two parameterizations, the landmark curves lying in the surface are exactly matched to each other. Figure 2 (h)-(m) illustrate the direct surface correspondence by morphing between these two cortical surfaces. Figure 2(h) and (m) and surfaces (a) and (d) respectively, viewed from a different viewpoint. (j), (k) and (l) are the intermediate shapes when linearly interpolating the surface correspondence vector field between (h) and (m). We can see that although surface shape changes substantially from (h) to (m), the relative locations of the three landmarks remain the same. This verifies that our algorithm provides a method to perform surface matching, while explicitly matching sulcal curves or other landmarks lying in the surface. Note although some previous approaches [3,4] had the same motivation
952
Y. Wang et al.
Fig. 2. Illustrates direct surface matching between two different cerebral cortical surfaces while explicitly matching landmark curves. (a)-(b) show a left cerebral cortex with four labeled landmarks and (c) shows its conformal map to a 3-hole disk. (d)-(e) show another left hemisphere model of the cerebral cortex with the same landmarks labeled and (f) shows its conformal map to a 3-hole disk. (g) is the parameterization of surface (d)-(e) after a constrained harmonic map from (f) to (c) is built. (h)-(m) show a morphing sequence from surface (a)-(b) to surface (d)-(e). (j)-(l) are the intermediate shapes when we linearly interpolate surface correspondence vector field between two surfaces (h) and (m). Although the cortical surface shape changes considerably, the relative positions of the selected landmark curves do not change.
Brain Surface Conformal Parameterization with Algebraic Functions
953
as ours, because their work introduced singularities at the so-called zero points, their surface matching results have some errors and inevitable distortions in the areas around the zero points. Our approach provides an improved method for global surface matching with exact landmark matching capability.
4
Conclusion and Future Work
In this paper, we present a brain surface conformal parameterization method based on algebraic functions. With the Ricci flow method, we solve the Yamabe equation to obtain a conformal deformation that conformally maps open boundary surfaces to multi-hole disk. We tested our algorithm on the hippocampus and surface models of the cerebral cortex. Used as a canonical space, the multi-hole disk conformal parameterization provides a brain surface matching approach that can exactly match landmark curves lying on the surfaces. Compared with other work which conformally maps brain surfaces to parallelograms, our algorithm offers some advantages because it does not introduce any singular points. Our future work will include algebraic function computation based on the multi-hole disk parameterization and empirical application of these Yamabe flow concepts to medical applications in computational anatomy.
References 1. Thompson, P.M., Giedd, J.N., Woods, R.P., MacDonald, D., Evans, A.C., Toga, A.W. Nature 404(6774) (2000) 190–193 2. Osher, S., Sethian, J.A. J. Comput. Phys. 79 (1988) 12–49 3. Wang, Y., Gu, X., Hayashi, K.M., Chan, T.F., Thompson, P.M., Yau, S.T. In: MICCAI. Volume II., Palm Springs, CA, USA (2005) 26–29 4. Wang, Y., Gu, X., Hayashi, K.M., Chan, T.F., Thompson, P.M., Yau, S.T. In: 10th IEEE ICCV. (2005) 1061–1066 5. Schwartz, E., Shaw, A., Wolfson, E. IEEE TPAMI 11(9) (1989) 1005–1008 6. Timsari, B., Leahy, R.M. In: Proceedings of SPIE, Medical Imaging, San Diego, CA (2000) 7. Drury, H.A., Van Essen, D.C., Anderson, C.H., Lee, C.W., Coogan, T.A., Lewis, J.W. J. Cognitive Neurosciences 8 (1996) 1–28 8. Hurdal, M.K., Stephenson, K. NeuroImage 23 (2004) S119–S128 9. Angenent, S., Haker, S., Tannenbaum, A., Kikinis, R. Med. Image Comput. Comput.-Assist. Intervention (1999) 271–278 10. Gu, X., Wang, Y., Chan, T.F., Thompson, P.M., Yau, S.T. IEEE TMI 23(8) (2004) 949–958 11. Ju, L., Stern, J., Rehm, K., Schaper, K., Hurdal, M.K., Rottenberg, D. In: IEEE ISBI, Arlington, VA, USA (2004) 77–80 12. Joshi, A.A., Leahy, R.M., Thompson, P.M., Shattuck, D.W. In: IEEE ISBI, Arlington, VA, USA (2004) 428–431 13. Ju, L., Hurdal, M.K., Stern, J., Rehm, K., Schaper, K., Rottenberg, D. NeuroImage 28(4) (2005) 869–880 14. Fischl, B., Sereno, M.I., Dale, A.M. NeuroImage 9 (1999) 179–194 15. Springer, G.: Introduction to Riemann Surfaces. AMS Chelsea (2000)
954
Y. Wang et al.
16. Schoen, R., Yau, S.T.: Lectures on Differential Geometry. International Press (1994) 17. Hamilton, R.S.: The Ricci flow on surfaces. Contemp. Math. 71 (1988) 237–262 18. Wang, Y., Chiang, M.C., Thompson, P.M. In: Med. Image Comp. Comput.-Assist. Intervention, Proceedings, Part II. (2005) 666–674 19. Kelemen, A., Sz´ekely, G., Gerig, G. IEEE TMI 18(10) (1999) 828–839 20. Thompson, P.M., Lee, A., Dutton, R., Geaga, J., Hayashi, K., Eckert, M., Bellugi, U., Galaburda, A., Korenberg, J., Mills, D., Toga, A., Reiss, A. J. Neuroscience 25(16) (2005) 4146–4158
Logarithm Odds Maps for Shape Representation Kilian M. Pohl1,2 , John Fisher2 , Martha Shenton1, Robert W. McCarley1, W. Eric L. Grimson2 , Ron Kikinis1 , and William M. Wells1,2 1
Surgical Planning Laboratory, Harvard Medical School and Brigham and Women’s Hospital, Boston, MA USA {shenton, mccarley, kikinis, sw}@bwh.harvard.edu 2 Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge MA, USA {pohl, fisher, welg}@csail.mit.edu
Abstract. The concept of the Logarithm of the Odds (LogOdds) is frequently used in areas such as artificial neural networks, economics, and biology. Here, we utilize LogOdds for a shape representation that demonstrates desirable properties for medical imaging. For example, the representation encodes the shape of an anatomical structure as well as the variations within that structure. These variations are embedded in a vector space that relates to a probabilistic model. We apply our representation to a voxel based segmentation algorithm. We do so by embedding the manifold of Signed Distance Maps (SDM) into the linear space of LogOdds. The LogOdds variant is superior to the SDM model in an experiment segmenting 20 subjects into subcortical structures. We also use LogOdds in the non-convex interpolation between space conditioned distributions. We apply this model to a longitudinal schizophrenia study using quadratic splines. The resulting time-continuous simulation of the schizophrenic aging process has a higher accuracy then a model based on convex interpolation.
1 Introduction Statistical shape representation in medical imaging applications is an important and challenging problem. The challenge arises from a number of issues including the need for a representation which lends itself to efficient statistical inference while simultaneously capturing the intrinsic properties of shapes. Signed distance maps (SDMs) are a popular shape representation that have recently been utilized in combination with principal components analysis (PCA) for statistical shape modeling. Examples of such approaches include [1,2,3,4,5] that, with some variation, essentially treat SDMs as elements of the vector space RN (N being the number of voxels), perform PCA over a set of example SDMs and then fit a statistical model to the resulting PCA coefficients. The resulting model is then useful for inference in a variety of tasks including segmentation and longitudinal studies of shape variation. The primary advantage of the approach is that it projects high-dimensional SDMs into a lower dimensional representation which lends itself to concise statistical modeling and inference. The utility of such approximate modeling has been demonstrated successfully in varying degrees. One drawback of this approach as noted by [5], however, is the implicit approximation of the manifold R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 955–963, 2006. c Springer-Verlag Berlin Heidelberg 2006
956
K.M. Pohl et al.
of SDMs as a linear vector space. Consequently, samples or compositions of samples from the inferred statistical models are in general not valid SDMs. This detail is usually dealt with by further approximation, necessitating an additional computational step, that projects samples from the distribution over PCA coefficients back onto the manifold of shapes. Additionally, like many shape models [6], SDMs do not capture space conditioned probabilities (SCP). SCPs are often used in voxel-based segmentation (such as [7,8,9,10]) to describe the anatomical variations within a population. Here we suggest an alternative representation, the log-odds maps (LogOdds), that explicitly addresses the issues above. LogOdds maps form a vector space and as such can be composed via linear methods while simultaneously maintaining the intrinsic properties of shapes. This allows for computationally efficient approaches to modeling and inference without the need for projecting the representation back onto the manifold of shapes. There is a straightforward mapping of the LogOdds representation to SCPs as each entry in the LogOdds representation is the logarithm of the odds over an anatomical label versus the alternative. The specific parameterization of LogOdds maps presented here provides a notion of certainty about the boundary of the inferred shape. Thus, LogOdds not only describe the shape of a single structure but also capture some aspects of variations within a structure across populations and expert segmentations. SDMs can be used directly as LogOdds maps, consequently no additional computational burden is incurred by their use. For example, entries with zero correspond to the boundary of the structure under either interpretation. The computational burden of mathematical operations over SCPs is reduced by performing them in the vector space of LogOdds. Arsigny et al. [11] have made the same observation with respect to using the logarithm on tensor data, although the logarithms of tensor data are not comparable to LogOdds. In Section 2, we review and establish some of the relevant properties of LogOdds maps. We also provide a probabilistic interpretation for mathematical operations of the vector space. In Section 3, we demonstrate the utility of LogOdds in two experiments. In the first experiment, a Bayesian classifier [12] segments 20 sets of MR images into subcortical structures. Each subject is automatically segmented using the deformable atlas based on both SDMs and the LogOdds representation, and the results are compared. The second experiment is based on a longitudinal schizophrenia study in which an atlas represents the aging process of that population. The atlas is comprised of three SCPs representing different time points in the development of the population. We generate a non-convex interpolation between the three time points within the space of LogOdds with superior results to a convex interpolation within the original space of SCPs.
2 LogOdds and Its Properties In this section we review some properties of LogOdds representations. In so doing, we focus on the use of the LogOdds for statistical inference over shapes. Specifically, we establish mappings between various representations and the means by which one performs probabilistic computations on LogOdds representations.
Logarithm Odds Maps for Shape Representation
(b) SDM D
(a) Binary Map
957
(c) SCP of D
Fig. 1. The binary map of the circle in (a) corresponds to the SDM D in (b). In D , positive values (bright) define the circle, while negative values (dark) represent the background. Panel (c) shows the corresponding SCP. Voxels inside the circle have probabilities greater 0.5.
2.1 An Introduction to LogOdds The LogOdds of a probability p ∈ P {p|p ∈ [0, 1], i.e. p is a probability} is the logarithm of the odds between the probability p and its complement p¯ = 1 − p: p logit(p) log = log p − log(1 − p). 1− p In the classification problem of medical images, for example, p is the probability that a voxel is assigned to a particular anatomical structure, while the complement p¯ is the probability of assignment to any other structure. The function logit(p) is simply the logarithm of their ratio. The LogOdds space is defined as L {logit(p)|p ∈ P}. Consequently, L = R and Ln defines a vector space through its equivalence to Rn . The inverse of the log odds function logit(·) is the standard logistic function
P (t)
1 . 1 + e−t
P (·) maps each element t ∈ L to a unique probability p ∈ P, thus, the function logit(·)
and its inverse comprise a homomorphism between P and L. Again, the equivalence of Ln to Pn is straightforward when P (·) is applied element-wise to Ln . SDMs are elements of Ln . The value of the SDM [D ]x at a voxel x is the distance from x to the closest point on the boundary. Figure 1(b) shows an example of a SDM corresponding to the binary map in Figure 1(a). It can be shown that any monotonic transformation V (·) of D is in Ln . An obvious choice is the identity transformation [V (D )]x [D ]x . In the LogOdds space, [D ]x is a representation for the probability px P ([D ]x ) ∈ P that voxel x is assigned to the foreground. The corresponding vector p (p1 , . . . , pn ) therefore comprises a SCP of the foreground, where n is the number of voxels in the image. The relationship between SDMs and SCPs define a meaningful probabilistic model. The foreground is represented by the positive values of SDM corresponding to probabilities greater 0.5 in the SCP (see Figure 1 (c)). Conversely, negative values of the SDM define the background, which maps to probabilities less than 0.5.
958
K.M. Pohl et al.
2.2 The Relationship Between LogOdds and SCPs We now make use of the Euclidean space Ln to induce a vector space on Pn . This results in a probabilistic model for addition and scalar multiplication in the log odds space Ln . The derivations are performed with respect to L, P for simplicity. It is easily extend to Ln , Pn by performing the operations individually for each vector entry. Addition in P : Based on the relationship between P and L, the probabilistic addition ⊕ of p1 , p2 ∈ P is equivalent to the logistic function of the addition of the logarithm of odds logit(p1 ) and logit(p2 ): p1 ⊕ p2 P (logit(p1 ) + logit(p2)) =
p1 · p2 p1 · p2 + (1 − p1)(1 − p2)
(1)
As P is closed under ⊕, (P, ⊕) forms an Abelian group with 0.5 as the neutral element and (1 − p) the additive inverse of the element p. The complement of p1 ⊕ p2 is the probabilistic addition of the complements of p1 and p2 , that is 1 − p1 ⊕ p2 =
(1 − p1) · (1 − p2) = (1 − p1) ⊕ (1 − p2) p1 · p2 + (1 − p1)(1 − p2)
Using suitably normalized likelihoods, the composition is equivalent to Bayes’ rule. ¯ and the corresponding Let the prior of the random variable X be p1 P(X ) = 1 − P(X) ¯ P(A|X) P(A|X) normalized likelihood be p2 P(A|X)+P(A|X) ¯ = 1 − P(A|X)+P(A|X) ¯ , then p1 ⊕ p 2 =
P(A|X) P (X ) P(A|X)+P(A| ¯ X)
P (X)
P(A|X) ¯ P(A|X)+P(A|X)
¯ P(A|X) ¯ + P(X) ¯ P(A|X)+P(A|X)
=
P(A|X)P (X) = P(X|A) P (A)
Scalar Multiplication in P : The probabilistic scalar multiplication within the space of P is equivalent to scalar multiplication, ∗, in the vector space L. Similar to ⊕, the multiplication between the scalar α ∈ R and probability p ∈ P is the logistic function of the product between α and the LogOdds logit(p) α p P (α ∗ logit(p)) =
1 1+e
p −α·log( 1−p )
=
1+
1 1−p p
α =
pα
pα + (1 − p)α
(P, ⊕, ) defines a vector space, which is equivalent to (L, +, ∗). The complement of the probabilistic scalar multiplication α p is equivalent to the scalar multiplication of the complement or negating α: 1−α p =1−
pα (1 − p)α 1 = −α =−α p α α =α(1− p) = α α p + (1 − p) p + (1 − p) 1 + 1−p p
Figure 2(a) shows the influence of α on α p. Changing α has the effect of increasing (α > 1) or decreasing (α < 1) the certainty of the boundary of a SCP. Probabilistic
Logarithm Odds Maps for Shape Representation
α p
αl SCPl
⊕
αr SCPr
=
αl < αr
⊕
=
αl > αr
⊕
=
p (a)
959
SCPres
(b)
Fig. 2. (a) the effect of α on α p. (b) Resulting SCP SCPres from combining 2 SCPs of circles (SCPl and SCPr ) with different centers, radii, and α values.
addition and scalar multiplication are illustrated in Figure 2 (b) using a synthetic shape description example. The figure depicts level sets of SCPl , SCPr ∈ Pn that describe circles which differ in their means, radii, and boundary certainty (as captured by αl , αr ). Figure 2 (b) shows how α effect the composition of the maps via the ⊕ operator. In the first row of Figure 2(b) αl = 20 (left circle) and αr = 30 (right circle). The distance between the level sets are seen to be closer for the right circle. Additionally, the composition of SCPs using ⊕ results in a combined SCP, which is biased towards the circle on the right. In the second row of Figure 2(b) the values of αl and αr are reversed resulting in a bias towards the circle on the left. Despite having different radii, if αl = αr the level sets of the combined SCP are ellipses, although the foci of the ellipse will depend on the relative radii. In summary, we discussed basic properties of the vector space of log-odds Ln relating them to the vector space of probabilities Pn via the logistic function. This provided a probabilistic interpretation of scalar multiplication and vector addition in Ln . We used LogOdds to describe binary maps of shapes. We note that the representation can be extended to multiple label maps.
3 Applications in Medical Imaging We now apply LogOdds to two relevant applications in medical imaging. The first experiment measures the robustness of a shape based segmentation algorithm. In the second experiment, we build a temporal interpolation function from data of a longitudinal study. Both examples are based on MR images acquired by a 1.5-T MRI system (GE Medical System, three-dimensional Fourier transformation spoiled gradient-recalled acquisition, matrix=256×256×124, voxel dimension=0.9375×0.9375×1.5mm). 3.1 The Accuracy of a Segmenter with Respect to the Deformable Shape Atlas We investigate the impact of our shape representation on the accuracy of a voxel-based segmenter [12]. The algorithm is guided by a deformable shape atlas that represents shape variation by PCA on SDMs. We analyze two implementations of the algorithm, which differ in the interpretation of the maps D generated by the PCA atlas.
960
K.M. Pohl et al.
Front
Side
Test Results
Fig. 3. Different views of a 3D model of the thalamus (gray), the caudate (black), and the ventricles (light gray). The model is based on a segmentation generated by IMPL-P . The graph to the right summarizes the results of our experiment. For both structures IMPL-P performs much better then IMPL-H .
The first implementation (IMPL-H ) views D as level set functions in a higher dimensional space. Each entry [D ]a,x of the SDM assigns the voxel x to the inside or outside of a structure a using the Heaviside function H (·). H (v) is one if v ≥ 0 and H ([D ] ) zero otherwise. A natural definition of the SCP is [SCPH ]a,x ∑ H ([Da,x] ) , where a
a ,x
∑a H ([D ]a ,x )) is the sum over all structures. In general, [SCPH ]a corresponds to a SCP with a steep slope along the structure’s boundary. Thus, the algorithm segments the structure according to [SCPH ]a , which only allows shapes within the PCA subspace. The second implementation (IMPL-P ) interprets the maps D as a vector of LogOdds Ln . According to Section 2, the logistic function P ([D ]a,x ) defines the probability that voxel x is assigned to structure a. This approach is extended to multiple structures by P ([D ] ) defining the SCP as [SCPP ]a,x ∑ P ([Da,x] ) . [SCPP ]a is characterized by a gradual slope a
a ,x
along the boundary of the structure. This provides the algorithm with some flexibility in determining the course of the contour – it allows the image evidence to nudge the resulting shape somewhat away from the PCA subspace. Both variations were used to segment 20 test cases into ventricles (light gray), the thalamus (gray), and caudate (black) (see Figure 3). Afterward, we compare the automatic segmentations of the thalamus and caudate to the manual segmentations via the Dice metric [13]. We focus on these two structures as they are characterized by different types of shapes (see Figure 3) and are partly invisible on MR images. Thus, the accuracy of the implementations is highly impacted by the shape model. The graph in Figure 3 shows the result of the experiment for IMPL-H (thalamus: 85.3 ±1.2% , caudate: 74.3 ±1.6%; mean DICE score ± standard error) and IMPL-P (thalamus: 88.4 ±1.0%, caudate: 84.9 ±0.8%). For both structures, IMPL-P achieves a much higher average score than IMPL-H . In addition, the segmentation results of IMPL-P are characterized by low standard error (in comparison to IMPL-H ) indicating high predictive power of the segmentation results. The differing accuracy of the implementations is mostly due to the different interpretations of the information in the shape atlas. The atlas underrepresents the shape variations within a population due to the limited amount of training data. The probabilistic model of IMPL-P addresses this issue by providing flexibility around the boundary, while IMPL-H in constrained to the limited-dimension PCA subspace.
Logarithm Odds Maps for Shape Representation
961
This shortcoming of the atlas only slightly impacts the segmentation of the thalamus due its low shape variations within the healthy population. The bending of the horn shaped caudate, however, can vary substantially among subjects. This may explain the much lower score of IMPL-H in comparison to IMPL-P in case of this structure. We note that the input for the PCA was restricted to SDMs, a particular choice LogOdds function. The impact of different LogOdds classes on the segmentations would be interesting but outside the scope of this paper. 3.2 Defining an Interpolation Function from Time-Related Data Samples Neuroscientists carry out longitudinal studies to better understand the aging process of a population. These studies are often defined by a set of subjects which have been scanned repeatedly at specific time points. The variations of the population at a specific point in time can be captured via SCPs. In this section, we explore the interpolation between the SCPs within the space of LogOdds, which results in continuous time atlases. This experiment is based on a longitudinal data set consisting of eight Schizophrenic patients. Each patient was scanned three times with an average 14 month separation between first and second scan, and 23 month separation between the second and third scan. We generate the SCPs of each time point by first aligning the 24 MR images towards their central tendency using the population registration method by Z¨ollei et al. [14]. Afterward, we segment each aligned MR image into three brain tissue classes using Pohl et al. [15]. We then compute the SCP of an anatomical structure and time point based on the eight corresponding segmentations. We can interpolate among atlases in either SCP or LogOdds space, however, with SCPs, to remain valid, we are limited to using convex combinations (CC), while we may use the broader class of linear combinations in the LogOdds vector space. In the following we will compare two temporal interpolation methods that are suited to the analysis of a small number of time points. One uses linear CCs of SCP, while the other uses a quadratic spline interpolation (that is not a CC) of LogOdds. It maps the SCPs to LogOdds’ space via the logistic function, applies the quadratic spline function and transfers the interpolated function back into the space of SCPs. The first row of Figure 4 shows sample slices of the quadratic spline interpolation of the SCPs of gray matter. Bright indicates high and dark low probability of the gray matter. An area of particular interest is the region around the thalamus (in the center of the image) as the corresponding SCP changes substantially throughout the scans. The second row of Figure 4 shows a magnified version of the SCP of the thalamus region (a), and the corresponding CC (b)+(d) and quadratic spline interpolations (c)+(d). In the graphs, the z-axis symbolizes the probability, the x-axis is the time axis, and the y-axis represents the row of voxels highlighted by the black line in Figure 4 (a). The graph of Figure 4 (b) is the CC of the SCPs from the three time points. The smoother interpolation in Figure 4 (c) is generated from quadratic splines within the LogOdds space. Unlike (b), this interpolation is differentiable over time. In order to compare the accuracy of the interpolations, we repeat the experiment but now compute both functions just on the basis of the first and third scan. Samples of the results are shown the graphs of Figure 4(d)+(e). We then calculate the sum of squared errors of the two interpolations by comparing them to the SCP of the second
962
K.M. Pohl et al.
(a)
(b)
(c)
(d)
(e)
Fig. 4. The first row shows a sample slice of the quadratic spline interpolation of a longitudinal schizophrenia study. Each image represents an SCP of the gray matter at a specific point in time of the study. Bright indicates high and dark low probability of the gray matter. The second row shows the SCP of the thalamus (a) with black indicating the voxels that are interpolated over three time point in (b)+(c) and two time points in (d) +(e). Graph (b)+(d) were produce by convex interpolation, while the smoother quadratic spline interpolation is representation (c)+(e).
scan. The quadratic spline interpolation achieves a 2% lower error for the gray matter and 5% lower error for the white matter in comparison to the CC. The quadratic spline interpolation may achieve this higher accuracy because it is smoother. In this section, we applied our representation to the automatic segmentation of MR images as well as the interpolation of longitudinal data sets. We note that LogOdds maps can be used for a variety of other problems in medical imaging. For example, we have found the representation very helpful for capturing the variations between expert segmentations, though we do not discuss this topic here due to space limitations.
4 Conclusion We presented a new shape representation called LogOdds. The representation can encode a single shape as well as its variations. A generalization of distance maps, this representation defines a vector space with a probabilistic interpretation. We evaluate LogOdds within a voxel based segmentation algorithm. 20 subjects are automatically segmented into subcortical structures using the LogOdds and signed distance map model. The algorithm performs best with our new shape representation as LogOdds better capture the boundary constraints of anatomical structures. In a second experiment, we use quadratic splines to interpolate SCPs from a longitudinal schizophrenia study. The splines are computed within the LogOdds space to simplify the calculations. We show that this models achieves a higher accuracy than the convex combination of SCPs in the original space. Acknowledgments. This study was supported by the Department of Veterans Affairs Merit Awards, and grants from the NIH (R01 MH 40799, K05 MH 70047, R01 MH
Logarithm Odds Maps for Shape Representation
963
50747, NIBIB NAMIC U54 EB005149, NCRR mBIRN U24-RR021382, NCRR NAC P41-RR13218, NINDS R01-NS051826) and NSF (JHU ERC CISST). We thank Mark Dreusicke from the Psychiatry Neuroimaging Laboratory, Harvard Medical School, for organizing the longitudinal study data and Torsten Rohlfing for his helpful comments.
References 1. M. Leventon, W. Grimson, and O. Faugeras, “Statistical shape influence in geodesic active contours,” in CVPR, pp. 1316 – 1323, 2000. 2. A. Tsai, A. Yezzi, W. Wells, C. Tempany, D. Tucker, A. Fan, W. Grimson, and A. Willsky, “A shape-based approach to the segmentation of medical imagery using level sets,” TMI, vol. 22, no. 2, pp. 137 – 154, 2003. 3. J. Yang, L. H. Staib, and J. S. Duncan, “Neighbor-constrained segmentation with level set based 3D deformable models,” TMI, vol. 23, no. 8, pp. 940–948, 2004. 4. K. Pohl, J. Fisher, J. Levitt, M. Shenton, R. Kikinis, W. Grimson, and W. Wells, “A unifying approach to registration, segmentation, and intensity correction,” in MICCAI, LNCS, pp. 310–318, 2005. 5. P. Golland, W. Grimson, M. Shenton, and R. Kikinis, “Detection and analysis of statistical differences in anatomical shape,” MIA, vol. 9, pp. 69–86, 2005. 6. S. Bouix, Medial Surfaces. PhD thesis, McGill University, 2003. 7. D. Collins, A. Zijdenbos, W. Barre, and A. Evans, “ANIMAL+INSECT: Improved cortical structure segmentation,” IPMI, vol. 1613, 1999. 8. K. Van Leemput, F. Maes, D. Vandermeulen, and P. Suetens, “Automated model-based tissue classification of MR images of the brain,” TMI, vol. 18, no. 10, pp. 897 – 908, 1999. 9. B. Fischl, A. van der Kouwe, C. Destrieux, E. Halgren, F. Sgonne, D. Salat, E. Busa, L. Seidman, J. Goldstein, D. Kennedy, V. Caviness, N. Makris, B. Rosen, and A. Dale, “Automatically parcellating the human cerebral cortex,” Cerebral Cortex, vol. 14, pp. 11–22, 2004. 10. K. M. Pohl, J. Fisher, W. Grimson, R. Kikinis, and W. Wells, “A bayesian model for joint segmentation and registration,” NeuroImage, vol. 31, no. 1, pp. 228–239, 2006. 11. V. Arsigny, P. Fillard, X. Pennec, and N. Ayache, “Geometric means in a novel vector space structure on symmetric positive-definite matrices,” SIAM JMAA, 2006. 12. K. Pohl, J. Fisher, R. Kikinis, W. Grimson, and W. Wells, “Shape based segmentation of anatomical structures in magnetic resonance images,” in ICCV, vol. 3765 of LNCS, pp. 489 – 498, Springer-Verlag, 2005. 13. L. Dice, “Measure of the amount of ecological association between species,” Ecology, vol. 26, no. 3, pp. 297–302, 1945. 14. L. Z¨ollei, E. Learned-Miller, W. Grimson, and W. Wells, “Efficient population registration of 3D data,” in ICCV, vol. 3765 of LNCS, pp. 291–301, 2005. 15. K. Pohl, S. Bouix, R. Kikinis, and W. Grimson, “Anatomical guided segmentation with nonstationary tissue class distributions in an expectation-maximization framework,” in ISBI, (Arlington, VA, USA), pp. 81–84, 2004.
Multi-modal Image Registration Using the Generalized Survival Exponential Entropy Shu Liao and Albert C.S. Chung Lo Kwee-Seong Medical Image Analysis Laboratory, Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong
[email protected],
[email protected]
Abstract. This paper introduces a new similarity measure for multimodal image registration task. The measure is based on the generalized survival exponential entropy (GSEE) and mutual information (GSEEMI). Since GSEE is estimated from the cumulative distribution function instead of the density function, it is observed that the interpolation artifact is reduced. The method has been tested on four real MR-CT data sets. The experimental results show that the GSEE-MI-based method is more robust than the conventional MI-based method. The accuracy is comparable for both methods.
1
Introduction
Image registration has been an important issue in medical image analysis. Given multiple registered images, the complementary and useful image information obtained from different modalities can be combined. The goal of image registration is to estimate a geometric transformation such that two medical images can be aligned accurately. Mutual information (MI) has been used widely as a similarity measure for multi-modal image registration tasks [1,2,3]. In this paper, we introduce the use of generalized survival exponential entropy (GSEE) for estimation of mutual information (GSEE-MI) in image registration, and compare the performance between MI and GSEE-MI in terms of robustness and accuracy. The GSEE-MI-based image registration method utilizes the generalized survival exponential entropy, which is estimated from the cumulative distribution function instead of the density function. In some real-world applications, the cumulative distribution function is more natural than the density function because it is defined in integral form, which can give more reliable estimation (i.e., it always exists even when the density function does not exist [4]). The GSEE-MI-based method has been tested on four MR-CT data sets obtained from the Retrospective Image Registration Evaluation project. The performance of GSEE-MI has been examined along the translation (X-, Y- and Z-axis) and rotation axes. It is observed that the values of GSEE-MI decrease smoothly from the optimal transformation. Moreover, because of the use of cumulative distribution function, the interpolation artifact is reduced in the GSEE-MI-based method. These factors can improve the robustness of an image registration R. Larsen, M. Nielsen, and J. Sporring (Eds.): MICCAI 2006, LNCS 4191, pp. 964–971, 2006. c Springer-Verlag Berlin Heidelberg 2006
Multi-modal Image Registration Using the GSEE
965
method. It is experimentally shown that the GSEE-MI-based method is more robust than the conventional MI-based method. The accuracy of both methods is comparable.
2 2.1
Methodology Joint Intensity Distribution Acquisition
Let Gf and Gr be the floating and reference images respectively, which are obtained from the same or different acquisitions. Assume that If and Ir represent their intensity values, Xf and Xr are their image domains. With a hypothesized transformation T , samples of intensity pairs I = {ir (x), if (T (x)) | ir ∈ Ir , if ∈ If } can be drawn randomly from If and Ir , where x are the pixel coordinates, x ∈ Ω and Ω ⊂ Xf ∪ Xr . That is, the sampling domain is inside Xf ∪ Xr . The joint intensity distribution is denoted as P T (Ir , If ). It depends on the specific hypothesized transformation T and changes during the registration process. There are two methods to approximate P T (Ir , If ): the Parzen windowing or histogramming [5]. This paper employs histogramming as it is more computationally efficient. The bin size of the histogram is set to 32 in this paper. The histogram partial volume (PV) interpolation method is employed [6]. 2.2
Generalized Survival Exponential Entropy: A New Measure of Information
In this section, the Survival Exponential Entropy (SEE) and the Generalized Survival Exponential Entropy (GSEE) [7] are introduced. GSEE is a new information measurement of random variables. Their definitions are given as follows: Def inition 1: For a random vector X in Rm , the survival exponential entropy of order α is [7]: 1 ⎛ ⎞ 1−α α ⎜ ⎟ Mα (X) = ⎝ F |X| (x)dx⎠ (1) Rm +
for α ≥ 0, where m defines the number of dimensions for X. Def inition 2: For a random vector X in Rm , the generalized survival exponential entropy of order (α, β) is [7]: ⎛ ⎜Rm + Sα,β (X) = ⎜ ⎝ Rm +
α
F |X| (x)dx
1 ⎞ β−α
⎟ ⎟
β ⎠ F |X| (x)dx
(2)
for α, β ≥ 0 and α = β, where X = (X1 , ..., Xm ) is a random vector in Rm . | X | denotes the random vector with components | X1 |,...,| Xm |. The notation
966
S. Liao and A.C.S. Chung
| X |> x means that | Xi |> xi for xi ≥ 0, i = 1, ..., m. The multivariate survival function F |X| (x) of the random vector | X | is defined by: F |X| (x) = P (| X |> x) = P (| X1 |> x1 , ..., | Xm |> xm )
(3)
m m for x ∈ R+ with R+ = {x ∈ Rm : x = (x1 , ..., xm ), xi ≥ 0, i = 1, ..., m}. Compare to the conventional information measurement of Shannon’s entropy, H(X), SEE and GSEE replace the density function with the cumulative distribution in Shannon’s definition [8]. The distribution function actually is more regular than the density function in that it is defined in the integral form. The density function is the derivative of the distribution. The extension of the Shannon’s Entropy to the continuous distribution is called the differential entropy [4]. The SEE and GSEE have several advantages over the Shannon entropy and differential entropy: (1) SEE and GSEE have consistent definitions in both the continuous and discrete domains; (2) SEE and GSEE are always nonnegative; (3) SEE and GSEE are easy to compute from sample data; (4) The Shannon’s Entropy is based on the density of the random variable p(X). As pointed out by [9], in general, p(X) may not exist [9] [4]. Even if it exists, it also needs to be estimated. The estimated value of p(X) converges to the true density only under some conditions [4]. For the applications in image registration, the SEE-based Mutual Information (SEE-MI) and the GSEE-based Mutual Information (GSEE-MI) are introduced. The mutual information (MI) similarity measure of images Gr and Gf : I(Gr , T (Gf )) using Shannon entropy is defined as [6]:
I(Gr , T (Gf )) = H(p(Gr )) − E [H(p(Gr )/p(T (Gf ))]
(4)
where p(Gr ) and p(T (Gf )) are the marginal distributions of the probability distribution histogram of the reference image and the floating image respectively. The conventional MI measure suffered from the drawbacks of the Shannon’s Entropy mentioned before. To overcome such shortage, SEE-MI and GSEE-MI are introduced as follows: SEE-MI(Gr , T (Gf )) = Mα (p(Gr )) − E [Mα (p(T (Gf ))/p(Gr ))] ,
(5)
for α ≥ 0. GSEE-MI(Gr , T (Gf )) = Sα,β (p(Gr )) − E [Sα,β (p(T (Gf ))/p(Gr ))] ,
(6)
for α, β ≥ 0 and α = β, where Mα (p(T (Gf ))/p(Gr )) and Sα,β (p(T (Gf ))/p(Gr )) are the conditional SEE and GSEE, the conditional SEE and GSEE are defined as [7]: 1 ⎛∞ ⎞ 1−α α Mα (X | Y ) = ⎝ F X|Y (x | y)dx⎠ (7) 0
for α ≥ 0, and
Multi-modal Image Registration Using the GSEE
⎛ ∞ ⎜0 Sα,β (X | Y ) = ⎜ ⎝ ∞ 0
α F X|Y
(x | y)dx
β
F X|Y (x | y)dx
967
1 ⎞ β−α
⎟ ⎟ ⎠
(8)
for α, β ≥ 0 and α = β. 2.3
Optimization of SEE-MI and GSEE-MI
Given definitions of SEE-MI and GSEE-MI in Equations 5 and 6, the goal of the registration is to find the optimal transformation Tˆ by maximizing the SEE-MI or GSEE-MI values, which is formulated as: Tˆ = arg max SEE-MI(Gr , T (Gf ))
(9)
Tˆ = arg max GSEE-MI(Gr , T (Gf ))
(10)
T
for α ≥ 0, and
T
for α, β ≥ 0 and α = β. The proposed method is an unsupervised registration method which does not need any pre-aligned training pairs in advanced. In this paper, only GSEE-MI is used in the experiments as it is more general than SEE-MI and more flexible because GSEE-MI has two parameters α and β for the maximization of mutual information. The value of GSEE-MI is maximized using the Powell’s method [10]. The Powell’s method searches for the maximum value of GSEE-MI iteratively along each parameter axis T while others are kept constant. The search step ∂T is set to 0.02 mm for the translation parameters along X, Y and Z axis and 0.2 degree for the rotation parameters about X, Y and Z axis. During each iteration of the Powell’s method, the grid search is performed to search the optimal pair 1 of α and β in the range from 2−5 to 25 with step size 2 4 in order to maximize the value of GSEE-MI. The iterative search process stops when the value of GSEE-MI is converged (i.e., when the change of GSEE-MI is sufficiently small, it is set to be 0.001 in this paper).
3
Experimental Results
3.1
Probing Performance of MR-CT(3D-3D) Registration
Four pairs of CT and MR image volumes are obtained from the Retrospective Image Registration Evaluation project1 . One of the pairs of 2D CT and MR image slices is shown in Figure 1. We chose the CT images as the reference image Gr and the MR images as the floating images Gf . In order to study the performance of the objective function, 1
The images were provided as part of the project, ”Retrospective Image Registration Evaluation”, National Institutes of Health, Project Number 8R01EB002124-03, Principal Investigator, J. Michael Fitzpatrick, Vanderbilt University, Nashville, TN.
968
S. Liao and A.C.S. Chung
Fig. 1. One pair of 2D CT and MR image slices 0.9 0.8 0.7
0.7
1.2
0.6
1.1
0.5
1
0.4
0.9
0.3
0.8
0.5 0.4
MI values
MI values
MI values
0.6
0.2
0.3
0.7
0.1
0.6
0
0.5
0.2
−0.1
0.4
0.1
−0.2
0
−0.3
−100
−50 0 50 Offset Values Along X−axis (mm)
100
0.3
−100
0.2
100
1.3
0.9
1.2
0.8
1
1.1
0.7
0.9
0.9 0.8
1.1
0.6 0.5 0.4
0.8 0.7 0.6
0.7
0.3
0.5
0.6
0.2
0.4
0.5
0.1
−100
−50 0 50 Offset Values Along X−axis (mm)
(d)
100
0
100
1.2
GSEEMI values
1
−50 0 50 Offset Values Along Z−axis (mm)
(c)
1
0.4
−100
(b)
1.4
GSEEMI values
GSEEMI Values
(a)
−50 0 50 Offset Values Along Y−axis (mm)
0.3
−100
−50 0 50 Offset Values Along Y−axis (mm)
(e)
100
0.2
−100
−50 0 50 Offset Values Along Z−axis (mm)
100
(f)
Fig. 2. MI: Shifting Probes Along (a) X-axis and (b) Y-axis and (c) Z-axis; GSEE-MI: Shifting Probes Along (d) X-axis and (e) Y-axis and (f) Z-axis
Gf was shifted along and rotated about the X, Y and Z axes while Gr were fixed. For a specific transformation T , if a pixel xi of Gf fell between the voxel positions of Gr , then the PV interpolation [6] was applied to achieve subvoxel accuracy. Figure 2 plots the translational probes of shifting along three axes for MI and GSEE-MI respectively. Figure 3 plots the rotational probes for MI and GSEE-MI respectively. As shown in Figure 2, for the conventional MI measure, there are obvious local maxima when the misalignment of two images is relatively large (i.e., floating image is shifted from -100 to -200 mm and 100 to 200 mm along X, Y and Z axes). The main reason is that the estimation of density function of the Shannon’ Entropy used in the conventional MI does not converge to the true density in such case. However, from Figure 2, it is observed that the probing curves based on GSEE-MI are improved as they do not have local maxima even when two
Multi-modal Image Registration Using the GSEE 1.15
0.95
969
0.95
0.9
0.9
1.1 0.85
0.85
1.05 MI values
MI values
MI values
0.8
0.75
1
0.7
0.8
0.75
0.7
0.95 0.65
−60
0.65
−40
−20 0 20 Offset Values Along X−axis (degree)
40
0.9 −60
60
−40
(a)
−20 0 20 Offset Value Along Y−axis (degree)
40
60
−60
−40
(b)
1.25
0.95
1.15
0.9
40
60
40
60
(c)
1
1.2
−20 0 20 Offset Value Along Z−axis (degree)
1.7
1.1
0.85
1.05
0.8
1
0.75
0.95 −60
GSEEMI values
GSEEMI values
GSEEMI values
1.65
1.6
1.55
1.5
−40
−20 0 20 Offset Value Along X−axis (degree)
(d)
40
60
0.7 −60
−40
−20 0 20 Offset Value Along Y−axis (degree)
(e)
40
60
−60
−40
−20 0 20 Offset Value Along Z−axis (degree)
(f)
Fig. 3. MI: Rotational Probes About (a) X-axis and (b) Y-axis and (c) Z-axis; GSEEMI: Rotational Probes About (d) X-axis and (e) Y-axis and (f) Z-axis
images have large misalignment because the density function is replaced by the distribution function. Therefore, the robustness of the GSEE-MI approach is greatly implied. From the rotational probing curves (see Figure 3), we can see that although the MI curves do not have any local maxima, the GSEE-MI probing curves are smoother than the MI curves. Therefore, in such case, GSEE-MI performs more stable than the conventional MI measure. To further demonstrate the robustness against the PV interpolation artifacts of the conventional MI and GSEE-MI similarity measures, the probing curves of the shifting range [-2,2] mm for both methods are plotted in Figure 4, it implies that GSEE-MI is also more robust against the interpolation artifacts. 3.2
Registration Performance of MR-CT(3D-3D) Registration
To study and compare the registration robustness of the proposed GSEE-MIbased and the conventional MI-based image registration methods, a series of randomized experiments are designed to evaluate both methods. The testing images are the aforementioned four MR-CT pairs datasets (#1, #2, #3 and #4). For each pre-obtained ground truth registration MR-CT pair2 , it was perturbed by six uniformly distributed random offsets for all translational and rotational axes. 2
In the experiments, all registered pairs examined by the RIRE project. The median registration errors were all less than 1mm.
S. Liao and A.C.S. Chung 0.86
1.37
0.85
1.36
0.84
1.35
0.83
1.34 GSEEMI values
970
MI values
0.82 0.81 0.80
1.33 1.32 1.31
0.79
1.30
0.78
1.29
0.77
−2.0
1.28
−1.6
−1.2
−0.8 −0.4 0 0.4 0.8 Offset Value Along X−axis (mm)
1.2
1.6
2.0
1.27 −2.0
(a)
−1.6
−1.2
−0.8 −0.4 0 0.4 0.8 Offset Value Along X−axis (mm)
1.2
1.6
2.0
(b)
Fig. 4. (a) MI Translational Probes Along X-axis in the range [-2,2] mm; (b) GSEE-MI Translational Probes Along X-axis in the range [-2,2] mm
Then the perturbed registration image pair was used as the starting alignment. For each MR-CT pair, the experiment was repeated 100 times. The translational random offsets for X and Y axes were drawn between [-150,150] mm, the random offset for Z axis was drawn between [-70,70] mm. The rotational random offsets for X, Y and Z axes were drawn between [-20,20] degrees. Also, for each MR-CT pair the same set or randomized starting alignment was used for both methods for fair comparison. In the experiment, the translation errors (the root-sum-square of the differences for three translational axes) and the rotational errors (which was the real part of a quaternion) were computed. The threshold vector for assessing registration success was set to (2mm, 2◦ ) as registration errors below 2mm and 2◦ are generally acceptable by experienced clinicians [11,12]. The success rate of the conventional MI approach and the proposed GSEE-MI approach is listed in Table 1. As we can see, the GSEE-MI approach has higher success rate than the conventional MI approach in each MR-CT pair dataset. Table 1. Success rates of the conventional MI-based approach and the proposed GSEEMI-based approach Testing dataset #1 #2 #3 #4
MI Success Rates 64% 63% 72% 65%
GSEE-MI Success Rates 95% 93% 97% 89%
To further precisely demonstrate the registration accuracy of the proposed method, the mean value and standard deviation of the registration errors in the successful registrations in all four MR-CT pairs for both MI and GSEE-MI methods are calculated and listed in Table 2. According to Table 2, the accuracies of the conventional MI approach and the proposed GSEE-MI approach for the successful registrations are comparable and acceptably high. However, we should
Multi-modal Image Registration Using the GSEE
971
Table 2. Registration Accuracy of the conventional MI approach and the proposed GSEE-MI approach Translation (10−3 mm) Rotation (10−3 degrees) Δtx Δty Δtz Δθx Δθy Δθz MI 0.64±1.84 -0.37±0.73 1.16±1.32 0.85±1.84 -1.05±1.47 0.86±1.63 GSEE-MI 0.85±1.63 -0.62±0.34 1.05±1.57 1.32±0.62 -0.83±1.58 0.52±1.15 Method
bear in mind that the number of successful registrations of the GSEE-MI-based approach are significantly higher than that of the conventional MI-based method according to Table 1.
4
Conclusion
In this paper, we have proposed a new multi-modal image registration method using the generalized survival exponential entropy and mutual information (GSEEMI). The experimental results show that the GSEE-MI-based method reduces the interpolation artifacts and is more robust than the conventional MI-based method. The future direction is to extend the current method and apply it to non-rigid image registration.
References 1. Wells, W., Viola, P., et al.: Multi-Modal Volume Registration by Maximization of Mutual Information. MedIA 1(1) (1996) 35–51 2. Maes, F., Collignon, A., et al.: Multimodality Image Registration by Maximization of Mutual Information. TMI 16(2) (1997) 187–198 3. Pluim, J., Maintz, J., Viergever, M.: Mutual-information-based registration of medical images: a survey. TMI 22(8) (2003) 986–1004 4. Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley and Sons (1991) 5. Bishop, C.: Neural Networks for Pattern Recognition. Oxford U. Press (1995) 6. Maes, F., Collignon, A., et al.: Multimodality Image Registration by Maximization of Mutual Information. TMI 16(2) (1997) 187–198 7. Zografos, K., Nadarajah, S.: Survival Exponential Entropies. IEEE Trans. on Information Theory 51(3) (2004) 1239–1246 8. Shannon, C.: A mathematical theory of communication. In: Bell Syst. Tech. J. Volume 27. (1948) 379–432 9. Wang, F., Vemuri, B., Rao, M., Chen, Y.: A New & Robust Information Theoretic Measure and its Application to Image Alignment. In: The 18th IPMI. (2003) 388–400 10. Press, W., Teukolsky, S., et al.: Numerical Recipes in C. Cambridge University Press (1992) 11. Hajnal, J.V., Hill, D.L.G., Hawkes, D.J: Medical Image Registration. CRC Press LLC (2001) 12. Zhu, Y., Cochoff, S.: Likelihood Maximization Approach to Image Registration. TIP 11 (2002) 1417–1426
Author Index
Abi-Nahed, Julien II-1 Abolmaesumi, Purang I-891, I-899, II-536, II-603 Abr` amoff, Michael I-800 Achenbach, Stephan II-348 Ahmadi, Seyed-Ahmad I-420 Aizenstein, Howard J. I-191 Akhurst, Timothy J. II-782 Akselrod-Ballin, Ayelet II-209 Alca˜ niz, Mariano I-159, I-167 Ali, Asem II-734 Aljabar, Paul II-702, II-815 Altrogge, Inga I-486, II-380 Amesur, Nikhil I-654 Anton, Jean-Luc II-193, II-300 Aoki, Eisuke I-543 Arbel, Tal I-232 Arsigny, Vincent I-924 Atallah, Louis I-752 Aubert-Broche, Berengere I-330 Augenstein, Kevin F. I-628 Avants, Brian II-831 Axel, Leon I-636 Ayache, Nicholas I-297, I-338, I-924 Aylward, Stephen R. II-662 Azpiroz, Fernando II-161 Babalola, Kolawole O. I-142 Baccon, Patricia Le I-907 Bachmann, Simon I-389 Bade, Ragnar I-478 Badiei, Sara II-17 ´ Ballester, Miguel A.G. I-25, I-405 Barber, David C. II-686 Barbu, Adrian II-462 Barillot, Christian II-33 Barkovich, James II-201 Basri, Ronen II-209 Basu, Saurav I-117 Baur, Charles I-816 Bay, Herbert II-185 Becher, Harald I-612 Becker, James T. I-191 Beek, Maarten II-536
Bendl, Rolf II-612 Benhimane, Selim I-551 Berezney, Ronald II-577 Berezovskii, Vladimir I-109 Bergmann, Ørjan II-268 Bertrand, Gilles II-389 Besant, Collin I-519 Bhagalia, Roshni R. II-276 Bhotika, Rahul II-479 Bian, Na II-93 Bichlmeier, Christoph I-364 Bischof, Horst II-710 Blanz, Volker II-495 Blezek, Daniel J. I-712 Bobick, Aaron I-66 Bockenbach, Olivier I-578 Boctor, Emad II-405 Bogoni, Luca II-169, II-462 Bogunovi´c, Hrvoje II-117 Bondiau, Pierre-Yves I-338 Born, Richard I-109 Bouix, Sylvain II-252 Bourgeat, Pierrick II-920 Brady, Michael I-865 Brandt, Achi II-209 Brasacchio, Ralph I-41 Brinker, Gerhard II-49 Brismar, Torkel II-880 Brock, Kristy K. II-686 Brown, Rebecca E.B. II-479 Brun, Caroline A. I-191 Brunenberg, Ellen II-9 Buchfelder, Michael II-225 B¨ uchler, Philippe I-405 Budge, Marc M. II-58 B¨ uhler, Katja I-151 Buist, Richard I-760 Bullitt, Elizabeth II-561, II-662 Bulot, R´emy II-300 Bunyak, Filiz I-101 Burns, Peter N. II-76 Busayarat, Sata II-670 B¨ uskens, Christof I-486 Butson, Christopher R. II-429
974
Author Index
Cai, Wenchao II-928 Camara-Rey, Oscar I-272, II-937 Carey, Lisa A. II-561 Carroll, John D. II-177 Castro, Marcelo II-438 Catanuto, Giuseppe I-776 Cathier, Pascal II-169, II-694 Cattin, Philippe C. I-816, II-185 Cavallo, Filippo I-429 Cebral, Juan II-438 Chakravarty, M. Mallar II-389 Chan, Raymond C. I-604 Chan, Tony F. II-308, II-946 Chang, Jenghwa I-50 Chang, Sukmoon I-784 Chen, Elvis C.S. I-313, II-397 Chen, James Y. II-177 Chen, Lin II-316 Chen, Thomas Kuiran I-899 Cheng, Christopher Wai Sam II-742 Cheng, Jack C.Y. I-175 Chiang, Ming-Chang I-191 Chiba, Toshio I-412 Choe, Tae Eun I-134 Choti, Michael II-405 Chou, Yi-Yu I-191 Chowdhury, Tarik A. II-84 Chu, Winnie C.W. I-175 Chung, Adrian J. I-281, II-512 Chung, Albert C.S. I-223, II-928, II-964 Cinquin, Philippe I-535 Clatz, Olivier I-338 Clevert, Dirk-Andr´e I-915 Clouchoux, C´edric II-193 Cohen, Isaac I-134 Colchester, Alan C.F. II-109 Collins, D. Louis I-232, I-330, II-58, II-144, II-389 Comaniciu, Dorin I-736, II-462 Commowick, Olivier I-924 Cooper, Scott E. II-429 Cootes, Tim F. I-142 Corso, Jason J. II-790 Coulon, Olivier II-193, II-300 Coup´e, Pierrick II-33 Cowan, Brett R. I-628, I-728 Cremer, Thomas I-907 Cremers, Daniel I-92 Crum, William R. II-937
Dalal, Sandeep I-604 Dario, Paolo I-429 Darzi, Ara W. I-752 Dauguet, Julien I-109 Davatzikos, Christos II-620 Davies, Brian I-519 Davis, Brad II-260 Davison, Andrew I-347 de Bruijne, Marleen I-1 de Mathelin, Michel I-527 de Souza, Nandita I-519 de Vries, Ayso H. II-471 Dedden, Katrin II-495 Deguchi, Daisuke II-645 Dehghan, Ehsan I-305 del Nido, Pedro J. I-58, I-596 Deligianni, Fani I-281 Delingette, Herv´e I-297, I-338 DeLorenzo, Christine I-932 Delzescaux, Thierry I-109 Dempere-Marco, Laura II-438 deOliveira, Michelle II-405 Deoni, Sean C.L. I-768 Detre, John A. II-284 Dinov, Ivo I-695 Ditt, Hendrik II-710 Dohi, Takeyoshi I-412, I-454, I-503 Doignon, Christophe I-527 Dornheim, Jana II-904 Dowsey, Andrew W. II-364 Doyle, Scott II-504 Duff, Eugene II-292 Duff, John M. I-816 Duncan, James S. I-932 Dunsby, Christopher II-586 Dupont, Pierre E. I-58 Dutton, Rebecca A. I-695, I-191 Dyet, Leigh I-687 Ebbers, Tino I-257 Eckman, Kort II-653 Edwards, David I-687 Egan, Gary F. II-292 Eils, Roland I-840, I-907 El-Baz, Ayman I-662, II-446, II-734, II-799 El-Ghar, Mohamed A. I-662, II-446, II-799 Eldiasty, Tarek I-662, II-446, II-799 Elhawary, Haytham I-519
Author Index Ellenberg, Jan I-840 Ellis, Randy E. I-313, I-445, I-899, II-397, II-637 Elshazly, Salwa I-662 Elson, Daniel S. II-586 Enders, Frank II-225 Engelhardt, Martin I-9, II-864 Eramian, Mark.G. II-93 Ertel, Dirk II-348 Eskildsen, Simon F. II-823 Euler, Ekkehard I-373, I-551 Evans, Alan C. I-183, I-330, II-144 Ewend, Matthew G. II-561 Fahmi, Rachid II-446 Falk, Robert I-381, I-662 Farag, Aly A. I-381, I-662, II-446, II-734, II-799 Farinella, Giovanni Maria I-776 Farneb¨ ack, Gunnar I-857 Fei, Jin II-569 Feldman, Michael II-504, II-620 Fern´ andez-Seara, Mar´ıa A. II-284 Fetita, Catalin II-413 Feussner, Hubertus I-420 Fevens, Thomas II-766 Fichtinger, Gabor I-50, I-494, II-405 Fieten, Lorenz I-9 Finnis, Kirk W. I-768 Firmin, David II-364 Fischer, Gerald I-264, I-588 Fisher, John II-955 Fletcher, P. Thomas I-117 Focacci, Francesco I-429 Fonov, Vladimir II-144 Fox, Nick C. I-272, II-937 Frangi, Alejandro II-438 Franken, Erik II-25 Freixenet, Jordi II-872 French, Daniel I-389 French, Paul M.W. II-586 Fripp, Jurgen II-920 Froh, Michael S. II-686 Fu, Luke I-41, II-144 Fujita, Hiroshi II-856 Fuller, Dave I-41 Funka-Lea, Gareth I-92, II-888 G˚ ardhagen, Roland I-257 Galaburda, Albert M. I-695
975
Galletly, Neil II-586 Gallo, Giovanni I-776 Galun, Meirav II-209 Ganslandt, Oliver II-225 Garbey, Marc II-569 Garcia, Joel A. II-177 Garde, Ellen I-272 Gee, Andrew H. II-356 Gee, James C. II-284, II-831 Georgescu, Bogdan I-736 Gerig, Guido II-260 Ghanem, Roger II-405 Ghita, Ovidiu II-84 Gil, Jos´e Antonio I-167 Gilles, Benjamin I-289 Gilmore, John II-260 Gimel’farb, Georgy I-662, II-734, II-799 Ginsberg, Michelle S. I-784 Glatard, Tristan II-152 Godinez, William J. I-840 Goksel, Orcun I-305 Gomes, Lavier II-135 Gomori, Moshe John II-209 Gong, Ren Hui I-891 Goodlett, Casey II-260 Gool, Luc Van II-185 Grabner, G¨ unther II-58 Grady, Leo II-888, II-896 Grass, Michael II-177 Grau, Vicente I-612 Greiner, G¨ unther II-225 Grimson, W. Eric L. II-955 Groher, Martin I-873 Grova, Christophe I-330 Gu, Xianfeng II-946 Guiraudon, Gerard II-520 Guo, Ting I-768 Haber, Eldad II-726 Haeker, Mona I-800 Hager, Gregory I-355, II-405 Hajnal, Joseph V. I-687, II-702, II-815 Haker, Steven I-66 Hamarneh, Ghassan I-808 Hammers, Alexander II-702, II-815 Hancock, Edwin R. II-234 Hanekamp, Annemarie I-604 Hanser, Friedrich I-588 Haque, Hasnine A. I-454 Hara, Takeshi II-856
976
Author Index
Harder, Nathalie I-840 Hasegawa, Akira II-454 Hashizume, Makoto II-372 Hassouna, M. Sabry I-381 Hastreiter, Peter II-225 Hayashi, Kiralee M. I-191 Heard, Edith I-907 Heckemann, Rolf A. II-702, II-815 Heiberg, Einar I-257 Heimann, Tobias II-41 Heining, Sandro Michael I-373, I-551 Henderson, Jaimie M. II-429 Heng, Pheng Ann I-175, II-67 Hertel, Ilka II-904 Heverhagen, Johannes I-832 Hill, Derek L.G. I-272, II-937 Hinterm¨ uller, Christoph I-264, I-588 Hirayama, Hiroaki II-372 Hlad˚ uvka, Jiˇr´ı I-151 Ho, Henry Sun Sien II-742 Ho, Simon S.M. II-67 Hoang, Angela II-284 Horn, Mark Van II-662 Horn, Martin I-420 Hoshi, Hiroaki II-856 Hou, Zujun II-324 Housden, R. James II-356 Howe, Robert D. I-58, I-596 Hu, Qingmao II-324 Huang, Ke I-645 Huang, Su II-324 Impoco, Gaetano I-776 Iordachita, Iulian I-50 Iseki, Hiroshi I-543 Ito, Kyoko I-83 Izard, Camille I-849 Jain, Ameet I-494 Jakobs, Tobias F. I-873, I-882 Janke, Andrew L. II-58 Jaramaz, Branislav II-653 Jean, Remi II-260 Jedynak, Bruno I-849 Jenkinson, Mark I-142 Jerebko, Anna II-169 Ji, Qing II-332 Jia, Guang I-832 Jiang, Tianzi II-340 Jin, Chao II-766
Johnston, Leigh A. II-292 Jolly, Marie-Pierre II-1 Jomier, Julien II-662 Jones, Doug II-520 Kachelrieß, Marc II-348 Kakeji, Yoshihiro II-372 Kalender, Willi A. II-348 Kanade, Takeo I-437 Kanematsu, Masayuki II-856 Kardon, Randy I-800 Karlsson, Matts I-257 Kazanzides, Peter I-50 Keegan, Jennifer II-364 Keil, Andreas II-49 Kelliher, Timothy P. II-487 Khamene, Ali I-915 Kikinis, Ron II-125, II-955 Kim, Boklye II-276 Kim, Byeong-sang I-570 Kim, In Young I-183 Kim, Jae-Hun I-183 Kim, Seyoung II-217 Kim, Sun I. I-183 Kindlmann, Gordon I-126, II-268 Kirchberg, Klaus J. I-470 Kitagawa, Teruhiko II-856 Kitasaka, Takayuki II-645 Klatzky, Roberta I-654 Knopp, Michael I-832 Kobayashi, Etsuko I-543 Kohlberger, Timo I-92 K¨ ohler, Daniela I-907 Koikkalainen, Juha I-75 Kondo, Hiroshi II-856 Konishi, Kozo II-372 Konuko˜ glu, Ender I-338 Korczykowski, Marc II-284 Kovalev, Vassili A. II-421 Kr¨ oger, Tim I-486, II-380 Krzy˙zak, Adam II-766 Laine, Andrew F. II-758 Lakare, Sarang II-169 Lamp´erth, Michael I-519 Laningham, Fred H. II-332 L¨ anne, Toste I-257 Lapp, Robert M. II-348 Larsen, Rasmus I-241 Lau, William I-355
Author Index Lauer, Wolfgang I-511 Lauerma, Kirsi I-75 Learned-Miller, Erik II-495 Lee, Agatha D. I-695 Lee, Jong-Min I-183 Lee, Junki I-183 Lee, Vivian S. II-758 Lee, Ying-lin II-479 Leemput, Koen Van I-704 LeGrice, Ian J. I-628 Lekadir, Karim I-620, II-586 Leong, Julian J.H. I-752 Lepore, Natasha I-191 Lerotic, Mirna I-462, II-364 Li, Jack I-50 Li, Kuncheng II-340 Li, Shuo II-766 Li, Song II-766 Liang, Meng II-340 Liao, Hongen I-412, I-454, I-503 Liao, Shu II-964 Likar, Boˇstjan II-135 Lim, Hokjin I-570 Lin, Nancy U. II-561 Lin, Xiang I-728 Ling, C. Clifton I-50 Ling, Keck Voon I-321, II-742 Linguraru, Marius George I-596 Liu, Huafeng I-397, I-720, I-792 Liu, Tien-I I-321 Liu, Zhening II-340 Lo, Jonathan Lok-Chuen I-865 Loeckx, Dirk II-718 Lohmann, Gabriele II-109 Lonˇcari´c, Sven II-117 Long, Jean-Alexandre I-535 Lopez, Oscar L. I-191 Lorenz, Christine H. I-470 L¨ otj¨ onen, Jyrki I-75, II-629 Loyd, Dan I-257 Lu, Qifeng II-495 Lueck, Gord II-76 Luenam, Suriya II-536 Lui, Lok Ming II-308 Lund, Michael T. I-1 Lundervold, Arvid II-268 Lundqvist, Roger II-421 Lutze, Theodor I-511
977
Ma, Burton II-397, II-637 Machiraju, Raghu I-832 Madabhushi, Anant II-504 Maes, Frederik II-718 Magnenat-Thalmann, Nadia I-289 Malsch, Urban II-612 Malyavantham, Kishore S. II-577 Manduca, Armando I-824 Mangin, Jean-Fran¸cois II-193 Mangin, Michel I-578 Manjeshwar, Ravindra M. II-782 Manolidis, Spiros I-33 Manzke, Robert I-604 Martel, Anne L. II-76, II-101, II-686 Mart´ı, Robert II-872 Masamune, Ken I-412, I-454, I-503 Matsumiya, Kiyoshi I-412, I-454, I-503 Matsuo, Hidenori I-83 Maurer Jr., Calvin R. II-645 McCarley, Robert W. II-955 McCoy, Julie M. II-479 McGinty, James II-586 McIntosh, Chris I-808 McIntyre, Cameron C. II-429 McRobbie, Donald I-519 McVeigh, Elliot I-297 Mechanic-Hamilton, Dawn II-284 Medioni, Gerard I-134 Megali, Giuseppe I-429 Meinzer, Hans-Peter II-41 Mendon¸ca, Paulo R.S. II-479 Merhof, Dorit II-225 Messenger, John C. II-177 Messing, Edward I-41 Metaxas, Dimitris N. I-636, I-784 Mewes, Andrea U.J. I-199 Micu, Ruxandra I-882 Miller, James V. I-712, II-479, II-487, II-594, II-782 Miller, Ralph II-495 Miller, William II-446 Miranda, Abhilash A. II-84 Misic, Vladimir I-41 Mitchell, J. Ross I-760 Moccozet, Laurent I-289 Modersitzki, Jan II-726 Modre, Robert I-588 Moghari, Mehdi Hedjazi II-603 Mollemans, Wouter II-718 Mongia, Sanjay II-389
978
Author Index
Montagnat, Johan II-152 Moore, Niall I-865 Mora-Berm´ udez, Felipe I-840 Mori, Kensaku II-645 Mori, Masaki II-645 Morikawa, Shigehiro I-454 Morris, W. James I-389, II-17 Mosaliganti, Kishore I-832 Mountney, Peter I-347 Movassaghi, Babak II-177 Muehler, Konrad I-478 M¨ uhlthaler, Hannes I-588 Mukherjee, Lopamudra II-577 Mukherjee, Pratik II-201 Muragaki, Yoshihiro I-543 Murgasova, Maria I-687 Mylonas, George P. I-752 Myronenko, Andriy II-553 Nagel, Markus II-348 Nageotte, Florent I-527 Nain, Delphine I-66 Nakamoto, Masahiko II-372 Nakamura, Katsushige I-543 Nath, Sumit K. I-101 Natori, Hiroshi II-645 Nava, Maurizio B. I-776 Navab, Nassir I-364, I-373, I-420, I-551, I-561, I-873, I-882, I-915, II-49, II-750 Neuvonen, Tuomas II-629 Ng, Wan-Sing I-41, I-321, II-742 Nicolaou, Marios I-752 Nielsen, Mads I-1 Niethammer, Marc II-252 Nilsson, Sven II-774 Nimsky, Christopher II-225 Noble, J. Alison I-612 Noguchi, Masafumi I-543 Nolte, Lutz-Peter I-25, I-405 Novotny, Paul M. I-58 Nowinski, Wieslaw L. II-324 Nystr¨ om, Ingela II-774 O’Donnell, Lauren II-243 Ogg, Robert J. II-332 Oliver, Arnau II-872 Omori, Shigeru I-543 Ong, Sim-Heng I-671
Operto, Gr´egory II-300 Østergaard, Lasse R. II-823 Oubel, Estanislao II-438 Ourselin, S´ebastien II-135, II-920 Pache, Fabian II-750 Padoy, Nicolas I-873 Pagani, Marco II-421 Paisley, Angela II-495 Palaniappan, Kannappan I-101 Papademetris, Xenophon I-932 Park, Shinsuk I-570 Parra, Carlos II-332 Parrent, Andrew G. I-768 Patenaude, Brain I-142 Pathak, Chetna II-662 Pavlidis, Ioannis II-569 Peeling, James I-760 Peitgen, Heinz-Otto I-486, II-380 Peled, Sharon I-109 Pennec, Xavier I-297, I-924, II-152 Pereira, Philippe L. II-380 Periaswamy, Senthil II-169 Perneczky, Axel I-511 Pernuˇs, Franjo II-135 Peter, Adrian I-249 Peters, Terry M. I-768, II-520 Petersson, Joel II-880 Pettersen, Paola P. I-1 Peyrat, Jean-Marc I-297 Pfeifer, Bernhard I-264, I-588 Pflederer, Tobias II-348 Piccigallo, Marco I-429 Pichery, Rapha¨el I-167 Pichora, David R. II-536 Pierson, Roger A. II-93 Pike, Bruce II-144 Plank, Stephen R. II-553 Plewes, Donald B. II-686 Podder, Tarun I-41, I-321 Pohl, Kilian M. II-955 Pollari, Mika II-629 Popovic, Aleksandra I-9, II-864 Prager, Richard W. II-356 Preim, Bernhard I-478, II-904 Prˆeteux, Fran¸coise II-413 Preusser, Tobias I-486, II-380 Pruessner, Jens II-58 Pujol, Oriol II-9 Putman, Christopher II-438
Author Index Qi, Feihu I-83 Qian, Zhen I-636 Querol, Laura Belenguer
I-405
Radermacher, Klaus I-9, I-511, II-864 Radeva, Petia II-9, II-161 Rajagopalan, Srinivasan II-544 Ramadan, Saadallah II-920 Ramaraj, Ramamani I-92 Rangarajan, Anand I-249 Rao, Anil I-142 Rasche, Volker I-604, II-177 Rea, Marc I-519 Reddick, Wilburn E. II-332 Reddy, Vivek Y. I-604 R´egis, Jean II-193 Reilhac, Anthonin I-330 Reilly, John J. II-125 Reiss, Allan L. I-695 Requejo-Isidro, Jose II-586 Richardson, Theodor I-17 Richter, Mirco II-225 Ridgway, Gerard R. I-272 Riederer, Stephen I-824 Riquarts, Christian I-373 Robb, Richard A. II-544 Rodriguez-Carranza, Claudia II-201 Rohlfing, Torsten I-207 Rohr, Karl I-215, I-840, I-907, II-678 Rongen, Peter II-25 Roose, Liesbet II-718 R¨ oper, Barbara II-750 Rosenbaum, James T. II-553 Ross, James C. II-487 Rousson, Mika¨el I-92, II-848 Rubens, Deborah I-41 Rueckert, Daniel I-405, I-687, II-702, II-815 Rueda, Sylvia I-159, I-167 Rusinek, Henry II-758 Russakoff, Daniel B. II-454 Rutherford, Mary I-687 Sadda, SriniVas R. I-134 Sadikot, Abbas F. II-389 Sakuma, Ichiro I-543 Salcudean, Septimiu E. I-305, I-389, II-17 Saltz, Joel I-832 Samset, Eigil I-445
979
San Jos´e Est´epar, Ra´ ul I-445, II-125 Sandro, Michael Heining I-364 Saragaglia, Amaury II-413 Sato, Yoshinobu II-372 Scahill, Rachael I. II-937 Schaefer, Dirk II-177 Schmid, Volker J. I-679 Schmidt, Diethard I-486, II-380 Schnabel, Julia A. II-937 Schuberth, Sebastian I-578 Schwartz, Lawrence H. I-784 Schwartzman, David I-437 Schweiger, Martin II-937 Sebastian, Thomas B. II-782 Seger, Michael I-264, I-588 Seim, Heiko II-904 Sellens, Richard W. II-536 Serefoglou, Stefanos I-511 Sermesant, Maxime I-297 Seshamani, Sharmishtaa I-355 Shah, Mubarak II-67 Shan, Zuyao Y. II-332 Shao, Jie I-736 Shao, Wei II-742 Sharon, Eitan II-790 Shen, Dinggang I-83, II-620 Shenton, Martha E. II-252, II-955 Sherman, Jason I-41 Shi, Lin I-175 Shi, Pengcheng I-397, I-720, I-744, I-792, II-528 Shi, Yonghong I-83 Sielhorst, Tobias I-364, I-373, I-420 Silverman, Edwin K. II-125 Simaan, Nabil I-33 Simpson, Amber L. II-397 Singh, Vikas II-577 Sinop, Ali Kemal II-896 Sirohey, Saad A. II-479 Sj¨ ostrand, Karl I-241 ¨ Smedby, Orjan II-880 Smith, David II-58 Smith, J. Keith II-561 Smyth, Padhraic II-217 Sneller, Beatrix I. I-272 Song, Jae-bok I-570 Song, Ting II-758 Song, Xubo B. II-553 Song, Zhuang II-831 Sonka, Milan I-800
980
Author Index
Spencer, Dennis I-932 Spoto, Salvatore I-776 Spyridonos, Panagiota II-161 Stamp, Gordon II-586 Stanwell, Peter II-920 Stark, Craig E.L. I-849 Stauder, Ralf I-420 Stefan, Philipp I-373 Stern, Hal II-217 Stetten, George I-654 Stewart, A. James I-891, II-397 Stoker, Jaap II-471 Stoll, Jeffrey A. I-58 Stoyanov, Danail I-347 Strang, John I-41 Strauss, Gero II-904 Studholme, Colin II-201 Stylopoulos, Nicholas I-445 Styner, Martin I-25 Suenaga, Yasuhito II-645 Suetens, Paul II-718 Sun, Nanfei II-569 Sun, Yin I-671 Suzuki, Hirokazu I-412 Svensson, Johan I-257 Sz´ekely, G´ abor I-816, II-185 Takabatake, Hirotsugu II-645 Tamura, Shinichi II-372 Tan, Ek Tsoon I-671 Tang, Kathy Z. II-284 Tank´ o, L´ aszl´ o B. I-1 Tannenbaum, Allen I-66 Tao, Xiaodong II-594 Taylor, Russell II-405 Teller, Kathrin I-907 ter Haar Romeny, Bart II-9, II-25 Thesen, Stefan II-49 Thieke, Christian II-612 Thng, Choon Hua II-742 Thom, Simon II-364 Thompson, Christopher I-445 Thompson, Paul M. I-191, I-695, II-308, II-946 Thurfjell, Lennart II-421 Thurston, Adrian D. I-899 Tian, Lixia II-340 Tian, Yi I-397 Tilg, Bernhard I-264, I-588 Toews, Matthew I-232
Toga, Arthur W. I-191, I-695 Tokuda, Junichi I-454 T¨ olli, Tuomas I-75 Tomaszeweski, John II-504, II-620 Tonet, Oliver I-429 Tong, Shan I-744 Trainer, Peter II-495 Traub, Joerg I-373, I-561 Treece, Graham M. II-356 Tricoche, Xavier I-126 Truyen, Roel II-471 Trzasko, Joshua I-824 Tsukamoto, Tetsuji I-454 Tu, Zhuowen I-695 Tuch, David II-807 Turner, Wesley D. II-479, II-487 Tustison, Nicholas II-831 Uematsu, Miyuki Urschler, Martin
I-429 I-882, II-710
van Almsick, Markus II-25 van Ginneken, Bram II-912 van Ravesteijn, Vincent F. II-471 van Vliet, Lucas J. II-471 van Wijk, Cees II-471 Varah, Jim II-17 Vasilyev, Nikolay V. I-58, I-596 Vidholm, Erik II-774 Vigneron, Daniel II-201 Vilari˜ no, Fernando II-161 Vitri` a, Jordi II-161 Vives, Kenneth P. I-932 von Cramon, D. Yves II-109 Voros, Sandrine I-535 Vos, Frank M. II-471 Vosburgh, Kirby I-445 Vrtovec, Tomaˇz II-135 Wachinger, Christian II-49 Wagenknecht, Gudrun II-316 Walsh, Alexander C. I-134 Wang, David I-654 Wang, Kun II-340 Wang, Liang II-340 Wang, Linwei I-792 Wang, Shih-Chang I-671 Wang, Song I-17 Wang, Yalin II-308, II-946 Warfield, Simon K. I-109, I-199, II-839
Author Index Washko, George G. II-125 Weihusen, Andreas I-486, II-380 Wein, Wolfgang I-915, II-750 Weisenfeld, Neil I. I-199 Wells, Jennifer I-760 Wells, William M. II-839, II-955 Wen, Xu I-389 Wendler, Thomas I-561 Wengert, Christian I-816 Westin, Carl-Fredrik I-109, I-126, I-445, I-857, II-125, II-243, II-252, II-268, II-807 Whelan, Paul F. II-84 Whitaker, Ross T. I-117 Whitcher, Brandon I-679 Wiesner, Stefan I-551 Wilson, Kevin II-520 Wimmer, Andreas I-470 Winer, Eric P. II-561 Wink, Onno II-177 Wolf, Ivo II-41 Wong, Chun Lok II-528 Wong, Ken C.L. I-720 Wong, Samson II-758 Wong, Tien-Tsin I-175 Wong, Wilbur C.K. I-223 W¨ orz, Stefan I-215, II-678 Wu, Kun I-932 Wu, Qian II-512 Wu, Ruoyun II-742 Wu, Ting I-9, II-864 Xiao, Di I-321 Xie, Jun II-67 Xu, Chenyang I-297, II-848 Xu, Jinhui II-577 Xu, Kai I-33 Xue, Zhong I-83 Yagi, Akihiko I-503 Yan, Chye Hwang I-671 Yan, Kai Guo I-321 Yan, Michelle I-645 Yang, Guang-Zhong I-281, I-347, I-462, I-620, I-679, I-752, II-1, II-364, II-512, II-586
Yang, Siwei I-907 Yau, Shing-Tung II-946 Yeo, Desmond T.B. II-276 Yeung, Benson H.Y. I-175 Yger, Pierre II-33 Yokoyama, Ryujiro II-856 Yong, V. Wee I-760 Yoshida, Daiki I-543 Young, Alistair A. I-628, I-728 Young, Ian I-519 Yu, Yan I-41, I-321 Yuille, Alan II-790 Yuille, Alan L. I-695 Yuksel, Seniha II-446 Yushkevich, Paul A. II-284 Zach, Christopher II-710 Zahiri-Azar, Reza I-389 Zambal, Sebastian I-151 Zeng, Donglin II-561 Zhan, Yiqiang II-620 Zhang, Fan II-234 Zhang, Heye I-720, I-792, II-528 Zhang, Hui II-284 Zhang, Jian I-33 Zhang, Xinqing II-340 Zhang, Xuejun II-856 Zhang, Yong II-332 Zhang, Yongde I-41 Zhang, Yunyan I-760 Zhao, Binsheng I-784 Zheng, Guoyan I-25 Zheng, Songfeng I-695 Zhong, Hua I-437 Zhou, Jinghao I-784 Zhou, S. Kevin I-736 Zhou, Xiangrong II-856 Ziegler, Sibylle Ilse I-561 Zikic, Darko I-915 Zivanovic, Aleksander I-519 Ziyan, Ulas II-807 Zou, Kelly H. II-839 Zrimec, Tatjana II-670 Zwiggelaar, Reyer II-872
981